“Simulation outputs are important but that does not mean we save them forever” – Gretchen Mullendore
This week I have been attending a workshop on data curation (a key part of open science) specifically on developing guidelines on the data produced by weather and climate simulations. Open science is better science! But a blanket “you must save and provide all data” is not only onerous (especially for underserved institutions) but not what is needed for reproducibility and reusability.
First, this post are my thoughts and do not, necessarily, reflect the views of attendees and organizers. There will be a report. There has been a lot written about measurements and measurements can no be recreated. Model data, to a degree, can be regenerated. By sharing workflows those with the appropriate resources can run the models on provided initialization and configuration data. Furthermore the sharing of workflows allows the exploring of the robustness of conclusions to assumptions (sensitivity) and the reuse of the workflow to address new science questions.
I really enjoyed the discussions and applaud the team’s focus on designing rubrics as it brings the conversation up a level and enables the clear measurement of the efficacy of solutions. It was also great seeing a huge diversity in the career stage and “flavor” of participants. We had data creators, curators, representatives from three publishers (AGU, AMS and PLOS), data scientists and more!
Also, fittingly, lots of discussions around equity. Open science is better science. Journals are increasingly requiring data to be made available (even FAIR) which can create a burden to institutions without the physical and/or workforce to meet these requirements. There have been discussions of carving out exceptions for underserved communities. My perception is that the community here at the workshop pushed back hard against that idea as, as aforementioned, open science is better science. Rather we need to equip those institutions to meet the open science requirements.
Lots of discussions on just how much data should be required to be made available to be open and how long it should be curated for. Again a focus on designing rubrics to guide the process. The focus should be on the goal and be flexible to aid the scientist in achieving open science and reproducibility and also allow the society driven journals in meeting the aspiration of is members.
It was great to be back in Grand Forks. The University of North Dakota is a great institution that, in the atmospheric science, punches way about its weight. Two of our recent three hires had a background at UND and I very much enjoy my collaborations with the team there. It was also very nice to be there during a dry cool air outbreak in summer rather than a frigid cold air outbreak in october!