Curating Weather Simulation Data. Earthcube Workshop in North Dakota.

“Simulation outputs are important but that does not mean we save them forever” – Gretchen Mullendore

This week I have been attending a workshop on data curation (a key part of open science) specifically on developing guidelines on the data produced by weather and climate simulations. Open science is better science! But a blanket “you must save and provide all data” is not only onerous (especially for underserved institutions) but not what is needed for reproducibility and reusability.

So many great minds focused on open science.

First, this post are my thoughts and do not, necessarily, reflect the views of attendees and organizers. There will be a report. There has been a lot written about measurements and measurements can no be recreated. Model data, to a degree, can be regenerated. By sharing workflows those with the appropriate resources can run the models on provided initialization and configuration data. Furthermore the sharing of workflows allows the exploring of the robustness of conclusions to assumptions (sensitivity) and the reuse of the workflow to address new science questions.

Gretchen kicking off the meeting

I really enjoyed the discussions and applaud the team’s focus on designing rubrics as it brings the conversation up a level and enables the clear measurement of the efficacy of solutions. It was also great seeing a huge diversity in the career stage and “flavor” of participants. We had data creators, curators, representatives from three publishers (AGU, AMS and PLOS), data scientists and more!

Susan from the University of Michigan on data curation.

Also, fittingly, lots of discussions around equity. Open science is better science. Journals are increasingly requiring data to be made available (even FAIR) which can create a burden to institutions without the physical and/or workforce to meet these requirements. There have been discussions of carving out exceptions for underserved communities. My perception is that the community here at the workshop pushed back hard against that idea as, as aforementioned, open science is better science. Rather we need to equip those institutions to meet the open science requirements.

Lots of discussions on just how much data should be required to be made available to be open and how long it should be curated for. Again a focus on designing rubrics to guide the process. The focus should be on the goal and be flexible to aid the scientist in achieving open science and reproducibility and also allow the society driven journals in meeting the aspiration of is members.

A nice atmosphere and a nice atmosphere!

It was great to be back in Grand Forks. The University of North Dakota is a great institution that, in the atmospheric science, punches way about its weight. Two of our recent three hires had a background at UND and I very much enjoy my collaborations with the team there. It was also very nice to be there during a dry cool air outbreak in summer rather than a frigid cold air outbreak in october!

SciPy Thoughts

Subtitle, too busy to blog. Just about finished my time here at SciPy and I am both tired and energized. My excitement has not diminished from my first SciPy back in 2012. Great to meet new people and re-meet people that, due to reasons, many of them pandemic related, I have lost contact with.

Good to be back in the ballroom!

My number one take away from SciPy is: How much better organized the community is and how they, more so than any government program I have worked with, pull in the same direction and work in concert across many projects. The impact of organizations like Chan Zuckerberg is clear as is the orchestrating role of NumFocus. Also a thing to watch is the new Scientific Python organization which is aimed at sustainable growth and enhancement of the ecosystem.

Queso!!!

The increasing common language of enhancement projects (PEPs, SPECs, ZEPs etc…) and common governance structures is extremely pleasing and what just blows my mind is how this is completely self organized without any kind of edict from above.

The Scientific Python ecosystem is just that, an evolving ecosystem! It is so pleasing watching it evolve to a sustainable track. As Ben Blaiszik said during his keynote, this software is fundamental science infrastructure and while it needs (very much) more financial support from the agencies who’s science it supports (side eye at DOE) it is now in a place where any funds it (the ecosystem) receives will be used for the good of science.

On a technical note some great things I took home were: New, exciting 3D visualization tools, Pangeo forge forges ahead, cool ways to access HRRR as a X-Array like Zarr store from AWS, James Webb space telescope processing runs on SciPy, new ways to manage conda environments for teams and more.

BBQ and storms!

On a professional note, my greatest enjoyment was from seeing the enjoyment of my team three of whom were at their first in-person SciPy. Joe, Max and Bhupendra seemed to completely immerse themselves in the meeting and made new connections. It was also fantastic seeing our ARM collaborators at Brookhaven Lab , Die Wang and Sid Gupta there. This turned into a mini-science meeting as well with new connections made and new work planned. It also is a sign that open science is growing in the programs I love.

On a personal note, it was fun and a little interesting being in Austin during the pandemic. The city’s homeless problem has gotten worse and many businesses are struggling with hiring and some old haunts have gone out of business. I really enjoyed taking advantage of the scooter scheme clocking up 25 miles of low carbon transport.

Great seeing out DOE EESSD funded open science family grow at SciPy.

The news today of NumFocus taking over from Enthought as the organizing entity for SciPy is great news. Enthought has been spectacular and so supportive but having a genuine not for profit will help in many ways. It also opens the opportunity for SciPy not being in Austin. I am genuinely on the fence about this. Whatever the case I hope NumFocus takes a good look at WHY we have these meetings and comes up with some guiding principles. Define what is trying to be achieved, a north star to guide decisions. Then they and the chairs, committee, etc, can keep coming back to those and be forced to justify decisions. I am excited for the future, be it in Austin or elsewhere (note the contract for Austin in 2023 is signed, this does not mean it has to be in Austin but means there is a cost to not having it in Austin).

I’ll finish this blog post by asserting I need to become more engaged in the community. I need to write in folks like NumFocus, Quantsight, 2i2c et al into grant proposals as collaborators as not only are they better positioned to implement workflows I love to use funding them will give back to the tools I love to use. I also need to make more time to contribute code and continue to support my team in contributing to free open community software, critical international science infrastructure.

SciPy 2022. Kid In a Candy Store.

Short update! I am SciPy bound. My first in person conference since, well, the world stopped. So the pandemic is by no means over and there is some controversy (which I will not go into but you can GTS yourself) but that has not dampened by excitement.

My first SciPy. Red pill all the way.

One super exciting thing is three members of my team, Bhupendra, Max and Joe, are heading to their first SciPy.. I remember my first SciPy. It was like a scene from the matrix where I took the red pill and my world changed forever. I have been in “science” for two decades plus and I have never found a community like the Scientific Python community. The smartest and kindest people I have ever met. Genuine and passionate.

Great day for traveling.

I am excited to re-meet many I have met (please please forgive my memory for names, the pandemic has frazzled the skills I had, already meagre, in that area) meet new people and just learn a lot! I remember clearly in 2012, my first SciPy attending and about to give a talk and wondering what to use to format code (at SciPy they show a LOT of code, it is amazing). And I heard about this cool tool called an iPython Notebook. Yeah, before Jupyter.

And that is the amazing thing about SciPy. You are, as Hamilton would say, in the room where it happens. In the very least you are in the hallway outside the room and are the first to know about what happened and use the tools of said happening. Bring It On!