Seeking input: Challenges to sustainability of open source research data tools

We are working on a research project with the goal of identifying systemic challenges to the sustainability of data driven tooling in science and scholarship - and we want your input [now closed].

Data intensive research increasingly depends on open source (OS) software and data tools. These tools meet the needs of data driven researchers across fields better than commercial offerings and are often led by researchers with deep understanding of scientific domains. Open communities build and maintain these tools, and this work is often funded by grants and donations (Mozilla 2018; Eghbal 2016).  While the scientific community’s usage of and participation in OS expands, the broader open source software community is experiencing a sustainability crisis (Eghbal 2016, see also the recent GitHub survey).

We are specifically interested in hearing from folks who work with and/or contribute to OS data tools for research and data science. For this purpose, OS data tools are defined as open source projects that facilitate any part of your data workflow, including data collection, analysis, visualization, sharing, reuse, publication, and/or collaborative data projects. Sustainability is defined as a project's long term capacity to operate stably.

Looking ahead, maintaining an innovative, independent data and research tools ecosystem is key to scientific advancement. Without coordinated effort, the open research tooling community will miss an opportunity to grow and become self sustaining. Revamping models for funding and sustaining open source projects that serve scientific and data driven research communities is timely, given the broader conversations on open source sustainability and cooperative movement within the open research space itself. The Open Source Alliance for Open Scholarship, the US Software Sustainability Institute conceptualization project, Joint Roadmap for Open Science Tools, and NumFOCUS' recent summit and sustainability workshop highlight the conversations on sustainability happening in science, scholarship, and data.

The research community is unique in many ways (structure, economies, participants) - and this community has unique challenges (and opportunities) around sustainability of software tools. Open source projects rooted in the research community (Juypter, Dat, RStudio) have grown into widely used tools across industries. Balancing the needs of a project’s founding community with the demands of growth is a challenge for any open software project. Doing this on a limited budget, while attending to research priorities, and without fundraising, business development, or other core operational expertise on staff adds to the challenge.

Through work with our sponsored projects, CS&S is developing operational and management capacity in our projects and project staff to support them as they grow into sustainable entities. We do this by focusing on identifying common needs (ie: problems faced by multiple projects), filling gaps and upskilling our community in management and operations, building shared solutions to systemic problems, and collaborating with organizations like NumFOCUS. As we continue to work with sponsored projects, we are looking to dig deeper into understanding sustainability challenges in the open research tools to meet these challenges in the broader community.

What's Your Perspective?

Through interviews and research, we aim to develop a deeper understanding of the sector’s strengths and identify areas of need.

Do you build, maintain, and/or use open source data tools? You might work as a contractor, at an academic institution, or a big corporation. We want to hear from you. If you have five minutes, please take this survey [closed]. If you have 15 minutes, we'd love to talk about your experiences in open source research and data-centric projects. Reach out @codeforsociety or email us at [email protected].

Funding

This work is funded a Gordon and Betty Moore Foundation to Code for Science & Society.

The Fine Print

The data generated by this survey will be used by Code for Science & Society and NumFOCUS.  You may take this survey anonymously and no questions are required. This survey will take about 5 minutes. Your responses will help Code for Science & Society and NumFOCUS to develop programs and resources to better support OS data tool sustainability.  Summary data (from multiple choice and check box questions) will be shared openly. Anonymous quotes (no names or project affiliations) may be included in a report, which will be shared openly. If you have additional questions about the survey email [email protected].

References

Eghbal, Nadia. 2016. “Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure.” https://www.fordfoundation.org/media/2976/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure.pdf.

Mozilla. 2018. “Open Source Archetypes: A Framework for Purposeful Open Source.” https://blog.mozilla.org/wp-content/uploads/2018/05/MZOTS_OS_Archetypes_report_ext_scr.pdf.