From Reproducible Research to Open Science Dissemination: A Computing Platform-Centric Approach
- Room: 80/1-001 - Globe of Science and Innovation - 1st Floor
- Speaker:
- Paul Meijer, Head of Software Engineering, , Paul Meijer, Head of Software Engineering at the Allen Institute for Immunology, brings a unique perspective to the field of software engineering for scientific research, having transitioned from a research background. His PhD research in Cognitive Psychology at the Max Planck Institute for Psycholinguistics and foundational research at the University of California, Santa Cruz, gave him a deep appreciation of the critical need for scientific analysis provenance and governance. At the Allen Institute for Immunology, he leads the Human Immune System Explorer (HISE), a comprehensive scientific computing platform with built-in provenance tracking and reproducibility features. HISE is designed to manage, analyze, and share the vast amounts of data generated in modern immunology research, fostering collaboration and accelerating scientific understanding of the human immune system in health and disease. , Allen Institute for Immunology, https://alleninstitute.org/division/immunology/
This demo presents a scientific computing platform designed for big data life science research that treats the open sharing of reproducible findings as a natural and efficient extension of the research process itself. By embedding a computational reproducibility framework directly within the platform, researchers can proactively capture a complete trace of their analysis, including data, methods, and executable tools, as their investigation unfolds.
Transparency and reproducibility requirements in computationally intensive scientific research demand novel solutions that integrate rigorous research practices with open science dissemination. This presentation presents a scientific computing platform designed for big data life science research that treats the open sharing of reproducible findings as a natural and efficient extension of the research process itself. By embedding a computational reproducibility framework directly within the platform, researchers can proactively capture a complete trace of their analysis, including data, methods, and executable tools, as their investigation unfolds. This approach empowers result verification and re-execution during the study. It also provides the essential components for transparent open science publication of the study findings through the release of the reproducible trace, granting access to all relevant data and the ability to re-run analysis steps within a compute environment mirroring the original infrastructure. Furthermore, this approach to research provenance offers a powerful mechanism for contextual data governance, moving beyond traditional IT-centric metrics to policies informed by the actual use and significance of specific data sets and tools. This enables organizations to create precise policies for data archival and tool discontinuation, which in turn reinforces the platform's long-term sustainability, ensuring its continued existence and impact on scientific discovery.