I would like to acknowledge past and current colleagues who have enriched my knowledge of statistical software engineering
Pictured : Alma Mater, University of Zürich, (2023)
I am working on an early drug development software for study
statisticians and early development teams to evaluate probability of
success within the Bayesian framework. The context of this work is in
the oncology therapeutic area. This work will be presented in various
talks in 2024. I am passionate about good engineering practices and am
bringing this package, written in R, to the state of art. I have a
checklist that was requested to be shared from the useR!2024
presentation “Software Engineering: A Statistician’s Journey” which will
be pubsliehd here soon. Here is the open source repository of phase1b
where I am the Lead developer of. What I enjoy about statistical
software development is to create mathematically elegant solutions such
that the solution is fit for purpose, and the facility is
computationally robust. It ties my interest in probability theory and
computer science together, along with my personal value of creating
accessible infrastructures.
Statistical software is a computing facility that provides a practical solution to statistical analyses. It can be written in one or more computing languages such as R, C++, Python, Julia or others. Usually the choice of language depends on the purpose of the statistical analyses, target user and performance.
Statistical software have existed for decades. Good software produces results efficiently and reproducibly, both can attend to the scientific question and good scientific practices.
Mary Shaw, (2002) : “Acceptance of theirresults relies on the process of obtaining the results as well as analysis of the results themselves.”
A good software tends to the principles of good scientific practice and good engineering practices. This is not without a rigorous practice of testing code, reviewing code, writing clear and clean code and addressing the precise statistical question.
The rationale for good statistical engineering practices is first and foremost deliver a fit for purpose and robust facility that produces reproducible results. Clean and readable code that is properly formatted and styled allows more efficient reviewing, reading and opens more doors for collaboration. These practices contribute to the ultimate goal of the best product available for a statistical analysis need.
Statistical software that have been widely accepted are generally published in CRAN, see pic below (The Comprehensive R Archive Network). Currently, there are over 20’000 packages on CRAN.
CRAN hosts a central hub for information about R installation, R packages and documentation. “CRAN operations, most importantly hosting, checking, distributing, and archiving of R add-on packages for various platforms, crucially rely on technical, emotional, and financial support by the R community.” See more here.
References :
Got feedback ? Email me here