I would like to acknowledge past and current colleagues who have enriched my knowledge of statistical software engineering


Pictured : Alma Mater, University of Zürich, (2023)

What am I currently working on ?

I am working on an early drug development software for study statisticians and early development teams to evaluate probability of success within the Bayesian framework. The context of this work is in the oncology therapeutic area. This work will be presented in various talks in 2024. I am passionate about good engineering practices and am bringing this package, written in R, to the state of art. I have a checklist that was requested to be shared from the useR!2024 presentation “Software Engineering: A Statistician’s Journey” which will be pubsliehd here soon. Here is the open source repository of phase1b where I am the Lead developer of. What I enjoy about statistical software development is to create mathematically elegant solutions such that the solution is fit for purpose, and the facility is computationally robust. It ties my interest in probability theory and computer science together, along with my personal value of creating accessible infrastructures.

What is a statistical software ?

Statistical software is a computing facility that provides a practical solution to statistical analyses. It can be written in one or more computing languages such as R, C++, Python, Julia or others. Usually the choice of language depends on the purpose of the statistical analyses, target user and performance.

Why would anybody be interested in statistical software ?

Statistical software have existed for decades. Good software produces results efficiently and reproducibly, both can attend to the scientific question and good scientific practices.

What is state of art engineering for statistical software ?

Mary Shaw, (2002) : “Acceptance of theirresults relies on the process of obtaining the results as well as analysis of the results themselves.”

A good software tends to the principles of good scientific practice and good engineering practices. This is not without a rigorous practice of testing code, reviewing code, writing clear and clean code and addressing the precise statistical question.

The rationale for good statistical engineering practices is first and foremost deliver a fit for purpose and robust facility that produces reproducible results. Clean and readable code that is properly formatted and styled allows more efficient reviewing, reading and opens more doors for collaboration. These practices contribute to the ultimate goal of the best product available for a statistical analysis need.

What are examples of good statistical software ?

Statistical software that have been widely accepted are generally published in CRAN, see pic below (The Comprehensive R Archive Network). Currently, there are over 20’000 packages on CRAN.

What is CRAN ?

CRAN hosts a central hub for information about R installation, R packages and documentation. “CRAN operations, most importantly hosting, checking, distributing, and archiving of R add-on packages for various platforms, crucially rely on technical, emotional, and financial support by the R community.” See more here.

References :

Shaw M (2002) What makes good research in software engineering? International Journal of Software Tools for Technology Transfer, 2002, vol. 4, no. 1, pp. 1-7.

Got feedback ? Email me here