-
-
Notifications
You must be signed in to change notification settings - Fork 381
GSOC 2026 Proposed Projects
Stan is a probabilistic programming language for statistical inference. The Stan language enables sophisticated statistical modeling using Bayesian inference, allowing for more accurate and interpretable results in complex data scenarios. Interfaces for Python, Julia, R, and the Unix shell make it easy to use Stan in any programming environment, on laptops, clusters, or in the cloud. A rich ecosystem of tools for validation and visualization supports decision-making and communication.
The timeline of the GSoC internships is available at the GSoC website
Students should be familiar with version control using git and GitHub. They should be able to program in R and know the basics of R package development. Familiarity with Bayesian statistics techniques is a must. Each project also lists some specific requirements needed to be able to complete the project. Note that some of these requirements can be learned while writing the proposal and during the community bonding period.
Below is a list of possible topics for your GSoC project.
We are also open to other topics. Contact us by opening an issue here [https://github.com/stan-dev/stan/issues] (we won't accept proposals on topics outside this idea list from people who haven't contacted us before).
Keep in mind that these are only ideas and that some of them can't be solved entirely in a single GSoC project.
When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects. If you'd like to be considered for a 175h project, you must reach out at [https://github.com/stan-dev/stan/issues]. We will not accept applications for shorter projects from people with whom we haven't discussed their time commitments before applying.
Students who work on Stan can expect their skills to grow in:
- Bayesian inference libraries
- Bayesian modelling workflow
- Software development best practices
- Usage of R packages like ggplot2, brms, testthat, etc. (depending on the project)
We recognize and appreciate the potential utility of artificial intelligence (AI) tools in modern software development. However, for all Stan projects carried out under Google Summer of Code, we require full disclosure of any AI tools used to assist with design, implementation, documentation, or debugging.
The primary goal of the program is learning and skill development, and this goal can be slowed, or even undermined, when contributors rely uncritically on large language models (LLMs) instead of engaging deeply with the underlying concepts and code. We therefore expect participants to use AI tools in a responsible, transparent, and reflective manner, ensuring that these tools support rather than replace their own understanding and contributions.
The loo package is one of the most used packages for cross-validation of Bayesian models and is an integral part of the Stan ecosystem. Many of the internal computations in the LOO package can benefit from parallelization. This project focuses on improving the speed and scalability while keeping a simple-to-use interface.
A collection of implemented and tested features enabling parallel execution of functions within the loo package. This includes parallel backends integrated with Loo’s existing API, documentation, and examples demonstrating the new features.
Participants should have experience with R programming, package development, and packages like mirai, future, and futurize. They should be familiar with Bayesian statistics and with cross-validation techniques, including Pareto smoothed importance sampling (PSIS) and k-fold cross-validation. Experience with the loo package is a plus.
- Expected size: 350h
- Difficulty rating: hard
- Potential mentors: Aki Vehtari, Florence Bockting, Noa Kallioinen
The bayesplot package is widely used in the Stan ecosystem for visualizing the results of Bayesian inference. This project would add new visualisations, improve warning messages to guide users towards best practices, and keep the documentation up to date to ensure a low barrier for users and future contributions.
A collection of implemented and tested features that enhance the user experience and enforce best practices when working with the bayesplot package. This includes new visualizations, improved warning messages, updated documentation, and examples demonstrating the new features. The project should also include tests to ensure the reliability and correctness of the implemented features.
Participants should have experience with R programming and package development. They should be familiar with Bayesian statistics and with methods for predictive checking. Experience with the bayesplot package is a plus.
- Expected size: 350h
- Difficulty rating: medium
- Potential mentors: Aki Vehtari, Florence Bockting, Jonah Gabry
The priorsense package implements efficient prior and likelihood sensitivity checks. It provides numerical and graphical diagnostic checks for Stan models. This project would improve the usability and efficiency of priorsense when working with large Stan models (either many variables or many posterior draws).
Modifications to existing functions with improved handling of large posteriors from Stan models. Changes to default outputs provided to users from diagnostic functions to sensibly handle large posteriors. Documentation reflecting the changes.
Participants should have experience with R programming and package development. Experience with Stan models or priorsense is a plus.
- Expected size: 350h
- Difficulty rating: medium
- Potential mentors: Noa Kallioinen, Osvaldo Martin