A master-level course, part of Statistics and Data Science master, Leiden University.
- Szymon M. Kiełbasa [LUMC/BDS], coordinator,
[email protected] - Ramin Monajemi [LUMC/BDS]
- Serena Della Corte
Elementary statistical skills and elements of linear algebra.
The course offers a practical introduction to a few programming languages and tools currently used in data science:
- Python is a general-purpose, high-level and easy to learn programming language. It provides a large number of data science libraries (e.g. machine learning, neural networks, data manipulation, data visualization).
- SQL is a standard language used to create, query, update and manage relational databases. For example, such databases are used to store large tables with results of experiments.
- Git is a tool that allows to track changes in files during development of programs. It is the current standard for collaborative code development.
During the course you will develop Python programs of growing complexity.
You will use state-of-the-art Python-specific data manipulation/visualization (e.g. pandas, Matplotlib) data science libraries. You will apply several standard machine learning methods.
After the course you will be able to program simple reproducible data analyses (consisting of data reading, cleaning, simple modelling, and reporting steps).
You will also learn about fundamentals of the relational databases and of the SQL language, and you will practice this knowledge on an example database (SQLite).
First, you will work alone and practice code development. You will submit your assignments through GitHub.
Later, shared code development will be practiced in groups. The members of the group will be requested to use git to track changes in their code and to share their code with other students through GitHub.
During the course you will practice writing Python and SQL code. After the course you will be able to:
- ✍️ Create Python code using collections (
list,tuple,set,dict), flow control statements (if,for,while, exceptions), context managers (with). - ✍️ Develop user functions.
- ✍️ Use Python classes (instance variables, methods, inheritance).
- ✍️ Combine functions from the Python standard libraries (reading/writing files in different formats;
math,statistics,random) into own code. - ✍️ Analyse example data with common data science libraries (NumPy, pandas, Matplotlib).
- Understand relational databases and apply the SQL language to create, query, and update a relational database:
- ✍️ Understand ideas behind relational databases and elementary SQL.
- 🚫 Use SQL to create, query, update a database.
- ✍️ Practice Python programming through running several machine learning algorithms.
- Practice individual and collaborative code development by using git and GitHub:
- ✍️ Understand ideas behind project versioning.
- 🚫 Use git and GitHub for individual and collaborative code development.
The primary source for lecture, exam and retake dates/locations is Essentials for Data Science course 4433ESSDSY schedule at https://rooster.universiteitleiden.nl/. The dates given below are manually copied and may contain mistakes. Always check the official schedule. The order/content of the future lectures as well as the submission dates of the assignments might be adjusted.
The schedule:
(01)Feb. 2nd, 2026 (Szymon/Ramin+Serena):- General course introduction
- Python notebooks
- Python basic
- Python lists and tuples
- Memory organization
- Git/GitHub: introduction
(02)Feb. 16th (Szymon/?):- Python sets and dictionaries
- Git/GitHub: practice
- 📙 Assignment A (not graded): start
(03)Feb. 23rd (Szymon/?):(04)Mar. 02nd (Szymon/?):- Python object oriented programming
- 📙 Assignment A: discussion of solutions
- 📗 Assignment B (not graded): start
(05)Mar. 09th (Szymon/?):(06)Mar. 16th (Ramin/Szymon):- Data manipulation: NumPy
- 📗 Assignment B: discussion of solutions
(07)Mar. 23rd (Ramin/?):- Data manipulation: pandas
- 📘 Assignment C (graded): start
(08)Mar. 30th (Ramin/?):- Data visualisation
- 📚 Group Assignment: create groups
(09)Apr. 13th (Szymon/?):- Relational databases:
- SQL language:
- Downloading and connecting to the example database
- Querying and selecting data (
SELECT,LIMIT,AS,ORDER,DISTINCT,WHERE,IN,BETWEEN,LIKE) [Exercises] - Grouping and summarising (
GROUP BY,HAVING,COUNT,SUM,AVG,MIN,MAX,GROUP_CONCAT) [Exercises]
- 📘 Assignment C: deadline (end-of-day)
- 📚 Group Assignment (graded): start
(10)Apr. 20th (Szymon/?):- Relational databases:
- SQL language:
- Modification statements (
UPDATE,INSERT,DELETE) [Exercises] - Data definition language (
CREATE TABLE,DROP TABLE) - Joining tables 1 (
INNER JOIN,LEFT JOIN,CREATE TEMP TABLE) [Exercises] - Joining tables 2 (
UNION,EXCEPT,INTERSECT, self joins,CROSS JOIN, subqueries,EXIST) [Exercises]
- Modification statements (
(11)May 04th (Szymon/Ramin):- Git branching and merging
- 📘 Assignment C: grades and feedback
(12)May 11th (Serena/?):- Machine Learning with sklearn [Exercises]
- 📚 Group Assignment: deadline (end-of-day)
(13)May 18th (Serena/Szymon):- Exam information, Final Q&A
- Deep learning with Keras [Exercises]
(--)June 01st:- 🏢 Exam
(--)June 29th:- 🏢 Retake
- Extra materials (in case of interest):
Components of the final grade:
- Assignment C (weight 0.1 in the final grade):
- The grade range is 1-10 for submissions before the deadline. The grade range is 1-7 for submissions after the deadline but before the feedback moment. No submissions will be accepted later (then, the grade is 1).
- Group Assignment (weight 0.2 in the final grade):
- The grade range is 1-10 for submissions before the deadline. The grade range is 1-7 for submissions after the deadline but before the exam day. No submissions will be accepted later (then, the grade is 1).
- To pass the course, the group assignment rounded grade must be greater than 5.5.
- Exam/Retake (weight 0.7 in the final grade):
- The exam consists of two parts: a pen-and-paper quiz and a programming part. Usage of AI tools is prohibited during the pen-and-paper quiz part.
- The grade range is 1-10.
- To pass the course, the exam/retake grade must be greater than 5.5.
- The exam will cover the course objectives marked above with ✍️, and will not cover the ones with 🚫 (these objectives are evaluated in the assignments).
The final grade:
- The final grade is calculated as a weighted mean of the grade components.
- To pass the course, the final grade needs to be greater or equal 6.0.
- The final grade is rounded to the nearest half integer.
For the course you will need to bring a laptop with properly installed Python and a development environment.
Install:
- Microsoft Visual Code: A free source-code editor made by Microsoft for Windows, Linux and MacOS. Follow the instructions at https://code.visualstudio.com/. Run the editor and install extensions for Python development (possibly, you will not need to install Python and pip separately).
You may additionally need to install:
- Python (version >= 3.9.?, optimally >= 3.12.?): Follow the download instructions at https://www.python.org/.
- pip: The Python Package Installer. It should already be installed during Python installation. If that is not the case, follow https://pip.pypa.io/en/stable/installation/.
- git: Free and open source distributed version control system. Follow the Downloads instructions provided at https://git-scm.com/. Visual Code extensions for git are recommended.