Education Research Current About VU Amsterdam NL
Login as
Prospective student Student Employee
Bachelor Master VU for Professionals
Exchange programme VU Amsterdam Summer School Honours programme VU-NT2 Semester in Amsterdam
PhD at VU Amsterdam Research highlights Prizes and distinctions
Research institutes Our scientists Research Impact Support Portal Creating impact
News Events calendar Biodiversity at VU Amsterdam
Israël and Palestinian regions Culture on campus
Practical matters Mission and core values Entrepreneurship on VU Campus
Organisation Partnerships Alumni University Library Working at VU Amsterdam
Sorry! De informatie die je zoekt, is enkel beschikbaar in het Engels.
This programme is saved in My Study Choice.
Something went wrong with processing the request.
Something went wrong with processing the request.

Intermediate Data Analysis with R and the Tidyverse

Intermediate Data Analysis with R and the Tidyverse

Boost your R skills and analyse real data with all the Tidyverse tools for cleaning, transforming, visualizing, and modelling.

This course provides a comprehensive introduction to the Tidyverse, a powerful ecosystem of R packages designed for efficient and intuitive data analysis. Students will learn how to import, clean, transform, visualize, and model data using modern R tools.

In data science, raw data is often messy, requiring careful cleaning and transformation before meaningful analysis can take place. The Tidyverse simplifies these tasks by providing a consistent and user-friendly set of tools for data manipulation, visualization, and modeling. Mastering the Tidyverse allows analysts and researchers to work more efficiently, ensuring data is well-structured and insights are clearly communicated.

The course is divided into four key sections, each focusing on a critical phase of the data analysis pipeline: 

(1) Data Import & Cleaning (Tidying the Data): Data comes in various formats: CSV files, Excel spreadsheets, databases, and even raw text files. Before any meaningful analysis can take place, data must be imported, structured, and cleaned. This section introduces the tools and best practices for handling messy data.  

Key topics include:

  • Importing data using the readr, readxl, and haven packages, ensuring seamless integration with different file types (e.g., CSV, Excel, JSON, SPSS, Stata). 
  • Understanding the concept of tidy data, a structured format that simplifies analysis and makes transformations more intuitive. 
  • Reshaping data using the tidyr package, including pivoting between long and wide formats, separating and uniting columns, and handling hierarchical data structures. 
  • Dealing with missing values, duplicate entries, and inconsistencies, ensuring a clean and reliable dataset for analysis.  

(2) Data Transformation (Joining & Manipulating Data): Once the data is clean, the next step is data transformation: filtering, summarizing, and modifying datasets to extract meaningful insights. This section focuses on dplyr, one of the most powerful packages in the Tidyverse, which provides a grammar for working with structured data efficiently.  

Key topics include:

  • Filtering and selecting data based on conditions using functions like filter(), select(), and arrange().
  • Summarizing and aggregating data with group_by() and summarize(), allowing for deeper insights into trends and patterns.
  • Joining multiple datasets using different types of joins (inner, left, right, full), enabling the integration of information from multiple sources.
  • Creating new variables dynamically using mutate() and case_when(), which help derive additional insights from existing data.
  • Functional programming with purrr, which allows for efficient iteration and manipulation of lists and nested data structures.

(3) Data Visualization: After cleaning and transforming the data, the next step is to create effective visualizations that communicate findings clearly. This section introduces ggplot2, the most widely used R package for data visualization, enabling students to craft informative and aesthetically pleasing plots.

Key topics include:

  • The Grammar of Graphics: understanding how ggplot2 structures visualizations and why this approach is so powerful.
  • Creating basic plots, including scatter plots, line charts, histograms, and bar charts.
  • Customizing aesthetics, such as color schemes, themes, labels, legends, and annotations to make plots more informative.
  • Faceting and grouping data, allowing comparisons across different categories or time periods.
  • Combining multiple plots into dashboards or complex visualizations.
  • Best practices in data visualization, ensuring clarity, accuracy, and impact in communicating insights.

(4) Data Modeling: The final step in the data analysis workflow is modeling, where data is used to make predictions or uncover hidden relationships. This section introduces the tidymodels framework, which provides a consistent and streamlined approach to building machine learning models in R.

Key topics include:

  • Introduction to machine learning models: understanding different types of models (e.g., regression, classification) and their applications.
  • Building predictive models using tidymodels, a modern framework for machine learning in R.
  • Evaluating model performance with accuracy, precision-recall, ROC curves, and cross-validation.
  • Feature engineering and selection, refining models for better accuracy and interpretability.
  • Interpreting model results and integrating them into the broader data analysis pipeline.

Continue reading below for additional course information.

Fill in the application form

Deadline: 8 December 2025 (23:59 CET)

Stay up to date
Winter courses

Andrea Bassi

Andrea Bassi

Andrea Bassi holds a MSc in Engineering Mathematics (Polytechnic University of Milan), with a focus on Applied Statistics. After having worked in Italy as a statistical consultant, he started his PhD training in Biostatistics at the Vrije Universiteit Medical Center, on the BIOMARKER project.

He is currently Senior Data Scientist and R consultant.

"Students should apply for Data analysis in R to discover the enormous potential of the open-source programming language R and for acquiring a series of skills and tools to analyse statistical problems of diverse nature."

Additional course information

  • Learning objectives

    By the end of this online course, students will be able to:

    • Demonstrate the ability to import, clean, and manipulate real-world datasets by applying techniques such as filtering, aggregation, and KPI computation. 
    • Interpret and present data insights using appropriate visualization tools and communication strategies. 
    • Apply and assess machine learning models to solve practical data problems. 
    • Implement and optimize automated ETL workflows through functional programming and scripting. 
  • Forms of tuition and assessment

    The course follows a hands-on, project-based approach. There will be daily morning courses which includes 2.5hrs of theory (slides) and 1.5hrs of practical sessions. On Tuesday and Thursday, an additional 2-hour afternoon session will take place to focus mostly on practical exercises or discussions/follow-ups.

    Students will work on real-world datasets and progressively apply what they learn through guided coding challenges and projects. Collaborative learning and interactive discussions will be encouraged, ensuring a solid grasp of key data analysis techniques.

    There will be an individual assessment that must be completed in the week after the course. The students will analyse a dataset from an external source and implement solutions in R. A short textual summary must be provided together with the code.

  • Entry requirements

    Even though students from different backgrounds are welcome, at least an undergraduate course in statistics is required to guarantee the necessary knowledge to follow the course. 

    See the entry requirements

  • Course syllabus

We are here to help!

Feel free to contact us anytime.

Contact

  • Maya Allister
  • Programme Manager

Quick links

Homepage Culture on campus VU Sports Centre Dashboard

Study

Academic calendar Study guide Timetable Canvas

Featured

VUfonds VU Magazine Ad Valvas Digital accessibility

About VU

Contact us Working at VU Amsterdam Faculties Divisions
Privacy Disclaimer Safety Web Colophon Cookie Settings Web Archive

Copyright © 2025 - Vrije Universiteit Amsterdam