Basics of the R programming language – data wrangling and data visualization
We would like to invite you to join us for the first free, two-day statistical course offered as part of the COORDINATE project. The aim of COORDINATE is to mobilize the community of researchers and organizations who will drive forwards the coordinated development of birth cohort and survey research on children’s wellbeing in Europe.
The aim of the statistical course is to familiarize participants with the basics of the open source programming language R and to teach data wrangling and data visualization using the R language and software packages from its ecosystem. The course is open to academic researchers from PhD students to full professors, as well as policy practitioners and other researchers or analysts working in EU Member States and Associated Countries.
A limited number of participants will be awarded financial support for travel and accommodation. Two lunches and one dinner will be provided for all participants.
The course will be based around data provided by the Generations & Gender Programme. The Generations & Gender Programme Research Infrastructure provides scientists and policy makers with high quality and timely data about families and life course trajectories of individuals to enable researchers to contribute insights and answers to current societal and public policy challenges.
This course teaches the participants the basics of the R programming language, how to use it to prepare data for visualization and analysis, and how to visualize data. The course will be conducted in the free and popular RStudio integrated development environment. During the course, participants will be given tips and tricks that make everyday work in RStudio easier.
The course will be conducted in English. Participants are expected to bring their own laptop with R and RStudio installed. Additional information regarding the required tools, packages and data will be sent to accepted applicants.
Day1: Basics of data wrangling using the R language
This part of the course covers the basics of the R programming language and the basics of what users need to know in order to prepare data for visualization and analysis.
We will begin this part by covering the basics of the R programming language. Participants will be shown the basic principles of R’s functioning (frequently used data types and data structures, syntax), and will be taught how to search, read, and understand R’s documentation.
Later, participants will be taught how to execute data transformations, how to cast data from the wide format to the long format (and vice versa), and how to format the data in a way consistent with the “tidy data” principles and other good data handling practices.
Day2: Data visualization using the R language and the ggplot2 package
This part of the course covers the basic approaches to data visualization using the R programming language, with a special emphasis on longitudinal data. Aside from R’s base plotting capabilities, which will be demonstrated briefly, the course will be using the ggplot2 package from the tidyverse ecosystem.
ggplot2 is based on a concise philosophical approach to data visualization (the grammar of graphics), which provides a flexible and powerful framework for the creation of any imaginable visualization. Participants will be shown the basics of the ggplot2 syntax (geom, aes, stat, facet), and will be encouraged to build their own modular graphs which can easily be upgraded by adding additional layers.
The most frequent types of graphs used for the visualization of continuous (histogram, density plot, boxplot, scatterplot) and categorical (bar plot, surface plot) variables. Participants will also be shown ways to use ggplot2’s syntax in order to display complex relationships between variables by using colors, shapes and sizes to emphasize group membership, as well as using facets to display data for different groups of participants.
We will end the course by exploring the specifics of plotting longitudinal data. Data and summary statistics visualization using line graphs will be explored in detail. Methods for visualizing differences in distributions at different time points will also be demonstrated.
Participants will learn how to use the R programming language through the RStudio interface, and how to access and read the R language documentation. They will learn the basic data structures in R and their properties. Participants will be able to handle tabular data using the R language, and to prepare the data for visualization.
Participants will learn the basic logic of the ggplot2 data visualization package and will be able to create basic plots of continuous and categorical data, as well as summary statistics, using the ggplot2 package.
Preliminary course schedule
Day 1: May 24, 2022
09:00 – 10:00 Introduction: Presentation of the Generations & Gender Programme dataset
10:00 – 13:00 Lecture: Basics of the R language, and data wrangling in R
13:00 – 14:00 Lunch
14:00 – 17:00 Exercises: Basics of the R language, and data wrangling in R
Day 2: May 25, 2022
09:00 – 13:00 Lecture: Data visualization in R
13:00 – 14:00 Lunch
14:00 – 15:30 Exercises: Data visualization in R
About the lecturers
OlgaGrünwald,M.Sc.works as survey data coordinator in the Generations and Gender Programme. In her research, she investigates the link between work and family. Besides that, she is also interested in quantitative research methods and survey methodology.
BlažRebernjak,PhD is an assistant professor at the Department of Psychology at the Faculty of Humanities and Social Sciences of the University of Zagreb.
DenisVlašiček,M.Psych.is a research assistant at the Croatian Social Science Data Archive (CROSSDA) and a doctoral student of Psychology at the Department of Psychology at the Faculty of Humanities and Social Sciences of the University of Zagreb.
To be eligible for participation in this course, applicants have to meet the following criteria:
· The applicant must have a contract or affiliation with a recognized academic or public institution, or a not-for-profit organization registered in an EU member state or in an associated country.
· Researchers from COORDINATE partner institutions who are not involved in the COORDINATE project are eligible to apply. However, as the objective of the project is to widen the network of researchers using child well-being data, only a limited number of places will be awarded to applicants from COORDINATE partner institutions.
· PhD students are eligible to apply, however, masters or bachelors students are not eligible.
To apply, click here. The deadline for the application submission is the 8th of April,23:59CEST.
Priority will be given to early career researchers interested in working with data on child wellbeing.
The applications will be reviewed by a six-member committee, which will score each application based on the information provided.
Applicants will be notified about the decision by 15 April 2022.
If you have any questions, please contact firstname.lastname@example.org