- This event has passed.
Mastering the Tidyverse
17/10/2018 @ 9:00 am - 5:00 pm
The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. This training course covers key aspects of the tidyverse, including dplyr, lubridate, tidyr and tibbles.
The core tidyverse data structure is a tibble; this is a modern take on the data frame.
The course commences with a brief introduction to this structure.
- What are they?
- How do they differ from data frames?
- Convert a data frame to a tibble and back
- Why tibbles – motivation for the other tidyverse packages and sets the theme for the day
dplyr: the workhorse of the tidyverse
Before the first coffee break, we’ll tackle the dplyr package. This package forms the foundation of the tidyverse by providing a standardised data manipulation grammar.
- What is dplyr?
- The grammar of tidyverse functions
- filter(), summarise() – it may be that a review of boolean algebra is necessary at this point for subsetting
- The pipe operator %>% and chaining functions into a workflow
- Some other useful dplyr functions group_by()
Your data should be tidy. An obvious statement, except what do we mean by tidy? This section will elucidate what we mean by tidy data and how to make it part of our workflow.
- Tidy data
- What is tidy data?
- Using tidyr
- spread() and gather() for reshaping data
- seperate() and unite() for splitting data into one column or the reverse
- dealing with missing values
- Joins for dealing with data split across multiple data frames
In order to manipulate data, we need to be able to load data into R. We’ll cover the key packages and provide advice as required.
- Data storage: practical advice for managing data
- Tidyverse packages
- readr and readxl for dealing with .csv and .xls/.xlsx files
- Database connections
- Non-tidyverse packages
- Not all data sets can be loaded using tidyverse packages
- foreign package for reading data from other statistical systems (SAS, SPSS, Minitab)
We’ll finish the day by looking at common difficulties that may crop up in a data scientist’s day
- Dates/times with the lubridate package
- String manipulation with stringr
- First steps in regular expressions
If you are interested in this course you can also contact us directly for information about how you can study at our offices for a lower price.
For more information please go to our website at Jumpingrivers.com