Welcome! & Data Science Life-cycle

CVEN 5999 - Summer 2025

Lars Schöbitz

Welcome! 👋

Meet the lecturer

Lars Schöbitz (he/him)

Headshot of Lars Schöbitz

Learning Goals (for the course)

  1. Be able to use a common set of data science tools (R, RStudio IDE, Git, GitHub, tidyverse, Quarto) to illustrate and communicate the utility of solutions for water, sanitation, air quality, and global health.

  2. Learn to use the Quarto file format and the RStudio IDE visual editing mode to produce scholarly documents with citations, footnotes, cross-references, figures, and tables.

Why are you here?

Pick an item

Take notes for 2 minutes:
What does the item you have picked have to do with the reason for you being here?

Why are you here?

In break-out rooms

Take 2 minutes each to share with your room partner:
What does the item you have picked have to do with the reason for you being here?

From which country are you joining us?

In the Zoom chat

Share with us from which country you are joining us.

Learning Objectives (for this week)

  1. Learners can navigate the platforms (Posit Cloud, GitHub, Course Website) that are used to for the course.
  2. Learners can render a Quarto file to an output file in HTML, PDF and DOCX format.
  3. Learners can list the six elements of the data science lifecycle.
  4. Learners can identify four components of a Quarto file (YAML, code chunk, R code, markdown).

Classroom tools

Live Coding Exercises

  • Instructor writes and narrates code out loud
  • Intstructor explains elements and principles that are relevant
  • Code is displayed on second screen / split screen
  • Learners join by writing and executing the same code
  • Learners “code-along” with the instructor

Pair Programming Exercises

  • Two learners work together in a break out session
  • One person (the driver) shares the screen and does the typing
  • The other person (the navigator) offers comments and suggestions
  • Roles get switched

Platforms and Tools

  • R
  • Posit Cloud
  • RStudio IDE
  • tidyverse R Packages
  • Quarto publishing system

cven5999-ss25.github.io/website/ 🔖

Posit Cloud

-

-

-

-

-

-

-

Screen setup

Who uses a setup with one screen?

“One screen” in the Zoom Chat

Screen setup

Who uses a setup with two screens?

“Two screens” in the Zoom Chat

Email from GitHub?

Please accept the invitation to the GitHub organisation for the course

Live Coding Exercise

git-configuration

Follow along on the screen

  1. Open the GitHub organisation for the course: https://github.com/cven5999-ss25
  2. You will find a repository titled: wk-02-USERNAME (with your GitHub Username)
  3. You will “clone” this repository to Posit Cloud

Break

GitHub PAT from week 1

Do you have your GitHub Personal Access Token readily accessible?

10:00

Version Control

Version Control with Git and GitHub

A way to share files with others, so they can:

  • download
  • re-use
  • contribute

You can view the history of files, and jump back in time to any point.

Why is it useful?

Git and GitHub

  • Git is a software for version control
  • Released in 2005
  • Popular among programmers collaboratively developing code
  • Tracks changes in a set of files (directory/folder/repository)

  • GitHub is a hosting platform for version control using Git
  • Launched in 2008, aquired by Microsoft in in 2018, Microsoft for US$ 7.5 billion
  • 73 million Users (February, 2022)
  • Facebook for Software Developers

Version Control - Terminology

Data Science Lifecycle

Deep End

via GIPHY

-

-

-

-

-

-

-

Live Coding Exercise

live-data-science-lifecycle

  1. Head over to posit.cloud
  2. Open the workspace for the course cven5999-ss25
  3. Open “Projects”
  4. Open the “wk-02-USERNAME” project
  5. Follow along with me

Break

05:00

R

Packages

base R

sqrt(49)
sum(1, 2)
  • Functions come with R

R Packages

library(dplyr)
  • Installed once in the Console: install.packages("dplyr")
  • Loaded per script

Functions & Arguments

library(dplyr)

filter(.data = gapminder, 
       year == 2007)
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data

Objects

library(dplyr)

gapminder_yr_2007 <- filter(.data = gapminder, 
                            year == 2007)
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data
  • Object: gapminder_yr_2007

Operators

library(dplyr)

gapminder_yr_2007 <- gapminder |> 
  filter(year == 2007) 
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data
  • Object: gapminder_yr_2007
  • Assignment operator: <-
  • Pipe operator: |>

Rules

Rules of dplyr functions:

  • First argument is always a data frame
  • Subsequent arguments say what to do with that data frame
  • Always return a data frame
  • Don’t modify in place

Course information

Weekly Structure

Monday Lecture
Tuesday
Wednesday
Thursday Feedback (grading) on assignments from previous week
Friday Homework assignment and learning reflection are due

Homework assignments

  • Weekly programming assignments
  • Graded as pass/fail (100 pts)
  • Submitted as rendered Quarto documents on GitHub
  • weighted at 40% of the total grade

Learning reflections

  • Reflections on the different class elements (lecture, homework assignment, readings)
  • Graded as pass/fail (100 pts)
  • minimum 100 words
  • Submitted as rendered Quarto documents on GitHub
  • weighted at 20% of the total grade

Capstone Project

  • Data analysis project report with a data set of your choice
  • Graded as number of points out of 100 pts for pre-defined graded elements
  • Submitted as rendered Quarto document on GitHub
  • weighted at 40% of the total grade

Grading

Conversion from percent to grades.
grade percent
A+ 97
A 93
A- 90
B+ 87
B 83
B- 80
C+ 77
C 73
C- 70
D+ 67
D 63
D- 60
F 0

Late work policy

  • due dates are set and all work is due on the stated date
  • work not submitted by the due date will receive 0 pts
  • the lowest score for each of the assignments or learning reflections is dropped

Homework week 2

Homework due dates

  • All material on course website
  • Homework assignment & learning reflection due: 2025-06-13

Thanks! 🌻

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

Access slides as PDF on GitHub

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.