HW: Project Data Exploration
Read your data and perform initial exploration
Learning Objectives
By completing this assignment, you will:
- Create a Quarto document (
index.qmd) in your project’s docs folder - Read your dataset into R using appropriate functions
- Perform initial data exploration and tidying
- Document your process with regular Git commits
- Create GitHub issues to track questions and challenges
Assignment Tasks
Version control requirements
Make regular commits throughout your work:
- After creating the
index.qmdfile - After successfully reading in your data
- After each major section is completed
- Use descriptive commit messages:
- “Add index.qmd to docs folder”
- “Successfully read raw data file”
- “Complete initial data exploration”
- “Document data quality issues”
- “Perform initial data tidying and save processed data”
Create a Quarto document
Open your capstone project in Posit Cloud
Navigate to your
docs/folder in the Files panelCreate a new Quarto document:
- Click File -> New File -> Quarto Document
- Title: “Data Exploration for [Your Project Name]”
- Author: Your name
- Save it as
index.qmdin thedocs/folder
Update the YAML header to include:
--- title: "Give your project a title" author: - name: "Your Full Name" orcid: "0000-0000-0000-0000" email: "your.email@colorado.edu" affiliation: - name: "University of Colorado Boulder" department: "Department of Civil, Environmental and Architectural Engineering" city: "Boulder" state: "CO" country: "USA" date: today format: html editor: visual ---
Set up your document structure
The report must render without errors to HTML format and contain at least four chapters of heading level 1:
# Introduction
[Brief description of your project and dataset]
# Methods
## Reading the Data
## Data Exploration Approach
## Initial Data Tidying
# Results
[This will be the core of your analysis with specific requirements]
# Conclusions
## Summary of Findings
## Questions and Next StepsRead in your data
In the “Methods” chapter under “Reading the Data”, create a code chunk that:
- Loads necessary packages (at minimum
tidyverse) - Reads your data from the
data/raw/folder - Stores it in an appropriately named object
- Loads necessary packages (at minimum
Example structure:
# Load packages library(tidyverse) # Read data my_data <- read_csv(here::here("data/raw/your_file.csv"))If you encounter any issues reading the data, document them and potential solutions
Explore your data
In the “Results” chapter under “Initial Data Exploration”, add code chunks to:
View the first few rows: Use
head()orglimpse()Check dimensions: Use
dim()to see number of rows and columnsSummarize the data: Use
group_by()andsummarize()to compute descriptive statistics
Perform initial tidying
In the “Methods” chapter under “Initial Data Tidying”:
Address at least 2-3 data quality issues you identified
Examples of tidying operations:
- Convert character dates to date format
- Standardize inconsistent categories
- Handle obvious data entry errors
- Create new variables if needed
- Use consistent variable naming conventions (e.g.
janitor::clean_names()for snake_case convention.)
Save your tidied data:
write_csv(tidied_data, here::here("data/processed/your_file_tidied.csv")
Create GitHub issues
As you work, create GitHub issues in your repository documenting:
- Questions about your data
- Challenges you encountered
- Decisions you need to make
- Next steps for analysis
Use descriptive titles and provide context in the issue description
Tag your instructor (@larnsce) in at least one issue where you need guidance
Tips
- If you’re unsure how to read your specific file type, search for examples or ask in an issue
- Remember to render your
index.qmdto HTML periodically to check your output
Due Date
This assignment is due: 2025-07-04
Submission
Ensure all your changes are committed and pushed to GitHub
Your repository should now contain:
docs/index.qmdwith your data explorationdocs/index.html(the rendered output)- Updated data in
data/processed/folder - GitHub issues documenting questions/challenges