HW: Project Data Exploration

Read your data and perform initial exploration

Learning Objectives

By completing this assignment, you will:

Create a Quarto document (index.qmd) in your project’s docs folder
Read your dataset into R using appropriate functions
Perform initial data exploration and tidying
Document your process with regular Git commits
Create GitHub issues to track questions and challenges

Assignment Tasks

Version control requirements

Make regular commits throughout your work:

After creating the index.qmd file
After successfully reading in your data
After each major section is completed
Use descriptive commit messages:
- “Add index.qmd to docs folder”
- “Successfully read raw data file”
- “Complete initial data exploration”
- “Document data quality issues”
- “Perform initial data tidying and save processed data”

Create a Quarto document

Open your capstone project in Posit Cloud
Navigate to your docs/ folder in the Files panel
Create a new Quarto document:
- Click File -> New File -> Quarto Document
- Title: “Data Exploration for [Your Project Name]”
- Author: Your name
- Save it as index.qmd in the docs/ folder

Update the YAML header to include:

---
title: "Give your project a title"
author:
  - name: "Your Full Name"
    orcid: "0000-0000-0000-0000"
    email: "your.email@colorado.edu"
    affiliation:
      - name: "University of Colorado Boulder"
        department: "Department of Civil, Environmental and Architectural Engineering"
        city: "Boulder"
        state: "CO"
        country: "USA"
date: today
format: html
editor: visual
---

Set up your document structure

The report must render without errors to HTML format and contain at least four chapters of heading level 1:

# Introduction

[Brief description of your project and dataset]

# Methods

## Reading the Data

## Data Exploration Approach

## Initial Data Tidying

# Results

[This will be the core of your analysis with specific requirements]

# Conclusions

## Summary of Findings

## Questions and Next Steps

Read in your data

In the “Methods” chapter under “Reading the Data”, create a code chunk that:
- Loads necessary packages (at minimum tidyverse)
- Reads your data from the data/raw/ folder
- Stores it in an appropriately named object

Example structure:

# Load packages
library(tidyverse)

# Read data
my_data <- read_csv(here::here("data/raw/your_file.csv"))

If you encounter any issues reading the data, document them and potential solutions

Explore your data

In the “Results” chapter under “Initial Data Exploration”, add code chunks to:

View the first few rows: Use head() or glimpse()
Check dimensions: Use dim() to see number of rows and columns
Summarize the data: Use group_by() and summarize() to compute descriptive statistics

Perform initial tidying

In the “Methods” chapter under “Initial Data Tidying”:

Address at least 2-3 data quality issues you identified
Examples of tidying operations:
- Convert character dates to date format
- Standardize inconsistent categories
- Handle obvious data entry errors
- Create new variables if needed
- Use consistent variable naming conventions (e.g. janitor::clean_names() for snake_case convention.)

Save your tidied data:

write_csv(tidied_data, here::here("data/processed/your_file_tidied.csv")

Create GitHub issues

As you work, create GitHub issues in your repository documenting:
- Questions about your data
- Challenges you encountered
- Decisions you need to make
- Next steps for analysis
Use descriptive titles and provide context in the issue description
Tag your instructor (@larnsce) in at least one issue where you need guidance

Tips

If you’re unsure how to read your specific file type, search for examples or ask in an issue
Remember to render your index.qmd to HTML periodically to check your output

Due Date

This assignment is due: 2025-07-04

Submission

Ensure all your changes are committed and pushed to GitHub
Your repository should now contain:
- docs/index.qmd with your data exploration
- docs/index.html (the rendered output)
- Updated data in data/processed/ folder
- GitHub issues documenting questions/challenges