HW: Capstone Project Setup
Set up your capstone project repository
Learning Objectives
By completing this assignment, you will:
- Set up your personal capstone project repository on GitHub
- Clone the repository to Posit Cloud
- Create a proper directory structure for your project with raw and processed data folders
- Upload your dataset to the appropriate folder
- Document your data using a comprehensive README template
Assignment Tasks
Create a new repository on GitHub & clone to Posit Cloud
Open the GitHub Organisation for the course: https://github.com/cven5999-ss25
To the right of the field “Find a repository”, click on the green “New” button.
In the “Repository name” field write
capstone-project-username
. Replace username with your GitHub username. Avoid using spaces. For example:capstone-project-rainbow-train
for the user with the usernamerainbow-train
Make sure the repository is set to Public.
Scroll down on the same page, and click “Create repository”.
In the “Quick setup” field, click on the clipboard next to the HTTPS URL
Open the Posit Cloud workspace for the course: cven5999-ss25
Open the Content page
Click on New Project -> New Project from Git Repository
Paste the HTTPS URL from GitHub into the “URL of your Git Repository” field.
Keep the tick next to Add packages from the base project
Click the OK button
Wait until the project is deployed.
Set up initial directory structure
- In your capstone project on Posit Cloud, create the following directory structure:
data/
- for storing your datasetsdata/raw/
- for original, unprocessed datasetsdata/processed/
- for cleaned and processed datasets
docs/
- for documentation and supplementary materials
- You can create these directories by:
- Using the Files panel in RStudio: Click “New Folder” and enter the folder names
- Or using the Console with:
dir.create("data")
,dir.create("data/raw")
,dir.create("data/processed")
, anddir.create("docs")
Add your dataset
Upload your data file: Add your capstone project dataset to the
data/raw/
folder. Your data must be in CSV or Excel format (.csv, .xlsx, or .xls).Upload methods:
- Option 1: Use the RStudio Files panel - click “Upload” and select your data file from your computer
- Option 2: If your data is available online, download it directly using R commands in the console
File naming: Use descriptive filenames without spaces (use underscores or hyphens instead). For example:
household_survey_2023.csv
orwater-quality-measurements.xlsx
Create a README for your data folder
Navigate to the Files tab in the bottom right window of RStudio.
Open the
data
folder you just created.Click on the “Blank File” button to create a new file.
Select “Text file”.
Enter the name “README.md” and click OK.
Go to: https://raw.githubusercontent.com/rbtl-dev/metadata-readme-template/main/README.md
Copy the content displayed in the browser and paste it into the
README.md
file.Complete the template: Fill out all sections of the README template with information about your specific dataset (if you don’t have the information for some sections, state it):
- Title: Give your dataset a descriptive title
- Description: Explain what the data contains and why it was collected
- Data source: Where did the data come from? Include URLs, citations, or collection details
- Data collection methods: How was the data gathered?
- Variable descriptions: Describe each column/variable in your dataset
- Data files: List and describe each file in your data folder
- Contact information: Your information as the dataset maintainer
Save the file.
Commit and push your changes
In the Git panel (usually in the top right), stage all new files by checking the boxes next to them.
Click “Commit”.
Write a meaningful commit message such as “Initial project setup with directory structure and data README”.
Click “Commit”.
Click “Push” to upload your changes to GitHub.
Final step
Open a Questions & Answers for the Capstone Project report issue on the GitHub issue tracker of your repository and tag the course instructor using the @
sign in combination with larnsce as the username.
Due Date
This assignment is due: 2025-06-27
Submission
No separate submission is required. Your instructor will check your GitHub repository to verify completion of the setup tasks.