HW: Capstone Project Setup

Set up your capstone project repository

Learning Objectives

By completing this assignment, you will:

  1. Set up your personal capstone project repository on GitHub
  2. Clone the repository to Posit Cloud
  3. Create a proper directory structure for your project with raw and processed data folders
  4. Upload your dataset to the appropriate folder
  5. Document your data using a comprehensive README template

Assignment Tasks

Create a new repository on GitHub & clone to Posit Cloud

  1. Open the GitHub Organisation for the course: https://github.com/cven5999-ss25

  2. To the right of the field “Find a repository”, click on the green “New” button.

  3. In the “Repository name” field write capstone-project-username. Replace username with your GitHub username. Avoid using spaces. For example: capstone-project-rainbow-train for the user with the username rainbow-train

  4. Make sure the repository is set to Public.

  5. Scroll down on the same page, and click “Create repository”.

  6. In the “Quick setup” field, click on the clipboard next to the HTTPS URL

  7. Open the Posit Cloud workspace for the course: cven5999-ss25

  8. Open the Content page

  9. Click on New Project -> New Project from Git Repository

  10. Paste the HTTPS URL from GitHub into the “URL of your Git Repository” field.

  11. Keep the tick next to Add packages from the base project

  12. Click the OK button

  13. Wait until the project is deployed.

Set up initial directory structure

  1. In your capstone project on Posit Cloud, create the following directory structure:
    • data/ - for storing your datasets
      • data/raw/ - for original, unprocessed datasets
      • data/processed/ - for cleaned and processed datasets
    • docs/ - for documentation and supplementary materials
  2. You can create these directories by:
    • Using the Files panel in RStudio: Click “New Folder” and enter the folder names
    • Or using the Console with: dir.create("data"), dir.create("data/raw"), dir.create("data/processed"), and dir.create("docs")

Add your dataset

  1. Upload your data file: Add your capstone project dataset to the data/raw/ folder. Your data must be in CSV or Excel format (.csv, .xlsx, or .xls).

  2. Upload methods:

    • Option 1: Use the RStudio Files panel - click “Upload” and select your data file from your computer
    • Option 2: If your data is available online, download it directly using R commands in the console
  3. File naming: Use descriptive filenames without spaces (use underscores or hyphens instead). For example: household_survey_2023.csv or water-quality-measurements.xlsx

Create a README for your data folder

  1. Navigate to the Files tab in the bottom right window of RStudio.

  2. Open the data folder you just created.

  3. Click on the “Blank File” button to create a new file.

  4. Select “Text file”.

  5. Enter the name “README.md” and click OK.

  6. Go to: https://raw.githubusercontent.com/rbtl-dev/metadata-readme-template/main/README.md

  7. Copy the content displayed in the browser and paste it into the README.md file.

  8. Complete the template: Fill out all sections of the README template with information about your specific dataset (if you don’t have the information for some sections, state it):

    • Title: Give your dataset a descriptive title
    • Description: Explain what the data contains and why it was collected
    • Data source: Where did the data come from? Include URLs, citations, or collection details
    • Data collection methods: How was the data gathered?
    • Variable descriptions: Describe each column/variable in your dataset
    • Data files: List and describe each file in your data folder
    • Contact information: Your information as the dataset maintainer
  9. Save the file.

Commit and push your changes

  1. In the Git panel (usually in the top right), stage all new files by checking the boxes next to them.

  2. Click “Commit”.

  3. Write a meaningful commit message such as “Initial project setup with directory structure and data README”.

  4. Click “Commit”.

  5. Click “Push” to upload your changes to GitHub.

Final step

Open a Questions & Answers for the Capstone Project report issue on the GitHub issue tracker of your repository and tag the course instructor using the @ sign in combination with larnsce as the username.

Due Date

This assignment is due: 2025-06-27

Submission

No separate submission is required. Your instructor will check your GitHub repository to verify completion of the setup tasks.