Streamline Your Data Science Workflow with a Cookiecutter Template and Devcontainers on GPU

Nov 14, 2024Β·
Tomasz Juszczyszyn
Tomasz Juszczyszyn
Β· 4 min read

Unlock the power of GPU-accelerated data science projects with a customized Cookiecutter template for VSCode.

Introduction

Ever tried running Docker with GPU support and felt like pulling your hair out? Or maybe you set everything up on one machine, tried the same on another, and bamβ€”it just didn’t work? Trust me, I’ve been there. That’s why I created Cookiecutter Data Science β€” a template that makes setting up your data science projects (with GPU power!) ridiculously easy, no matter where you’re working. Say goodbye to the setup blues and hello to a smooth ride!

Ok enough chit-chat, nobody sponsors me anyway :D Let’s dive in!

What Is Cookiecutter Data Science?

Cookiecutter Data Science is a standardized, project template for data science projects. It helps in maintaining a consistent directory structure and coding practices, making collaboration and scaling more manageable.

My customized version builds upon this foundation by integrating:

  • Devcontainers with GPU Support: Leverage NVIDIA GPUs directly within your VSCode environment.
  • Zsh and Oh-My-Zsh with Agnoster Theme: Enhance your terminal experience for increased productivity.
  • Pre-commit Hooks: Automate code quality checks before every commit.
  • Python 3.10 and 3.11 Compatibility: Ensuring smooth operation with the latest stable Python versions.

Project Homepage

Find the template and contribute on GitHub.

Before You Start

Requirements

  • Python 3.5+

  • Cookiecutter Package: Install via pip or conda.

    # Using pip
    pip install cookiecutter
    
    # Using conda
    conda config --add channels conda-forge
    conda install cookiecutter
    

Create a New Project

Run the following command to start a new project using the template:

cookiecutter https://github.com/tomcioslav/cookiecutter-data-science

Setting Up GPU Support

Prerequisites

To utilize GPU support within the devcontainer, you need to have NVIDIA Docker installed.

Installing NVIDIA Docker

Follow these steps to install NVIDIA Docker on Ubuntu:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Update and install nvidia-docker2
sudo apt-get update
sudo apt-get install -y nvidia-docker2

# Restart Docker daemon
sudo systemctl restart docker

Building the Devcontainer

The devcontainer uses nvidia/cuda:11.0.3-base-ubuntu20.04 as the base image. After installing NVIDIA Docker, you can build the devcontainer directly in VSCode.

  1. Open your project in VSCode.
  2. When prompted, reopen the project in the devcontainer.
  3. The container will build automatically, including all specified dependencies.

After Building the Devcontainer

Setting Up the Environment

  • Python Interpreter: The devcontainer comes with Poetry installed for dependency management. To set up the Python interpreter:

    1. Click on the Python version displayed in the bottom-left corner of VSCode.
    2. Select the interpreter located in your project’s .venv directory (e.g., /workspaces/your_project/.venv/bin/python).
    3. If you don’t see the interpreter, try restarting the devcontainer.
  • Adding Packages: Use Poetry to manage packages.

    poetry add package_name
    

Configuring SSH Keys

To enable Git operations via SSH within the devcontainer:

  1. Copy Your SSH Key: Place your id_rsa file into the devcontainer’s SSH directory.

    cp ~/.ssh/id_rsa /path_to_devcontainer/.devcontainer/id_rsa
    
  2. Set Permissions:

    chmod 600 /root/.ssh/id_rsa
    

    Note: Ensure you’re aware of the security implications of copying private keys into containers.

Project Structure

Here’s what your project directory will look like:

β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md              # Project documentation.
β”œβ”€β”€ data                   # Data storage.
β”œβ”€β”€ models                 # Trained models and outputs.
β”œβ”€β”€ notebooks              # Jupyter notebooks.
β”œβ”€β”€ src
β”‚   └── your_project       # Source code.
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── config.py      # Configuration file.
β”œβ”€β”€ .devcontainer          # Devcontainer configuration.
β”‚   β”œβ”€β”€ devcontainer.json
β”‚   β”œβ”€β”€ docker-compose.yml
β”‚   └── Dockerfile
β”œβ”€β”€ Dockerfile             # Base Dockerfile for deployment.
β”œβ”€β”€ docker-compose.yml     # Docker Compose configuration.
  • data/: Place your raw and processed data here.
  • models/: Store trained models, predictions, and summaries.
  • notebooks/: Keep your exploratory and presentation notebooks organized.
  • src/: All project-specific source code.
  • .devcontainer/: Configuration files for VSCode’s devcontainer feature.

Additional Features

Zsh and Oh-My-Zsh

The template includes Zsh with Oh-My-Zsh and the Agnoster theme for an improved terminal experience.

  • Auto-suggestions and Syntax Highlighting: Enabled for better command-line productivity.
  • Customizable Prompt: Provides useful information at a glance.

Pre-commit Hooks

Maintain code quality with pre-configured pre-commit hooks using pre-commit.

  • Automate Code Checks: Automatically format code and check for issues before every commit.
  • Easy Setup: Hooks are installed when you run the init-git Makefile target.

Conclusion

This customized Cookiecutter template streamlines the setup of data science projects, allowing you to focus on analysis and modeling rather than environment configuration. By integrating devcontainers with GPU support, pre-configured tools, and enhanced terminal features, you can significantly boost your productivity.

We welcome contributions and feedback. Feel free to open issues or pull requests on the GitHub repository.

Additional Resources


Boost your data science workflow with this powerful and customizable template. Happy coding!