Streamline Your Data Science Workflow with a Cookiecutter Template and Devcontainers on GPU


Unlock the power of GPU-accelerated data science projects with a customized Cookiecutter template for VSCode.
Introduction
Ever tried running Docker with GPU support and felt like pulling your hair out? Or maybe you set everything up on one machine, tried the same on another, and bamβit just didnβt work? Trust me, Iβve been there. Thatβs why I created Cookiecutter Data Science β a template that makes setting up your data science projects (with GPU power!) ridiculously easy, no matter where you’re working. Say goodbye to the setup blues and hello to a smooth ride!
Ok enough chit-chat, nobody sponsors me anyway :D Letβs dive in!
What Is Cookiecutter Data Science?
Cookiecutter Data Science is a standardized, project template for data science projects. It helps in maintaining a consistent directory structure and coding practices, making collaboration and scaling more manageable.
My customized version builds upon this foundation by integrating:
- Devcontainers with GPU Support: Leverage NVIDIA GPUs directly within your VSCode environment.
- Zsh and Oh-My-Zsh with Agnoster Theme: Enhance your terminal experience for increased productivity.
- Pre-commit Hooks: Automate code quality checks before every commit.
- Python 3.10 and 3.11 Compatibility: Ensuring smooth operation with the latest stable Python versions.
Project Homepage
Find the template and contribute on GitHub.
Before You Start
Requirements
-
Python 3.5+
-
Cookiecutter Package: Install via
pip
orconda
.# Using pip pip install cookiecutter # Using conda conda config --add channels conda-forge conda install cookiecutter
Create a New Project
Run the following command to start a new project using the template:
cookiecutter https://github.com/tomcioslav/cookiecutter-data-science
Setting Up GPU Support
Prerequisites
To utilize GPU support within the devcontainer, you need to have NVIDIA Docker installed.
Installing NVIDIA Docker
Follow these steps to install NVIDIA Docker on Ubuntu:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Update and install nvidia-docker2
sudo apt-get update
sudo apt-get install -y nvidia-docker2
# Restart Docker daemon
sudo systemctl restart docker
Building the Devcontainer
The devcontainer uses nvidia/cuda:11.0.3-base-ubuntu20.04
as the base image. After installing NVIDIA Docker, you can build the devcontainer directly in VSCode.
- Open your project in VSCode.
- When prompted, reopen the project in the devcontainer.
- The container will build automatically, including all specified dependencies.
After Building the Devcontainer
Setting Up the Environment
-
Python Interpreter: The devcontainer comes with Poetry installed for dependency management. To set up the Python interpreter:
- Click on the Python version displayed in the bottom-left corner of VSCode.
- Select the interpreter located in your project’s
.venv
directory (e.g.,/workspaces/your_project/.venv/bin/python
). - If you don’t see the interpreter, try restarting the devcontainer.
-
Adding Packages: Use Poetry to manage packages.
poetry add package_name
Configuring SSH Keys
To enable Git operations via SSH within the devcontainer:
-
Copy Your SSH Key: Place your
id_rsa
file into the devcontainer’s SSH directory.cp ~/.ssh/id_rsa /path_to_devcontainer/.devcontainer/id_rsa
-
Set Permissions:
chmod 600 /root/.ssh/id_rsa
Note: Ensure you’re aware of the security implications of copying private keys into containers.
Project Structure
Here’s what your project directory will look like:
βββ LICENSE
βββ README.md # Project documentation.
βββ data # Data storage.
βββ models # Trained models and outputs.
βββ notebooks # Jupyter notebooks.
βββ src
β βββ your_project # Source code.
β βββ __init__.py
β βββ config.py # Configuration file.
βββ .devcontainer # Devcontainer configuration.
β βββ devcontainer.json
β βββ docker-compose.yml
β βββ Dockerfile
βββ Dockerfile # Base Dockerfile for deployment.
βββ docker-compose.yml # Docker Compose configuration.
- data/: Place your raw and processed data here.
- models/: Store trained models, predictions, and summaries.
- notebooks/: Keep your exploratory and presentation notebooks organized.
- src/: All project-specific source code.
- .devcontainer/: Configuration files for VSCode’s devcontainer feature.
Additional Features
Zsh and Oh-My-Zsh
The template includes Zsh with Oh-My-Zsh and the Agnoster theme for an improved terminal experience.
- Auto-suggestions and Syntax Highlighting: Enabled for better command-line productivity.
- Customizable Prompt: Provides useful information at a glance.
Pre-commit Hooks
Maintain code quality with pre-configured pre-commit hooks using pre-commit.
- Automate Code Checks: Automatically format code and check for issues before every commit.
- Easy Setup: Hooks are installed when you run the
init-git
Makefile target.
Conclusion
This customized Cookiecutter template streamlines the setup of data science projects, allowing you to focus on analysis and modeling rather than environment configuration. By integrating devcontainers with GPU support, pre-configured tools, and enhanced terminal features, you can significantly boost your productivity.
We welcome contributions and feedback. Feel free to open issues or pull requests on the GitHub repository.
Additional Resources
- Cookiecutter Data Science: Original Template
- VSCode Devcontainers: Documentation
- Pre-commit: Official Website
- Oh-My-Zsh: Official Website
Boost your data science workflow with this powerful and customizable template. Happy coding!