Easydata

A python framework and git template for data scientists, teams, and workshop organizers aimed at making your data science work reproducible and shareable

For most of us, data science is 5% science, 60% data cleaning, and 35% IT hell. Easydata focuses the 95% by helping you deliver

  • reproducible python environments,
  • reproducible datasets, and
  • reproducible workflows

In other words, Easydata is a template, library, and workflow that lets you get up and running with your data science analysis, quickly and reproducibly.

What is Easydata?

Easydata is a python cookiecutter for building custom data science git repos that provides:

  • An opinionated workflow for collaboration, storytelling,
  • A python framework to support this workflow,
  • A makefile wrapper for conda and pip environment management,
  • A catalog of prebuilt dataset recipes, and
  • A library of training materials and documentation around doing reproducible data science.

Easydata is not

  • an ETL tooklit
  • A data analysis pipeline
  • a containerization solution, or
  • a prescribed data format.

Contributing

The Easydata project is opinionated, but not afraid to be wrong. Best practices change, tools evolve, and lessons are learned. The goal of this project is to make it easier to start, structure, and share your data science work. Pull requests and filing issues is encouraged. We'd love to hear what works for you, and what doesn't.

If you use the Cookiecutter Easydata Project, link back to this page.

Easydata started life as an opinionated fork of the cookiecutter-datascience project. Easydata has evolved considerably since then with a specific focus on enabling overall team efficiency by improving collaboration and reproducibility. We owe the cookiecutter-datascience project a great debt for the work they have done in creating a flexible but highly useful project template.

Also, a huge thanks to the Cookiecutter project (github), which is helping us all spend less time thinking about and writing boilerplate and more time getting things done.