Welcome to DataEngineeringUnboxed! This repository is a comprehensive resource for data engineers, covering a wide range of topics, tools, and best practices in the field of data engineering. Our goal is to provide practical, hands-on guidance for both beginners and experienced professionals. For more info, you can visit my blog articles here
Bonus tip : You can boost your productivity today, visit DataUnboxed
This repository covers various aspects of data engineering, including but not limited to:
- AWS Services: Glue, ECS, CloudFormation
- Development Environments: Jupyter, VSCode
- Data Formats: Parquet
- Machine Learning: Project setup and best practices
- Infrastructure as Code: CloudFormation templates
- Local Development: Running cloud services locally
Our repository is organized into the following main directories:
/aws
: AWS-specific guides and resources/MLKickstart_repo
: Machine Learning project template and best practices/parquet
: Tutorials on Parquet file format/llm_structured_outputs
: Examples of structured outputs from language models
- Clone this repository
- Explore the directories that interest you most
- Follow the README files in each subdirectory for specific instructions
- Check out the CloudFormation sample templates in the
aws/ecs_global infrastructure/
directory for ECS cluster and service setups.
- Explore the
parquet/parquet_encoding_secrets.ipynb
notebook for insights on Parquet file optimization. for more info, you can visit my blog article here
- The
MLKickstart_repo/
directory contains a template for setting up machine learning projects with best practices.
- The
llm_structured_outputs/
directory contains a template for making best use of LLMs. and For more details, you can visit my blog article here
We welcome contributions from the community! If you have knowledge to share:
- Fork the repository
- Create a new branch (
git checkout -b feature/new-content
) - Add your content or make your changes
- Commit your changes (
git commit -am 'Add some new content'
) - Push to the branch (
git push origin feature/new-content
) - Open a Pull Request
Please ensure your contributions align with our Contribution Guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- All the amazing contributors to this project
- The broader data engineering community for continuous inspiration
Happy Data Engineering! DataEngineeringUnboxed Team