Skip to content

errajibadr/DataEngineeringUnboxed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataEngineeringUnboxed

Welcome to DataEngineeringUnboxed! This repository is a comprehensive resource for data engineers, covering a wide range of topics, tools, and best practices in the field of data engineering. Our goal is to provide practical, hands-on guidance for both beginners and experienced professionals. For more info, you can visit my blog articles here

Bonus tip : You can boost your productivity today, visit DataUnboxed

🚀 What's Inside

This repository covers various aspects of data engineering, including but not limited to:

  1. AWS Services: Glue, ECS, CloudFormation
  2. Development Environments: Jupyter, VSCode
  3. Data Formats: Parquet
  4. Machine Learning: Project setup and best practices
  5. Infrastructure as Code: CloudFormation templates
  6. Local Development: Running cloud services locally

📚 Repository Structure

Our repository is organized into the following main directories:

  • /aws: AWS-specific guides and resources
  • /MLKickstart_repo: Machine Learning project template and best practices
  • /parquet: Tutorials on Parquet file format
  • /llm_structured_outputs: Examples of structured outputs from language models

🛠 Getting Started

  1. Clone this repository
  2. Explore the directories that interest you most
  3. Follow the README files in each subdirectory for specific instructions

📖 Key Tutorials

AWS Glue Local Development

AWS ECS and Infrastructure

  • Check out the CloudFormation sample templates in the aws/ecs_global infrastructure/ directory for ECS cluster and service setups.

Parquet Optimization

  • Explore the parquet/parquet_encoding_secrets.ipynb notebook for insights on Parquet file optimization. for more info, you can visit my blog article here

Machine Learning Project Setup

  • The MLKickstart_repo/ directory contains a template for setting up machine learning projects with best practices.

LLM and Gen AI

  • The llm_structured_outputs/ directory contains a template for making best use of LLMs. and For more details, you can visit my blog article here

🤝 Contributing

We welcome contributions from the community! If you have knowledge to share:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/new-content)
  3. Add your content or make your changes
  4. Commit your changes (git commit -am 'Add some new content')
  5. Push to the branch (git push origin feature/new-content)
  6. Open a Pull Request

Please ensure your contributions align with our Contribution Guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

  • All the amazing contributors to this project
  • The broader data engineering community for continuous inspiration

Happy Data Engineering! DataEngineeringUnboxed Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published