- Data Architecture articles
- Data Ingestion / Data Onboarding / ETL / ELT
- Reverse ETL
- Data Collection / Product Analytics / Customer Data
- Transformation / Preparation / Cleaning / Wrangling
- SQL Tools / Editors
- SQL Engines
- BI / Reporting / Data Visualization
- Data Quality / Profiling / Observability
- Data Management / Lineage / Catalog / Governance
- DataOps / Data Fabric
- Orchestration / Workflow
- Storage / Database
- Data Privacy / Security / Identity
- Others
No (file systems) storage or (traditional) databases, and for now, no data science, virtualization, or streaming tools. And no all embedded tools and services proposed by the 3 main public Cloud providers (Google Cloud, Microsoft Azure and AWS).
- Emerging Architectures for Modern Data Infrastructure
- The Modern Data Stack: Past, Present, and Future
- Data Mesh Principles and Logical Architecture and a Data Warehouse comparison
- The Building Blocks of a Modern Data Platform
- Two steps towards a modern data platform
- What your data team is using: The analytics stack
- The Top 20 Most Commonly Used Data Engineering Tools
- Data Stacks For Fun & Nonprofit
- The Future of Business Intelligence is Open Source
- What is Data Observability?
Data Ingestion / Data Onboarding / ETL / ELT
- Flatfile Data Onboarding platform
- Fivetran Cloud data integration platform
- Matillion Cloud data integration platform
- Apache Gobblin Open Source distributed data integration framework
- Singer "Open Source standard for writing scripts that move data"
- Meltano Open Source ELT for the DataOps
- Airbyte Open Source data integration platform
- Stitch Simple, extensible Cloud ETL platform (Talend)
- Hevo No-code data pipeline as a service
- Apache Hop Open Source data integration platform project
- Meroxa Real-time data ingestion infrastructure
- Portable Cloud Hosted ELT Platform
- Talend, StreamSets, Alooma (Google), Xplenty, Striim, Panoply, Stambia, HVR
- Census Operational analytics platform, move data from data warehouse to apps
- Hightouch Sync customer data to SaaS business platforms
- Grouparoo Open Source framework to move data between database and Cloud apps
Data Collection / Product Analytics / Customer Data
- Segment Customer data platform (CDP) (Twilio)
- RudderStack Customer data pipeline, event tracking
- Snowplow Data collection platform
- Freshpaint Collect, control, and deliver customer data
- PostHog Open Source Product Analytics platform
- Amplitude Product Analytics platform
- Iteratively Product Analytics platform « Capture customer data you trust »
- Avo Product Analytics platform
- Mixpanel Product analytics platform
- Indicative Product analytics platform
- Heap Product analytics platform
- Supermetrics Get marketing data for reporting, analytics and storage
Transformation / Preparation / Cleaning / Wrangling
- Trifacta Data Wrangling for Cloud (or Hadoop) platforms and storages
- dbt Transform with SQL from command line (Open Source) or Cloud
- Dataform Collaboration on SQL pipelines in Cloud data warehouses (Google)
- Pano Open Source data preparation for Cloud data warehouses
- Rasgo Data preparation for Data Scientists
- Mito Jupyter Lab extension to generate panda Python code from a spreadsheet
- DataPrep Prepare data in Python
- OpenRefine "A free, open source, powerful tool for working with messy data"
- Count "The BI notebook built for analysts"
- PopSQL "Modern SQL editor"
- DataGrip IDE for SQL (JetBrains)
- DBeaver Free (or Enterprise and Cloud editions) universal database tool
- sq "swiss-army knife for data", SQL in command line for relational data
- SqlDBM Develop Database Models
- Querybook Open Source SQL query and Big Data IDE via a notebook interface
- Soda SQL Data testing, monitoring, and profiling for SQL-accessible data
- SQLFluff SQL Linting and Auto-formatting for Humans
- Trino Open Source high perf and distributed SQL query engine (formerly PrestoSQL)
- Starburst Cloud or On-premises SQL engine (based on Trino)
- AWS Athena Interactive SQL query service for Amazon S3 (based on Presto)
- DataFusion Query execution engine using Apache Arrow as its in-memory format
BI / Reporting / Data Visualization
- Metabase Open Source business intelligence tool
- Apache Superset Open Source modern data exploration and visualization platform
- Apache ECharts Open Source JavaScript Visualization Library
- Cube.js Open Source Analytical API platform
- Grafana Open Source analytics & monitoring solution
- Looker BI and Analytics Platform (Google)
- Redash Data visualisation and Dashboarding with SQL (Databricks)
- Mode Collaborative data platform that combines SQL, R, Python, and visual analytics
- Sigma Cloud analytics solution
- Hex Collaborative SQL + Python-based notebooks
- Lux Python library and API for Intelligent Visual Discovery
- y42 "No-Code Business Intelligence" platform
- Knowage Open Source Business Analytics Suite
- Rakam Data platform for building analytics interface (dbt integration)
- Datawrapper Enrich stories and articles with data visualization
- D3 JavaScript library for visualizing data with HTML, SVG, and CSS
- Lightdash Open source BI tool fully integrated with dbt projects
- Tableau, PowerBI, Sisense, Qlik, Spotfire, ThoughtSpot, Chartio (Atlassian), Domo, Toucan Toco
Data Quality / Profiling / Observability
- Monte Carlo "Data Reliability Delivered"
- Datafold Data Observability platform
- Great Expectations Open Source data quality, profiling & validation
- Bigeye Automatic data quality monitoring
- Anomalo Validate and document your data warehouse
- Trackplan "Schema Management for Behavioural Data Tracking"
- lightup Cloud data quality indicators provider
Data Management / Lineage / Catalog / Governance
- Datakin DataOps solution, Data Lineage
- Marquez Open Source metadata and data governance project
- DataHub Open Source metadata search & discovery tool
- Amundsen Open Source data discovery and metadata engine
- Data Galaxy Data Governance platform with Data Catalog and Data Lineage
- Zeenea Cloud-native Data Catalog
- Alation Data Governance and Data Catalog platform
- Collibra Data Governance and Data Catalog platform
- Secoda Data Discovery and Data Catalog
- MANTA Data Lineage platform
- data.world Cloud-native Data Catalog
- Stemma SaaS managed version of Amundsen
- Egeria Open Metadata and Governance
- Altan "the modern data workspace", Data Management & DataOps
- Nessie DataOps for Data Lakes, a "Git-Like Experience for your Data Lake"
- Nexla DataOps platform "to delivery data for Analytics, AI and Operations"
- Keboola DataOps platform
- Saagie DataOps platform
- DataKitchen DataOps platform
- DAGsHub GitHub for data
- Unravel DataOps platform
- Upsolver "Compute and pipeline layer between your data lake and the analytics tools"
- Cinchy "Autonomous Data Fabric" and Data Management platform
- Apache Airflow Open Source workflow scheduler platform
- Dagster Open Source "Data orchestrator for machine learning, analytics, and ETL"
- Prefect Workflow management system and platform for dataflow automation
- Apache DolphinScheduler Distributed and visual workflow scheduler system
- Luigi Python package to build complex pipelines of batch jobs
- DuckDB In-process SQL OLAP database (Sqlite like column oriented)
- ClickHouse Open-source OLAP database management system
- DoltHub "the true Git for data experience in a SQL database"
- DVC Data Version Control
- Materialize Event Streaming Database
- Warp 10 Advanced Time Series Platform
- Snowflake, Firebolt, BigQuery, Redshift, Apache Cassandra, MongoDB, InfluxDB, QuestDB, Neo4j, SingleStore(MemSQL)
Data Privacy / Security / Identity
- Immuta "Self-Service Data Access with Automated Privacy Control"
- Okera Cloud data security, "Universal Data Authorization"
- Privacera SaaS Access Governance Solution
- Apache Ranger Framework to enable, monitor and manage comprehensive data security
- Baffle Cloud security with a "transparent data security mesh"
- Privitar Enterprise Data Privacy Software
- ReachFive Identity & Access Management
- Okta Trusted platform to secure identities, from customers to workforce
- Opendatasoft Data sharing platform
- Streamlit Turns data scripts into shareable data web apps
- Transform Data Shared data interface and metrics repository
- White Label Data Platform for building and deploying custom data applications
- Flat Data Bring working datasets into your GitHub repositories and versioning them
And finally don't hesitate to:
- Star this GitHub repository Web page
- Suggest addition interesting and new data tool with a pull request, an issue or a message
- Share this list in your newtork
- Enjoy and Have Fun !
Victor