Skip to content

Commit

Permalink
[Modify/Add] Add Databases & Analytics, and Other Compute Service Doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
kananinirav committed Oct 20, 2024
1 parent 4224f41 commit daad911
Show file tree
Hide file tree
Showing 6 changed files with 510 additions and 6 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@
- Scalability & High Availability, Vertical Scalability, Horizontal Scalability, High Availability, High Availability & Scalability For EC2, Scalability vs Elasticity (vs Agility), What is load balancing?, What’s an Auto Scaling Group?
- [Amazon S3](./sections/s3.md)
- S3 Use cases, Amazon S3 Overview - Buckets, Amazon S3 Overview - Objects, S3 Websites, S3 Storage Classes, S3 Object Lock & Glacier Vault Lock, Shared Responsibility Model for S3, AWS Snow Family, What is Edge Computing?, Snow Family - Edge Computing, AWS OpsHub, Hybrid Cloud for Storage, AWS Storage Gateway
- [Databases & Analytics](./sections/databases.md)
- Databases Intro, Relational Databases, NoSQL Databases, Databases & Shared Responsibility on AWS, AWS RDS Overview, Amazon Aurora, Amazon ElastiCache Overview, DynamoDB, Redshift Overview, Amazon EMR, Amazon Athena, Amazon QuickSight, DocumentDB, Amazon Neptune, Amazon QLDB
- [Other Compute Section](./sections/other_compute.md)
- What is Docker?, ECS, Fargate, ECR, What’s serverless?, Why AWS Lambda ?, Amazon API Gateway, AWS Batch, Batch vs Lambda, Amazon Lightsail, Lambda Summary

## Practice Exams ( dumps )

Expand Down
306 changes: 306 additions & 0 deletions sections/databases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
# Databases & Analytics

- [Databases \& Analytics](#databases--analytics)
- [Databases Intro](#databases-intro)
- [Relational Databases (SQL)](#relational-databases-sql)
- [NoSQL Databases](#nosql-databases)
- [NoSQL data example: JSON](#nosql-data-example-json)
- [Databases \& Shared Responsibility on AWS](#databases--shared-responsibility-on-aws)
- [AWS RDS Overview](#aws-rds-overview)
- [Advantage over using RDS versus deploying DB on EC2](#advantage-over-using-rds-versus-deploying-db-on-ec2)
- [RDS Deployments](#rds-deployments)
- [RDS Deployments: Read Replicas, Multi-AZ](#rds-deployments-read-replicas-multi-az)
- [RDS Deployments: Multi-Region](#rds-deployments-multi-region)
- [Amazon Aurora](#amazon-aurora)
- [Amazon ElastiCache Overview](#amazon-elasticache-overview)
- [DynamoDB](#dynamodb)
- [DynamoDB Accelerator (DAX)](#dynamodb-accelerator-dax)
- [DynamoDB Global Tables](#dynamodb-global-tables)
- [Redshift Overview](#redshift-overview)
- [Amazon EMR (Elastic MapReduce)](#amazon-emr-elastic-mapreduce)
- [Amazon Athena](#amazon-athena)
- [Amazon QuickSight](#amazon-quicksight)
- [DocumentDB (with MongoDB Compatibility)](#documentdb-with-mongodb-compatibility)
- [Amazon Neptune](#amazon-neptune)
- [Amazon QLDB](#amazon-qldb)
- [Amazon Managed Blockchain](#amazon-managed-blockchain)
- [AWS Glue](#aws-glue)
- [DMS - Database Migration Service](#dms---database-migration-service)
- [Databases \& Analytics Summary](#databases--analytics-summary)

## Databases Intro

- Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits
- Sometimes, you want to store data in a database…
- You can structure the data
- You build indexes to efficiently query / search through the data
- You define relationships between your datasets
- Databases are optimized for a purpose and come with different features, shapes and constraint
- **Managed Databases**: AWS takes care of maintenance, backups, and security for databases.
- **Benefits**: Reduced operational complexity, built-in high availability, disaster recovery, scalability, and enhanced security.
- **Types**:
- **Relational Databases** (SQL)
- **NoSQL Databases**
- **Data Warehousing**
- **In-memory Caching**

## Relational Databases (SQL)

- **Structured Data**: Stored in predefined schema tables, managed with SQL.
- **Use Cases**: Transactional applications, financial systems.
- **Examples**: MySQL, PostgreSQL, Oracle, SQL Server, MariaDB.

## NoSQL Databases

- **Flexible Schema**: No predefined schema, designed for fast and scalable data storage.
- **Use Cases**: Real-time applications, IoT, mobile apps.
- Benefits:
- Flexibility: easy to evolve data model
- Scalability: designed to scale-out by using distributed clusters
- High-performance: optimized for a specific data model
- Highly functional: types optimized for the data model
- **Examples**: DynamoDB, MongoDB (DocumentDB), Key-value, document, graph, in-memory, search databases

### NoSQL data example: JSON

- JSON is a common form of data that fits into a NoSQL model
- Data can be nested
- Fields can change over time
- Support for new types: arrays, etc…

```json
{
"name": "Abc",
"age": 30,
"cars": [
"Ford",
"BMW",
"Fiat"
],
"address": {
"type": "house",
"number": 23,
"street": "Abc Road"
}
}
```

## Databases & Shared Responsibility on AWS

| **AWS Responsibility** | **Customer Responsibility** |
| ------------------------------------------- | ------------------------------------------------ |
| Infrastructure management, backups, patches | Data security, encryption, access controls (IAM) |
| Availability and failover | Data management, monitoring, performance tuning |

## AWS RDS Overview

- **RDS (Relational Database Service)**: Fully managed service for relational databases.
- It’s a managed DB service for DB use SQL as a query language.
- Supports **MySQL**, **PostgreSQL**, **MariaDB**, **Oracle**, **SQL Server**.
- Handles **backup**, **patching**, **high availability** (Multi-AZ), and **scaling**.

### Advantage over using RDS versus deploying DB on EC2

- RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1)
- BUT you can’t SSH into your instances

### RDS Deployments

- **Read Replicas**: Improves read performance, **asynchronous** replication.
- **Multi-AZ**: Automatic failover, high availability for production environments.
- **Multi-Region**: Disaster recovery across regions, global availability.

### RDS Deployments: Read Replicas, Multi-AZ

| Read Replicas | Multi-AZ |
| ----------------------------------- | ------------------------------------------------- |
| Scale the read workload of your DB | Failover in case of AZ outage (high availability) |
| Can create up to 5 Read Replicas | Data is only read/written to the main database |
| Data is only written to the main DB | Can only have 1 other AZ as failover |

![Read Replicas Multi-AZ](../images/read_replicas_multi_AZ.png)

### RDS Deployments: Multi-Region

- Multi-Region (Read Replicas)
- Disaster recovery in case of region issue
- Local performance for global reads
- Replication cost

![Multi-Region](../images/multi_region.png)

## Amazon Aurora

- **Amazon Aurora**: High-performance RDS database.
- Compatible with **MySQL** and **PostgreSQL**.
- **5x faster** than MySQL, **3x faster** than PostgreSQL.
- **Auto-scaling** storage up to **64 TB**.
- Supports **Multi-AZ** and up to **15 read replicas**.
- Great for **enterprise-grade** applications requiring high availability and performance.
- Aurora costs more than RDS (20% more) – but is more efficient

## Amazon ElastiCache Overview

- **ElastiCache**: In-memory data caching service.
- **Redis**: Advanced key-value store with replication and persistence.
- **Memcached**: Simple, memory-only caching service.
- Reduces database load and speeds up applications by **caching frequent queries**.
- Caches are in-memory databases with high performance, low latency
- AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup

## DynamoDB

- Fully managed, serverless NoSQL database.
- Supports key-value and document data models.
- Automatically scales based on demand.
- Provides high availability and durability with replication across 3 AZ
- Millions of requests per seconds, trillions of row, 100s of TB of storage
- Fast and consistent in performance
- Single-digit millisecond latency – low latency retrieval
- Integrated with IAM for security, authorization and administration
- Low cost and auto scaling capabilities
- Standard & Infrequent Access (IA) Table Class

### DynamoDB Accelerator (DAX)

- In-memory caching for DynamoDB.
- **10x faster** read performance. ingle-digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
- Secure, highly scalable & highly available
- Ideal for use cases where **low-latency reads** are critical.

### DynamoDB Global Tables

- Multi-region replication for **global** applications.
- **Low-latency** reads and writes across multiple regions.
- Ensures data availability globally with **multi-master replication**.

## Redshift Overview

- Managed data warehousing service.
- Optimized for **online analytical processing (OLAP)** and big data analytics.
- Uses **columnar storage** for fast query performance.
- 10x better performance than other data warehouses, scale to PBs of data
- Columnar storage of data (instead of row based)
- Supports integration with **BI tools** (QuickSight, Tableau).
- Massively Parallel Query Execution (MPP), highly available.
- Has a SQL interface for performing the queries.
- Pay-per-query or **reserved instances** for cost savings.
- Designed for **massive datasets**.

## Amazon EMR (Elastic MapReduce)

- Managed big data processing service.
- Uses **Hadoop**, **Apache Spark**, and **Hive** for processing large data sets.
- Ideal for **data transformation**, **machine learning**, and **ETL** (Extract, Transform, Load).
- Integration with **S3**, **DynamoDB**, and **Redshift**.
- The clusters can be made of hundreds of EC2 instances
- EMR takes care of all the provisioning and configuration
- Auto-scaling and integrated with Spot instances
- Use cases: data processing, machine learning, web indexing, big data

## Amazon Athena

- Serverless query service
- Use **SQL** to query structured and unstructured data stored in **S3**.
- No infrastructure to manage, pay-per-query.
- Supports various formats like **CSV**, **JSON**, **Parquet**, and **ORC**.
- Pricing: $5.00 per TB of data scanned
- Use compressed or columnar data for cost-savings (less scan)
- Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
- Analyze data in S3 using serverless SQL, use Athena

## Amazon QuickSight

- Business Intelligence (BI) tool for data visualization.
- Serverless machine learning-powered business intelligence service to create interactive dashboards
- Fast, automatically scalable, embeddable, with per-session pricing
- Supports data from S3, Redshift, RDS, and other AWS data sources.
- **Pay-per-session** pricing model for cost efficiency.
- Use cases:
- Business analytics
- Building visualizations
- Perform ad-hoc analysis
- Get business insights using data

## DocumentDB (with MongoDB Compatibility)

- Managed document database, **MongoDB-compatible**.
- DocumentDB is the same for MongoDB (which is a NoSQL database)
- Highly scalable and durable with **Multi-AZ**.
- Built for **JSON** document storage.
- Aurora storage automatically grows in increments of 10GB, up to 64 TB.
- Automatically scales to workloads with millions of requests per seconds
- Use cases: Content management, cataloging, and mobile backends.

## Amazon Neptune

- Fully managed graph database
- A popular graph dataset would be a social network
- Users have friends
- Posts have comments
- Comments have likes from users
- Users share and like posts…
- Highly available across 3 AZ, with up to 15 read replicas
- Build and run applications working with highly connected datasets – optimized for these complex and hard queries
- Can store up to billions of relations and query the graph with milliseconds latency
- Highly available with replications across multiple AZs
- Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

## Amazon QLDB

- QLDB stands for ”Quantum Ledger Database”
- A ledger is a book **recording financial transactions**
- Fully Managed, Serverless, High available, Replication across 3 AZ
- Used to **review history of all the changes made to your application data** over time
- **Immutable** system: no entry can be removed or modified, cryptographically verifiable
- 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
- Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules

## Amazon Managed Blockchain

- Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
- Amazon Managed Blockchain is a managed service to:
- Join public blockchain networks
- Or create your own scalable private network
- Compatible with the frameworks Hyperledger Fabric & Ethereum

## AWS Glue

- Managed extract, transform, and load (ETL) service
- Useful to prepare and transform data for analytics
- Fully serverless service
- Glue Data Catalog: catalog of datasets
- can be used by Athena, Redshift, EMR

## DMS - Database Migration Service

- Quickly and securely migrate databases to AWS, resilient, self healing
- The source database remains available during the migration
- Supports:
- Homogeneous migrations: ex Oracle to Oracle
- Heterogeneous migrations: ex Microsoft SQL Server to Aurora

## Databases & Analytics Summary

- Relational Databases - OLTP: RDS & Aurora (SQL)
- Differences between Multi-AZ, Read Replicas, Multi-Region
- In-memory Database: ElastiCache
- Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
- Warehouse - OLAP: Redshift (SQL)
- Hadoop Cluster: EMR
- Athena: query data on Amazon S3 (serverless & SQL)
- QuickSight: dashboards on your data (serverless)
- DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
- Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
- Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
- Glue: Managed ETL (Extract Transform Load) and Data Catalog service
- Database Migration: DMS
- Neptune: graph database
10 changes: 5 additions & 5 deletions sections/elb_asg.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@

## Scalability vs Elasticity (vs Agility)

| **Term** | **Definition** |
|--------------------|--------------------------------------------------------------------------------------------------|
| **Scalability** | Ability to increase or decrease the capacity to handle varying levels of traffic or load. |
| **Elasticity** | Automatically adjusts resources up or down based on the load in real-time, preventing under or over-provisioning. |
| **Agility** | The ability to deploy and manage resources quickly and efficiently in response to changing demands. |
| **Term** | **Definition** |
| --------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Scalability** | Ability to increase or decrease the capacity to handle varying levels of traffic or load. |
| **Elasticity** | Automatically adjusts resources up or down based on the load in real-time, preventing under or over-provisioning. |
| **Agility** | The ability to deploy and manage resources quickly and efficiently in response to changing demands. |

## What is Load Balancing?

Expand Down
3 changes: 3 additions & 0 deletions sections/iam.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
- **Users**: Represent individual identities that interact with AWS services. Users have unique credentials (username, password, access keys).
- **Groups**: Logical grouping of users to simplify permission management.
- Permissions assigned to a group are automatically inherited by its users.
- Flexibility in User Management in IAM, users do not have to belong to a group, and a user can belong to multiple groups. This allows user to manage access permissions in a granular and efficient manner. For example, a user could belong to both the “QAs" group and the “Developers” group, inheriting permissions from both.

| **IAM Users** | **IAM Groups** |
|------------------------------------------------------------|----------------------------------------------------------|
Expand All @@ -55,6 +56,8 @@

### IAM Policies Inheritance

![IAM Policies Inheritance](../images/IAM_Policies_inheritance.png)

- Policies are evaluated together for a user, including:
- **Directly attached policies**.
- **Group policies**.
Expand Down
Loading

0 comments on commit daad911

Please sign in to comment.