forked from stevemcghee/r9y.dev
-
Notifications
You must be signed in to change notification settings - Fork 0
/
topics.txt
212 lines (190 loc) · 4.52 KB
/
topics.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
Local Development
Monolith
Code Review
Pre Merge Hooks
Active Passive Clusters
Microservices
Leftshift Reliability Design
Graceful Service Degradation (Individual CUJs)
Left Shift Performance Testing
Graceful Service Degradation (Universal)
Bounded Context
Protobufs
Smoke Tests
Automated Unit Testing
Multi Service Development
Distributed Systems Awareness
Deployments in Place
Feature Flags
Active Active Multi Cluster
Basic Chaos Testing
Serious Design/Domain Driven Design
Design Around Universal Failure Domains
Sharded Data
Manual Tests
Code Version Control
functional tests
semi automated integration
Data versioning
traffic shifting
instrumentation for in process traces
Backwards Version Compatibility by default
Canary Deployments
Left Shift QA testing (SDET)
E2E testing
Multi Cluster Rollout Policy
Universal Smart Retries
Sharded Serving
manual integration tests
regular release cadence
containers
Blue Green Deployments
Fuzz Testing
Distributed systems (no active/passive)
Automatic assured capacity and performance testing
andon cord/big red button
Code Quality Threshold (code reuse preferred)
Low Context Architecture
Language Readability
Only customize components needing customization
Design for Chaos
Formal methods (e.g. TLA+)
Local data storage
Single Zone
DNS / SImple LB
Basic linear capacity projection
Advanced Loadbalancing
IaC
Understand Infrastructure Failure Domains
Auto Failover
Failure Testing in Prod
N+1 as standard
N+2 Thinking
N+2 Global Planning
Pet Host
1+ computer
distributed storage
alternate site replication
Cattle Infrastructure
Container Orchestrator
Auto Scaling
Eliminate SPOFs (hardware & software)
Service Discovery
Drain/Spill (N/S & E/W)
Basic Loadtesting
Multi Zone
Holtz-Winter capacity projections
Failure Injection
N+1 Regional Planning
L7 Global LB
High Water Mark Prediction
Assured Capacity Load Testing
Real World Traffic Load Testing
L4 Regional Load Balancing
Multi Region
Off-host backup
RPO/RTO defined
DR Plan
RPO/RTO refined
DR plan simulated/tabletop
DR plan tested periodically
Continuous Integration
Continuous Delivery
Regular BCP Testing (run from alternate site)
% Based Traffic Steering
Active Active Datastores
Internal Rate Limiting
Autonomous Response Systems
Automatic Rollbacks
Manually created machines
Manual VM Images
Custom VMs via semi-automation
ITIL style NOC
DR Site Exists
Manual remediation playbooks
Formal Incident Response Roles
Formal Incident Response Processes
Rollbacks/Rollforwards tested
Continuous Deployment
External Rate Limiting
Centralized Production Changelog
Proactive DDoS Countermeasures
Load Prediction
Manual Remediation
Scheduled Downtime
Basic Incident Management
Repeatable Deployments
Automation of Toil
Problem Management Function
Dedicated Operations Tooling
Automated Service Discovery
Data Collection Automation
Mostly Automated Remediation
Patching Windows
Gold Image Automation
Central Certificate Rotation
Breakglass Secret Access
Global Policy Enforcement
Vanilla DDoS Protection
DiRT Testing
Product Specific DDoS Protection (e.g. WAF)
Host Metrics and Logging
Per Host Alarms
Host Ping Tests
Synthetic Monitoring
APM Metrics and Traces
Internal SLAs
Error Budgets
Custom In Process Tracing
Cross Service Transaction Testing
"Multi Machine Debugging
Anomaly Detection
Observability Integration Across Tools
On host log grep
SSH to Grep Logs
Centralized Log Collection
Realtime Centralized Log Analytics
Automated Topology View
Service Level Indicators (SLI)
Record and Replay Traffic
Advanced Vizualizations (heatmaps)
Near Miss Detection
Service Level Objectives (SLO)
Event Correlation
High Context Behaviours
RCA/5 Whys
Incentivise trust/safety
Understand Business Impacts
Blameless Postmortems
Postmortem reviews/actions
Single Central CAB
Holistic View of R9y as high value
Reliability Executive/Sponsor exists
Reliability has a seat at the table
R9y is a product differentiator
R9y can stop feature launch
Proactive Risk and Scaling Analysis
Managing pet configuration drift
Measure Everything
Data Driven Decisions
Service Ownership
Incentivise cross silo collaboration
Dedicated R9y staffing
Change Freezes
Vertical Scale is an Antipattern
SRE SWE roles introduced
Empowered R9y staff
R9y Embedded in High Level Strategy and Operations
Advanced Cost Optimization
Focus on prevention and near misses instead of outages
TODO Lists
Waterfall Projects/PMO
SMART Goals
Goals -> Objectives (OKRs)
Architecture Reviews
High Performing Staff (Promotion and Hiring)
Reactive Risk Analysis
Basic Cost Optimisation
Introducing Dedicated SREs
Toil Budgets
Decreased Reliance on 3rd party SaaS