-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathSystem_Design_Topics.txt
1703 lines (1299 loc) · 107 KB
/
System_Design_Topics.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
System Design Topics
-1. FOUR Building Blocks of Architecting Systems for Scale
0. Conversion Guide
1. Performance vs scalability
2a. CAP Theorem
2b. Consistency Patterns
2c. Availability Patterns
3. SQL vs NoSQL
4. Caching
4a. Best practices for Caching
4b. Strategies and How to Choose the Right One; Cache Policies
5. CDN
6. Load Balancer:
6a. What can we do when load balancer becomes the bottleneck?
6b. What are the various Load Balancing Methods
6c. Types of Load Balancers: Classic/Network, HTTP Based (Application),
7. Proxies and Sessions
7a. Sticky Sessions:
7b. Reverse Proxy vs Forward Proxy
7c. Load balancer vs reverse proxy
7d. Load Balancer vs API Gateway
8. Asynchronism:
9. Databases
9a1. Normalization
9a2. 1NF
9a3. 2NF
9a4. 3NF
9a5. Database-design-bad-practices
9a6. How Facebook scaled MySQL
9a. Pairing Master and Slave DB to make Webapps faster
9b. Polygot Persistence
9c. Strategies for dealing with heavy writes to a DB
9d. What is CQRS?
9e. When would I use Amazon Redshift vs. Amazon RDS?
9f. What is Amazon Athena
10. Consistent Hashing
11. REST API
11a. GET, POST, PUT, DELETE
11b. How to design REST API
11c. API Query, Filter and Pagination
11d. API Versioning and Techniques/Best Practices
12. Designing Idempotent API (How to handle Retries)
13. Distributed Locks
14. Distributed Transactions
15. Pessimistic and Optimitic Locks
16. What are Websockets - C10K Challenge
16a. Long Polling vs Web sockets
16b. What is the overhead of using Websockets
16c. HTTP vs Websocket
16d. Scaling Websockets - C10K
17. PostgreSQL vs MySQL
18. What is Event Driven Architecture / Event Sourcing
19. What is Non-blocking or AsyncIO Asynchronous IO
20. InfoQ Videos - How Netflix sends Recommendation using Zuul Push
22. Technologies to browse: Airflow, Redshift vs Snowflake, Segment and Fivetran, Apache Hive, Mosquito
--------------------------------------------------------------------------------
Technology Related Notes; Distributed Systems - Key Concepts
0. Things to read about
1. Comparing Popular Databases
4. ActiveMQ or RabbitMQ or ZeroMQ
5. Running Java in Container
6. Protocol Buffers
7. Docker and Kubernetes
8. Dockers and Containerization:
9. Docker and Kubernetes
10. Memcache, Redis
11. Hadoop
12. Apache Spark
12b. Spark vs MapReduce
13. Hive
14. AWS
14a. Amazon EC2 - Elastic Compute Cloud
15. Kafta
16. RabbitMQ
17. Zookeeper
18. Get zeromq message data into std::vector<char>
--------------------------------------------------------------------------------
Key References:
https://roadtoarchitect.com/2018/09/04/useful-technology-and-company-architecture/ - Contains architecture of various companies
https://roadtoarchitect.com/category/system-design/
https://github.com/prasadgujar/low-level-design-primer/blob/master/solutions.md
https://igotanoffer.com/blogs/tech/system-design-interviews
https://www.algoexpert.io/systems/questions
https://www.educative.io/courses/grokking-the-system-design-interview
https://www.interviewbit.com/courses/system-design/
--------------------------------------------------------------------------------
Open Questions:
1. How do you pair Read-through and Write Through Cache
2. Why is state india of a session stored in NoSql 1.14
3. Sharding:
a. How to solve celebrity problem
b. Denormalize the db so that queues can be performed in a single table
--------------------------------------------------------------------------------
SYSTEM DESIGN PRIMER:
https://github.com/donnemartin/system-design-primer
Scalability Video
https://www.youtube.com/watch?v=-W9F__D3oY4
-1. FOUR Building Blocks of Architecting Systems for Scale
http://highscalability.com/blog/2012/9/19/the-4-building-blocks-of-architecting-systems-for-scale.html
1. Load Balancing: Scalability and Redundancy
- Horizontal scalability and redundancy are usually achieved via load balancing,
the spreading of requests across multiple resources.
1. Smart Clients.
The client has a list of hosts and load balances across that list of hosts.
Upside is simple for programmers. Downside is it's hard to update and change.
2. Hardware Load Balancers.
Targeted at larger companies, this is dedicated load balancing hardware.
Upside is performance.
Downside is cost and complexity.
3. Software Load Balancers.
The recommended approach, it's software that handles load balancing, health checks, etc
2. Caching.
Make better use of resources you already have. Precalculate results for later use.
Application Versus Database Caching. Databases caching is simple because the programmer doesn't have to do it. Application caching requires explicit integration into the application code.
In Memory Caches. Performs best but you usually have more disk than RAM.
Content Distribution Networks. Moves the burden of serving static resources from your application and moves into a specialized distributed caching service.
Cache Invalidation. Caching is great but the problem is you have to practice safe cache invalidation.
3. Off-Line Processing.
Processing that doesn't happen in-line with a web requests. Reduces latency and/or handles batch processing.
Message Queues. Work is queued to a cluster of agents to be processed in parallel.
Scheduling Periodic Tasks. Triggers daily, hourly, or other regular system tasks.
Map-Reduce. When your system becomes too large for ad hoc queries then move to using a specialized data processing infrastructure.
4. Platform Layer.
Disconnect application code from web servers, load balancers, and databases using a service level API.
This makes it easier to add new resources, reuse infrastructure between projects, and scale a growing organization.
0. Conversion Guide
2.5 million seconds per month
1 request per second = 2.5 million requests per month
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month
SECONDS IN A DAY = 86,400
SECONDS IN A MONTH = 2.5M
SECONDS IN A YEAR = 32M
MINUTES in a DAY = 1440 = 1.5k
MINUTES in a MONTH = 43,200 = 50k
MINUTES in a YEAR = 525,600 = 0.5M
RPS to DAY: 1 RPS = 86400 Requests per day = 86k
RPS to MONTH: 1 RPS = 2,592,000 Requests per month = 2.5 M
RPS to YEAR: 1 RPS = 31,536,000 Requests per year = 32 M
Power of 2
Power Value Byte Number of 0s
10 1 K 1 KB
20 1 Million 1 MB 6
30 1 Billion 1 GB 9
40 1 Trillion 1 TB 12
50 1 Quardillion 1 PB 15
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 100 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from disk 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Handy metrics based on numbers above:
Read sequentially from disk at 30 MB/s
Read sequentially from 1 Gbps Ethernet at 100 MB/s
Read sequentially from SSD at 1 GB/s
Read sequentially from main memory at 4 GB/s
6-7 world-wide round trips per second
2,000 round trips per second within a data center
1. Performance vs scalability
A way to look at performance vs scalability:
- If you have a performance problem, your system is slow for a single user.
- If you have a scalability problem, your system is fast for a single user but slow under heavy load.
2a. CAP Theorem
http://ksat.me/a-plain-english-introduction-to-cap-theorem
[C] Consistency - All nodes see the same data at the same time.
Simply put, performing a read operation will return the value of the most recent write operation causing all nodes to return the same data.
[A] Availability - Every request gets a response on success/failure.
Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system.
[P] Partition Tolerance - System continues to work despite message loss or partial failure.
Most people think of their data store as a single node in the network. ¿This is our production SQL Server instance¿. Anyone who has run a production instance for more than four minutes, quickly realizes that this creates a single point of failure. A system that is partition-tolerant can sustain any amount of network failure that doesn¿t result in a failure of the entire network.
2b. Consistency Patterns
2c. Availability Patterns
3. SQL vs NoSQL
Reasons for SQL:
Structured data
Strict schema
Relational data
Need for complex JOINS
Transactions
Clear patterns for scaling
More established: developers, community, code, tools, etc
Lookups by index are very fast
Reasons for NoSQL:
Semi-structured data
Dynamic or flexible schema
Non-relational data
No need for complex joins
Store many TB (or PB) of data
Very data intensive workload
Very high throughput for IOPS
4. Caching
4a. Best practices for Caching
https://docs.microsoft.com/en-us/azure/architecture/best-practices/caching --> GREAT READ
1. Caching in distributed applications
Distributed applications typically implement either or both of the following strategies when caching data:
- Using a private cache, where data is held locally on the computer that's running an instance of an application or service.
- Using a shared cache, serving as a common source that can be accessed by multiple processes and machines.
a. Private Cache
- If you have multiple instances of an application that uses this model running concurrently, each application instance has its own independent cache holding its own copy of the data.
b. Shared Cache
- Shared caching ensures that different application instances see the same view of cached data. It does this by locating the cache in a separate location, typically hosted as part of a separate service
2. Decide when to cache data
- Read frequently but modified infrequently
- Caching typically works well with data that is immutable or that changes infrequently
- Don't use cache to store critical information
- Caching is less useful for dynamic data
Example:
For example, if a data item represents a multivalued object such as a bank customer with a name, address, and account balance, some of these elements might remain static (such as the name and address), while others (such as the account balance) might be more dynamic. In these situations, it can be useful to cache the static portions of the data and retrieve (or calculate) only the remaining information when it is required.
3. Cache highly dynamic data
- Consider the benefits of storing the dynamic information directly in the cache instead of in the persistent data store.
If the data is noncritical and does not require auditing, then it doesn't matter if the occasional change is lost.
4. Managing concurrency in a cache
Depending on the nature of the data and the likelihood of collisions, you can adopt one of two approaches to concurrency:
1. Optimistic
- Immediately prior to updating the data, the application checks to see whether the data in the cache has changed since it was retrieved. If the data is still the same, the change can be made. Otherwise, the application has to decide whether to update it.
- This approach is suitable for situations where updates are infrequent, or where collisions are unlikely to occur.
2. Pessimistic
- the application locks it in the cache to prevent another instance from changing it.
- This approach might be appropriate for situations where collisions are more likely, especially if an application updates multiple items in the cache and must ensure that these changes are applied consistently.
4b. Strategies and How to Choose the Right One; Cache Policies
https://codeahoy.com/2017/08/11/caching-strategies-and-how-to-choose-the-right-one/
https://hazelcast.com/blog/a-hitchhikers-guide-to-caching-patterns/
1. Cache-Aside
The cache sits on the side and the application directly talks to both the cache and the database.
Pros:
- For read-heavy workloads.
- Another benefit is that the data model in cache can be different than the data model in database.
- Systems using cache-aside are resilient to cache failures
Cons:
- When cache-aside is used, the most common write strategy is to write data to the database directly. When this happens, cache may become inconsistent with the database.
- To deal with above, developers generally use time to live (TTL) and continue serving stale data until TTL expires.
2. Read-Through Cache
Read-through cache sits in-line with the database. When there is a cache miss, it loads missing data from database, populates the cache and returns it to the application.
- first time, it always results in cache miss and incurs the extra penalty of loading data to the cache.
- Developers deal with this by ‘warming’ or ‘pre-heating’ the cache by issuing queries manually.
Pros:
-
Cons:
- the data model in read-through cache cannot be different than that of the database.
Cache-Aside :: Read-Through
- In cache-aside, the application is responsible for fetching data from the database and populating the cache. In read-through, this logic is usually supported by the library or stand-alone cache provider.
- Unlike cache-aside, the data model in read-through cache cannot be different than that of the database.
https://www.baeldung.com/cs/cache-write-policy
3. Write-Through Cache
- data is first written to the cache and then to the database.
- ONly after both, response is returned to the caller
- The cache sits in-line with the database and writes always go through the cache to the main database.
Pros:
- But when paired with read-through caches, we get all the benefits of read-through and we also get data consistency guarantee, freeing us from using cache invalidation techniques.
- best consistency,
Cons:
- extra write latency because data is written to the cache first and then to the main database.
4. Write-Around
- data is written directly to permanent storage, bypassing the cache.
- This can reduce the cache being flooded with write operations that will not subsequently be re-read,
but has the disadvantage that a read request for recently written data will create a "cache miss" and must be read from slower back-end storage and experience higher latency.
5. Write-Back
- data is written to cache alone and completion is immediately confirmed to the client.
- the backing store update happens asynchronously in a separate sequence
Pros:
- Write back caches improve the write performance and are good for write-heavy workloads.
- When combined with read-through, it works good for mixed workloads, where the most recently updated and accessed data is always available in cache.
5. CDN
https://medium.com/@lee5187415/concepts-you-should-know-about-large-system-design-c0a823c33a96
- Content Delivery Network: global network servers storing static files (image, videos, code files).
- CDN closer to the user has acquired the data from the remote server
Push Based
Pull Based
6. Load Balancer:
- Typically comes in pairs
- Could be implemented as Active - Active or Active Passive mode
Active - Active:
- Load balancer can partition for various requests
- Frequent heart beat happens. So that if ones dies another will automatically be responsible
Advantages of Load Balancers:
1. SSL Termination
Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations.
Removes the need to install X.509 certificates on each server
2. Session Persistence
Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions
Load balancers can route traffic based on various metrics, including:
Random
Least loaded
Sticky Session/cookies
Round robin or weighted round robin
Layer 4 :
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet.
Layer 7 :
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies.
6a. What can we do when load balancer becomes the bottleneck?
https://stackoverflow.com/questions/55201050/what-can-we-do-when-load-balancer-becomes-the-bottleneck
https://www.nginx.com/resources/glossary/dns-load-balancing/
https://www.linux.com/learn/intro-to-linux/2018/3/simple-load-balancing-dns-linux
The usual approach is to publish the load balancer IP addresses under the same domain name.
This is called DNS load balancing. Clients will ask for the IP resolution for your load balancer's domain name and they will get different IP addresses on a round-robin fashion.
To configure DNS load balancing you have to add multiple A records for your load balancer's domain name to your DNS configuration.
6b. What are the various Load Balancing Methods
https://www.dnsstuff.com/what-is-server-load-balancing
1. Round Robin
2. Hash the IP
3. Node with least connections
4. Node with least response time
5. Node which will consume the least bandwidth
6c. Types of Load Balancers: Classic/Network, HTTP Based (Application),
https://www.f5.com/company/blog/top-five-scalability-patterns
https://www.dnsstuff.com/what-is-server-load-balancing
HTTP(S) Load Balancing:
You can load balance requests based on anything HTTP – including the payload.
Most folks (smartly, in my opinion) restrict their load balancing rules to what can be found in the HTTP header.
That includes the host, the HTTP method, content-type, cookies, custom headers, and user-agent, among others.
This form of load balancing relies on layer 7, which means it operates in the application layer.
It allows you to form distribution decisions based on any information that comes with an HTTP address.
Network Load Balancing:
Network load balancing leverages network layer information to decide where to send network traffic.
This is accomplished through layer 4 load balancing, which is designed to handle all forms of TCP/UDP traffic.
Network load balancing is considered the fastest of all the load balancing solutions, but it tends to fall short when it comes to balancing the distribution of traffic across servers.
7. Proxies and Sessions
7a1. Sticky Sessions:
https://stackoverflow.com/questions/10494431/sticky-and-non-sticky-sessions
https://dev.to/gkoniaris/why-you-should-never-use-sticky-sessions-2pkj
https://stackoverflow.com/questions/1553645/pros-and-cons-of-sticky-session-session-affinity-load-blancing-strategy
- Say you have two web servers, WWW1 and WWW2.
- You get a request from Alice and Load Balancer sends it to WWW1.
- Next time when you get a request from Alice you would like to send to same WWW1
- ANS:
- When WWW1 sends response, it can send a cookie object to the client. So next time when Alice sends a request it will use the cookie object
- The cookie objects identifies to send request to WWW1 instead of WWW2
Amazon ELB has built-in support to enable Sticky Sessions
http://www.lecloud.net/post/9699762917/scalability-for-dummies-part-4-asynchronism
7a2. Types of Sticky Session
1. Duration-based stickiness
2. Application-based stickiness
Application-based stickiness gives you the flexibility to set your own criteria for client-target stickiness.
When you enable application-based stickiness, the load balancer routes the first request to a target within the target group based on the chosen algorithm. The target is expected to set a custom application cookie that matches the cookie configured on the load balancer to enable stickiness. This custom cookie can include any of the cookie attributes required by the application.
7a3. Drawback of storing sticky session on a node
https://aws.amazon.com/caching/session-management/
A drawback for using storing sessions on an individual node is that in the event of a failure, you are likely to lose the sessions that were resident on the failed node. In addition, in the event the number of your web servers change, for example a scale-up scenario, it’s possible that the traffic may be unequally spread across the web servers as active sessions may exist on particular servers. If not mitigated properly, this can hinder the scalability of your applications.
7a4. Session Replication and Sticky Session
https://stackoverflow.com/questions/6367812/sticky-sessions-and-session-replication/11045462#11045462
Imagine you have only one user using your web app, and you have 3 tomcat instances.
1. If you're using session replication without sticky session :
Session requests will be sent randomly to a Tomcat instance
2. If you're using sticky session without replication :
Session requests will be sent to the same Tomcat instance (say A). Later if A goes down, new sessions will be sent to B or C.
But B or C won't have a copy of the user's session.
The user will lose his session and is disconnected from the web app although the web app is still running.
3. If you're using sticky session WITH session replication :
Session will be preserved even if an instance goes down
7b. Reverse Proxy vs Forward Proxy
https://stackoverflow.com/questions/224664/difference-between-proxy-server-and-reverse-proxy-server
First of all, the word "proxy" describes someone or something acting on behalf of someone else.
In the computer realm, we are talking about one server acting on the behalf of another computer.
FORWARD proxy
The proxy event in this case is that the "forward proxy" retrieves data from another web site on behalf of the original requestee.
A tale of 3 computers (part I)
For an example, I will list three computers connected to the internet.
X = your computer, or "client" computer on the internet
Y = the proxy web site, proxy.example.org
Z = the web site you want to visit, www.example.net
Normally, one would connect directly from X --> Z.
However, in some scenarios, it is better for Y --> Z on behalf of X, which chains as follows: X --> Y --> Z.
Reasons why X would want to use a forward proxy server:
Here is a (very) partial list of uses of a forward proxy server.
1) X is unable to access Z directly because
a) Someone with administration authority over X's internet connection has decided to block all access to site Z.
REVERSE proxy
A tale of 3 computers (part II)
For this example, I will list three computers connected to the internet.
X = your computer, or "client" computer on the internet
Y = the reverse proxy web site, proxy.example.com
Z = the web site you want to visit, www.example.net
Normally, one would connect directly from X --> Z.
However, in some scenarios, it is better for the administrator of Z to restrict or disallow direct access, and force visitors to go through Y first. So, as before, we have data being retrieved by Y --> Z on behalf of X, which chains as follows: X --> Y --> Z.
What is different this time compared to a "forward proxy," is that this time the user X does not know he is accessing Z, because the user X only sees he is communicating with Y. The server Z is invisible to clients and only the reverse proxy Y is visible externally. A reverse proxy requires no (proxy) configuration on the client side.
The client X thinks he is only communicating with Y (X --> Y), but the reality is that Y forwarding all communication (X --> Y --> Z again).
Reasons why Z would want to set up a reverse proxy server:
1) Z wants to force all traffic to its web site to pass through Y first.
a) Z has a large web site that millions of people want to see, but a single web server cannot handle all the traffic. So Z sets up many servers, and puts a reverse proxy on the internet that will send users to the server closest to them when they try to visit Z. This is part of how the Content Distribution Network (CDN) concept works.
2) The administrator of Z is worried about retaliation for content hosted on the server and does not want to expose the main server directly to the public.
a) Owners of Spam brands such as "Canadian Pharmacy" appear to have thousands of servers, while in reality having most websites hosted on far fewer servers. Additionally, abuse complaints about the spam will only shut down the public servers, not the main server.
In the above scenarios, Z has the ability to choose Y
7c. Load balancer vs reverse proxy
https://stackoverflow.com/questions/65174175/how-do-websocket-connections-work-through-a-load-balancer
Load Balancer
Main use case of the load balancer is to distribute the load among node in a group of the server to manage the resource utilisation of each node
Reverse Proxy
One of the use cases of a reverse proxy is to hide server meta information (ip,port etc..) from the client. It's some sort of security.
We can configure the reverse proxy with load balancer or we can configure the reverse proxy alone as well.
Configuring the load balancer for a single node doesn't make sense but we can configure the reverse proxy for a single node.
Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.
Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.
Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.
Disadvantage(s): reverse proxy
Introducing a reverse proxy results in increased complexity.
A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.
7d. Load balancer vs API Gateway
https://stackoverflow.com/questions/61174839/load-balancer-and-api-gateway-confusion
Load Balancer ->
Its a software which works at protocol or socket level (eg. tcp, http, or port 3306 etc.) Its job is to balance the incoming traffic by distributing it to the destinations with various logics (eg. Round robin). I doesn't offer features such as authorisation checks, authentication of requests etc.
API Gateway ->
Its a managed service provided by various hosting companies to manage API operations to seamlessly scale the API infra. It takes cares of
access control,
Rate Limiting
Circuit Breakers
response caching,
response types,
authorisation,
authentication,
request throttling,
data handling,
identifying the right destinations based on custom rules, and seamless scaling the backend.
Generally Managed API gateways by default comes with scalable infra, so putting them behind load balancer might not make sense.
Q: Where are API gateways hosted? A DNS resolves domain name to a load balancer or api gateway?
A: About resolving the Domain, most likely always the DNS resolves to the load balancer, which in turn fetches the response from the API gateway service.
DNS -> Load Balancer -> API gateway -> Backend service
8. Asynchronism:
- Pre-compute things ahead of time
- Callback mechanism
9. Databases
9a0. Denormalization
https://www.geeksforgeeks.org/denormalization-in-databases/
Denormalization is a database optimization technique in which we add redundant data to one or more tables.
This can help us avoid costly joins in a relational database.
Note that denormalization does not mean ‘reversing normalization’ or ‘not to normalize’.
It is an optimization technique that is applied after normalization.
The process of taking a normalized schema and making it non-normalized is called denormalization, and designers use it to tune the performance of systems to support time-critical operations
9a1. Normalization
https://www.youtube.com/watch?v=xoTyrdT9SZI
Avoid / Removing redundant data from a table to reduce
- Insertion Anomoly
- Update Anomoly
- Deletion Anomoly
9a2. 1NF
https://www.youtube.com/watch?v=mUtAPbb1ECM
Every table should at least follow 1NF ALWAYS
4 rules
1. Each column must have a single values; Should not have multiple values
2. All values in a column should be of same kind
3. Each column should have a unique name
4. Order in which the data is stored doesn't matter
9a3. 2NF
https://www.youtube.com/watch?v=R7UblSu4744
1. Table should be in 1NF
2. No partial dependencies in the table
- Dependency
- Eg: all fields are dependent on the Primary Key. this is a dependency
- A single column can uniquely identify a complete row or all the other columns in a row
- Partial Dependency
- In the below table primary key is a composite key : "student_id" + "subject_id"
- In the below example, "teacher" is dependent on "subject_id" not on both
- This is a partial dependency
- Many to Many Relationship
- Eg: Scores Table
student_id subject_id marks teacher
10 1 50 a
10 2 60 b
11 1 85 a
11 2 75 b
11 4 55 j
- In the table above, student_id + subject_id, forms a key
- We cannot get all the marks of Student Id "10"
9a4. 3NF
https://www.youtube.com/watch?v=aAx_JoEDXQA
1. Table should be in 2NF
2. No transitive dependencies in the table
Transitive Dependency
- When an attribute in a tables depends on some non-prime attribute and not on the prime attribute
- Eg: Scores Table
student_id subject_id marks exam_name total_marks
10 1 50 a
10 2 60 b
11 1 85 a
11 2 75 b
11 4 55 j
- In the table above, student_id + subject_id, forms a key
- Total_marks depends on "Exam Name"
- This does not depend on any of the primary keys
9a5. database-design-bad-practices
https://www.toptal.com/database/database-design-bad-practices
https://www.javatpoint.com/dbms-integrity-constraints
1. Poor Normalization
- At least 3NF
2. Redundancy
- Redundant fields and tables are a nightmare for developers
3. Bad Referential Integrity (Constraints)
- Referential integrity is one of the most valuable tools that database engines provide to keep data quality at its best.
- A referential integrity constraint is specified between two tables.
- In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
4. Not Taking Advantage of DB Engine Features
- Good use of Indexes
- Views that provide a quick and efficient way to look at your data
- Aggregate functions that help analyze information without programming
- Transactions or blocks of data-altering sentences that are all executed and committed or cancelled (rolled back)
- Locks that keep data safe and correct while transactions are being executed.
5. Composite Primary Keys
Beware, though, if your table with a composite primary key is expected to have millions of rows, the index controlling the composite key can grow up to a point where CRUD operation performance is very degraded. In that case, it is a lot better to use a simple integer ID primary key
6. Poor Indexing
- If the table is big enough, you will think, logically, to create an index on each column that you use to access this table only to find almost immediately that the performance of SELECTs improves but INSERTs, UPDATEs, and DELETEs drop. This, of course, is due to the fact that indexes have to be kept synchronized with the table, which means massive overhead for the DBE. This is a typical case of over indexing that you can solve in many ways; for instance, having only one index on all the columns different from the primary key that you use to query the table, ordering these columns from the most used to the least may offer better performance in all CRUD operations than one index per column.
9a6. How Facebook scaled MySQL
https://www.facebook.com/watch?v=695491248045
https://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/
Database Monitoring
- Monitor Query performances
Online Schema Change: When a new Column is added
- Updates will be blocked for the entire duration
- All rows should be updated
So FB build a tool (called Online Schema Change)
- Create a new table,
- Copy data to the new table
- Set the new tabls as the target
- A SINGLE SELECT WON'T work
- Copy table in multiple steps
- Use ideas from Shlomi Noach
Adding an Edge to a graph
- This is limited by the rate at which Locks can be obtained on a row
start TRANSACTION
insert edge into the graph
update the "count" of edges coming out
Commit TRANSACTION
- Solution 1: Stored Procedure
- Same code written inside NySQL
- Soultion 2: Triggers
- With solution 1 or 2, rates at which we can add notes are doubled.
- Solution 3: No Stored Procuedure
- Use MySQL feature called Multi-Statement Query
9a. Pairing Master and Slave DB to make Webapps faster
https://www.quora.com/What-are-Master-and-Slave-databases-and-how-does-pairing-them-make-web-apps-faster
Master databases receive and store data from applications. Slave databases get copies of that data from the masters. Slaves are therefore read-only from the application's point of view while masters are read-write.
Writes to a database are more "expensive" than reads. Checking for data integrity and writing updates to physical disks, for example, consume system resources. Most web applications require a much higher ratio of reads to writes. For example a person may write an article once and then it’s read thousands of times. So setting up master-slave replication in the right scenario lets an application distribute its queries efficiently. While one database is busy storing information the others can be busy serving it without impacting each other.
Most often each master and slave database are run on separate servers or virtual environments. Each is then tailored and optimized for their needs. Master database servers may be optimized for writing to permanent storage. Slave database servers may have more RAM for query caching. Tuning the environments and database settings makes each more optimized for reading or writing, improving the overall efficiency of the application.
9b. Polygot Persistence
https://martinfowler.com/bliki/PolyglotPersistence.html
Using multiple DBs to solve the business requirement.
9c. Strategies for dealing with heavy writes to a DB
https://stackoverflow.com/questions/53037736/system-design-strategies-for-dealing-with-heavy-writes-to-a-db
Great Question:
Question:
what are some industry-standard strategies in dealing with a system that requires heavy writes to a particular table in a DB.
For simplicity sake, let's say the table is an inventory table for products, and has a column 'Product Name', and a column 'Count', and it simply increments by +1 each time a new Product is bought into the system. And there are millions of users buying different products every 2nd and we have to keep track of the latest count of each product, but it does not have to be strictly realtime, maybe a 5 min lag is acceptable.
My options are:
1) Master slave replication, where master DB handles all writes, and slaves handles reads. But this doesn't address the write-heavy problem
2) Sharding the DB based on product name range, or its hashed value. But what if there's a specific product (eg Apple) that receives large number of updates in a short time, it'll still hit the same DB.
3) Batched updates? Use some kind of caching and write to table every X number of seconds with a cumulative counts of whatever we've received in those X seconds? Is that a valid option, and what caching mechanism do I use? And what if there's a crash between the last read and next write? How do I recover the lost count?
Answer:
A solution to write thousands of records per second might be very different from incrementing a counter in the example you provided. More so, there could be no tables at all to handle such load. Consistency/availability requirements are also missing in your question and depending on them the entire architecture may be very different.
Anyway, back to your specific simplistic case and your options
Option 1 (Master slave replication)
The problem you’ll face here is database locking - every increment would require a record lock to avoid race conditions and you’ll quickly get your processes writing to your db waiting in a queue and your system down. Even under a moderate load )
Option 2 (Sharding the DB)
Your assumption is correct, not much different from p.1
Option 3 (Batched updates)
Very close. A caching layer provided by a light-weight storage providing concurrent atomic incremens/decrements with persistence not to lose your data. We’ve used redis for a similar purpose although any other key-value database would do as well - there are literally dozens of such databases around.
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash table
The solution would look as follows:
incoming requests → your backend server -> kv_storage (atomic increment(product_id))
And you'll have a "flushing" script running i.e. */5 that does the following (simplified):
1. for every product_id in kv_storage read its current value
2. update your db counter (+= value)
3. decrement the value in kv_storage
Further scaling
- if the script fails nothing bad would happen - the updates would arrive on next run
- if your backend boxes can't handle load - you can easily add more boxes
- if a single key-value db can't handle load - most of them support scaling over multiple boxes or a simple sharding strategy in your backend scripts would work fine
- if a single "flushing" script doesn't keep up with increments - you can scale them to multiple boxes and decide what key ranges are handled by each one
9d. What is CQRS?
https://garywoodfine.com/what-is-cqrs/
https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs
https://martinfowler.com/bliki/CQRS.html
- Command Query Responsibility Segregation (CQRS)
- Developers use different models for Read and Update processes. Command and Query which are two operations for reads and writes respectively.
- The main use of CQRS pattern using it in high-performance applications to scale read and write operations.
- CQRS allows you to separate the load from reads and writes allowing you to scale each independently.
- Thus, every method should either be a Command or a Query that performs separate actions but not both simultaneously.
- CQRS is a natural fit with the following:
Task based UI systems
Event-based programming models
Event-Driven Microservices
Eventual Consistency
Domain Driven Design
https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs
- In traditional architectures, the same data model is used to query and update a database.
- This makes things unwieldy
- For example, on the read side, the application may perform many different queries, returning data transfer objects (DTOs) with different shapes. Object mapping can become complicated. On the write side, the model may implement complex validation and business logic.
- CQRS separates reads and writes into different models, using commands to update data, and queries to read data.
- Commands should be task-based, rather than data centric. ("Book hotel room", not "set ReservationStatus to Reserved").
- For greater isolation, you can physically separate the read data from the write data. In that case, the read database can use its own data schema that is optimized for queries.
- If separate read and write databases are used, they must be kept in sync. Typically this is accomplished by having the write model publish an event whenever it updates the database.
- Using multiple read-only replicas can increase query performance
- Separation of the read and write stores also allows each to be scaled appropriately to match the load.
9e. When would I use Amazon Redshift vs. Amazon RDS?
https://aws.amazon.com/redshift/faqs/
Both Amazon Redshift and Amazon Relational Database Service (RDS) let you run traditional relational databases in the cloud while off-loading database administration. Customers use Amazon RDS databases primarily for online-transaction processing (OLTP) workloads, while Amazon Redshift is used primarily for reporting and analytics. OLTP workloads require quickly querying specific information, and support for transactions such as insert, update, and delete are best handled by Amazon RDS. Amazon Redshift harnesses the scale and resources of multiple nodes and uses a variety of optimizations to provide order of magnitude improvements over traditional databases for analytic and reporting workloads against very large datasets. Amazon Redshift provides an excellent scale-out option as your data and query complexity grows if you want to prevent your reporting and analytic processing from interfering with the performance of your OLTP workload
9f. What is Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is easy to use. Simply point to your data in S3, define the schema, and start querying using standard SQL.
10. Consistent Hashing
https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/
Consistent hashing uses a counter that acts like a clock. Once it reaches the ¿12¿, it wraps around to ¿1¿ again. Suppose this counter is 16 bits. This means it ranges from 0 to 65535. If we visualize this on a clock, the number 0 and 65535 would be on the ¿12¿, 32200 would be around 6¿o clock, 48000 on 9 o¿clock and so on. We call this clock the continuum.
On this continuum, we place a (relative) large amount of ¿dots¿ for each server. These are placed randomly so we have a clock with a lot of dots.
11. REST API
Examples:
http://localhost:8080/myappstore/customer/1/orders?iterm=microwave
http://localhost:8080/myappstore/customer/1/orders?iterm=microwave&quantity=1
http://localhost:8080/myappstore/customers?name=john
http://localhost:8080/myappstore/customers?name=john&dobStart=1995-12-01T00:00:00&dateEnd=2020-12-31T23:59:59
http://localhost:8080/myappstore/customers?limit=20&offset=0
11b. How to design REST API
https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/
https://medium.com/hashmapinc/rest-good-practices-for-api-design-881439796dc9
1. Accept and Respond with JSON
2. Use "nouns" instead of "verbs" in endpoint paths
- This is because our HTTP request method already has the verb.
3. Group / Nest entities logically
- For example, if we want an endpoint to get the comments for a news article, we should append the /comments path to the end of the /articles path.
- /customer/orders
- For example, if a user has posts and we want to retrieve a specific post by user, API can be defined as
GET /users/123/posts/1 which will retrieve Post with id 1 by user with id 123
4. Handle errors gracefully and return standard error codes
400 Bad Request – This means that client-side input fails validation.
401 Unauthorized – This means the user isn’t not authorized to access a resource. It usually returns when the user isn’t authenticated.
403 Forbidden – This means the user is authenticated, but it’s not allowed to access a resource.
404 Not Found – This indicates that a resource is not found.
500 Internal server error – This is a generic server error. It probably shouldn’t be thrown explicitly.
502 Bad Gateway – This indicates an invalid response from an upstream server.
503 Service Unavailable – This indicates that something unexpected happened on server side (It can be anything like server overload, some parts of the system failed, etc.).
5. Allow filtering, sorting, and pagination
http://example.com/articles?sort=+author,-datepublished
Where + means ascending and - means descending. So we sort by author’s name in alphabetical order and datepublished from most recent to least recent.
6. Maintain good security practices
- using SSL/TLS for security is a must.
7. Cache data to improve performance
8. Versioning our APIs
Two main ways
a. Header
b. URL
9. HATEOAS: Hypermedia As Transfer Engine Of Application State
- Instead of embedding everything in the response, link URLs for other resources.
{
“name”: “John Doe”,
“self”: “http://localhost:8080/users/123",
“posts”: “http://localhost:8080/users/123",
“address”: “http://localhost:8080/users/123/address"
}
- If resources contain several fields that the user may not want to go through, it’s a good idea to show navigation to sub-resources then implement HATEOAS.
- It provides ease of navigation through a resource and its available actions.
Differnce in Opinion:
- There are a lot of mixed opinions as to whether the API consumer should create links or whether links should be provided to the API.
- HATEOS is useful when browsing the web where we go to a website's front page and follow links based on what we see on the page).
- When browsing a website, decisions on what links will be clicked are made at run time.
- HATEOAS on APIs might not be that good.
- With an API, decisions as to what requests will be sent are made when the API integration code is written, not at run time.
- Could the decisions be deferred to run time? Sure, however, there isn't much to gain going down that route as code would still not be able to handle significant API changes without breaking.
10. Swagger for documentation
11c. API Query, Filter and Pagination
http://localhost:8080/myappstore/customer/1/orders?item=microwave
http://localhost:8080/myappstore/customer/1/orders?item=microwave&quantity=1
http://localhost:8080/myappstore/customer?name=john
http://localhost:8080/myappstore/customer?name=john&dobStart=1995-12-01T00:00:00&dateEnd=2020-12-31T23:59:59
Filter:
GET /users/123/posts?state=published
Searching
GET /users/123/posts?state=published&ta=scala
Pagination
http://localhost:8080/myappstore/customers?limit=20&offset=0
Sorting
http://example.com/articles?sort=+author,-datepublished
Where + means ascending and - means descending. So we sort by author’s name in alphabetical order and datepublished from most recent to least recent.
11d. API Versioning and Techniques/Best Practices
https://cloud.google.com/blog/products/api-management/api-design-which-version-of-versioning-is-right-for-you
https://medium.com/swlh/api-versioning-7f6f713c6b14
https://www.xmatters.com/blog/blog-four-rest-api-versioning-strategies/
https://www.akana.com/blog/api-versioning
https://stackoverflow.com/questions/389169/best-practices-for-api-versioning
One reason why many APIs never need versioning is that you can make many small enhancements to APIs in a backwards-compatible way, usually by adding new properties or new entities that older clients can safely ignore.
Your first thought should always be to try to find a backwards-compatible way of introducing an API change without versioning;
The more clients an API has, and the greater the independence of the clients from the API provider, the more careful the API provider has to be about API compatibility and versioning.
Providers of APIs sometimes make different choices if the consumers of the API are internal to the same company, or limited to a small number of partners. In that case they may be tempted to try to avoid versioning by coordinating with consumers of the API to introduce a breaking change. In our experience this approach has limited success; it typically causes disruption and a large coordination effort on both sides.
- It is usually much better for API providers to treat internal users and partners as if they were external consumers whose development process is independent.
Format Versioning VS Entity Versioning
1. Format Versioning
The important point in this example is that version 1 and version 2 of the API both allow access to the same bank accounts. The API change introduces no new entities; versions 1 and 2 simply provide two different "formats" [my word1] for manipulating the same bank accounts.
Further, any change made using the version 2 API changes the underlying account entity in ways that are visible to clients of the version 1 API. In other words, each new API version defines a new format for viewing a common set of entities. It’s in this sense that I use the phrase "format versioning" in the rest of this post.
2. Entity Versioning
Extending the bank example, imagine that the bank wants to introduce checking accounts based on blockchain technology, which requires the underlying data for the account to be organized quite differently. If the API that was previously exposed for accounts made assumptions that are simply not compatible with the new technology, it's not going to be possible to read and manipulate the blockchain accounts using the old API. The bank’s solution is the same as the car company’s: introduce "version 2" checking accounts. Each account is either a conventional account or a blockchain account, but not both at the same time. Each version has its own API that are the same where possible but different where necessary.
While "entity versioning" is attractive for its flexibility and simplicity, it also is not free; you still have to maintain the old versions for as long as people use them.
1. Embedding API version in the URI?
GET https://www.sampleresource.com/v1/foo
Lot of companies use this (FB, Twitter, Airbnb etc)
a. How will API Consumers be notified of the API Version Change
The new API version will be sent to consumers through a PATCH
- Major patch
- In this approach, your URI would denote the breaking changes to the API. A new major version requires creating a new API. The version number is what you use to route to the correct host via your URI.
- Minor patch
- You update change logs to inform API consumers of new functionality or bug fixes. Or, you could correlate minor to a lifecycle coordinator iteration, in which that minor introduces a non-breaking functionality.
Pros
This looks like the easiest way forward.
Cons
It violates one of the dictums of good API design - That every URI should contain unique resources.
URI versioning can cause issues with HTTP caching. An HTTP cache would have to store each version.
However, this would be against the HATEOAS constraint [Hypermedia As The Engine Of Application State].
This is because having a resource address/URI would change over time.
I would conclude that API versions should not be kept in resource URIs for a long time meaning that resource URIs that API users can depend on should be permalinks.
With API versions clearly visible in URI there's a caveat: one might also object this approach since API history becomes visible/aparent in the URI design and therefore is prone to changes over tim
Adding a version number to the API would mean that the client is making an assumption of how an API would behave and would thus mean that the API is no longer opaque.
If we go by the book, clients should be dynamic and only rely on the API responses [see. web browsers].
It might also mean that incrementing the API version would translate into branching the entire API resource.
2. Through content negotiation
GET /foo
Accept: application/ion+json;v2.0
curl -H “Accept: application/vnd.xm.device+json; version=1” http://www.example.com/api/products
This approach allows us to version a single resource representation instead of versioning the entire API which gives us a more granular control over versioning. It also creates a smaller footprint in the code base as we don’t have to fork the entire application when creating a new version. Another advantage of this approach is that it doesn’t require implementing URI routing rules introduced by versioning through the URI path.
Pros:
- Allows us to version a single resource representation instead of versioning the entire API
- More granular control over versioning
- Creates a smaller footprint
- Doesn’t require implementing URI routing rules.
Cons:
- Requiring HTTP headers with media types makes it more difficult to test and explore the API using a browser
- More often than not, content negotiation needs to be implemented from scratch as there are few libraries that offer that out of the box.
3. Query Parameters
www.sampleresource.com/api/foo?version=1
Include the version number as a query parameter.
Pros:
This approach is very straightforward
Easy to set defaults to the latest version in case of missing query parameters.
Cons:
Query parameters are more difficult to use for routing requests to the proper API version
4. Custom Headers
curl -H “accepts-version: 1.0”
www.sampleresource.com/api/foo
REST APIs can also be versioned by providing custom headers with the version number included as an attribute.
Pros:
The main difference between this approach and the two previous ones is that it doesn’t clutter the URI with versioning information.
Cons:
It requires custom headers
12. Designing Idempotent API (How to handle Retries)
https://stripe.com/blog/idempotency
https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/
https://8thlight.com/blog/colin-jones/2018/09/18/microservices-arent-magic-handling-timeouts.html
https://ieftimov.com/post/understand-how-why-add-idempotent-requests-api/
https://medium.com/@saurav200892/how-to-achieve-idempotency-in-post-method-d88d7b08fcdd
List of idempotent Rest methods:
HTTP Method
OPTIONS
GET
HEAD
PUT
DELETE
List of non-idempotent methods:
HTTP Method
POST
PATCH
Timeout Error
- Clients can Retry. But retry may not solve.
- Good clients to Exponentially Backoff
- But we should avoid https://en.wikipedia.org/wiki/Thundering_herd_problem by introducing a random JITTER
Options at hand in case of a timeout (You got a timeout from a remote API)
1. When you hit a timeout, assume it succeeded and move on.
- Never do this
2. For read requests, use a cached or default value.
- If your request is a read request and isn’t intended to have any effects on the remote end, this could be a good bet.
3. Assume the remote operation failed, and try again automatically.
- Without the idempotent property, you could create duplicate data
4. Check and see if the request succeeded, and try again if it’s safe.
- This approach clearly requires the existence of an endpoint that can give us the information we want.
- Use Idempotent Keys (A unique ID) in the request
- but this Key should be stored so that on a new request, the backend can check if the Idempotent Key was processed earlier
- Response and Previous Status will also be stored so that it can be immediately returned to the client
- TTL for keys to expire after some time
- DON'T USE DB for the Keys
- keeping all of these keys can be expensive, especially if you do it in an nonoptimal way like by using the database.
- Whenever you are in doubt if data like this should be stored in a database, always think about how crucial to your business this data is.
- For the scope of our example, this data is not business critical, therefore we could offload it to a different type of storage.
- Use REDIS. It has TTL as well and superfast to fetch something
13. Distributed Locks
Requirements:
- Mutal Exclusion; Two or more processes shouldn't acquire the same lock
- Deadlock free
- Fault tolerant
Step 1:
A ----------> Lock Manager ----------> Cache
- A requests a lock
- Lock Manager records the entry in cache and provides a lock to A
B ---------->
Happy Path Flow:
1. A acquires a lock;
2. A releases the lock
3. B acquires the lock
Problem
1. What is A holds the lock for a long time?