awesome-papers/README.md at master · sanwan/awesome-papers · GitHub

Ceph

Spark

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Kafka

Kafka: a Distributed Messaging System for Log Processing

Kubernetes

Large-scale cluster management at Google with Borg

Mesos

Google

Large-scale cluster management at Google with Borg
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Borg, Omega, and Kubernetes(This article describes some of the knowledge gained and lessons learned during Google’s journey from Borg to Kubernetes. )
The Google File System
Bigtable: A Distributed Storage System for Structured Data
MapReduce: Simplified Data Processing on Large Clusters
Dremel: Interactive Analysis of Web-Scale Datasets
Pregel: A System for Large-Scale Graph Processing
Large-scale Incremental Processing Using Distributed Transactions and Notifications(One of the backend systems that subtend Caffeine)
Similarity Estimation Techniques from Rounding Algorithms

Algorithms