Skip to content

Geo Pinning

Stu Arnett edited this page May 22, 2019 · 1 revision

Geo-pinning is a feature that focuses on maximizing XOR storage efficiency, while minimizing read-latency impact.

NOTE: For geo-pinning to work well, you must use a replication group with 3 or more sites, with low latency between sites. If application performance is more important than maximizing storage efficiency, you should not use geo-pining, but instead, make sure your application and the ECS bucket are located in the same site.

To take advantage of the storage efficiencies gained on ECS by XOR, data must be written evenly across 3 or more sites. While writing data evenly across multiple sites leads to increased storage efficiency, reading data in a similar fashion may lead to increased WAN overhead and storage inefficiencies due to caching of remote data. This is because for ECS to provide data that is spread out across multiple sites in a strongly consistent manner, it maintains a record of each object’s owner. The object owner is the VDC in which the object was written, and serves as the definitive source and ultimate authority for changes to that object. When an object is read from a non-owner site, ECS must communicate with the owner site across the WAN to determine the latest version of the object.

If you can direct applications to the site where an object was originally written, WAN traffic can be minimized and caching of ECS objects at non-owning sites eliminated or dramatically minimized. This results in higher performance for application workflow and minimal caching of remote data.

Globally balancing writes across ECS sites in a basic round robin fashion will lead to the highest XOR efficiency. However, applying the same basic round robin algorithm to reads would mean requests would most often be sent to a different site than where an object was written. This is where a geo-affinity (or "geo-pinning") algorithm is beneficial. Geo-pinning ensures that all requests for a particular object (be they read or write), are sent to the same site. It is a feature built in to the ECS Object Client for Java (as part of the smart-client load balancer). To use it, simply provide at least 3 VDCs and enable geo-pinning in the configuration. Here is an example:

S3Config config;

// client-side load balancing (direct to individual nodes) with 3+ VDCs
config = new S3Config(Protocol.HTTP, new Vdc("Boston", VDC1_NODE1, VDC1_NODE2),
        new Vdc("Seattle", VDC2_NODE1, VDC2_NODE2), new Vdc("Minneapolis", VDC3_NODE1, VDC3_NODE2);
// enable geo-pinning
config.setGeoPinningEnabled(true);

config.withIdentity(S3_ACCESS_KEY_ID).withSecretKey(S3_SECRET_KEY);

S3Client s3Client = new S3JerseyClient(config);

Enabling geo-pinning results in the following behavior:

  • Equivalent amount of data written to all sites. This leads to lower data protection overhead with XOR.
  • Objects are always read from site where it was originally written. This leads to lower WAN traffic, higher performance, and no caching needed for remote data.

Geo-pinning works by performing a hash of the object’s key, as seen in the URL (but excluding the bucket name), and taking the modulus of that hash to determine which VDC should handle the request.

Clone this wiki locally