Skip to content

Homework 3: Architectural Changes and Merra Service

Madhavan K R edited this page Apr 6, 2022 · 12 revisions

Major Changes

Following are the key changes implemented in this iteration

  1. Redesign of the registry database schema.
  2. Implementing a multi-level cache to store converted data.
  3. CI/CD implementation

Redesign of the registry database schema

In the previous version, the events and requests were coupled, which caused the unrelated information to be forced into being in a single table. This caused issues in distinguishing the events generated by the user from the requests. Further, a single request (requestId) goes through multiple states during its lifetime whereas, an eventId is unique per event. This version of the database also made it hard to keep track of the search history of the user.

As part of this redesign, the registry database was simplified. Following are the considerations

  1. The requests and events are separated into individual tables.
  2. Requests table keeps track of a requestId, search parameters, data source along with the status and result details.
  3. Events table only keeps track of an eventId, the name of event and its time of occurrence.
  4. Both requests and events are tagged to the user who generated it through userId.
  5. Two new tables NexradData and MeraData has been added, which holds the current status of data availability in local object store.

Following is the database schema

Implementing a multi-level cache to store converted data.

We have implemented a two-level cache to enhance the response time of the requests. The cache is used to store the converted data files from both Nexrad and Merra.

The two levels of cache are:

  1. L1-Cache
  2. L2-Cache

L1-Cache (implemented by Amol Sangar)

L2-Cache (implemented by Madhavan Kalkunte Ramachandra)

L2-Cache is implemented using a combination of local s3 like object store, registry microservice, and Redis.

  • local s3 like object-store: We have deployed an S3 like object store that is used to store the converted files reliably for a longer duration and to enable file availability across all instances in the system.
  • registry microservice: Registry microservice is used to persist metadata about files stored in the local object-store. APIs are implemented to query and update the current state of the cache.
  • Redis: A centrally hosted Redis is used to synchronize the cache updates and cache invalidation across all instances.

LRU Cache implementation

L2-Cache is implemented as an LRU Cache. When the cache capacity is reached, the least recently used files are removed from the cache. Each time a file is accessed (queried or downloaded) its last access time is updated in the registry. When the cache capacity is reached, the contents of the cache are sorted in ascending order of their last access time and the top elements are deleted to make room for new entries.

Synchronized Cache Access (for writes and deletes)

The L2-Cache is accessed by multiple instances of the ingestor microservice. When overlapping requests are received at the same time by different instances, it may result in duplication of work i.e the same data can be downloaded and converted multiple times. Also, there may be chances that this might result in corruption of cache. Hence, it is important to ensure synchronized access to the cache especially when writing or deleting from the cache.

Following are the considerations in designing the L2-Cache:

  1. when there are duplicate/overlapping requests only one worker (consumer) must update the cache (i.e download, convert, and upload to s3).
  2. If a similar request is under process anywhere across the system, the worker (consumer) must wait for it to conclude.
  3. Every worker must specify an estimated time to complete the request (i.e download, convert, and upload to s3) beyond which, another request must be allowed to process the data.

The synchronized access to the cache is achieved by using a distributed locking algorithm based on Redis called REDLOCK. The updates to cache (writes and deletes) are considered and treated as critical section. The redlock algorithm ensures that only one worker (among multiple threads of the same node or across multiple nodes) acquires a lock. For a given data we generate a lock name that is unique for the data requested - this ensures that even across multiple nodes when requested for same data, they try to acquire same lock.