Skip to content

System architecture

vogelsgesang edited this page Apr 9, 2014 · 10 revisions

System architecture

Since most data will be retrieved from sources which are based on web technologies (XML, JSON, Rest APIs, Web scraping), our system will be based on web technologies, as well. It will be divided into a client and a server. The server is responsible for creating a consistent view of the data contained in the data sources. It provides access to this data by means of a JSON Api. Thanks to this, our software can be reused and integrated into other systems. The client makes calls to this Api in order to retrieve the data and display it to the end user in a more intuitive way.

Additional Implementation details are provided in a separate wiki page.

The server

The server has four responsibilities:

  • Deliver the clients source code
  • Provide an API for modifying the meta data repository
  • Build a consistent view of the data and deliver this view through the API
  • Providing recommendations (no concrete plans on this so far. Maybe we will use Random Walks)

Delivering the clients source code is trivial: From the point of view of our server the client consists only of static files. The node-static library is used for serving them to the browser. The files are minified and pre-compressed in order to guarantee a minimal response time.

The meta data is stored in a MongoDB instance. Access is provided using a simple REST Api. More information on the format of the meta data can be found on the Meta data wiki page.

Consolidated data is served by constructing this data on the fly. For this purpose, all sources configured in the meta data repository are queried for relevant informations. This returned information is filtered and if its is actually fitting our initial query, we add it to the set of acknowledged information. Every time when new information is acknowledged, all other sources are checked again if additional informations can be found using the new set of available base information. This process is repeated until no source is able to contribute additional information.

All sources are connected with our core through adapters (See Interface specification for source adapters). These modules are responsible for speaking to the relevant databases/web services. Every module delivers the data in a common format. This common format still contains the field names of the original source. The core of our data integration layer takes care of mapping and restructuring these fields.

All sources are queried asynchronously, i.e. in parallel. This is done in order to avoid unnecessary idle times and for constructing the consolidated data faster. In order to further improve the response time, consolidated data is cached using MemCache. In addition, the unaggregated data returned by each source is cached in order to avoid the necessity of reloading the data from all sources if the configuration of one source is changed.

The client

The client is a one-page-webapp which provides a GUI for the JSON Api of our server. It is served by our server statically, i.e. the server will not embed any type of information into the delivered HTML/JS files. The JS code of the client is responsible for querying the relevant information using the server's JSON Api and for embedding the response into the GUI. It does not take care of any type of data integration.