Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-handler support #18

Closed
ibnesayeed opened this issue May 17, 2016 · 9 comments
Closed

Multi-handler support #18

ibnesayeed opened this issue May 17, 2016 · 9 comments

Comments

@ibnesayeed
Copy link
Contributor

ibnesayeed commented May 17, 2016

There are use cases where having a single instance running multiple handlers can be useful. For example, a webmaster is running multiple CMS software such as WordPress, Joomla, MediaWiki, and some custom applications and is interested in providing TimeGate access to some or all of those services. Currently, the Webmaster needs to run separate handler instances for each service. Additionally, all the example handlers in the package are something that anyone can run on their servers (we don't expect that these well know services will use our code and run it to provide native Memento support, it would be awesome if they do), for instance, LANL's Memento aggregator uses all the example implementations as Memento proxies, so is ODU's Memento aggregator.

In the current setup, each instance needs to run separately on separate ports (or in separate containers), then a loadbalancer/reverse proxy needs to be setup in order to unify them under the same domain name and port while name spacing handlers based on the first path segment. Although, I like the modular micro-instance approach, but when there are too much repetitions in the architecture, sometimes it's better to make monolithic, but easier.

I am proposing the multi-handler mode in which one can keep single entries for [server] and [cache] sections in the config.ini file while introducing subsections under the [handler] section to serve multiple proxies under the same instance like this:

[server]
host = http://localhost
strict_datetime = true
api_time_out = 6

[cache]
cache_activated = false
cache_refresh_time = 86400
cache_directory = cache
cache_max_values = 250

[handler][simple]
handler_class = timegate.examples.simple:ExampleHandler
use_timemap = true
is_vcs = true
base_uri = http://www.example.com/simple/

[handler][webcite]
handler_class = timegate.examples.webcite:WebCiteHandler
use_timemap = false
is_vcs = false
base_uri = http://www.example.com/webcite/

The code should check if there are subsections under [handler] section then honor the multi-handler mode or fallback to the existing single instance mode. This way the new config style will not break the backward compatibility. I would note that the ConfigParser package does not support sub-sections, but there are several alternatives available.

@jirikuncar
Copy link
Contributor

I had the similar proposal in draft and I have already started working on it.

We can keep compatibility with handler sections defined in following way.

[handler]
handler_class = simple
base_uri = http://www.example.com/simple/

[handler:webcite]
handler_class = timegate.examples.webcite:WebCiteHandler
use_timemap = false
is_vcs = false
base_uri = http://www.example.com/webcite/

@ibnesayeed
Copy link
Contributor Author

@jirikuncar I liked your [handler:webcite] style name spacing idea. This will not require any special code for the backward compatibility and it will work with the existing config parser. All the logic to build the unified handlers config object will be custom and will reside in config.py file. Looking forward to seeing this implemented soon.

@ibnesayeed
Copy link
Contributor Author

ibnesayeed commented May 17, 2016

Another approach I was thinking about was to embrace convention over configuration philosophy in a way that the system by default follows some conventions and infer configurations automatically, but it allows explicit configurations when needed. Let me explain what I mean here in this context. Suppose we have a directory structure like this:

- timegate
    - conf
        - config.ini
    - handlers
        - arxiv
            - ArxivHandler.py
        - simple
            - ExampleHandler.py
        - website
            - WebCiteHandler.py

Then in the config.ini file, we can have one config option autoload_handlers_path to point to a directory where various handlers are organized in directories with expropriate file names (in this eaxmple, timegate/handlers). Under that path it can use directory names as the namespace/entrypoint (perhaps even nested directories can be supported), and the python file name as the class name with correct casing. If the autoload_handlers_path config is not set then it can fall back to the configuration method described above. Additionally, if autoload_handlers_path is configured, but explicit configurations are also present then the explicit ones should override the automatic configurations. Some flags such as is_vcs are handler specific so they can easily be moved inside the handler code itself in a way that individual handler implementation has an appropriate default value, but honors the explicit value, if present in the config file.

jirikuncar added a commit to jirikuncar/timegate that referenced this issue May 18, 2016
* Adds configurable multi-handler support via `[handler:<name>]`
  sections in INI file or `HANDLER` key in config.  (closes mementoweb#18)

Signed-off-by: Jiri Kuncar <[email protected]>
@hariharshankar
Copy link
Contributor

Sorry for the late response, Herbert, Shawn, and I had a chance to discuss this issue only today.

Thanks a lot for all your effort. Is there use case where having this multi-handler setup will be useful, other than for aggregation purposes? We are worried if this is going to introduce unnecessary complexity for something that may not be useful for 99% of users.

We specifically designed this software to handle only one archive at a time for simplicity and easy adaptability. This software is intended to make existing versioning systems Memento compatible, and this requires only one handler to connect to the versioning system's API. Hence, this timegate is intended to handle one API at a time, which keeps the setup, documentation, and maintenance simple.

Our earlier version did have multi-handler support and it made things unnecessarily complex. In this setup, when one archive/handler misbehaved, it would render all the other timegates unresponsive. So, thats another reason we made this timegate handle one archive only. We run an aggregation instance here at LANL with about 15 separate timegates without any issues. So, we are not going to go down the multi-handler path again. If you feel there is a specific use case that we may have overlooked, please let us know.

@jirikuncar
Copy link
Contributor

We are worried if this is going to introduce unnecessary complexity for something that may not be useful for 99% of users.

@hariharshankar can you please have a look at the implementation in #19? The configuration for multi handler support is completely optional.

Is there use case where having this multi-handler setup will be useful, other than for aggregation purposes?

We are planning to use it in @inveniosoftware for Memento support and it seems easier to integrate one WSGI middleware rather than "15 separate" ones. (cc @tiborsimko)

@ibnesayeed
Copy link
Contributor Author

@hariharshankar please don't consider it as an argument war, I am just trying to put my input to make this library better and more useful. I have no doubt that you have a better understanding of this code base and the goals it is designed to achieve. However, the first two paragraphs of the original post have enough motivation of why I made this request. There is a use case described which is other than the aggregator proxy. Although, I think aggregator proxies are also a significant use case of this library and should not be ignored.

I believe that #19 implementation is 100% backward compatible with the existing functionality and configuration while adding the multi-handler support transparently. With about +105 -51 line changes (including other refinements), it is not a big maintenance overhead. Current single handler implementation has no issues, but when multiple instances are to be run, one needs to run them on separate ports and introduce another layer of a load balancer/reverse proxy to unify them as a single service, which might be a hassle for some.

I have a few question out of curiosity and for better understanding.

We are worried if this is going to introduce unnecessary complexity for something that may not be useful for 99% of users.

Do we have any idea of how many people are using this library and in what scenarios? If the number of users is small (say < 10) then the percentage is not significant and a couple of aggregators would be enough to justify the overhead, but if the user base is significant or is expected to grow soon then we should perhaps understand their use cases and decide which direction the library should go.

In this setup, when one archive/handler misbehaved, it would render all the other timegates unresponsive.

How exactly? Isn't it the case that each request is treated independently? I am just curious.

@ibnesayeed
Copy link
Contributor Author

ibnesayeed commented May 20, 2016

An alternate approach would be to not implement multi-handler support in the core, but create another repository and use timegate as a module to expand on the functionality to implement multi-handler there. Though, I am not sure how easy would it be, what changes needs to be made in the core to support this, and how much of the functionality and configurations can be reused.

@ibnesayeed
Copy link
Contributor Author

Another point to note is that this feature does not affect in the slightest how a custom individual handler will be implemented. Which is essentially what users of this library are expected to implement, unless they want to run an existing example handler.

@jirikuncar
Copy link
Contributor

I can remove the support for [handler:<name>] from config.py in #19 if the direct multi-handler support is not wanted. Then the other changes can be easily considered as code improvements/refactoring without direct impact to the library itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants