[implemented in v.0.6.13] A possible way to add an analog of Express.js middleware to RESTinio #140

eao197 · 2020-12-12T13:16:16Z

eao197
Dec 12, 2020
Maintainer

The problem

RESTinio's express_router was added under the influence of Express.js router. But another important thing from Express.js, middleware, wasn't implemented in RESTinio. We didn't need such functionality because the RESTinio usage scenarios were rather simple at that time.

But the time goes on, and the scenarios on RESTinio usage become more and more complex. And it seems to me that an analog of Express.js middleware would be a good addition to RESTinio.

For example, let's see the following case: on an incoming request, we have to a) check the presence of some mandatory HTTP-fields and their value, b) perform authentication of a user, c) do the authorization of the user (e.g., check access rights for the URL), and only then do the actual request processing.

That case can be easily expressed using middlewares: one for checking the necessary HTTP-fields, one for the authentication, one for the authorization. But RESTinio lacks that feature at the moment.

I thought about adding such a feature, and it seems that there are several ways of adding it to RESTinio. I'll describe three of them there, and it will be great if someone shares opinions about those approaches.

Disclaimer

I didn't do any experiments to check the possibility of the implementation of the approaches described below. But it seems that all of them are pretty doable even in C++14.

The general idea

The general idea is to define a chain of request-handlers. Something like:

class http_fields_checker {
  ...
public:
  some_return_type operator()(const restinio::request_handle_t & req) {...}
};

class user_authentificator {
  ...
public:
  some_return_type operator()(const restinio::request_handle_t & req) {...}
};

class user_authorization {
  ...
public:
  some_return_type operator()(const restinio::request_handle_t & req) {...}
};

class actual_request_processor {
  ...
public:
  some_return_type operator()(const restinio::request_handle_t & req) {...}
};

restinio::default_request_handler_t handlers_chain = restinio::make_handlers_chain(
    std::make_unique<http_fields_checker>(...),
    std::make_unique<user_authentificator>(...),
    std::make_unique<user_authorization>(...),
    std::make_unique<actual_request_processor>(...));

restinio::run( restinio::on_this_thread()
    .address(...).port(...).request_handler(std::move(handlers_chain))
  );

The type of the return value of request-handlers in a chain

The return value of a request-handler in a chain should indicate the necessity of calling the next handler in the chain. It's not possible with the current request_handling_status_t that is defined as the following:

enum class request_handling_status_t : std::uint8_t
{
  accepted, rejected
};

So there could be at least two solutions.

The first one is the addition of a new value to request_handling_status_t:

enum class request_handling_status_t : std::uint8_t
{
  accepted, rejected, try_next
};

A request-handler in a chain should return request_handling_status_t::try_next if the next handler in the chain has to be activated.

Another solution is to use different type as the result of a request-handler:

struct try_next_t {};

using chain_handling_status_t = variant_t<try_next_t, request_handling_status_t>;

If a request-handler returns an instance of a new empty type try_next_t then the next handler in the chain should be activated. Otherwise (when request_handling_status_t is returned) the processing of the chain should be stopped.

Personally, I prefer the solution with new try_next_t and chain_handling_status_t types (but the names for the new types can be not as good as I want).

Spreading request-related info from one chain link to another

The main problem with chained request-handlers is the spreading of some request-related info from previous handlers to the next handlers in the chain.

For example, http_field_checker can extract some data from a HTTP-field, convert it into some internal form, and check it. Then that data has to be stored somewhere to be accessible in the further processing of the request. user_authentificator can detect an internal ID of the user and has to store that ID to be accessible by user_authorization and so on.

Such data sharing is not a problem for middleware in the original Express.js because of the dynamic nature of JavaScript programming language. But we are in the statically, but weakly typed C++...

So I see several ways of organizing data transfer from one chain link to another.

The simplest solution: a map of std::any in request

The simplest way is to allow to store a map of named instances of std::any (or its analog for C++14) inside restinio::request_t. Something like:

class user_authentificator {
public:
  using user_id_t = uuid_t;
  static std::string user_id_key() { return "user_authentificator.user_id"; }

  restinio::chain_handling_status_t operator()(
    const restinio::request_handle_t & req) {
    ... // Do the authentication.
    user_id_t user_id = ...;

    // Store user_id inside request.
    req->user_data().store(user_id_key(), std::any{user_id});

    return restinio::try_next_t{};
  }
  ...
};

class user_authorization {
  ...
public:
  restinio::chain_handling_status_t operator()(
    const restinio::request_handle_t & req) {
    // Access the user_id that should be stored earlier.
    auto user_id = std::any_cast<user_authentificator::user_id>(
        req->data().access(user_authentificator::user_id_key()));

    ... // Handling the user_id.

    return restinio::try_next_t{};
  }
  ...
};

Pros

It's a very simple and understandable approach. I don't expect someone will have a problem with it. It doesn't require a PhD degree from a developer to understand how to use it.

It doesn't require changing the format of a request-handler (even request_handler for express_router or easy_parser_router). So a new functionality can easily be added to existing request_handlers without a big code refactoring.

Cons

The nature of such a way is fully dynamic. The compiler can't help us and all errors (wrong key or wrong typecast via std::any_cast) will be detected only at the run-time.

The performance penalty especially if std::map/std::unordered_map with dynamic allocations will be used as the storage and std::string as a key. Any memory allocation/deallocation during request processing can hit the performance.

We can try to avoid that performance hit, for example, by limiting the size of keys. Thus, we can use something like:

struct user_data_key_t {
  std::uint8_t m_length;
  std::array<char, 31> m_data;

  ... // Stuff for comparison and so on.
};

struct user_data_item_t {
  user_data_key_t m_key;
  std::any m_value;
};

class request_t {
  ...
  std::optional<std::vector<user_data_item_t>> m_user_data;
  ...
};

But that will make the use of user-data is not as convenient as I wanted to be.

Functional-like solution: the result of the current chain link will be the input for the next link

This approach assumes that a chain link returns something like:

variant<try_next_t<Result>, request_handling_status_t>

where try_next_t<Result> contains an instance of some handler-related type Result. A rvalue-reference to that instance will be passed as an additional argument to the next request-handler.

With this approach our sample chain can look like:

class http_fields_checker {
  ...
public:
  struct result_t { ... };

  restinio::chain_handling_status_t<result_t> operator()(
    const restinio::request_handle_t & req) {
    ...
    return restinio::try_next_t<result_t>{...};
  }
};

class user_authentificator {
  ...
public:
  struct result_t {
    http_fields_checker::result_t m_prev_result;
    user_id_t m_user_id;
  };

  restinio::chain_handling_status_t<result_t> operator()(
    const restinio::request_handle_t & req,
    http_fields_checker::result_t && incoming_user_data) {
    ...
    return restinio::try_next_t<result_t>{
        std::move(incoming_user_data),
        user_id
      };
  }
};

class user_authorization {
  ...
public:
  struct result_t {
    user_authentificator::result_t m_prev_result;
    ...
  };

  restinio::chain_handling_status_t<result_t> operator()(
    const restinio::request_handle_t & req,
    user_authentificator::result_t && incoming_user_data) {
    ...
    return restinio::try_next_t<result_t>{
        std::move(incoming_user_data),
        ...
      };
  }
};

Pros

There we have the full help from the compiler. Many mistakes will be detected by the compiler at the compile-time.

This approach is expected to be much more efficient because the transition of user-data from handler to handler will be by value in most cases (and heavy data can be passed efficiently by using move semantics).

Cons

I'm afraid this approach won't be flexible enough and it will require a lot of refactoring if the order of request-handlers has to be changed.

This approach can be difficult for non-experienced developers.

"Preallocated tuple" for every chain invocation

The idea is: a user specifies types of the parts of the whole user-data for a request during the creation of a chain:

class http_fields_checker {
public:
  strict data_t {...};
  ...
};

class user_authentificator {
public:
  struct data_t {...};
  ...
};

class user_authorization {
public:
  struct data_t {...};
  ...
};

...
restinio::default_request_handler_t handlers_chain = restinio::make_handlers_chain<
  http_fields_checker::data_t,
  user_authentificator::data_t,
  user_authorization::data_t>(
    std::make_unique<http_fields_checker>(...),
    std::make_unique<user_authentificator>(...),
    std::make_unique<user_authorization>(...),
    std::make_unique<actual_request_processor>(...));

Then chained-handler creates an instance of std::tuple<http_fields_checker::data, user_authentificator::data, user_authorization::data> at the start of its work and passes a reference to that instance (or to a part of it) to every handler in the chain.

A request-handler receives an additional argument. That argument can have two forms.

The first form is for cases when the request-handler needs only a part of user-data. For example, http_fields_checker needs only a part of type http_fields_checker::data. So http_fields_checker can look like:

class http_fields_checker {
public:
  strict data_t {...};
  ...
  restinio::chain_handling_status_t operator()(
    restinio::user_data_fragment_t<data_t> & user_data,
    const restinio::request_handle_t & req) {...}
};

The user_authentificator can look the similar way:

class user_authentificator {
public:
  struct data_t {...};
  ...
  restinio::chain_handling_status_t operator()(
    restinio::user_data_fragment_t<data_t> & user_data,
    const restinio::request_handle_t & req) {...}
};

But user_authorization and actual_request_processor can require more than one part. In that case we have to use whole_user_data_t type:

class user_authorization {
public:
  struct data_t {...};
  ...
  template<typename User_Data>
  restinio::chain_handling_status_t operator()(
    restinio::whole_user_data_t<User_Data> & user_data,
    const restinio::request_handle_t & req) {
    auto user_id = std::get<user_authentificator::data_t>(user_data.get()).m_user_id;
    ...
  }
};

class actual_request_processor {
  ...
public:
  using user_data_t = std::tuple<
    http_fields_checker::data_t,
    user_authentificator::data_t,
    user_authorization::data_t>;

  restinio::chain_handling_status_t operator()(
    restinio::whole_user_data_t<user_data_t> & user_data,
    const restinio::request_handle_t & req) {
    auto user_id =  std::get<user_authentificator::data_t>(user_data.get()).m_user_id;
    auto some_header_value = std::get<http_fields_checker::data_t>(user_data.get()).m_some_header;
    ...
  }
};

Pros

This approach will have the smallest run-time overhead because that "preallocated tuple" instance will be created (at the stack) only once and then only a reference to it will be passed to request-handlers.

There we will have the full help from the compiler.

There we will have a decoupling between the parts of the whole user-data. Thus, if http_fields_checker cares only about its part it should not know about all other parts of the actual user data. So we can easily mix different intermediate request-handlers and change their order.

Cons

This approach can be difficult for non-experienced developers.

This version can be sensitive to the order of handlers invocation. For example, user_authorization assumes that user_authentificator is already completed. But if we make a mistake and put user_authorization before user_authentificator in the chain then user_authorization will receive a reference to non-initialized data.

This flaw can be addressed somehow.

One way to do that is to hold tuple<optional<T1>, optional<T2>, ...> instead of tuple<T1, T2, ...>. In that case restinio::user_data_fragment_t<T1>::get can check the presence of a value of T1 and throw an exception.

Another way is to add new types like input_t<T> and output_t<T> to the signature of a request-handler, for example:

class user_authorization {
public:
  struct data_t {...};
  ...
  restinio::chain_handling_status_t operator()(
    restinio::user_data::input_t<user_authentificator::data_t> & in_user_id,
    restinio::user_data::output_t<data_t> & out_data,
    const restinio::request_handle_t & req) {
    auto user_id = in_user_id.get().m_user_id;
    ...
  }
};

But this will make the approach yet more complex.

Support of user-data in express- and easy_parser_routers

If the "functional-like" or "preallocated tuple" approach will be chosen then there will be a need to marry the selected approach with express_router (and easy_parser_router). I think it's possible (and can be not so complex).

For example, the express_router_t is now defined as:

template<typename Regex_Engine = std_regex_engine_t>
class express_router_t
{
  ...
};

I think that definition can be changed that way:

struct no_user_data_t;

template<
  typename Regex_Engine = std_regex_engine_t,
  typename User_Data = no_user_data_t>
class express_router_t
{
  ...
};

And express_router_t will have two implementations. The first will be a specialization for a case when User_Data is no_user_data_t. In that case express_router_t will work as in the previous versions and a request-handler will have the format:

request_handling_status_t handler(
  const request_handle_t & req,
  const router::route_params_t & params);

The second implementation is for a case when User_Data isn't no_user_data_t. In that case a request-handler will have the format:

request_handling_status_t handler(
  whole_user_data_t<User_Data> & user_data,
  const request_handle_t & req,
  const router::route_params_t & params);

or something like that.

The solution I prefer most at this time

Unfortunately, there is no solution that I can mark as "the best one". Every one of the approaches described above has its strong and weak sides. But I prefer more efficient ones with strict compiler checking.

So I don't like the first one with the map of std::any.

Selecting between two others I, probably, prefer the last one with "preallocated tuple".

The feedback is encouraged

I will be glad to receive some feedback from those who use RESTinio or are interested in RESTinio.

Do you want to have chained request-handlers (aka middleware) in RESTinio? Maybe they are not necessary at all?

If you want to see that functionality what of the approaches described above do you like most? Maybe dislike most?

Maybe you see similar functionality in other C++ web-frameworks and can point me to the things you like?

eao197 · 2020-12-14T12:24:06Z

eao197
Dec 14, 2020
Maintainer Author

I think there is another approach. Maybe it's better than the approaches described above. Let's call it a "user-data-factory".

The first part of that approach is the extension of restinio::request_t by a user's data.

Let's see that is an example:

class http_fields_checker {
public:
  struct data_t {...};
  ...
};

class user_authentificator {
public:
  struct data_t {...};
  ...
};

class user_authorization {
public:
  struct data_t {...};
  ...
};

// The type to be used as the factory of user-data per request.
struct my_user_type_factory {
  // This name is required by RESTinio.
  using data_type = std::tuple<
      http_fields_checker::data_t,
      user_authentificator::data_t,
      user_authorization::data_t
    >;

  // This method will be called by RESTinio for every new request.
  void allocate_within()(void * memory) {
    new(memory) data_type{};
  }
};

// The user_type_factory should be specified in server traits.
struct my_traits : public restinio::default_traits_t {
  using user_data_factory_t = my_user_data_factory;
};

Classes restinio::default_traits_t and restinio::default_single_thread_traits_t will define user_data_factory_t as restinio::no_user_data_factory_t:

struct no_user_data_factory_t {
  struct data_type {};
};

The type restinio::no_user_data_factory_t will be specially handled by RESTinio.

The old type request_t becomes an template type:

template<typename User_Data>
class incoming_request_t {
public:
  ...
  User_Data & user_data() noexcept;
  const User_Data & user_data() const noexcept;
};

with the specialization for restinio::no_user_data_factory_t::data_type:

template<>
class incoming_request_t<no_user_data_factory_t::data_type> {
public:
  ...
  // Without `user_data` methods.
};

The old name restinio::request_t becomes an alias:

using request_t = incoming_request_t<no_user_data_factory_t::data_type>;

And restinio::request_handle_t becomes an another alias:

template<typename UD>
using incoming_request_handle_t =
  std::shared_ptr<incoming_request_t<UD>>;

using request_handle_t =
  incoming_request_handle_t<no_user_data_factory_t::data_type>;

The second part of the "user-data-factory" approach is the format of a request-handler. It will have the following form:

request_handling_status_t handler(
  incoming_request_handle_t<typename traits::user_data_factory_t::data_type> req);

It means that if a user doesn't define his/her own user_data_factory_t in the server traits then the new format will exactly be the old one.

The enum request_handling_status_t will be expanded and a new value try_next will be added to it:

enum class request_handling_status_t : std::uint8_t
{
  accepted, rejected, try_next
};

The third part is the way of grouping request-handlers in a chain.

I think there will be at least two helpers. The first one for the case when the number of actual request-handlers is known at the compile-time:

struct my_traits : public restinio::default_traits_t {
  using user_data_factory_t = my_user_data_factory;

  using request_handler_t = restinio::fixed_size_chain_t<4>;
};
...
restinio::run(on_this_thread<my_traits>()
  ...
  .request_handler(my_traits::request_handler_t::make(
    std::make_unique<http_fields_checker>(...),
    std::make_unique<user_authentificator>(...),
    std::make_unique<user_authorization>(...),
    std::make_unique<actual_request_handler>(...)))
  ...);

The second one will be used when a number of handlers in the chain is detected at the run-time:

struct my_traits : public restinio::default_traits_t {
  using user_data_factory_t = my_user_data_factory;

  using request_handler_t = restinio::dynamic_size_chain_t;
};
...
auto chain = my_traits::request_handler_t::make();
if(config.log_incoming_requests())
  chain->add(std::make_unique<request_logger>(...));
if(config.disable_head_requests())
  chain->add(std::make_unique<head_request_interceptor>(...));

chain->add(std::make_unique<http_fields_checker>(...));
chain->add(std::make_unique<user_authentificator>(...));
chain->add(std::make_unique<user_authorization>(...));
chain->add(std::make_unique<actual_request_handler>(...));

restinio::run(on_this_thread<my_traits>()
  ...
  .request_handler(std::move(chain))
  ...
);

Discussion

Pros

This approach is expected to be efficient enough. Because user-data can be held in a request object by value. Something like:

template<typename User_Data>
class incoming_request_t {
  ...
  alignas(User_Data) std::array<char, sizeof(User_Data)> m_user_data;
  ...
public:
  ...
  User_Data & user_data() noexcept {
    return *(reinterpret_cast<User_Data *>(m_user_data.data()));
  }
  ...
};

// Somewhere in RESTinio:
auto new_request = std::make_shared<incoming_request_t<User_Data>>(...);
m_user_data_factory->allocate_within(new_request->m_user_data.data());

This approach keeps compatibility with previous versions of RESTinio. Maybe somewhere in the user's code, some handling of request_handling_status_t::try_next has to be added.

This approach allows keeping the signature of request-handlers the same for cases when handler-chaining will be used and for cases where just one request-handler is necessary.

This approach allows writing intermediate request-handlers that care only for their own part of user data. For example, let's assume that we want to make user_authentificator a reusable handler. We can write it that way:

class user_authentificator {
  ...
public:
  struct data_t {...};

  template<typename User_Data>
  restinio::request_handling_status_t operator()(
    restinio::incoming_request_handle_t<User_Data> req)
  {
    auto & data = std::get<data_t>(req->user_data());
    ...
  }
};

At the same time, we'll have strict control from the compiler.

The user-data is bundled with request object and that simplifies async processing of requests:

auto some_request_handler(
  restinio::incoming_request_handle_t<my_data> req) {
  // Update user-data.
  req->user_data().make_some_changes(...);

  // Delegating the request-processing to another thread.
  send_for_processing(target_thread, std::move(req));
}

In that case, all user-data will be transferred to the worker thread just inside the request object.

Cons

If a user-data is used then the signatures of request-handlers will be bound to the type of user-data. It can make writing a reusable code harder.

This approach can be difficult for non-experienced developers.

0 replies

prince-chrismc · 2021-01-08T20:16:52Z

prince-chrismc
Jan 8, 2021

I did not see this discussion, I saw this new feature in the release notes and today I took some time to try it out. I made a quick POC riping out my auth handling route wrapper 🤢 .

All in all it looks promising. It absolutely will help cut down on code duplication. It solve a very common web pattern too!

Little pain with all the indirection using the growable chain and many routers... but the new example helped a ton!

I'll try to report back with my final thoughts... the POC was successful to I'll try to do a full implementation based on this work.

❤️ Thank you for now 😄

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[implemented in v.0.6.13] A possible way to add an analog of Express.js middleware to RESTinio #140

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[implemented in v.0.6.13] A possible way to add an analog of Express.js middleware to RESTinio #140

eao197 Dec 12, 2020 Maintainer

The problem

Disclaimer

The general idea

The type of the return value of request-handlers in a chain

Spreading request-related info from one chain link to another

The simplest solution: a map of std::any in request

Functional-like solution: the result of the current chain link will be the input for the next link

"Preallocated tuple" for every chain invocation

Support of user-data in express- and easy_parser_routers

The solution I prefer most at this time

The feedback is encouraged

Replies: 2 comments

eao197 Dec 14, 2020 Maintainer Author

prince-chrismc Jan 8, 2021

eao197
Dec 12, 2020
Maintainer

eao197
Dec 14, 2020
Maintainer Author

prince-chrismc
Jan 8, 2021