This document gives you a bird-eye view of the architecture of Pavex.
This is an ideal starting point if you want to contribute or gain a deeper understanding of its inner workings.
A Pavex project goes through three stages in order to generate runnable application code:
- Define your request handlers, constructors and error handlers;
- Combine them together in a
Blueprint
, a representation of the desired API behaviour; - Generate the API server SDK via
pavex_cli
, using theBlueprint
as input.
In a diagram:
flowchart TB
app_rh[Request handlers] --> app_b
app_eh[Error handlers] --> app_b
app_c[Constructors] --> app_b
app_b[Blueprint] -->|Used by pavex_cli to generate| api_sdk[API server SDK]
api_sdk -->|Consumed by| api_binary[API binary]
api_sdk -->|Consumed by| tests[Black-box tests]
To accomplish these tasks, users interact with two crates:
pavex
, containing both the machinery to define aBlueprint
and the "typical" web framework utilities (request, response, extractors, etc.);pavex_cli
, the transpiler.
You can put most of the machinery in the pavex
crate in the same bucket of axum
or actix-web
:
the types and abstractions that are needed at runtime to handle incoming requests.
You will see pavex
in two contexts:
- in the signature and implementations of route handlers and type constructors, written by application developers;
- in the source code generated by
pavex_cli
.
use pavex::response::Response
// A request handler, returning a response as output.
// The response type is defined in the `pavex` crate.
pub fn stream_file(
inner: std::path::PathBuf,
http_client: reqwest::Client,
) -> Response { /* */ }
pavex::blueprint
is the module in the pavex
crate containing the interface used to craft a Blueprint
—a specification of
how the application is supposed to behave at runtime.
use pavex::blueprint::{Blueprint, Lifecycle};
use pavex::f;
/// The blueprint for our application.
/// It lists all its routes and provides constructors for all the types
/// that will be needed to invoke `stream_file`, our request handler.
///
/// This will be turned into a ready-to-run web server by `pavex_cli`.
pub fn blueprint() -> Blueprint {
Blueprint::new()
.constructor(f!(crate::load_configuration), Lifecycle::Singleton)
.constructor(f!(crate::http_client), Lifecycle::Singleton)
.constructor(f!(crate::extract_path), Lifecycle::RequestScoped)
.constructor(f!(crate::logger), Lifecycle::Transient)
.route(GET, "/home", f!(crate::stream_file))
}
A Blueprint
captures two types of information:
- route handlers (e.g. use
my_handler
for all incoming/home
requests); - type constructors (e.g. use
my_constructor
every time you need to build an instance of typeMyType
).
For each type constructor, the developer must specify the lifecycle of its output type:
- singleton - an instance is built once before, the application starts, and re-used for all incoming requests;
- request-scoped - a new instance is built for every incoming request and re-used throughout the handling of that specific request;
- transient - a new instance is built every time the type is needed, potentially multiple times for each incoming request.
All this information is encoded into the Blueprint
and passed as input to pavex_cli
to generate the API server SDK's
source code.
pavex_cli
is our transpiler, the component in charge of transforming a Blueprint
into a ready-to-run web
server.
It is packaged as a binary, a thin wrapper over the (internal) pavexc
crate.
The transpiler is where most of the complexity lives.
It must generate:
- a struct representing the application state;
- a function to build an instance of the application state, ahead of launching the web server;
- a function to build the HTTP router;
- a dispatch function (built on top of the HTTP router) to dispatch incoming requests to the correct handlers;
- for each route, a function that takes as input the server state and the incoming request while returning an HTTP response as output.
What is pavex_cli
getting as input?
Something that looks like this:
(
constructors: [
(
registered_at: "app",
import_path: "crate :: http_client",
),
(
registered_at: "app",
import_path: "crate :: extract_path",
),
(
registered_at: "app",
import_path: "crate :: logger",
),
],
handlers: [
(
registered_at: "app",
import_path: "crate :: stream_file",
),
],
component_lifecycles: {
(
registered_at: "app",
import_path: "crate :: http_client",
): Singleton,
(
registered_at: "app",
import_path: "crate :: extract_path",
): RequestScoped,
(
registered_at: "app",
import_path: "crate :: logger",
): Transient,
},
router: {
"/home": (
registered_at: "app",
import_path: "crate :: stream_file",
),
},
handler_locations: { /* */ },
constructor_locations: { /* */ }
)
We have the raw path of the functions and methods registered by the developer. We need to turn this into working source code!
To make this happen, we need to turn those strings into structured metadata.
For each of those functions and methods, we want to know:
- their input parameters;
- their output type.
But Rust does not have reflection, nor at compile-time nor at runtime!
Luckily enough, there is a feature currently baking in nightly
that, if you squint hard enough, looks like
reflection: rustdoc
's JSON output.
Using
cargo +nightly rustdoc -p library_name --lib -- -Zunstable-options -wjson
You can get a structured representation of all the types in library_name
.
This is what Pavex does: for each registered route handler and constructor, it builds the documentation for the crate
it belongs to and extracts the relevant bits of information from rustdoc
's output.
If you are going through the source code, this is the process that converts a RawCallableIdentifiers
into a Callable
, with ResolvedPath
as an intermediate step.
Callable
looks like this:
struct Callable {
pub output_fq_path: ResolvedType,
pub callable_fq_path: ResolvedPath,
pub inputs: Vec<ResolvedType>,
}
pub struct ResolvedType {
pub package_id: PackageId,
pub base_type: Vec<String>,
pub generic_arguments: Vec<ResolvedType>,
}
After this phase, we have a collection of Callable
instances representing our constructors and handlers.
It's a puzzle that we need to solve, starting from the handlers: how do we build instances of the types that they take
as inputs?
The framework machinery, as we discussed before, provides the request processing pipeline with two types out of the box:
the incoming request and the application state.
The constructors registered by the developer can then be used to transform those types and/or extract information
out of them.
For each handler, we try to build a dependency graph: we go through the input types of the request handler function and check if we have a corresponding constructor that returns an instance of that type; if we do, we then recursively look at the constructor signature to find out what types the constructor needs as inputs; we recurse further, until we have everything mapped out as a graph with graph edges used to keep track of the "is needed to build" relationship.
To put in an image, we want to build something like this for each route:
flowchart TB
handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
client[reqwest::Client]
logger[app::Logger]
config[app::Config]
path[std::path::PathBuf]
request[pavex::request::RequestHead]
config --> client
client --> handler
logger --> handler
path --> handler
request --> path
This information is encoded in the CallableDependencyGraph
struct.
At this point, we are only looking at types and signatures: we are not taking into account the lifecycle of those
types.
E.g. is reqwest::Client
a singleton that needs to be built once and reused? Or a transient type, that must be build
from scratch every time it is needed?
By taking into account these additional pieces of information, we build a HandlerCallGraph
for each handler function,
starting from its respective CallableDependencyGraph
. It looks somewhat like this:
flowchart TB
handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
client[reqwest::Client]
logger[app::Logger]
state[ServerState]
path[std::path::PathBuf]
request[pavex::request::RequestHead]
state --> client
client --> handler
logger --> handler
path --> handler
request --> path
You can spot how reqwest::Client
is now fetched from app::ServerState
instead of being built from scratch
from app::Config
.
Armed with this representation, Pavex can now generate the source code for the application library crate.
Using the same example, assuming the application has a single route, we get the following code:
use pavex::routing::Router;
use pavex::hyper::server::{Builder, conn::AddrIncoming};
use pavex::request::RequestHead;
use pavex::response::Response;
struct ServerState {
router: Router<u32>,
application_state: ApplicationState,
}
pub struct ApplicationState {
s0: app::HttpClient,
}
/// The entrypoint to build the application state, a pre-requisite to launching the web server.
pub fn build_application_state(v0: app::Config) -> crate::ApplicationState {
// [...]
}
/// The entrypoint to launch the web server.
pub async fn run(
server_builder: Builder<AddrIncoming>,
application_state: ApplicationState,
) -> Result<(), anyhow::Error> {
// [...]
}
fn route_request(
request: &RequestHead,
server_state: std::sync::Arc<ServerState>,
) -> Response {
let route_id = server_state
.router
.at(request.uri().path())
.expect("Failed to match incoming request path");
match route_id.value {
0u32 => route_handler_0(server_state.application_state.s0.clone(), request),
_ => panic!("This is a bug, no route registered for a route id"),
}
}
pub fn route_handler_0(
v0: app::HttpClient,
v1: RequestHead,
) -> Response {
let v2 = app::extract_path(v1);
let v3 = app::logger();
app::stream_file(v2, v3, v0)
}
This section focuses on issues, limitations and risks that sit outside the Pavex project itself: obstacles that we cannot remove on our own, but require coordination/collaboration with other projects.
Each risk is classified over two dimensions: impact and resolution likelihood.
For impact, we use the following emojis:
- 😭, severe impact on the developer experience/viability of the project;
- 😢, medium impact on the developer experience/viability of the project.
For resolution likelihood, we use the following emojis:
- 🔴, unlikely to be remediated on a medium time-horizon (>6 months, <2 years);
- 🟡, likely to be remediated on a medium time-horizon.
We do not care about the short term since Pavex itself still requires tons of work to be viable and it's unlikely to be ready for prime time in less than 6 months.
rustdoc
's JSON output requires the nightly
compiler.
This is not a showstopper for production usage of Pavex since nightly
is never used to compile
any code that is actually run at runtime, it is only used by the "reflection engine". Nonetheless, nightly
can cause
breakage and unnecessary disruption due to its instability. rustdoc
's JSON output itself is quickly evolving,
including breaking changes that we must keep up with.
Plan:
- Sit and wait.
rustdoc
's JSON output is likely to be stabilised, therefore we will be able to dropnightly
not too far into the future.
Generating the JSON representation of rustdoc
's output takes time, especially if we need to generate it for several
crates in the dependency tree.
Current remediations:
rustdoc
's JSON output for third-party dependencies is highly cacheable given the dependency version and the set of activated features.pavex_cli
uses SQLite to implement a per-user centralized cache of the JSON docs for all third-party crates that have been needed at least once.
Future avenues:
- The idea of hosting the JSON version of a crate's docs has been floated around. This would allow us to download the rendered JSON instead of having to build it every time from scratch.
Due to cargo
's very coarse locking scheme, it is not possible to invoke cargo
itself from a build.rs
script (
see tracking issue).
Pavex relies on cargo
commands to:
- build
rustdoc
's JSON output for local and third-party crates; - analyze the dependency tree (via
guppy
which in turn relies oncargo metadata
); - find the workspace root (via
guppy
which in turn relies oncargo metadata
).
There seems to be no active effort to remove this limitation.
Current remediations:
Pavex will rely on cargo-px
for code generation.