Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LAVA jobs to pass artifacts back to the runner #17

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eds-collabora
Copy link
Contributor

@eds-collabora eds-collabora commented Jan 22, 2023

We'd like to potentially add artifacts created by LAVA jobs to the archive stored by Gitlab. To achieve this, we run a cut-down web server on the lava-gitlab-runner that is able to respond to POST requests at

   http://<runner ip:ephemeral port>/<key>/artifacts/

The LAVA job is able to know the upload URL because we introduce a new namespace for templated variables runner in the lava-gitlab-runner and add ARTIFACT_UPLOAD_URL to it.

It's still relatively complicated to get variables into LAVA tests; the pattern I used in my test repository

https://gitlab.collabora.com/eds/callback-tests

is to create a parameter called CALLBACK_URL in the test itself, and then in the job we can use a stanza like:

   test: foo
     parameters:
       CALLBACK_URL: {{ '{{ runner.ARTIFACT_UPLOAD_URL }}' }}

to make it available to the test. Bear in mind, if you use this that LAVA does not automatically export parameters from environment variables so you will need to export it inside your steps: in your test if you want to use it in scripts.

The key part of the upload URL is a long random string. It's generated uniquely per Gitlab job, not per LAVA job, although this detail is not important unless you are performing multi-node tests.

Because of the dynamic nature of the key, and the runner's port and IP, artifact upload is only possible for submit jobs. For monitor jobs, there's simply no way to communicate the necessary URL to them.

Backing the webserver, there is a shared UploadServer that stores the uploaded artifacts, and bridges between the web server thread and the job thread. It stores a JobArtifacts for each active job, which the ArtifactStore can query when we come to upload files. I've elected to put the uploaded artifacts at:

<job_id>_artifacts/

in the archive, to match the other uploads which also lead with the job ID. Note however that it will require some significant reworking to support distinct directories for multi-node jobs. That's because we do not know how many nodes there are in the job until after we submit, at which point it's too late to create new keys for the other jobs. We could speculatively create a surplus, for example, but we couldn't then tie them to job IDs anyway.

For the generated upload URL, note that you can use the environment variables LAVA_GITLAB_RUNNER_ROUTABLE_HOST,
LAVA_GITLAB_RUNNER_ROUTABLE_PORT to specify an arbitrary external address, for example for an appropriate reverse proxy. Also the upload URL will always be https, even though the service itself does not have TLS support.

Signed-off-by: Ed Smith [email protected]

We'd like to potentially add artifacts created by LAVA jobs to the
archive stored by Gitlab. To achieve this, we run a cut-down web
server on the lava-gitlab-runner that is able to respond to POST
requests at

   http://<runner ip:ephemeral port>/<key>/artifacts/

The LAVA job is able to know the upload URL because we introduce a new
namespace for templated variables `runner` in the lava-gitlab-runner
and add `ARTIFACT_UPLOAD_URL` to it.

It's still relatively complicated to get variables into LAVA tests;
the pattern I used in my test repository

   https://gitlab.collabora.com/eds/callback-tests

is to create a parameter called `CALLBACK_URL` in the test itself, and
then in the job we can use a stanza like:

```yaml
   test: foo
     parameters:
       CALLBACK_URL: {{ '{{ runner.ARTIFACT_UPLOAD_URL }}' }}
```

to make it available to the test. Bear in mind, if you use this that
LAVA does not automatically export parameters from environment
variables so you will need to export it inside your `steps:` in your
test if you want to use it in scripts.

The `key` part of the upload URL is a long random string. It's
generated uniquely per Gitlab job, not per LAVA job, although this
detail is not important unless you are performing multi-node tests.

Because of the dynamic nature of the key, and the runner's port and
IP, artifact upload is only possible for `submit` jobs. For `monitor`
jobs, there's simply no way to communicate the necessary URL to them.

Backing the webserver, there is a shared `UploadServer` that stores
the uploaded artifacts, and bridges between the web server thread and
the job thread. It stores a `JobArtifacts` for each active job, which
the `ArtifactStore` can query when we come to upload files. I've elected
to put the uploaded artifacts at:

  `<job_id>_artifacts/`

in the archive, to match the other uploads which also lead with the
job ID. Note however that it will require some significant reworking
to support distinct directories for multi-node jobs. That's because we
do not know how many nodes there are in the job until after we submit,
at which point it's too late to create new keys for the other jobs.
We could speculatively create a surplus, for example, but we couldn't
then tie them to job IDs anyway.

For the generated upload URL, note that you can use the environment
variables `LAVA_GITLAB_RUNNER_ROUTABLE_HOST`,
`LAVA_GITLAB_RUNNER_ROUTABLE_PORT` to specify an arbitrary external
address, for example for an appropriate reverse proxy. Also the upload
URL will always be `https`, even though the service itself does not
have TLS support.

Signed-off-by: Ed Smith <[email protected]>
))
.expect("failed to bind listener");

let routable_host = match env::var("LAVA_GITLAB_RUNNER_ROUTABLE_HOST") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really just be one argument/environment variable for the base URL rather then specifying a seperate hostname and port so one can also do https://my-upload-host:2456/badger/snakes if this endpoint is not mounted on the root

LavaUploadableFileType::Log { id } => format!("{}_log.yaml", id).into(),
LavaUploadableFileType::Junit { id } => format!("{}_junit.xml", id).into(),
LavaUploadableFileType::Artifact { id, path } => {
format!("{}_artifacts/{}", id, path).into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just make that path as it can only come out of submit jobs; the <job_id>_bla.* mostly came due to those coming from the monitor jobs as well

}
};

let listener = std::net::TcpListener::bind(std::net::SocketAddr::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bothering with a tcp listener rather then just asking Axum::Server to bind itself

.to_string();

warn!(
"No LAVA_GITLAB_RUNNER_ROUTABLE_HOST set, using best guess of local IP {}.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no base url set then simply don't enable the listen functionality; local_pi() is of quite questionable value.

@@ -921,6 +1005,10 @@ async fn new_job(job: Job) -> Result<impl CancellableJobHandler<LavaUploadableFi
Ok(Run::new(lava, url, job, cancel_behaviour))
}

async fn upload_artifact(Path((job, path)): Path<(String, String)>, body: Bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really needs to be a chunked/streaming transfer as the lava job might be quite big files not stuff we want to store in memory perse

// Wipe any dead jobs as the new one starts. It's not hugely
// important when this happens, so long as it happens
// periodically.
self.cleanup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is a periodic cleanup required? or rather cant the registration be dropped whenever the relevant jobs get dropped?.. Also I'm not a big fan of scatering Arc<Mutex<T>> around as it tends to make things harder to use from the user point of view :)

@@ -844,6 +927,7 @@ type LavaMap = Arc<Mutex<BTreeMap<(String, String), Arc<ThrottledLava>>>>;
lazy_static! {
static ref LAVA_MAP: LavaMap = Arc::new(Mutex::new(BTreeMap::new()));
static ref MAX_CONCURRENT_REQUESTS: Arc<Mutex<usize>> = Arc::new(Mutex::new(20));
static ref UPLOAD_SERVER: Arc<Mutex<UploadServer>> = Default::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather then having a global static variable why pass a reference to the Run struct ; that avoids the global variable and potentlaly a lot of the lock().unwrap() dances (though that could be sorted out by internal mutability as wel..

Tbh it would probably be nicer if the Runs each simply had a copy or a registration object from the factory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants