Allow LAVA jobs to pass artifacts back to the runner #17

eds-collabora · 2023-01-22T22:07:04Z

We'd like to potentially add artifacts created by LAVA jobs to the archive stored by Gitlab. To achieve this, we run a cut-down web server on the lava-gitlab-runner that is able to respond to POST requests at

   http://<runner ip:ephemeral port>/<key>/artifacts/

The LAVA job is able to know the upload URL because we introduce a new namespace for templated variables runner in the lava-gitlab-runner and add ARTIFACT_UPLOAD_URL to it.

It's still relatively complicated to get variables into LAVA tests; the pattern I used in my test repository

https://gitlab.collabora.com/eds/callback-tests

is to create a parameter called CALLBACK_URL in the test itself, and then in the job we can use a stanza like:

   test: foo
     parameters:
       CALLBACK_URL: {{ '{{ runner.ARTIFACT_UPLOAD_URL }}' }}

to make it available to the test. Bear in mind, if you use this that LAVA does not automatically export parameters from environment variables so you will need to export it inside your steps: in your test if you want to use it in scripts.

The key part of the upload URL is a long random string. It's generated uniquely per Gitlab job, not per LAVA job, although this detail is not important unless you are performing multi-node tests.

Because of the dynamic nature of the key, and the runner's port and IP, artifact upload is only possible for submit jobs. For monitor jobs, there's simply no way to communicate the necessary URL to them.

Backing the webserver, there is a shared UploadServer that stores the uploaded artifacts, and bridges between the web server thread and the job thread. It stores a JobArtifacts for each active job, which the ArtifactStore can query when we come to upload files. I've elected to put the uploaded artifacts at:

<job_id>_artifacts/

in the archive, to match the other uploads which also lead with the job ID. Note however that it will require some significant reworking to support distinct directories for multi-node jobs. That's because we do not know how many nodes there are in the job until after we submit, at which point it's too late to create new keys for the other jobs. We could speculatively create a surplus, for example, but we couldn't then tie them to job IDs anyway.

For the generated upload URL, note that you can use the environment variables LAVA_GITLAB_RUNNER_ROUTABLE_HOST,
LAVA_GITLAB_RUNNER_ROUTABLE_PORT to specify an arbitrary external address, for example for an appropriate reverse proxy. Also the upload URL will always be https, even though the service itself does not have TLS support.

Signed-off-by: Ed Smith [email protected]

We'd like to potentially add artifacts created by LAVA jobs to the archive stored by Gitlab. To achieve this, we run a cut-down web server on the lava-gitlab-runner that is able to respond to POST requests at http://<runner ip:ephemeral port>/<key>/artifacts/ The LAVA job is able to know the upload URL because we introduce a new namespace for templated variables `runner` in the lava-gitlab-runner and add `ARTIFACT_UPLOAD_URL` to it. It's still relatively complicated to get variables into LAVA tests; the pattern I used in my test repository https://gitlab.collabora.com/eds/callback-tests is to create a parameter called `CALLBACK_URL` in the test itself, and then in the job we can use a stanza like: ```yaml test: foo parameters: CALLBACK_URL: {{ '{{ runner.ARTIFACT_UPLOAD_URL }}' }} ``` to make it available to the test. Bear in mind, if you use this that LAVA does not automatically export parameters from environment variables so you will need to export it inside your `steps:` in your test if you want to use it in scripts. The `key` part of the upload URL is a long random string. It's generated uniquely per Gitlab job, not per LAVA job, although this detail is not important unless you are performing multi-node tests. Because of the dynamic nature of the key, and the runner's port and IP, artifact upload is only possible for `submit` jobs. For `monitor` jobs, there's simply no way to communicate the necessary URL to them. Backing the webserver, there is a shared `UploadServer` that stores the uploaded artifacts, and bridges between the web server thread and the job thread. It stores a `JobArtifacts` for each active job, which the `ArtifactStore` can query when we come to upload files. I've elected to put the uploaded artifacts at: `<job_id>_artifacts/` in the archive, to match the other uploads which also lead with the job ID. Note however that it will require some significant reworking to support distinct directories for multi-node jobs. That's because we do not know how many nodes there are in the job until after we submit, at which point it's too late to create new keys for the other jobs. We could speculatively create a surplus, for example, but we couldn't then tie them to job IDs anyway. For the generated upload URL, note that you can use the environment variables `LAVA_GITLAB_RUNNER_ROUTABLE_HOST`, `LAVA_GITLAB_RUNNER_ROUTABLE_PORT` to specify an arbitrary external address, for example for an appropriate reverse proxy. Also the upload URL will always be `https`, even though the service itself does not have TLS support. Signed-off-by: Ed Smith <[email protected]>

sjoerdsimons · 2023-01-23T12:42:31Z

src/main.rs

+        ))
+        .expect("failed to bind listener");
+
+        let routable_host = match env::var("LAVA_GITLAB_RUNNER_ROUTABLE_HOST") {


This should really just be one argument/environment variable for the base URL rather then specifying a seperate hostname and port so one can also do https://my-upload-host:2456/badger/snakes if this endpoint is not mounted on the root

sjoerdsimons · 2023-01-23T13:17:57Z

src/main.rs

            LavaUploadableFileType::Log { id } => format!("{}_log.yaml", id).into(),
            LavaUploadableFileType::Junit { id } => format!("{}_junit.xml", id).into(),
+            LavaUploadableFileType::Artifact { id, path } => {
+                format!("{}_artifacts/{}", id, path).into()


I'd just make that path as it can only come out of submit jobs; the <job_id>_bla.* mostly came due to those coming from the monitor jobs as well

sjoerdsimons · 2023-01-23T13:26:22Z

src/main.rs

+            }
+        };
+
+        let listener = std::net::TcpListener::bind(std::net::SocketAddr::new(


why bothering with a tcp listener rather then just asking Axum::Server to bind itself

sjoerdsimons · 2023-01-23T13:27:58Z

src/main.rs

+                    .to_string();
+
+                warn!(
+                    "No LAVA_GITLAB_RUNNER_ROUTABLE_HOST set, using best guess of local IP {}.",


If there is no base url set then simply don't enable the listen functionality; local_pi() is of quite questionable value.

sjoerdsimons · 2023-01-23T13:56:36Z

src/main.rs

@@ -921,6 +1005,10 @@ async fn new_job(job: Job) -> Result<impl CancellableJobHandler<LavaUploadableFi
    Ok(Run::new(lava, url, job, cancel_behaviour))
 }

+async fn upload_artifact(Path((job, path)): Path<(String, String)>, body: Bytes) {


This really needs to be a chunked/streaming transfer as the lava job might be quite big files not stuff we want to store in memory perse

sjoerdsimons · 2023-01-23T14:04:05Z

src/upload.rs

+        // Wipe any dead jobs as the new one starts. It's not hugely
+        // important when this happens, so long as it happens
+        // periodically.
+        self.cleanup();


why is a periodic cleanup required? or rather cant the registration be dropped whenever the relevant jobs get dropped?.. Also I'm not a big fan of scatering Arc<Mutex<T>> around as it tends to make things harder to use from the user point of view :)

sjoerdsimons · 2023-01-23T14:04:56Z

src/main.rs

@@ -844,6 +927,7 @@ type LavaMap = Arc<Mutex<BTreeMap<(String, String), Arc<ThrottledLava>>>>;
 lazy_static! {
    static ref LAVA_MAP: LavaMap = Arc::new(Mutex::new(BTreeMap::new()));
    static ref MAX_CONCURRENT_REQUESTS: Arc<Mutex<usize>> = Arc::new(Mutex::new(20));
+    static ref UPLOAD_SERVER: Arc<Mutex<UploadServer>> = Default::default();


rather then having a global static variable why pass a reference to the Run struct ; that avoids the global variable and potentlaly a lot of the lock().unwrap() dances (though that could be sorted out by internal mutability as wel..

Tbh it would probably be nicer if the Runs each simply had a copy or a registration object from the factory

sjoerdsimons reviewed Jan 23, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow LAVA jobs to pass artifacts back to the runner #17

Allow LAVA jobs to pass artifacts back to the runner #17

eds-collabora commented Jan 22, 2023 •

edited

Loading

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

sjoerdsimons Jan 23, 2023

Allow LAVA jobs to pass artifacts back to the runner #17

Are you sure you want to change the base?

Allow LAVA jobs to pass artifacts back to the runner #17

Conversation

eds-collabora commented Jan 22, 2023 • edited Loading

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

sjoerdsimons Jan 23, 2023

Choose a reason for hiding this comment

eds-collabora commented Jan 22, 2023 •

edited

Loading