Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add compute block config validation #5

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

PaulKalho
Copy link
Member

@PaulKalho PaulKalho commented Oct 30, 2024

Closes #9
Closes #4

Explaination of the config:

name: "NLP toolbox"
description: "Contains NLP algorithms..."
author: "John Doe"
docker_image: "https://ghcr.io/nlp-toolbox"

entrypoints:
  topic_modelling:
    description: "Run topic modelling"
    envs:
      LANGUAGE: "de"
    inputs:
      text_data:
        description: "Text file. Can be uploaded by the user."
        type: "file"
        config:
          TXT_SRC_PATH: null
      db_data:
        description: "Information in a database"
        type: "db_table"
        config:
          DATA_TABLE_NAME: "nlp_information"
          DB_HOST: "time.rwth-aachen.de"
          DB_PORT: 1234
    outputs:
      topic_model:
        type: "file"
        description: "Topic model file"
        config:
          OUTPUT_PATH_TOPIC_MODEL: null
      run_durations:
        type: "db_table"
        description: "Table that contains the run durations per day."
        config:
          RUN_DURATIONS_TABLE_NAME: "run_durations_nlp"

  analyze_runtime:
    description: "Analyze the runtimes"
    inputs:
      run_durations:
        description: "Teble that contains all runtimes and dates"
        type: "db_table"
        config:
          RUN_DURATIONS_TABLE_NAME: "run_durations_nlp"
    outputs:
      csv_output:
        type: "file"
        description: "A csv containing statistical information"
        config:
          CSV_OUTPUT_PATH: "outputs/statistics.csv"

Each compute block has some metadata (e.g. name, author) it also contains a link to its docker-image.

It lists the entrypoints, which should have the same name as the functions which are "decorated" with the @entrypoint decorator.

  • Each entrypoint can have so called envs these can be used to configure the functionality of an entrypoint and should describe Variables that are shared within the entrypoints functionality.

  • Each entrypoint can have an unspecified amount of inputs and outputs. Which, in the first Version, can be of the following types: db_table, file

  • These inputs and outputs can also be configured using the config block. Strictly speaking, these are also ENV-Variables however for semantics these are grouped to the input (as they configure functionality for the input (e.g. DB_HOST, DB_PORT)).

    • The goal is that the user can set these variables as well -> With semantic context to the input
  • The values of the config keys should be the default values, provided by the ComputeBlock (TODO: Validate)

    • If the value is explicitly set to Null within the config, the user of the ComputeBlock must manually set this configuration, or StartUp of the ComputeBlock will fail
  • Of course, the ComputeBlock developer can also define private ENV-Variables that must not be reflected within the config yaml (e.g. a shared DB_HOST that is not configurable or DB_PASS)


Outputs:

There are currently two types of outputs:

db_table, file

they have the same characteristics as the corresponding input types, however, they cannot be optional.


@PaulKalho PaulKalho changed the title wip: add config add compute block config validation Oct 30, 2024
@PaulKalho PaulKalho self-assigned this Oct 30, 2024
@PaulKalho PaulKalho mentioned this pull request Nov 4, 2024
@PaulKalho PaulKalho marked this pull request as ready for review November 7, 2024 06:50
@PaulKalho PaulKalho marked this pull request as draft November 7, 2024 11:40
README.md Outdated Show resolved Hide resolved
scystream/sdk/config/config_loader.py Outdated Show resolved Hide resolved
tests/test_config_files/valid_config.yaml Outdated Show resolved Hide resolved
@PaulKalho
Copy link
Member Author

@mottegk thanks for taking a look!
I drafted this again, as I noticed that I forgot some things. I will rerequest your review as soon as I pushed these changes :)

@PaulKalho PaulKalho marked this pull request as ready for review November 12, 2024 18:57
@PaulKalho PaulKalho mentioned this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reevaluate internal validation of config file contents Schema for validating input and output data
2 participants