Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(validator): add support to validate essential metrics produced by Kepler #1834

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vprashar2929
Copy link
Collaborator

@vprashar2929 vprashar2929 commented Nov 4, 2024

This commit introduces functionality to validate essential metrics produced by Kepler
The following comparisons are included:

  • Node Exporter Comparison

    • Validates node_rapl_<package|core|dram> metrics against kepler_node_<package|core|dram>{dev}
  • Kepler Process Comparison

    • Compares kepler_process_<package|core|dram|platform|other|uncore>{latest} metrics to
      kepler_process_<package|core|dram|platform|other|uncore>{dev}
  • Kepler Node Comparison

    • Validates kepler_node_<package|core|dram|platform|other|uncore>{latest} against
      kepler_node_<package|core|dram|platform|other|uncore>{dev}

Additionally, a stressor script has been added to include system load, allowing for real-time validation of Kepler under stress conditions.

@vprashar2929 vprashar2929 marked this pull request as draft November 4, 2024 11:21
Copy link

github-actions bot commented Nov 4, 2024

🤖 SeineSailor

Here is a concise summary of the pull request changes:

Summary: This pull request introduces a new feature to validate Kepler metrics in the validator module, enhancing its capabilities for handling and validating metrics. The changes include:

  • Adding a new validate_metrics command to the validator CLI with --duration and --report-dir options.
  • Modifying the PrometheusJob named tuple and the load function to include new fields and initialize them when loading configuration from a file.
  • Introducing new functions, such as validate_metrics, ScriptResult, and write_md_report, and adding an optional dependency on click.exceptions.Exit.
  • Creating a new bash script, regression-stressor.sh, to simulate CPU load and measure its impact on the system using the stress-ng tool.

Impact: These changes expand the validator module's capabilities for handling and validating Kepler metrics, but do not affect the signatures of exported functions or global data structures. However, the introduction of the new script might affect the overall system behavior when executed.

Observations and Suggestions:

  • The changes seem to be well-structured and follow a logical flow, making it easier to understand the new feature's implementation.
  • It would be beneficial to include additional tests to ensure the new validate_metrics command and the regression-stressor.sh script work as expected.
  • Consider adding documentation or comments to explain the purpose and usage of the new functions and script, especially for users who may not be familiar with the validator module or Kepler metrics.
  • Review the error handling and logging mechanisms to ensure they are robust and provide useful information in case of failures or errors.

@vprashar2929 vprashar2929 force-pushed the add-kep-reg branch 4 times, most recently from 827aefd to 564fa4c Compare November 5, 2024 17:49

validations:
# absolute power comparison
- name: Total - absolute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also validate these invariants in the same version of dev

  • kepler_node_<pkg|core|uncore|dram|other>{dev} = sum of ( process_<pkg|core|uncore|dram|other>{dev} )
  • kepler_node_<pkg|core|..> = node_exporter_rapl_<pkg|core...>
    *sum( kepler_process_bpf_cpu ) = node_exporter_cpu_time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for kepler_node_<pkg|core|dram...>{dev} = sum of (process_<pkg|core|dram....>){dev} do you mean
MAE of sum(rate(kepler_node<pkg|core|dram>){dev}[20s]) and sum(rate(process_<pkg|core|dram>{dev}[20s]))?

click.secho(" * Generating validate metrics report file and dir", fg="green")
write_md_report(results_dir, res)

raise Exit(1) if not res.validations.passed else Exit(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just return 0 or 1 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default when a Click script is invoked as command line application (through BaseCommand.main()) the return value is ignored unless the standalone_mode is disabled in which case it’s bubbled through.
https://click.palletsprojects.com/en/stable/commands/#command-return-values

@vprashar2929 vprashar2929 force-pushed the add-kep-reg branch 6 times, most recently from 33d2963 to de1649f Compare November 9, 2024 14:18
@vprashar2929 vprashar2929 changed the title feat(validator): add support to validate kepler metrics feat(validator): add support to validate essential metrics produced by Kepler Nov 9, 2024
…y Kepler

This commit introduces functionality to validate essential metrics produced by Kepler
The following comparisons are included:

- Node Exporter Comparison
  - Validates `node_rapl_<package|core|dram>` metrics against `kepler_node_<package|core|dram>{dev}`

- Kepler Process Comparison
  - Compares `kepler_process_<package|core|dram|platform|other|uncore>{latest}` metrics to
  `kepler_process_<package|core|dram|platform|other|uncore>{dev}`

- Kepler Node Comparison
  - Validates `kepler_node_<package|core|dram|platform|other|uncore>{latest}` against
  `kepler_node_<package|core|dram|platform|other|uncore>{dev}`

Additionally, a stressor script has been added to include system load,
allowing for real-time validation of Kepler under stress conditions.

Signed-off-by: vprashar2929 <[email protected]>
@vprashar2929 vprashar2929 marked this pull request as ready for review November 10, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants