-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rfc43: add new RFC for job list service
Problem: job-list services are not documented. Add an RFC to document current RPCs.
- Loading branch information
Showing
4 changed files
with
375 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,362 @@ | ||
.. github display | ||
GitHub is NOT the preferred viewer for this file. Please visit | ||
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_41.html | ||
43/Job List Service | ||
################### | ||
|
||
The Flux Job List Service provides summary information for jobs in the | ||
system. It provides read-only access. Several ways to find / filter | ||
jobs is also supported. | ||
|
||
.. list-table:: | ||
:widths: 25 75 | ||
|
||
* - **Name** | ||
- github.com/flux-framework/rfc/spec_43.rst | ||
* - **Editor** | ||
- Albert Chu <[email protected]> | ||
* - **State** | ||
- raw | ||
|
||
Language | ||
******** | ||
|
||
.. include:: common/language.rst | ||
|
||
Related Standards | ||
***************** | ||
|
||
- :doc:`spec_18` | ||
- :doc:`spec_20` | ||
- :doc:`spec_21` | ||
- :doc:`spec_25` | ||
- :doc:`spec_26` | ||
- :doc:`spec_27` | ||
- :doc:`spec_29` | ||
- :doc:`spec_31` | ||
- :doc:`spec_41` | ||
|
||
Background | ||
********** | ||
|
||
Users are interested in seeing jobs that have been submitted to the | ||
scheduler. Some reason may include: | ||
|
||
- See which jobs are pending, running, or inactive | ||
- See what jobs are running on specific nodes | ||
- Get general information about a job, such as a job's exit code | ||
- See the order in which jobs were submitted | ||
- See how many jobs are pending in the queue before a specific one | ||
|
||
While the job info service described in RFC41 is capable of providing job owners information about their | ||
own jobs, it has several limitations: | ||
|
||
- job information may not be easily parsed / collated from multiple sources into one easily parsable format | ||
- information from multiple jobs is not collated into a simple to parse list | ||
- information about non-owned jobs is not available | ||
|
||
Goals | ||
***** | ||
|
||
- Provide read-only access to non-sensitive information for all jobs. | ||
|
||
- Hide the complexity of parsing or collating data from multiple sources for commonly accessed information. | ||
|
||
- Provide ways to find and/or filter jobs callers are interested in. | ||
|
||
Implementation | ||
************** | ||
|
||
The job list service SHALL provide callers the ability to read job information via identifier keys, which will be called *attributes*. See `Job Attributes` below for details. | ||
|
||
The job list service SHALL provide a RFC31 constraint syntax for filtering jobs. See `Constraint Operators` below for details. | ||
|
||
Job Attributes | ||
============== | ||
|
||
Job information is defined by the following *attribute* keys listed below. | ||
|
||
.. list-table:: | ||
:header-rows: 1 | ||
|
||
* - Attribute | ||
- Description | ||
- Value Encoding | ||
* - id | ||
- job id | ||
- integer | ||
* - userid | ||
- userid of job submitter | ||
- integer | ||
* - urgency | ||
- job urgency | ||
- integer | ||
* - priority | ||
- job priority | ||
- integer | ||
* - t_submit | ||
- time job was submitted | ||
- real | ||
* - t_depend | ||
- time job entered depend state | ||
- real | ||
* - t_run | ||
- time job entered run state | ||
- real | ||
* - t_cleanup | ||
- time job entered cleanup state | ||
- real | ||
* - t_inactive | ||
- time job entered inactive state | ||
- real | ||
* - state | ||
- current job state | ||
- integer | ||
* - name | ||
- job name | ||
- string | ||
* - cwd | ||
- job current working directory | ||
- string | ||
* - queue | ||
- job queue | ||
- string | ||
* - project | ||
- job project | ||
- string | ||
* - bank | ||
- job bank | ||
- string | ||
* - ntasks | ||
- job task count | ||
- integer | ||
* - ncores | ||
- job core count | ||
- integer | ||
* - nnodes | ||
- job node count | ||
- integer | ||
* - ranks | ||
- ranks a job ran on | ||
- integer | ||
* - nodelist | ||
- nodes a job ran on, may accept RFC29 Hostlist | ||
- string | ||
* - duration | ||
- job duration in seconds | ||
- real | ||
* - expiration | ||
- time job was marked to expire | ||
- real | ||
* - success | ||
- true if job was successful | ||
- boolean | ||
* - result | ||
- integer indicating job success or failure type | ||
- integer | ||
* - waitstatus | ||
- status of job as returned by waitpid(2) | ||
- integer | ||
* - exception_occurred | ||
- true if exception occurred | ||
- boolean | ||
* - exception_type | ||
- if exception occurred, exception type | ||
- string | ||
* - exception_severity | ||
- if exception occurred, exception severity | ||
- integer | ||
* - exception_note | ||
- if exception occurred, exception note | ||
- string | ||
* - annotations | ||
- annotations as described in RFC27 | ||
- object | ||
* - dependencies | ||
- current job dependencies | ||
- array of string | ||
|
||
Job attributes SHALL be returned via an object where the keys are the requested job attributes. The values are the attribute values, each encoded as described in the above table. | ||
|
||
The *attribute* *id* SHALL always be returned for each job. Every other attribute is optional. | ||
|
||
Not all job attributes are available for a job. Many attributes are dependent on job state, job submission information, system configuration, or other conditions. For example: | ||
|
||
- a job that is pending (i.e. not yet running) does not yet have any resources to run on. Therefore, *ranks* or *nodelist* cannot yet be set. Similarly, attributes such as *success* or *result* cannot yet be determined. A timestamp like *t_run* does not yet have a value. | ||
- a job submitted without dependencies will never have *dependencies* set | ||
- a job cannot belong in a *queue* on a system without a job queue | ||
- *exception_type* will only exist if *exception_occurred* is true | ||
|
||
If an *attribute* has not been set for a job, it SHALL NOT be returned in the returned data object. | ||
|
||
|
||
Constraint Operators | ||
==================== | ||
|
||
Using the constraint syntax described by RFC31, jobs can be filtered | ||
based on the following constraint operators. | ||
|
||
``userid`` | ||
Designate one or more userids (*integer*) and match jobs submitted by those userids. | ||
|
||
``name`` | ||
Designate one or more job names (*string*) and match jobs with those job names. | ||
|
||
``queue`` | ||
Designate one or more queues (*string*) and match jobs submitted to those job queues. | ||
|
||
``states`` | ||
Designate one or more job states (*string* or *integer*) and match jobs in those job states. Both bitmasks (including multiple states) and string names of the states SHALL be accepted. | ||
|
||
``results`` | ||
Designate one or more job results (*string* or *integer*) and match jobs with those results. Both bitmasks (including multiple results) and string names of the results SHALL be accepted. | ||
|
||
``t_submit``, ``t_depend``, ``t_run``, ``t_cleanup``, ``t_inactive`` | ||
Designate one timestamp with a prefixed comparison operator (*string*). The accepted comparison operators SHALL be `>`, `<`, `>=`, and `<=`, for greater than, less than, greater than or equal, or less than or equal. Match jobs where the respective timestamp matches against the submitted timestamp and comparison operator. | ||
|
||
``not`` | ||
Logical negation of one constraint object. | ||
|
||
``or`` | ||
Logical or of one or more constraint objects. | ||
|
||
``and`` | ||
Logical and of one or more constraint objects.. | ||
|
||
The following are several constraints examples. | ||
|
||
Filter jobs that belong to userid 42 or 43 | ||
|
||
.. code:: json | ||
{ "userid": [ 42, 43 ] } | ||
Filter jobs that were not submitted to job queue "foobar" | ||
|
||
.. code:: json | ||
{ "not": [ { "queue": [ "foobar" ] } ] } | ||
Filter jobs that pending. | ||
|
||
.. code:: json | ||
{ "states": [ "depend", "priority", "sched" ] } | ||
Filter jobs that belong to userid 42 and were submitted after January 1, 2000. | ||
|
||
.. code:: json | ||
{ "and": [ { "userid": [ 42 ] }, { "t_submit": [ ">946713600.0" ] } ] } | ||
List | ||
==== | ||
|
||
The :program:`job-list.list` RPC fetches a list of jobs. | ||
|
||
The list of jobs shall be filtered in the following order. | ||
|
||
- pending jobs | ||
- running jobs | ||
- inactive jobs | ||
|
||
Pending jobs are returned ordered by priority (higher priority first), | ||
running jobs ordered by start time (most recent first), and inactive | ||
jobs ordered by completion (most recently finished first) | ||
|
||
The RPC payloads are defined as follows: | ||
|
||
.. object:: job-info.lookup request | ||
|
||
The request SHALL consist of a JSON object with the following keys: | ||
|
||
.. object:: max_entries | ||
|
||
(*integer*, REQUIRED) Indicate the maximum number of entries to be | ||
returned. Specify 0 for no limit. | ||
|
||
.. object:: attrs | ||
|
||
(*array of string*, REQUIRED) List of attributes to return. The | ||
special job attribute *all* SHALL allow a caller to request all job | ||
attributes for a job. | ||
|
||
.. object:: since | ||
|
||
(*real*, OPTIONAL) Limit output to jobs that have been active | ||
since a given time. If not specified, all jobs are considered. | ||
|
||
.. object:: constraint | ||
|
||
(*object*, OPTIONAL) Limit output to jobs that match a constraint | ||
object as described in RFC31. See `Constraint Operators` for | ||
legal job list constraint operators. If not specified, match all | ||
jobs. | ||
|
||
.. object:: job-info.lookup response | ||
|
||
The response SHALL consist of a JSON object with the following keys: | ||
|
||
.. object:: jobs | ||
|
||
(*array of objects*, REQUIRED) A list of the jobs returned from | ||
the request. Each object will contain the requested attributes in | ||
an object described in `Job Attributes`. | ||
|
||
List ID | ||
======= | ||
|
||
The :program:`job-list.list-id` RPC fetches job attributes for a specific job ID. | ||
|
||
The RPC payloads are defined as follows: | ||
|
||
.. object:: job-list.list-id request | ||
|
||
The request SHALL consist of a JSON object with the following keys: | ||
|
||
.. object:: id | ||
|
||
(*integer*, REQUIRED) The job id. | ||
|
||
.. object:: attrs | ||
|
||
(*array of string*, REQUIRED) List of attributes to return. The | ||
special job attribute *all* SHALL allow a caller to request all job | ||
attributes for a job. | ||
|
||
.. object:: state | ||
|
||
(*integer*, OPTIONAL) Specify optional job state to wait for job | ||
to reach, before returning job data. This may be useful so that | ||
state specific job attributes will be available before returning. | ||
|
||
.. object:: job-list.list-id response | ||
|
||
The response SHALL consist of a JSON object with the following keys: | ||
|
||
.. object:: job | ||
|
||
(*object*, REQUIRED) The job information from the request. The | ||
returned object will contain the requested attributes in an object | ||
described in `Job Attributes`. | ||
|
||
List Attributes | ||
=============== | ||
|
||
The :program:`job-list.list-attrs` RPC returns a list of all job attributes | ||
that can be returned. | ||
|
||
The RPC payloads are defined as follows: | ||
|
||
.. object:: job-list.list-attrs request | ||
|
||
No keys are recognized for the request. | ||
|
||
.. object:: job-list.list-attrs response | ||
|
||
The response SHALL consist of a JSON object with the following keys: | ||
|
||
.. object:: attrs | ||
|
||
(*array of string*, REQUIRED) List of attributes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -483,3 +483,8 @@ sdexec | |
socketpair | ||
subprocess | ||
perilog | ||
nodelist | ||
waitstatus | ||
userids | ||
parsable | ||
bitmasks |