Skip to content

Latest commit

 

History

History
228 lines (193 loc) · 10.1 KB

File metadata and controls

228 lines (193 loc) · 10.1 KB

iRODS Rule Engine Plugin - Logical Quotas

Allows administrators to track and enforce limits on the number of bytes and data objects in a collection.

The following example demonstrates monitoring a collection, setting a quota on the maximum number of data objects, and then violating that quota.

$ ils
/tempZone/home/rods:
  foo
  bar
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance 'logical_quotas_start_monitoring_collection("/tempZone/home/rods")' null ruleExecOut
$ imeta ls -C .                                                                                                                                                                
AVUs defined for collection /tempZone/home/rods:
attribute: irods::logical_quotas::total_number_of_data_objects
value: 2
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 1014
units: 
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance 'logical_quotas_set_maximum_number_of_data_objects("/tempZone/home/rods", "2")' null ruleExecOut              
$ imeta ls -C .
AVUs defined for collection /tempZone/home/rods:
attribute: irods::logical_quotas::maximum_number_of_data_objects
value: 2
units: 
----
attribute: irods::logical_quotas::total_number_of_data_objects
value: 2
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 1014
units: 
$ iput baz
remote addresses: 152.54.8.75 ERROR: putUtil: put error for /tempZone/home/rods/baz, status = -130000 status = -130000 SYS_INVALID_INPUT_PARAM
Level 0: Logical Quotas Policy Violation: Adding object exceeds maximum number of objects limit
$ ils
/tempZone/home/rods:
  foo
  bar

Build Dependencies

  • iRODS development package
  • iRODS runtime package
  • iRODS externals package for boost
  • iRODS externals package for fmt
  • iRODS externals package for nlohmann-json
  • iRODS externals package for spdlog
  • OpenSSL development package

Building

To build, follow the normal CMake steps.

mkdir build # Preferably outside of the repository.
cd build
cmake /path/to/repository
make package # Pass -j to use more parallelism.

Configuration

To enable, prepend the following plugin configuration to the list of rule engines in /etc/irods/server_config.json.

"rule_engines": [
    {
        "instance_name": "irods_rule_engine_plugin-logical_quotas-instance",
        "plugin_name": "irods_rule_engine_plugin-logical_quotas",
        "plugin_specific_configuration": {
            "namespace": "irods::logical_quotas",
            "metadata_attribute_names": {
                "maximum_number_of_data_objects": "maximum_number_of_data_objects",
                "maximum_size_in_bytes": "maximum_size_in_bytes",
                "total_number_of_data_objects": "total_number_of_data_objects",
                "total_size_in_bytes": "total_size_in_bytes"
            }
        }
    },
    
    // ... Previously installed rule engine plugin configs ...
]

The plugin configuration must be placed ahead of all plugins that define any of the following PEPs:

  • pep_api_data_obj_close_post
  • pep_api_data_obj_close_pre
  • pep_api_data_obj_copy_post
  • pep_api_data_obj_copy_pre
  • pep_api_data_obj_create_and_stat_post
  • pep_api_data_obj_create_and_stat_pre
  • pep_api_data_obj_create_post
  • pep_api_data_obj_create_pre
  • pep_api_data_obj_open_and_stat_pre
  • pep_api_data_obj_open_pre
  • pep_api_data_obj_put_post
  • pep_api_data_obj_put_pre
  • pep_api_data_obj_rename_post
  • pep_api_data_obj_rename_pre
  • pep_api_data_obj_unlink_post
  • pep_api_data_obj_unlink_pre
  • pep_api_mod_avu_metadata_pre
  • pep_api_replica_close_post
  • pep_api_replica_close_pre
  • pep_api_replica_open_pre
  • pep_api_rm_coll_post
  • pep_api_rm_coll_pre
  • pep_api_touch_post
  • pep_api_touch_pre

Even though this plugin will process PEPs first due to its positioning, subsequent Rule Engine Plugins (REP) will still be allowed to process the same PEPs without any issues.

Before you can start monitoring collections, you'll also need to add the following specific queries to your zone:

iadmin asq "select count(distinct data_id) from R_DATA_MAIN d inner join R_COLL_MAIN c on d.coll_id = c.coll_id where coll_name like ?" logical_quotas_count_data_objects_recursive
iadmin asq "select sum(t.data_size) from (select data_id, data_size from R_DATA_MAIN d inner join R_COLL_MAIN c on d.coll_id = c.coll_id where coll_name like ? and data_is_dirty in ('1', '4') group by data_id, data_size) as t" logical_quotas_sum_data_object_sizes_recursive

These queries are required due to a limitation in GenQuery's ability to distinguish between multiple replicas of the same data object.

The data_size specific query may result in an overcount of bytes on an actively used zone due to write-locked replicas of the same data object having different sizes. For this situation, consider using slightly larger quota limits.

How To Use

IMPORTANT NOTE: To invoke rules provided by the plugin, the only requirement is that the user be a rodsadmin. The rodsadmin user does not need permissions set on the target collection.

The following operations are supported:

  • logical_quotas_count_total_number_of_data_objects
  • logical_quotas_count_total_size_in_bytes
  • logical_quotas_get_collection_status
  • logical_quotas_recalculate_totals
  • logical_quotas_set_maximum_number_of_data_objects
  • logical_quotas_set_maximum_size_in_bytes
  • logical_quotas_start_monitoring_collection
  • logical_quotas_stop_monitoring_collection
  • logical_quotas_unset_maximum_number_of_data_objects
  • logical_quotas_unset_maximum_size_in_bytes
  • logical_quotas_unset_total_number_of_data_objects
  • logical_quotas_unset_total_size_in_bytes

Invoking operations via the Plugin

To invoke an operation through the plugin, JSON must be passed using the following structure:

{
    // One of the operations listed above.
    "operation": "<value>",

    // The absolute logical path of an existing collection.
    "collection": "<value>",

    // This value is only used by "logical_quotas_set_maximum_number_of_data_objects" and
    // "logical_quotas_set_maximum_size_in_bytes". This is expected to be an integer
    // passed in as a string.
    "value": "<value>"
}

Use irule to execute an operation. For example, we can start monitoring a collection by running the following:

irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_start_monitoring_collection", "collection": "/tempZone/home/rods"}' null ruleExecOut

We can set a maximum limit on the number of data objects by running the following:

irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_set_maximum_number_of_data_objects", "collection": "/tempZone/home/rods", "value": "100"}' null ruleExecOut

If no errors occurred, then /tempZone/home/rods will only be allowed to contain 100 data objects. However, Logical Quotas does not guarantee that the numbers produced perfectly reflect the total number of data objects under a collection. Logical Quotas only provides a relative value assuming there are many clients accessing the system simultaneously.

To help with this situation, logical_quotas_recalculate_totals is provided. This operation can be scheduled to run periodically to keep the numbers as accurate as possible.

You can also retrieve the quota status for a collection as JSON by invoking logical_quotas_get_collection_status, for example:

irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_get_collection_status", "collection": "/tempZone/home/rods"}' null ruleExecOut

The JSON output will be printed to the terminal and have the following structure:

{
    <maximum_number_of_data_objects_key>: "#",
    <maximum_size_in_bytes_key>: "#",
    <total_number_of_data_objects_key>: "#",
    <total_size_in_bytes_key>: "#"
}

The keys are derived from the namespace and metadata_attribute_names defined by the plugin configuration.

Invoking operations via the Native Rule Language

Here, we demonstrate how to start monitoring a collection just like in the section above.

irule -r irods_rule_engine_plugin-irods_rule_language-instance 'logical_quotas_start_monitoring_collection(*col)' '*col=/tempZone/home/rods' ruleExecOut

Stream Operations

With previous iterations of this plugin, changes in data were tracked and checked for violations across all stream-based operations in real-time. However, with the introduction of intermediate replicas and logical locking in iRODS v4.2.9, maintaining this behavior became complex. Due to the complexity, the handling of quotas has been relaxed. The most important changes are as follows:

  • Quotas are no longer checked, enforced, or updated during write and seek operations.
  • Once a quota has been violated, opening a data object for writing will fail.
  • Only data objects with replicas marked as good in the catalog are counted towards quota totals.

These changes have the following effects:

  • The plugin allows stream-based writes to violate the maximum bytes quota once.
  • Subsequent stream-based creates and writes will be denied until the quotas are out of violation.

Questions and Answers

Sometimes, the total number of bytes for my collection doesn't change when I remove a data object. Why?

When it comes to tracking the total number of bytes in use, only good replicas are considered. If the data object being removed has no good replicas, the plugin will leave the total number of bytes as is. The reason for this is due to there not being a clear path forward for determining which replica's data size should be used for the update. Therefore, the recommendation is for administrators to recalculate the quota totals periodically.

Remember, the plugin is designed to track the totals of good replicas only.

What are the rules around shared monitored nested collections?

Anytime a user performs an operation that results in a quota update, that user MUST have modify_object permissions on ALL monitored parent collections. To reduce the management complexity of this, consider the following:

  • Avoid monitoring collections that have parent collections which are already being monitored
  • Use groups to simplify permission management