Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META]Add fillnull command to PPL #3031

Open
YANG-DB opened this issue Sep 16, 2024 · 1 comment
Open

[META]Add fillnull command to PPL #3031

YANG-DB opened this issue Sep 16, 2024 · 1 comment
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 16, 2024

Description:
We propose adding a fillnull command to OpenSearch's Piped Processing Language (PPL) to provide a convenient way to handle null or missing values in query results. This feature would be similar to the fillnull command in Splunk's SPL, enhancing PPL's data cleaning and preparation capabilities.

Proposed Functionality:

  1. The 'fillnull' command should allow users to replace null values with a specified value.
  2. It should support filling nulls for specific fields or all fields.
  3. The command should allow different fill values for different fields.
  4. It should support conditional filling based on other field values or expressions.

Example Usage:

... | fillnull value=0

This would replace all null values in all fields with 0.

... | fillnull value=N/A field1, field2

This would replace null values in field1 and field2 with "N/A".

... | fillnull field1=0 field2="Unknown" field3=false

This would fill null values in different fields with different values.

... | eval new_field = if(field1 == "category1", field2, null) | fillnull value=0 new_field

This example uses eval to create a new field (or overwrite an existing one) based on a condition, and then use fillnull to handle the null values

...
| eval field1 = if(field1 == "category1", field1, null), field2 = if(field2 == "category2", field2, null)
| fillnull field1=0 field2="Unknown"

This example uses multiple eval expressions to handle different conditions for multiple fields, followed by fillnull


implementation Considerations:

  1. Ensure compatibility with existing PPL commands and syntax
  2. Optimize performance for large datasets with many null values
  3. Provide clear documentation and examples for users
  4. Consider type-checking or type-conversion for filled values

Support for PPL fillnull functionality is required for both:

OpenSearch based PPL engine

Spark based PPL engine

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 16, 2024
@dblock dblock removed the untriaged label Oct 7, 2024
@dblock
Copy link
Member

dblock commented Oct 7, 2024

[Catch All Triage - 1, 2, 3, 4]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
None yet
Development

No branches or pull requests

2 participants