Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to create a new RollUp Job in OpenSearch 2.12 #1161

Closed
karthikeyan21 opened this issue Apr 22, 2024 · 12 comments
Closed

[BUG] Unable to create a new RollUp Job in OpenSearch 2.12 #1161

karthikeyan21 opened this issue Apr 22, 2024 · 12 comments
Assignees
Labels
breaking change Flags issues/PRs as breaking changes bug Something isn't working v2.15.0 Issues targeting release v2.15.0

Comments

@karthikeyan21
Copy link

karthikeyan21 commented Apr 22, 2024

What is the bug?
RollUp Job creation fails with 500 error code in Opensearch 2.12

Error Message :
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot invoke \"java.time.Instant.plusMillis(long)\" because \"startTime\" is null"}],"type":"null_pointer_exception","reason":"Cannot invoke \"java.time.Instant.plusMillis(long)\" because \"startTime\" is null"},"status":500}

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Use the below API to create a new RollUp Job - Create RollUp Job
    Sample -
    curl -X PUT localhost:9200/_plugins/_rollup/jobs/test -H 'Content-Type:application/json' -d '{"rollup":{"target_index":"rollup_hourly_fmstats_test","description":"Hourly Stats Rollup","source_index":"test_*","enabled":true,"schedule":{"interval":{"period":60,"unit":"Minutes"}},"delay":0,"continuous":"true","metrics":[{"source_field":"abc.accepted","metrics":[{"max":{}}]},{"source_field":"abc.rejected","metrics":[{"max":{}}]},{"source_field":"abc.matched","metrics":[{"max":{}}]}],"page_size":5000,"dimensions":[{"date_histogram":{"fixed_interval":"60m","source_field":"timestamp"}},{"terms":{"source_field":"name"}}]}}'

  2. RollUp Job creation is fails with error (500)

What is the expected behavior?
RollUp Job to be created and data to be rolled up

What is your host/environment?

  • OS: Ubuntu
  • Version 22
  • Plugins - index-state-management (ISM)

Do you have any screenshots?
NA

Do you have any additional context?
I was debugging the code and noticed that we have not initialised Schedule
Modifying the code to Instant.now() instead of schedule.startTime fixed the issue

Update - This doesn't affect the existing RolUp Jobs. Any job created using earlier version (2.10) seems to be working as the time is initialised

image
@karthikeyan21 karthikeyan21 added bug Something isn't working untriaged labels Apr 22, 2024
@mgodwan
Copy link
Member

mgodwan commented Apr 25, 2024

Related to #1040

@bowenlan-amzn @ikibo Could you please check?

@mgodwan mgodwan added breaking change Flags issues/PRs as breaking changes v2.14.0 Issues targeting release v2.14.0 and removed untriaged labels Apr 25, 2024
@bowenlan-amzn
Copy link
Member

Yes, I think this is a miss and causes a breaking change.
Regarding this #1040 (comment) , if user doesn't pass in start_time, the schedule.startTime will be null, and will cause the exception when instantialize the IntervalSchedule.

The solution is to add schedule.startTime ?: Instant.now() back

@sarthakaggarwal97
Copy link
Contributor

@bowenlan-amzn so looks like schedule.startTime is not a required field. What do you think, should it be a required field?

@ikibo
Copy link
Contributor

ikibo commented Apr 29, 2024

@mgodwan, thank U for this finding.

Good point, @bowenlan-amzn : the case when start_time is not defined in the request must have been handled.
But the question is how?

according to the official rollup-api-doc schedule.interval.start_time is a required field (@sarthakaggarwal97 FYI).

@bowenlan-amzn plz help me understand what would be the best way to handle this issue

  • bad-request : start_time is not defined (according to the doc)
  • handling null check as U suggest (in this case, I would suggest changing the doc to determine that start-time is set to current time if not set explicitly in the request, making it 'kind-of' not mandatory)

@bowenlan-amzn the same issue exists for the Transform job( the fix should be pretty much the same as for the roll-up). I think we can handle both under this ticket. Plz assign this issue to me.

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Apr 29, 2024

@ikibo Thanks!
The goal here is to not introduce breaking change. I think the documentation is wrong, obviously start_time is not a required field, as the example provided in this issue, if provided schedule like this

"schedule": {
            "interval": {
                "period": 60,
                "unit": "Minutes"
            }
        },

rollup can be created before, and start time default to current time.
so please go with the second path

handling null check as U suggest (in this case, I would suggest changing the doc to determine that start-time is set to current time if not set explicitly in the request, making it 'kind-of' not mandatory)

also link the transform change #1040

@louzadod
Copy link

louzadod commented Jul 1, 2024

The workaround I used was replacing schedule.interval by schedule.cron. But I miss schedule.interval a lot.

@bowenlan-amzn
Copy link
Member

@louzadod this has been fixed 2.14

@louzadod
Copy link

louzadod commented Jul 1, 2024

Hi. @bowenlan-amzn . Right after migration from 2.11 to 2.14, my rollup jobs configured with schedule.interval stopped running. By replacing schedule.interval with schedule.cron it started running again.

@bowenlan-amzn
Copy link
Member

@louzadod it's probably not the same issue. Do you want to report a bug with the error you saw and some reproduce steps maybe?

@louzadod
Copy link

louzadod commented Jul 1, 2024

@bowenlan-amzn I'm getting the same error as reported in this bug and I'm running version 2.14.0.

GET /

{
  "name": "logs-corporativos-client-2",
  "cluster_name": "logs-corporativos",
  "cluster_uuid": "rgFOp61cTRKts3oqa4dAwA",
  "version": {
    "distribution": "opensearch",
    "number": "2.14.0",
    "build_type": "tar",
    "build_hash": "aaa555453f4713d652b52436874e11ba258d8f03",
    "build_date": "2024-05-09T18:51:00.973564994Z",
    "build_snapshot": false,
    "lucene_version": "9.10.0",
    "minimum_wire_compatibility_version": "7.10.0",
    "minimum_index_compatibility_version": "7.0.0"
  },
  "tagline": "The OpenSearch Project: https://opensearch.org/"
}

Here is my rollup definition:

{
    "rollup": {
        "rollup_id": "vulner-history-job",
        "enabled": true,
        "schedule": {
            "interval": {
                "period": 1,
                "unit": "Minutes"
            }
        },
        "enabled_time": null,
        "description": "Rollup job para sumarizar diariamente as vulnerabilidades",
        "schema_version": 16,
        "source_index": "vulnerabilities",
        "target_index": "vulner-history",
        "page_size": 1000,
        "delay": 0,
        "continuous": false,
        "dimensions": [
            {
                "date_histogram": {
                    "fixed_interval": "1d",
                    "source_field": "timestamp",
                    "target_field": "timestamp",
                    "timezone": "America/Sao_Paulo"
                }
            },
            {
                "terms": {
                    "source_field": "severity",
                    "target_field": "severity"
                }
            },
            {
                "terms": {
                    "source_field": "stack_prefix",
                    "target_field": "stack_prefix"
                }
            },
            {
                "terms": {
                    "source_field": "stack",
                    "target_field": "stack"
                }
            },
            {
                "terms": {
                    "source_field": "service",
                    "target_field": "service"
                }
            }
        ],
        "metrics": [
            {
                "source_field": "event_count",
                "metrics": [
                    {
                        "sum": {}
                    }
                ]
            }
        ]
    }
}

After invoking the API for creating the rollup, here is the message:

{"error":{"root_cause":[{"type":"null_pointer_exception","reason":"Cannot invoke \"java.time.Instant.plusMillis(long)\" because \"startTime\" is null"}],"type":"null_pointer
_exception","reason":"Cannot invoke \"java.time.Instant.plusMillis(long)\" because \"startTime\" is null"},"status":500}

@bowenlan-amzn
Copy link
Member

@louzadod just did a quick check. 2.14 didn't pick up this fix, it's in 2.15

@bowenlan-amzn bowenlan-amzn added v2.15.0 Issues targeting release v2.15.0 and removed v2.14.0 Issues targeting release v2.14.0 labels Jul 2, 2024
@louzadod
Copy link

louzadod commented Jul 2, 2024

ok. thanks for the confirmation, @bowenlan-amzn .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Flags issues/PRs as breaking changes bug Something isn't working v2.15.0 Issues targeting release v2.15.0
Projects
None yet
Development

No branches or pull requests

6 participants