Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Lock acquire/release for snapshot jobs emit warning and error logs #1199

Open
spapadop opened this issue Jul 3, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@spapadop
Copy link

spapadop commented Jul 3, 2024

Describe the bug

I have a simple snapshot management job running daily:

{
	"name": "daily-monit-qa-backup_v2",
	"description": "Daily snapshot policy",
	"schema_version": 19,
	"creation": {
	  "schedule": {
	    "cron": {
	      "expression": "0 12 * * *",
	      "timezone": "Europe/Zurich"
	    }
	  }
	},
	"deletion": {
	  "schedule": {
	    "cron": {
	      "expression": "0 12 * * *",
	      "timezone": "Europe/Zurich"
	    }
	  },
	  "condition": {
	    "max_age": "3d",
	    "min_count": 1
	  }
	},
	"snapshot_config": {
	  "indices": "monit_qa*",
	  "ignore_unavailable": true,
	  "repository": "s3-monitqa1-bucket",
	  "partial": true
	},
	"schedule": {
	  "interval": {
	    "start_time": 1718098008729,
	    "period": 1,
	    "unit": "Minutes"
	  }
	},
	"enabled": true,
	"last_updated_time": 1718709061300,
	"enabled_time": 1718098008729
}

Everyday it produces around 70 warning logs like:

Cannot acquire lock for snapshot management job daily-monit-qa-backup_v2

followed by 2 error logs:

Could not release lock [.opendistro-ism-config-daily-monit-qa-backup_v2-sm-policy] for daily-monit-qa-backup_v2-sm-policy.

These two error logs cause two failure notifications, if the notification channel for this is configured.

However, the snapshot has actually a "SUCCESS" status, so these logs seem rather insignificant or at least not worth further digging, as my snapshot is successful. Not sure what should happen here, I guess either "downgrade" these logs significance to "INFO" or "DEBUG", but definitely not "ERROR" as this makes the Failure notification functionality non-reliable.

Related component

Storage:Snapshots

To Reproduce

  1. Create a daily snapshot policy to an s3 bucket, like the one I specify above.
  2. Observe the WARN/ERROR logs emited when the snapshot is getting created accordingly.

Expected behavior

If the snapshot is successful, it should produce no ERROR logs.

Additional Details

Plugins
All default ones + repository-s3

Host/Environment (please complete the following information):

  • OS: AlmaLinux
  • Version 9.4

Additional context
Tested on OpenSearch v2.11.1

@spapadop spapadop added bug Something isn't working untriaged labels Jul 3, 2024
@dblock
Copy link
Member

dblock commented Jul 22, 2024

Thanks for opening this @spapadop.

[Catch All Triage w/ 1, 2, 3]

@dblock dblock removed the untriaged label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants