-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash fails to acquire lock on PQ lockfile (reentrant?) #10572
Comments
might be related to #10715 which I will investigate. |
Experiencing this problem with logstash 7.5.1
Tried to stop the logstash process, delete the .lock file manually and start logstash again but did not help still seeing the error message..
My queue/beats pipeline failed for whatever reason roughly 11 hours earlier and starting to queue everything to disk elasticsearch cluster health is GREEN... still looking into that pipeline issue. |
@Aqualie this is likely not a PQ issue but more the result of a pipeline exception/error that does not correctly closes the PQ (and thus not releasing the lock) and upon restarting the failed pipeline it complains about failing to acquire the lock. Can you verify the logs prior to this problem to see if you can trace any errors which might help explain why the lock has not been released? (look for pipeline reload failure, pipeline error/exception or any warning/error logs for that matter). Also I have seen some cases where this problem occurred because of permissions problems on the PQ data files. Can you look for IO/File/Permissions errors? Also double-check that the file permissions in the PQ data dir are ok. |
There is a related issue specific to the permission problem #10715 |
Facing this error on 7.6.2
This is observed when running a configuration check while the logstash process is already running:
|
Can somebody has a solution to this error because I am getting the same error: I am running dockerized logstash, logstash version 7.4.2 |
There are two ways that the PQ can fail to acquire a lock:
|
@yaauie Thanks for your reply. To Elaborate further I am running a dockerized logstash to ingest logs from different sources, parse/filter them and then store them in Elasticsearch. The docker container is running in a linux virtual machine. Recently all the logstash pipelines are continuously failing with this error. "usr/share/logstash/data/queue" is not a directory present in the virtual machine. It is only present inside the docker container. So we cannot change the permissions or remove files from this directory. Docker Image layers looks like this:
` have deleted all the docker images, containers, volumes using ''sudo docker system prune -all" and then pulled the docker logstash images again. After trying to build the docker images and starting docker containers, this lock error still persist. I read in a thread that this might happen due to an orphaned process running in the virtual machine. So I rebooted the virtual machine to get rid of the orphaned processes but this error still persist. Can somebody explain what is going wrong here. We are trying to solve this error for the last 10 days, but could not get any success. |
More details, I am running multiple pipelines which are defined in pipelines.yml and queue type is defined as "persisted" |
We observed LockException, please find more details in the below topic1. |
@yaauie Since some time we observe that from time to time (~1 week), our Logstash processing comes to a complete stop and we have to restart our Logstash instances. We tried to make sense of the logs, but so far we have not yet found the root cause of the issue. That being said, in some cases we observed the following log entries around the time when the blocking happened:
I looked into these cases a little bit more and I found the following log entries, that seemed to be at the start of the locking issue:
It affected not always the same pipeline, but common denominator is, that there was some sort of issue while initially creating the pipeline ( In my case, it is always As of now, we use Logstash 8.12.1 on Linux (rpm installation, no containers). That being said, we observed the issue already with Logstash 8.7.x and we performed the update to 8.12.1 in the hope, that it would maybe resolve the issue. But this was unfortunately not the case. Our Logstash configuration makes use of pipelines as well as pipeline to pipeline communication. In total there are 14 pipelines and most of them are part of a bigger processing system where we use the distributor as well as the collector pattern. I hope, the information provided does help in debugging this issue. |
I made an additional observation in regards to the log entries I mentioned in the previous post. When Logstash is in the above mentioned error state, where a pipeline can not be created because of the lock not being available ( I created the cronjob to get better insights about what is happening before Logstash stops processing events. |
On Logstash versions 6.2.4 and 6.5.4, I have seen reports of the Logstash Persistent Queue failing to acquire a lock on a lockfile, with the message indicating that the lock is already held in the current JVM process.
I do not presently have sufficient steps for consistent reproduction, but am working to chase those down.
In at least one case, the user was using multiple pipelines via
pipelines.yml
, the issue was occurring at startup, and the pipelines did not have any pipeline-specific queue settings.The text was updated successfully, but these errors were encountered: