-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled Exception Error when server goes away #338
Comments
If you shut RabbitMQ down in such a way that it can close its end of client TCP connections first, the clients will eventually get an Some clients provide automatic connection recovery but I don't think this one does. So it's up to application developer to handle connection failures. |
A RabbitMQ server unexpectedly shutting down SHOULD NOT cause all its clients to crash! The socket should implement an 'error' handler and deal with this. NOT take the whole application down.
|
@edgework there's more to an AMQP 0-9-1 connection than just a socket. I cannot comment on what may be taking your "entire application" down but if a library can take your application down, perhaps the application should be more defensive. Clients that offer connection recovery have to worry about way more than reopening their TCP connection. |
Unhandled Exception will take anything down. Your library is not handling an exception in the socket. Its that simple. In any case https://www.npmjs.com/package/amqp handles this failure correctly so I'll just use that. |
@edgework I think it's reasonable for amqplib to emit an error event. It doesn't have sufficient context to make a universally acceptable decision for a fault of this severity, and as such I consider it good design for a low level library to bring this to your attention in the bluntest way possible. The alternative, which is to enforce a set of opinions onto users would make the amqplib less flexible. e.g.
Unfortunately there isn't a "correct" answer. My approach was to write a library that sat on top of amqplib and did some of these things in a highly opinionated way which suited my exact situation. I'm very thankful that amqplib was flexible enough to enable me to do this. Anyway good luck with amqp |
Hi. @michaelklishin I too am running into this problem. I deal with it somewhat by grabbing the uncaught exception and if its ECONNREST I'm guessing that its ampq and closing all the ampq connections and then go into a loop retrying and that works until about a minute later when the ampq library throws another uncaught exception
This one I can't do anything about. Ignoring it will HARD crash the application. Given that I its a not trivial for me to restart my entire app because a connection to one server closed unexpectedly. How would you suggest I deal with this? |
@cressie176 @michaelklishin Anyone? Is this project active? |
Hi @rick-rheo. Yes this project is still active. WRT your issue I'm probably not going to be much use. I'm just a user of amqplib and don't know the internals very well. I suggest putting some debug in the source here, here, here and here. Maybe you'll see your dead connections trying to heartbeat for some reason. You could also set the heartbeat option to 0 to disable heartbeats on both the client and server. I'm not sure what undesirable side effects that will have. |
Thought I'd setup a test case using a rabbitmq container. @rick-rheo, @mateodelnorte #334 does this correlate to what you're seeing? The biggest problem I see with the following is the uncaughtException if the consumer code throws an asynchronous error, but I'm not really sure what amqplib could do about that. var amqp = require('./channel_api')
var Promise = require('bluebird')
process.on('error', err => {
console.log('#338: process error', err)
})
process.on('uncaughtException', function (err) {
console.log('#338: process uncaught exception', err)
})
process.on('unhandledRejection', (reason, p) => {
console.log('#338: process unhandled rejection', p, 'reason', reason)
})
amqp.connect('amqp://localhost').then(conn => {
console.log('#338: Connected')
conn.on('error', (err) => {
console.log('#338: TODO Handle connection error', err)
})
return conn.createChannel().then(ch => {
return new Promise(resolve => {
ch.assertQueue('test_q')
.then(qok => qok.queue)
.then(queue => {
console.log('#338: Consuming')
return ch.consume(queue, (msg) => {
console.log('#338: Received msg')
if (msg == null) return
if (msg.content.toString() == 'crash now') throw new Error('Crashing Now!')
if (msg.content.toString() == 'crash later') {
setImmediate(() => {
throw new Error('Crashing Later!')
})
}
}, { noAck: true }).then(function(_consumeOk) {
console.log(' [*] Waiting for messages. To exit press CTRL+C');
})
})
})
}).finally(() => {
console.log('#338: Closing connection')
conn.close()
})
}).catch(err => {
console.log('#338: Caught Error from connect', err)
})
|
It looks like amqplib doesn't try to catch errors from the socket after the opening handshake is complete: https://github.com/squaremo/amqp.node/blob/master/lib/connect.js#L186. If RabbitMQ is killed and drops connections on the floor, that'll mean amqplib does not respond to those errors, so it'll keep heartbeating. It's possible that just re-emitting the error on the connection (or doing some variety of connection shutdown-with-error) will fix this. It may be a short while before I get to try this. |
@squaremo in case you are looking for naming suggestions, Java client refers to such connection errors as "connection shutdown." Server-sent |
@squaremo Any chance you got to work on this fix? This still a constant problem in otherwise stable codebase. |
That's not true; it passes the socket to the Connection constructor, and that registers a handler for If I run @cressie176's test program above, and
I.e., the handler for the connection |
@squaremo, this is what I get when I do a kill -9 to rabbit server.
In my code I have
I was hoping that the error would be caught and we will be able to write some logic to reconnect after a timeout. However this does not seem to work. As soon as I get this error, my app crashes. I want to prevent app crash and retry after a minute. Also I have 2 more instances of this client running, and they dont seem to get this error. It could be that, before the error is delivered to other client, the app crashes. My thought is, if you modified @cressie176 code and added one more connection obj say conn1 and conn2 and then do a kill -9 you might see this issue. Just my guess. A sample code that can be used is listed in another bug #445 |
I adapted the code above as https://gist.github.com/squaremo/76246899d5b68302eaeaf76ac917d33c#file-338-js This makes two changes:
This is what the output looks like when I run it, kill RabbitMQ, then restart RabbitMQ:
Is there an element of your suggestion that I've missed, @blazerguns? Perhaps it's just the version of NodeJS being used. Or the way RabbitMQ disconnects; I'm killing it with -9, but actual connection drops rather than the process terminating could provoke a different behaviour. Doing a bit of archaeology, I see that the version released at the time of the original report did not have the socket error handling I remarked on above, so perhaps there's nothing mysterious here after all: it didn't work then, it does work now, and #445 is a different problem. |
Ugh, no, I'm wrong again (looking in the wrong file) -- it's been there for ages. Scratch that idea. |
Let me work on a sample code for you to test. I am running on node 10.5.0. May be that? |
also, RabbitMQ 3.5.7, Erlang 18.3 |
Hi, were having the same problem here, using version 0.5.2 of amqplib, the error received : it is important to notice that this error is caught from 'uncaughtException' event, and not the error event from amqplib's connection or channel, which means we cannot implement a retry logic. Is there a possible fix or workaround? |
Did this issue ever get a solution, i am facing it as of 8th March 2019?
|
For me the problem was socket availability of thw node process I was running, on an azure cloud virtual machine. It wasnt a problem with amqpnode itself. |
@Lawrence-Windsor Are you adding an error handler to the connection? If so, would you provide a (small as possible) test case that fails reliably, please. |
Seems the problem is the version of node. With node 10 works and node 8 dont in my tests |
Closing, as the reports are quite old (most recent 2019) and none provided a repeatable test case. amqplib no longer supports node 8, so if that was the issue as suggested by @luissajunior (thanks), then the solution is to upgrade. |
Opening this again as i'm having the same issue. I'm handling private async connect(): Promise<void> {
if (this.connected) {
return;
}
await this.connectionLock.runExclusive(async () => {
if (this.connected) {
return;
}
await pRetry(
async () => {
logger.info('Attempting rabbitMQ connection');
this.connection = await client.connect(this.connectionSettings.getConnectionString());
},
{
onFailedAttempt: (error) => {
logger.error(`RabbitMq connection failed. Attempt ${error.attemptNumber}. Error: ${error.message}`);
if (error.retriesLeft === 0) {
this.connected = false;
throw Error(`Failed to establish rabbitMq connection after ${this.maxConnectionRetry} attempts`);
}
},
factor: 2,
minTimeout: 1000,
maxTimeout: 5000,
retries: this.maxConnectionRetry,
},
);
logger.info('RabbitMq connection successful');
this.setEventListeners();
this.eventEmitter.emit('connected');
this.connected = true;
});
}
private setEventListeners(): void {
this.connection.on('close', async (message) => {
logger.warn(`RabbitMq connection lost: ${message}`);
this.eventEmitter.emit('disconnected');
await this.handleErrorEvents();
});
this.connection.on('error', async (error) => {
logger.error(`RabbitMq connection error: ${error}`);
this.eventEmitter.emit('error');
await this.handleErrorEvents();
});
this.connection.on('blocked', async (reason) => {
logger.warn(`RabbitMq connection blocked: ${reason}`);
this.eventEmitter.emit('blocked');
await this.disconnect();
await this.handleErrorEvents();
});
}
private async handleErrorEvents(): Promise<void> {
this.connected = false;
await this.connect();
}
async disconnect(): Promise<void> {
await this.connection.close();
this.eventEmitter.emit('disconnected');
this.connected = false;
} Exception looks something like this:
node -v -> v16.15.0 |
cc: #690 (comment)
|
Just an FYI for those looking to solve all of their connection recovery problems "by using that other lib". That package is abandonware and is known to cause reconnection storms (and leaks) so severe that it is on the black list of client libraries are at least two RabbitMQ-as-a-Service providers. |
If the rabbitMq server goes down the client will crash with an unhandled exception ECONNRESET
The text was updated successfully, but these errors were encountered: