Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure undetected, resulting in 6 hour timeout #86

Open
melvyn-apryl opened this issue Jun 29, 2022 · 3 comments
Open

Failure undetected, resulting in 6 hour timeout #86

melvyn-apryl opened this issue Jun 29, 2022 · 3 comments

Comments

@melvyn-apryl
Copy link

melvyn-apryl commented Jun 29, 2022

This is v16, but looking at the code, this can still happen with v20. The timeout is only checked if status is ready and version labels match, the other branches have no timeout. In this case, status is ready, but version labels seem to not match or more likely the deployment failed message was not caught. Log from action:

Deployment started, "wait_for_deployment" was true...
16:16:40 INFO: Environment update is starting.
16:17:20 INFO: Deploying new version to instance(s).
16:17:34 INFO: Still updating, status is "Updating", health is "Green", health status is "Info"
16:17:38 INFO: Environment health has transitioned from Ok to Info. Application update in progress (running for 13 seconds).
16:17:43 INFO: Instance deployment successfully generated a 'Procfile'.
16:18:01 ERROR: Instance deployment failed. For details, see 'eb-engine.log'.
16:18:05 ERROR: [Instance: i-02e54e6924cc0c92a] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..
16:18:05 INFO: Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
16:18:05 ERROR: Unsuccessful command execution on instance id(s) 'i-02e54e6924cc0c92a'. Aborting the operation.
16:18:37 INFO: Still updating, status is "Updating", health is "Red", health status is "Degraded"
16:18:38 WARN: Environment health has transitioned from Info to Degraded. Command failed on all instances. Incorrect application version found on all instances. Expected version "staging-533c634" (deployment 277). Application update is aborting. 1 out of 1 instance completed (running for 2 minutes). Impaired services on all instances.
16:19:40 INFO: Still updating, status is "Ready", health is "Red", health status is "Degraded"
16:20:44 INFO: Still updating, status is "Ready", health is "Red", health status is "Degraded"
16:21:47 INFO: Still updating, status is "Ready", health is "Red", health status is "Degraded"
... # etc

And eb environment:

2022-06-28 18:18:05 UTC+0200	ERROR Failed to deploy application.
2022-06-28 18:18:05 UTC+0200	ERROR Unsuccessful command execution on instance id(s) 'i-02e54e6924cc0c92a'. Aborting the operation.
2022-06-28 18:18:05 UTC+0200	INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2022-06-28 18:18:05 UTC+0200	ERROR [Instance: i-02e54e6924cc0c92a] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..

So the last of the 4 in the same second was missed here and then the loop is endless. The setting wait_for_environment_recovery: 120 is in the job:

 Version description: 
          AWS Region: eu-central-1
                File: Deploy.zip
      AWS Access Key: 20 characters long, starts with H
      AWS Secret Key: 40 characters long, starts with D
 Wait for deployment: true
  Recovery wait time: 120
@dantehemerson
Copy link

I suggest you check the /var/log/eb-engine.log in the EC2 instance associated to check the error in more detail. Probably you included a change that is breaking your application.

@melvyn-apryl
Copy link
Author

I know the cause. But the problem is that this message was sent by AWS:
2022-06-28 18:18:05 UTC+0200 ERROR Failed to deploy application.

΅But was not processed at this line:
if (ev.Message.match(/Failed to deploy application/)) {

And so the action never terminates, which means the 120 seconds to prevent this kind of thing from happening cannot be trusted.

@tomgrowflow
Copy link

tomgrowflow commented Nov 15, 2022

this happened to me, i set a git action timeout to prevent it, not great though

https://stackoverflow.com/a/59076067/1869299

my-job:
runs-on: ubuntu-latest
timeout-minutes: 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants