Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure after TLS issue #168

Open
Ottergoose opened this issue Feb 4, 2024 · 12 comments
Open

Failure after TLS issue #168

Ottergoose opened this issue Feb 4, 2024 · 12 comments

Comments

@Ottergoose
Copy link

Currently running Running ADS-B Feeder Image v1.1.9-beta.1(beta) with containers from 2024-01-25T18:40:21-06:00, running on Raspberry Pi 3 Model B Plus Rev 1.3. adsb-im-raspberrypi64-v1.1.4.img; I assume it has the latest version of this software.

I noticed FlightAware said I wasn't feeding data for a few hours, and checked my logs; it looks like a TLS issue tripped things up, and it was unable to recover until I rebooted. After this happened, no data was going to FlightAware, and my CPU usage bumped way up.

07:29:43 AM [piaware] 2024/02/04 07:29:43 TLS alert (read): bad record mac
07:29:43 AM [piaware] 2024/02/04 07:29:43 TLS error: 0
07:29:43 AM [piaware] 2024/02/04 07:29:43 Lost connection to adept server at piaware.flightaware.com/1200: error reading "sock559fd35b60": software caused connection abort

Hope this helps, thank you for your work!

@the-jeffski
Copy link

I am seeing the same - have enabled the autoheal container to trigger restarts for now

@wiedehopf
Copy link
Contributor

flightaware/piaware#87

Ok so ... i'm not sure but possibly we could just make a watchdog for this container that watches the piaware process.
Apparently there is some bug they can't fix so they added a watchdog ....
The container doesn't run systemd so the watchdog would just have to check piaware log output i suppose.

How regular is the log output?

@the-jeffski
Copy link

I'll aim to grab a log output - just upgraded the container a couple of hours back so the current log is gone.
I'm running autoheal at the moment to restart the container as the only fix but it occasionally doesn't work then I find I'm offline for 6 hrs. Auto heal log gives an idea of how often:
`2024-06-11 08:50:15+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 08:50:19+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 11:50:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 11:50:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

54227 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 11:50:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 11:50:28+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 13:20:35+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 13:20:35+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

57745 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 13:20:35+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 13:20:38+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 14:20:45+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 14:20:45+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

28553 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 14:20:45+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 14:20:48+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 15:50:55+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 15:50:55+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

56664 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 15:50:55+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 15:50:58+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 17:11:05+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 17:11:05+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

50810 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 17:11:05+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 17:11:09+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 17:51:15+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 17:51:15+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

9632 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 17:51:15+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 17:51:18+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful

2024-06-11 19:11:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container is unhealthy with 3 failures

2024-06-11 19:11:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container last output: [1] No connection to Flightaware, NOT OK.

52173 dump1090 messages sent in past hour, OK.

Webserver listening on port 80, OK.

Webserver listening on port 8080, OK.

2024-06-11 19:11:25+0100 [WARNING] [piaware (8fb95a0ab823)] Container restarting with 10s timeout

2024-06-11 19:11:34+0100 [ INFO] [piaware (8fb95a0ab823)] Container restart was successful`

@the-jeffski
Copy link

Grabbed the error in the log from today:

[piaware] 2024/06/12 16:59:00 piaware has successfully sent several msgs to FlightAware!
[piaware] 2024/06/12 17:03:34 4866 msgs recv'd from dump1090 (4866 in last 5m); 4866 msgs sent to FlightAware
[piaware] 2024/06/12 17:05:33 TLS alert (read): bad record mac
[piaware] 2024/06/12 17:05:33 TLS error: 0
[piaware] 2024/06/12 17:05:33 Lost connection to adept server at piaware.flightaware.com/1200: error reading "sock55564e94af50": software caused connection abort

You only get the one error output then it dies. Restart of container starts a fresh log

@wiedehopf
Copy link
Contributor

wiedehopf commented Oct 20, 2024

So using the log is suboptimal and possibly using the this file is an option for a watchdog.

docker exec -it piaware watch grep time /run/piaware/status.json

Can you check if this file changes after the error happens?
(uptime usually increases every 5 seconds)

Edit: According to the patch comments on the FA repo, the whole process hangs.
So i'll just assume such a watchdog will work and i'll add one.

Edit2: watchdog added, image building now.

wiedehopf added a commit to wiedehopf/docker-piaware that referenced this issue Oct 20, 2024
possible workaround for
sdr-enthusiasts#168
fredclausen pushed a commit that referenced this issue Oct 20, 2024
* add watchdog

possible workaround for
#168

* add PIAWARE_MINIMAL mode without maps
@wiedehopf
Copy link
Contributor

@Ottergoose so this watchdog is in adsb.im beta now.

How often does this issue happen?

piaware is not updating /run/piaware/status.json, sending SIGKILL

This would be in the log when the watchdog restarts piaware.
So i suppose if you see log like that you know the watchdog is working.
@the-jeffski this would go for you as well.

Anyhow please report back and / or close this issue :)

@Ottergoose
Copy link
Author

I will update and report back if issue happens again, thank you!

@the-jeffski
Copy link

I will keep an eye on the logs and see - turning off auto heal for now too.

@the-jeffski
Copy link

Looking good - it detected a drop and restarted.

@wiedehopf
Copy link
Contributor

You got a log, was it the TLS error?

Interesting that you got the error so quick, i've never seen it :)

@the-jeffski
Copy link

Yes:

[2024-10-22 20:46:16.228][piaware] mlat-client(4925): Aircraft: 10 of 26 Mode S, 54 of 112 ADS-B used

[2024-10-22 20:46:47.353][piaware] 13267 msgs recv'd from dump1090 (4367 in last 5m); 13267 msgs sent to FlightAware

[2024-10-22 20:46:57.233][piaware] TLS alert (read): bad record mac

[2024-10-22 20:46:57.233][piaware] TLS error: 0

[2024-10-22 20:46:57.233][piaware] Lost connection to adept server at piaware.flightaware.com/1200: error reading "sock5555efa45d60": software caused connection abort

[2024-10-22 20:47:27.972][watchdog] piaware is not updating /run/piaware/status.json, sending SIGKILL

[2024-10-22 20:47:29.058][piaware] ****************************************************

[2024-10-22 20:47:29.058][piaware] piaware version 9.0.1 is running, process ID 5273

[2024-10-22 20:47:29.061][piaware] your system info is: Linux b21db7359e36 6.6.47+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux

[2024-10-22 20:47:30.562][piaware] Connecting to FlightAware adept server at piaware.flightaware.com/1200

[2024-10-22 20:47:30.745][piaware] Connection with adept server at piaware.flightaware.com/1200 established

@wiedehopf
Copy link
Contributor

@Ottergoose
good enough for me, please close (no reason to assume it doesn't work for you :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants