Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add downlink frame counter and global health packet transmission #75

Merged
merged 11 commits into from
Jan 4, 2025

Conversation

thanasipantazides
Copy link
Contributor

Overview

Frame counting

This is a draft implementation of frame counters for downlink packets, to address #74. This implementation does not require changes to the v3.0.0 GSE logger for downlink data to be stored accurately (the header bytes used to implement this frame counter are ignored by the v3.0.0 Listener).

This change enables a more robust logger design in the ground segment, by adding a frame-counting field to the downlink packet header in the last two bytes. Logging on the ground can inspect this frame counter and reassemble packets with matching frame_counter back into a single frame, rather than assuming that packet loss only occurs when a packet collision occurs on the ground when assembling a frame.

Global status message

This PR also adds system-specific error tracking and status transmission. The types of errors tracked here are software related, and the Formatter can now report errors for subsystems related to reading individual packets, reading complete frames, SpaceWire command errors, failure to process and dispatch uplink commands, or errors packetizing and preparing data for downlink. These errors are tracked separately for each SystemManager in the main loop. An error is a uint16_t-sized list of flags.

These error uint16_ts are all concatenated and sent as a global health message to the GSE once per main Circle::manage_systems() call. These downlink messages are sent to the housekeeping system (0x02) and with the type code PING (0x20).

Problems addressed

The prior downlink and logger design was vulnerable to the following types of packet loss events: if a frame suffers packet loss during downlink and has a gap for a certain packet at index $i$ in the frame, if the next frame sent fills the gap at packet index $i$, before overwriting any other packets from the first frame, all subsequent frames will incur a packet shift at that position. They will be incorrectly logged with their packet at $i$ coming from the next frame in the frame sequence.

Testing

I have tested all this code in the foxsimile framework on a single machine, and updated a local copy of the GSE Listener to utilize the frame_counter field in the new header. All works well on the formatter side. I will put in a PR for the GSE changes so that we can track validation there separately.

Global health messages/errors have not introduced new problems in running the formatter, and downlink messages appear correct. But more stress-testing is needed to actually trigger all error flags and validate clearing of errors on board.

@thanasipantazides
Copy link
Contributor Author

This should become v1.3.0 if accepted. Substantial non-breaking changes to functionality.

@thanasipantazides thanasipantazides linked an issue Nov 25, 2024 that may be closed by this pull request
@thanasipantazides
Copy link
Contributor Author

thanasipantazides commented Dec 28, 2024

BTW, here is what the new downlink health packets look like:

formatter_health_ping

The order that systems appear in the last, repeating 4 byte chunk of the packet is defined by the system order set in apps/main.cpp. That is:

  1. cdte1
  2. cdte2
  3. cmos1
  4. cdtede
  5. housekeeping
  6. cdte3
  7. cdte4
  8. cmos2
  9. timepix
  10. uplink

But you can also just identify each system's errors on the ground using the 1-byte System ID field (see systems.json).

@thanasipantazides
Copy link
Contributor Author

I'm finding some improvements to make with the Formatter-side implementation of these ping/health packets. The overall packet structure seems fine and none of this should affect parsing (GSE-side implementation). But some flags are not used/inconvenient in their current form.

  1. Several systems do not have a .system_state value initialized in Circle.cpp. So they default to SYSTEM_STATE::OFF. This is meaningful for e.g. CdTe detectors, which change state to ::LOOP if they ping back, but not for e.g. CdTe DE and Housekeeping, which do not properly initialize their state. So adding a few statements to do that setup would make this field more useful.
  2. I need to scrub error setting and clearing logic in the Circle and TransportLayerMachine and make sure errors will actually get to the ground. Currently testing with all good packets is not very informative.

@thanasipantazides
Copy link
Contributor Author

I tested this today with one canister and added new code to address (1) and (2) in my last comment. I'm happy with how it behaves, and added a parser to telemetry_tools to ingest the data.

I have not exhaustively tested setting and clearing of all errors in the health packet, but I want to:

  1. Merge this so other work can proceed,
  2. Version the merged code,
  3. Open an issue regarding exhaustive testing of flag behavior.

So that's what will happen next.

@thanasipantazides thanasipantazides merged commit 7aac4f8 into main Jan 4, 2025
1 check passed
@thanasipantazides thanasipantazides deleted the log branch January 4, 2025 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frame counter
1 participant