Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support version skew between Antrea Agent and Flow Aggregator #6912

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

antoninbas
Copy link
Contributor

When a new IPFIX Information Element (IE) is introduced, a version mismatch between the Agent and the Flow Aggregator can be problematic. A "new" Agent can send an IE which is unknown to the "old" Flow Aggregator, or the "new" Flow Aggregator may expect an IE which is not sent by an "old" Agent.

Prior to this change, we required the list of IEs sent by the Agent to be the same as the list of IEs expected by the Flow Aggregator. This is impossible to ensure during upgrade, as it may take a long time for all Agents in the cluster to be upgraded.

After this change, Agents and Flow Aggregator can be upgraded in any order (although we would recommend the Flow Aggregator to be upgraded last). To achieve this, we introduce a new "process" between IPFIX collection and aggregation in the Flow Aggregator: the "preprocessor". The preprocessor is in charge of processing messages received from the IPFIX collector, prior to handling records over to the aggregation process. At the moment, its only task is to ensure that all records have the expected fields. If a record has extra fields, they will be discarded. If some fields are missing, they will be "appended" to the record with a "zero" value. For example, we will use 0 for integral types, "" for strings, 0.0.0.0 for IPv4 address, etc. Note that we are able to keep the implementation simple by assuming that a record either has missing fields or extra fields (not a combination of both), and that such fields are always at the tail of the field list. This assumption is based on implementation knowledge of the FlowExporter and the FlowAggregator. When we introduce a new IE, it always comes after all existing IEs, and we never deprecate / remove an existing IE across versions.

Note that when the preprocessor adds a missing field, it is no longer possible to determine whether the field was originally missing, or was sent by the Agent with a zero value. This is why we recommend upgrading the Flow Aggregator last (to avoid this situation altogether). However, we do not believe that it is a significant drawback based on current usage.

Fixes #6777

@antoninbas antoninbas requested review from tnqn and yuntanghsu January 8, 2025 23:11
@antoninbas
Copy link
Contributor Author

Tested manually with increased logging verbosity:

Agent upgraded first:

# FA logs
I0108 22:42:17.376787       1 process.go:304] "Template includes an information element that is not present in registry" obsDomainID=801257890 templateID=256 enterpriseID=56506 elementID=159
I0108 22:42:22.377982       1 preprocessor.go:159] "Record received from exporter includes unexpected elements, truncating" expectedElements=42 receivedElements=43
I0108 22:42:22.378155       1 preprocessor.go:159] "Record received from exporter includes unexpected elements, truncating" expectedElements=42 receivedElements=43
I0108 22:42:22.378743       1 preprocessor.go:159] "Record received from exporter includes unexpected elements, truncating" expectedElements=42 receivedElements=43
I0108 22:42:22.378872       1 preprocessor.go:159] "Record received from exporter includes unexpected elements, truncating" expectedElements=42 receivedElements=43

FA upgraded first:

# FA logs
I0108 22:46:33.522289       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42
I0108 22:46:33.522399       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42
I0108 22:46:33.523095       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42
I0108 22:46:33.523313       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42
I0108 22:46:33.523471       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42
I0108 22:46:38.524684       1 preprocessor.go:166] "Record received from exporter is missing information elements, adding fields with zero values" expectedElements=43 receivedElements=42

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Jan 8, 2025
@antoninbas antoninbas added this to the Antrea v2.3 release milestone Jan 8, 2025
err = fa.InitAggregationProcess()
if err != nil {
return nil, fmt.Errorf("error when creating aggregation process: %v", err)
recordCh := make(chan ipfixentities.Record)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a message can generate a collection of records. As the record channel has no buffer, the preprocessor and the AggregationProcess would be sycnhronized to some extent, which may be less efficient than the original implementation that passes messages? Should it use a channel with some buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. At the moment, the go-ipfix library only includes a single record per message, but this won't always be the case and I am trying to change it actually (in particular when UDP is used). I will make it a buffered channel.

pkg/flowaggregator/preprocessor.go Outdated Show resolved Hide resolved
pkg/flowaggregator/preprocessor.go Outdated Show resolved Hide resolved
When a new IPFIX Information Element (IE) is introduced, a version
mismatch between the Agent and the Flow Aggregator can be
problematic. A "new" Agent can send an IE which is unknown to the "old"
Flow Aggregator, or the "new" Flow Aggregator may expect an IE which is
not sent by an "old" Agent.

Prior to this change, we required the list of IEs sent by the Agent to
be the same as the list of IEs expected by the Flow Aggregator. This is
impossible to ensure during upgrade, as it may take a long time for all
Agents in the cluster to be upgraded.

After this change, Agents and Flow Aggregator can be upgraded in any
order (although we would recommend the Flow Aggregator to be upgraded
last). To achieve this, we introduce a new "process" between IPFIX
collection and aggregation in the Flow Aggregator: the
"preprocessor". The preprocessor is in charge of processing messages
received from the IPFIX collector, prior to handling records over to the
aggregation process. At the moment, its only task is to ensure that all
records have the expected fields. If a record has extra fields, they
will be discarded. If some fields are missing, they will be "appended"
to the record with a "zero" value. For example, we will use 0 for
integral types, "" for strings, 0.0.0.0 for IPv4 address, etc. Note that
we are able to keep the implementation simple by assuming that a record
either has missing fields or extra fields (not a combination of both),
and that such fields are always at the tail of the field list. This
assumption is based on implementation knowledge of the FlowExporter and
the FlowAggregator. When we introduce a new IE, it always comes after
all existing IEs, and we never deprecate / remove an existing IE across
versions.

Note that when the preprocessor adds a missing field, it is no longer
possible to determine whether the field was originally missing, or was
sent by the Agent with a zero value. This is why we recommend upgrading
the Flow Aggregator last (to avoid this situation altogether). However,
we do not believe that it is a significant drawback based on current
usage.

Fixes antrea-io#6777

Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas antoninbas merged commit 0efb397 into antrea-io:main Jan 10, 2025
56 of 62 checks passed
@antoninbas antoninbas deleted the fa-version-skew branch January 10, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support version skew between Antrea Agent and Flow Aggregator
2 participants