-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support version skew between Antrea Agent and Flow Aggregator #6912
Conversation
Tested manually with increased logging verbosity: Agent upgraded first:
FA upgraded first:
|
pkg/flowaggregator/flowaggregator.go
Outdated
err = fa.InitAggregationProcess() | ||
if err != nil { | ||
return nil, fmt.Errorf("error when creating aggregation process: %v", err) | ||
recordCh := make(chan ipfixentities.Record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a message can generate a collection of records. As the record channel has no buffer, the preprocessor and the AggregationProcess would be sycnhronized to some extent, which may be less efficient than the original implementation that passes messages? Should it use a channel with some buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. At the moment, the go-ipfix library only includes a single record per message, but this won't always be the case and I am trying to change it actually (in particular when UDP is used). I will make it a buffered channel.
When a new IPFIX Information Element (IE) is introduced, a version mismatch between the Agent and the Flow Aggregator can be problematic. A "new" Agent can send an IE which is unknown to the "old" Flow Aggregator, or the "new" Flow Aggregator may expect an IE which is not sent by an "old" Agent. Prior to this change, we required the list of IEs sent by the Agent to be the same as the list of IEs expected by the Flow Aggregator. This is impossible to ensure during upgrade, as it may take a long time for all Agents in the cluster to be upgraded. After this change, Agents and Flow Aggregator can be upgraded in any order (although we would recommend the Flow Aggregator to be upgraded last). To achieve this, we introduce a new "process" between IPFIX collection and aggregation in the Flow Aggregator: the "preprocessor". The preprocessor is in charge of processing messages received from the IPFIX collector, prior to handling records over to the aggregation process. At the moment, its only task is to ensure that all records have the expected fields. If a record has extra fields, they will be discarded. If some fields are missing, they will be "appended" to the record with a "zero" value. For example, we will use 0 for integral types, "" for strings, 0.0.0.0 for IPv4 address, etc. Note that we are able to keep the implementation simple by assuming that a record either has missing fields or extra fields (not a combination of both), and that such fields are always at the tail of the field list. This assumption is based on implementation knowledge of the FlowExporter and the FlowAggregator. When we introduce a new IE, it always comes after all existing IEs, and we never deprecate / remove an existing IE across versions. Note that when the preprocessor adds a missing field, it is no longer possible to determine whether the field was originally missing, or was sent by the Agent with a zero value. This is why we recommend upgrading the Flow Aggregator last (to avoid this situation altogether). However, we do not believe that it is a significant drawback based on current usage. Fixes antrea-io#6777 Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
0a3a446
to
31d2e6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
When a new IPFIX Information Element (IE) is introduced, a version mismatch between the Agent and the Flow Aggregator can be problematic. A "new" Agent can send an IE which is unknown to the "old" Flow Aggregator, or the "new" Flow Aggregator may expect an IE which is not sent by an "old" Agent.
Prior to this change, we required the list of IEs sent by the Agent to be the same as the list of IEs expected by the Flow Aggregator. This is impossible to ensure during upgrade, as it may take a long time for all Agents in the cluster to be upgraded.
After this change, Agents and Flow Aggregator can be upgraded in any order (although we would recommend the Flow Aggregator to be upgraded last). To achieve this, we introduce a new "process" between IPFIX collection and aggregation in the Flow Aggregator: the "preprocessor". The preprocessor is in charge of processing messages received from the IPFIX collector, prior to handling records over to the aggregation process. At the moment, its only task is to ensure that all records have the expected fields. If a record has extra fields, they will be discarded. If some fields are missing, they will be "appended" to the record with a "zero" value. For example, we will use 0 for integral types, "" for strings, 0.0.0.0 for IPv4 address, etc. Note that we are able to keep the implementation simple by assuming that a record either has missing fields or extra fields (not a combination of both), and that such fields are always at the tail of the field list. This assumption is based on implementation knowledge of the FlowExporter and the FlowAggregator. When we introduce a new IE, it always comes after all existing IEs, and we never deprecate / remove an existing IE across versions.
Note that when the preprocessor adds a missing field, it is no longer possible to determine whether the field was originally missing, or was sent by the Agent with a zero value. This is why we recommend upgrading the Flow Aggregator last (to avoid this situation altogether). However, we do not believe that it is a significant drawback based on current usage.
Fixes #6777