diff --git a/documentation/assemblies/cruise-control/assembly-cruise-control-concepts.adoc b/documentation/assemblies/cruise-control/assembly-cruise-control-concepts.adoc index 85ae2f32dcf..984f84cae7a 100644 --- a/documentation/assemblies/cruise-control/assembly-cruise-control-concepts.adoc +++ b/documentation/assemblies/cruise-control/assembly-cruise-control-concepts.adoc @@ -5,17 +5,25 @@ [id='cruise-control-concepts-{context}'] = Using Cruise Control for cluster rebalancing -include::../../modules/cruise-control/con-cruise-control-description.adoc[leveloffset=+1] +[role="_abstract"] +Cruise Control is an open-source application designed to run alongside Kafka to help optimize use of cluster resources by doing the following: -NOTE: Strimzi provides xref:config-examples-{context}[example configuration files for Cruise Control]. +* Monitoring cluster workload +* Rebalancing partitions based on predefined constraints -include::../../modules/cruise-control/con-cruise-control-overview.adoc[leveloffset=+1] +Cruise Control operations help with running a more balanced Kafka cluster that uses brokers more efficiently. -include::../../modules/cruise-control/con-optimization-goals.adoc[leveloffset=+1] +As Kafka clusters evolve, some brokers may become overloaded while others remain underutilized. +Cruise Control addresses this imbalance by modeling resource utilization at the replica level--including, CPU, disk, network load--and generating optimization proposals (which you can approve or reject) for balanced partition assignments based on configurable optimization goals. -include::../../modules/cruise-control/con-optimization-proposals.adoc[leveloffset=+1] +Optimization proposals are configured and generated using a `KafkaRebalance` resource. +You can configure the resource using an annotation so that optimization proposals are approved automatically or manually. -include::../../modules/cruise-control/con-rebalance-performance.adoc[leveloffset=+1] +NOTE: Strimzi provides xref:config-examples-{context}[example configuration files for Cruise Control]. + +//overview and concepts +include::../../modules/cruise-control/con-cruise-control-overview.adoc[leveloffset=+1] +include::../../modules/cruise-control/con-rebalance-performance.adoc[leveloffset=+2] include::../../modules/cruise-control/proc-configuring-deploying-cruise-control.adoc[leveloffset=+1] diff --git a/documentation/modules/cruise-control/con-cruise-control-description.adoc b/documentation/modules/cruise-control/con-cruise-control-description.adoc deleted file mode 100644 index 7aa0d575e77..00000000000 --- a/documentation/modules/cruise-control/con-cruise-control-description.adoc +++ /dev/null @@ -1,24 +0,0 @@ -//standard description for cruise control -[role="_abstract"] -Cruise Control is an open source system that supports the following Kafka operations: - -* Monitoring cluster workload -* Rebalancing a cluster based on predefined constraints - -The operations help with running a more balanced Kafka cluster that uses broker pods more efficiently. - -A typical cluster can become unevenly loaded over time. -Partitions that handle large amounts of message traffic might not be evenly distributed across the available brokers. -To rebalance the cluster, administrators must monitor the load on brokers and manually reassign busy partitions to brokers with spare capacity. - -Cruise Control automates the cluster rebalancing process. -It constructs a _workload model_ of resource utilization for the cluster--based on CPU, disk, and network load--and generates optimization proposals (that you can approve or reject) for more balanced partition assignments. -A set of configurable optimization goals is used to calculate these proposals. - -You can generate optimization proposals in specific modes. -The default `full` mode rebalances partitions across all brokers. -You can also use the `add-brokers` and `remove-brokers` modes to accommodate changes when scaling a cluster up or down. - -When you approve an optimization proposal, Cruise Control applies it to your Kafka cluster. -You configure and generate optimization proposals using a `KafkaRebalance` resource. -You can configure the resource using an annotation so that optimization proposals are approved automatically or manually. \ No newline at end of file diff --git a/documentation/modules/cruise-control/con-cruise-control-overview.adoc b/documentation/modules/cruise-control/con-cruise-control-overview.adoc index 150b1f30b61..e55e4a2be54 100644 --- a/documentation/modules/cruise-control/con-cruise-control-overview.adoc +++ b/documentation/modules/cruise-control/con-cruise-control-overview.adoc @@ -2,54 +2,505 @@ // // assembly-cruise-control-concepts.adoc -// Save the context of the assembly that is including this one. -// This is necessary for including assemblies in assemblies. -// See also the complementary step on the last line of this file. - [id='con-cruise-control-overview-{context}'] = Cruise Control components and features [role="_abstract"] -Cruise Control consists of four main components--the Load Monitor, the Analyzer, the Anomaly Detector, and the Executor--and a REST API for client interactions. -Strimzi utilizes the REST API to support the following Cruise Control features: --- -* Generating optimization proposals from optimization goals. -* Rebalancing a Kafka cluster based on an optimization proposal. --- - -Optimization goals:: An optimization goal describes a specific objective to achieve from a rebalance. -For example, a goal might be to distribute topic replicas across brokers more evenly. -You can change what goals to include through configuration. -A goal is defined as a hard goal or soft goal. -You can add hard goals through Cruise Control deployment configuration. -You also have main, default, and user-provided goals that fit into each of these categories. -+ --- -* *Hard goals* are preset and must be satisfied for an optimization proposal to be successful. -* *Soft goals* do not need to be satisfied for an optimization proposal to be successful. -They can be set aside if it means that all hard goals are met. -* *Main goals* are inherited from Cruise Control. Some are preset as hard goals. -Main goals are used in optimization proposals by default. -* *Default goals* are the same as the main goals by default. -You can specify your own set of default goals. -* *User-provided goals* are a subset of default goals that are configured for generating a specific optimization proposal. --- - -Optimization proposals:: Optimization proposals comprise the goals you want to achieve from a rebalance. -You generate an optimization proposal to create a summary of proposed changes and the results that are possible with the rebalance. -The goals are assessed in a specific order of priority. -You can then choose to approve or reject the proposal. -You can reject the proposal to run it again with an adjusted set of goals. -+ -You can generate an optimization proposal in one of three modes. -+ --- -* *`full`* is the default mode and runs a full rebalance. -* *`add-brokers`* is the mode you use after adding brokers when scaling up a Kafka cluster. -* *`remove-brokers`* is the mode you use before removing brokers when scaling down a Kafka cluster. --- - -Other Cruise Control features are not currently supported, including self healing, notifications, and write-your-own goals. +Cruise Control comprises four main components: + +Load Monitor:: Load Monitor collects the metrics and analyzes cluster workload data. +Analyzer:: Analyzer generates optimization proposals based on collected data and configured goals. +Anomaly Detector:: Anomaly Detector identifies and reports irregularities in cluster behavior. +Executor:: Executor applies approved optimization proposals to the cluster. + +Cruise Control also provides a REST API for client interactions, which Strimzi uses to support these features: + +* Generating optimization proposals from optimization goals +* Rebalancing a Kafka cluster based on an optimization proposal +* Changing topic replication factor + +NOTE: Other Cruise Control features are not currently supported, including self healing, notifications, and write-your-own goals. + +== Optimization goals + +Optimization goals define objectives for rebalancing, such as distributing topic replicas evenly across brokers. + +They are categorized as follows: + +* *Hard goals* are preset and must be satisfied for a proposal to succeed. +* *Soft goals* are objectives that are prioritized during optimization as much as possible, without preventing a proposal from being created if all hard goals are satisfied. +* *Main goals* contain the list of goals that can be used by Cruise Control operations (all built-in goals by default). +This list can be customized to specify a subset of all available goals, including any custom goals. +* *Default goals* refer to the goals used by default when generating proposals. +They match the main goals unless specifically set by the user. +* *User-provided goals* are a subset of default goals configured for specific proposals. + +Configure optimization goals in the `Kafka` and `KafkaRebalance` custom resources. + +* `Kafka` resource for hard, main, and default goals. +** Hard goals: `Kafka.spec.cruiseControl.config.hard.goals` +** Main goals: `Kafka.spec.cruiseControl.config.goals` +** Default goals: `Kafka.spec.cruiseControl.config.default.goals` +* `KafkaRebalance` resource for user-provided goals. +** User-Provided goals: `KafkaRebalance.spec.goals` + +=== Hard and soft goals + +Hard goals are mandatory and must be satisfied for optimization proposals to be generated. +Soft goals are best-effort objectives that Cruise Control tries to meet after all hard goals are satisfied. +The classification of hard and soft goals is fixed in Cruise Control code and cannot be changed. + +Cruise Control first prioritizes satisfying hard goals, and then addresses soft goals in the order they are listed. +A proposal meeting all hard goals is valid, even if it violates some soft goals. + +For example, a soft goal might be to evenly distribute a topic's replicas. +Cruise Control will ignore this if it conflicts with hard goals. + +Configure hard goals in your Cruise Control deployment using `Kafka.spec.cruiseControl.config.hard.goals`: + +* To enforce all hard goals, omit the `hard.goals` property. +* To specify hard goals, list them in `hard.goals`. +* To exclude a hard goal, ensure it's not in either `default.goals` or `hard.goals`. + +Increasing the number of configured hard goals will reduce the likelihood of Cruise Control generating optimization proposals. + +=== Main goals + +Main goals are predefined and available to all users in Cruise Control. +Goals not listed as main goals cannot be used in Cruise Control operations. +Some main goals are preset as hard goals. + +To simplify configuration, use the inherited main goals unless you need to exclude specific goals from `KafkaRebalance` resources. +You can adjust the priority order in the default optimization goals configuration. + +Configure main goals in `Kafka.spec.cruiseControl.config.goals`: + +* To accept inherited main goals, omit the `goals` property. +* To modify main goals, specify the goals in descending priority order in the `goals` property. + +=== Default goals + +Cruise Control uses default goals to generate an optimization proposal. +You can override default goals by setting user-provided optimization goals in a `KafkaRebalance` resource. + +If `default.goals` is not specified in the Cruise Control deployment configuration, main goals are used as default goals. +The optimization proposal is then generated using these main goals. + +Configure default goals in `Kafka.spec.cruiseControl.config.default.goals`: + +* To use main goals as default, omit the `default.goals` property. +* To modify default goals, specify a subset of main goals in the `default.goals` property. + +=== User-provided goals + +User-provided optimization goals narrow down the configured default goals for specific optimization proposals. + +Configure user-provided goals in `KafkaRebalance.spec.goals`: + +* Specify a subset of main optimization goals for customization. + +For example, you can optimize topic leader replica distribution across the Kafka cluster without considering disk capacity or utilization by defining a single user-provided goal. + +=== Goals order of priority + +Unless you change the Cruise Control xref:proc-configuring-deploying-cruise-control-{context}[deployment configuration], Strimzi inherits goals from Cruise Control, in descending priority order. + +The following list shows main goals inherited by Strimzi from Cruise Control in descending priority order. +Goals labeled as hard are mandatory constraints that must be satisfied for optimization proposals. + +* `RackAwareGoal` (hard) +* `MinTopicLeadersPerBrokerGoal` +* `ReplicaCapacityGoal` (hard) +* `DiskCapacityGoal` (hard) +* `NetworkInboundCapacityGoal` (hard) +* `NetworkOutboundCapacityGoal` (hard) +* `CpuCapacityGoal` (hard) +* `ReplicaDistributionGoal` +* `PotentialNwOutGoal` +* `DiskUsageDistributionGoal` +* `NetworkInboundUsageDistributionGoal` +* `NetworkOutboundUsageDistributionGoal` +* `CpuUsageDistributionGoal` +* `TopicReplicaDistributionGoal` +* `LeaderReplicaDistributionGoal` +* `LeaderBytesInDistributionGoal` +* `PreferredLeaderElectionGoal` +* `IntraBrokerDiskCapacityGoal` +* `IntraBrokerDiskUsageDistributionGoal` + +Resource distribution goals are subject to link:{BookURLConfiguring}#property-cruise-control-broker-capacity-reference[capacity limits^] on broker resources. + +For more information on each optimization goal, see link:https://github.com/linkedin/cruise-control/wiki/Pluggable-Components#goals[Goals^] in the Cruise Control Wiki. + +NOTE: "Write your own" goals and Kafka assigner goals are not supported. + +.Example `Kafka` configuration for default and hard goals +[source,yaml,subs="attributes+"] +---- +apiVersion: {KafkaApiVersion} +kind: Kafka +metadata: + name: my-cluster +spec: + kafka: + # ... + zookeeper: + # ... + entityOperator: + topicOperator: {} + userOperator: {} + cruiseControl: + brokerCapacity: + inboundNetwork: 10000KB/s + outboundNetwork: 10000KB/s + config: + #`default.goals` (superset) must also include all `hard.goals` (subset) + default.goals: > + com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, + com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, + com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal + com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, + com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal + hard.goals: > + com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal + com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, + com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal + # ... +---- + +IMPORTANT: Ensure that the main `goals`, `default.goals`, and (unless `skipHardGoalCheck` is set to `true`) user-provided `spec.goals` include all hard goals specified in `hard.goals` to avoid errors when generating optimization proposals. +Hard goals must be included as a subset in the main, default, and user-provided goals. + +.Example `KafkaRebalance` configuration for user-provided goals +[source,yaml,subs="attributes+"] +---- +apiVersion: {KafkaRebalanceApiVersion} +kind: KafkaRebalance +metadata: + name: my-rebalance + labels: + strimzi.io/cluster: my-cluster +spec: + goals: + - RackAwareGoal + - TopicReplicaDistributionGoal + skipHardGoalCheck: true +---- + +=== Skipping hard goal checks + +If `skipHardGoalCheck: true` is specified in the `KafkaRebalance` custom resource, Cruise Control does not verify that the user-provided goals include all the configured hard goals. +This allows for more flexibility in generating optimization proposals, but may lead to proposals that do not satisfy all hard goals. + +However, any hard goals included in the user-provided goals will still be treated as hard goals by Cruise Control, even with `skipHardGoalCheck: true`. + +[id='con-optimization-proposals-{context}'] +== Optimization proposals + +Optimization proposals are summaries of proposed changes based on the defined optimization goals, assessed in a specific order of priority. +You can approve or reject proposals and rerun them with adjusted goals if needed. + +With Cruise Control deployed for use in Strimzi, the process to generate and approve an optimization proposal is as follows: + +. Create a `KafkaRebalance` resource specifying optimization goals and any specific configurations. +This resource triggers Cruise Control to initiate the optimization process. +. Strimzi Metrics Reporter runs in Kafka brokers, collecting raw metrics and publishing them to a dedicated Kafka topic (`strimzi.cruisecontrol.metrics`). +Metrics for partition topics and modeling the impact of rebalances are collected in other xref:proc-cruise-control-auto-created-topics-{context}[topics automatically created when Cruise Control is deployed]. +. Load Monitor collects the metrics from Kafka brokers, including CPU, disk, and network utilization data. +. Anomaly Detector continuously monitors the collected metrics to identify anomalies, such as broker failures or disk capacity issues, that could impact cluster stability. +. Analyzer processes the collected metrics and constructs a _workload model_ of the current state of the Kafka cluster. +Based on configured goals and capacities, it generates an optimization proposal for balancing partition distribution across brokers, which is reflected in the status of the `KafkaRebalance` resource. +. The optimization proposal is approved or rejected (manually or automatically) based on its alignment with cluster management goals. +. If approved, the Executor applies the optimization proposal to rebalance the Kafka cluster. +This involves reassigning partitions and redistributing workload across brokers according to the approved proposal. + +.Cruise Control optimization process +image:kafka-concepts-cruise-control.png[Cruise Control process] + +Optimization proposals comprise separate partition reassignment commands. +When you approve a proposal, the Cruise Control server applies these commands to the Kafka cluster. + +A partition reassignment command consists of either of the following types of operations: + +* Partition movement: Involves transferring the partition replica and its data to a new location. Partition movements can take one of two forms: +** Inter-broker movement: The partition replica is moved to a log directory on a different broker. +** Intra-broker movement: The partition replica is moved to a different log directory on the same broker. + +* Leadership movement: Involves switching the leader of the partition's replicas. + +Cruise Control issues partition reassignment commands to the Kafka cluster in batches. +The performance of the cluster during the rebalance is affected by the number of each type of movement contained in each batch. + +[id='con-optimization-proposals-modes-{context}'] +=== Rebalancing modes + +Proposals for rebalances can be generated in three modes, which are specified using the `spec.mode` property of the `KafkaRebalance` custom resource. + +`full` mode:: The `full` mode runs a full rebalance by moving replicas across all the brokers in the cluster. +This is the default mode if the `spec.mode` property is not defined in the `KafkaRebalance` custom resource. + +`add-brokers` mode:: The `add-brokers` mode is used after scaling up a Kafka cluster by adding one or more brokers. +Normally, after scaling up a Kafka cluster, new brokers are used to host only the partitions of newly created topics. +If no new topics are created, the newly added brokers are not used and the existing brokers remain under the same load. +By using the `add-brokers` mode immediately after adding brokers to the cluster, the rebalancing operation moves replicas from existing brokers to the newly added brokers. +You specify the new brokers as a list using the `spec.brokers` property of the `KafkaRebalance` custom resource. + +`remove-brokers` mode:: The `remove-brokers` mode is used before scaling down a Kafka cluster by removing one or more brokers. +If you scale down a Kafka cluster, brokers are shut down even if they host replicas. +This can lead to under-replicated partitions and possibly result in some partitions being under their minimum ISR (in-sync replicas). +To avoid this potential problem, the `remove-brokers` mode moves replicas off the brokers that are going to be removed. +When these brokers are not hosting replicas anymore, you can safely run the scaling down operation. +You specify the brokers you're removing as a list in the `spec.brokers` property in the `KafkaRebalance` custom resource. + +In general, use the `full` rebalance mode to rebalance a Kafka cluster by spreading the load across brokers. +Use the `add-brokers` and `remove-brokers` modes only if you want to scale your cluster up or down and rebalance the replicas accordingly. + +The procedure to run a rebalance is actually the same across the three different modes. +The only difference is with specifying a mode through the `spec.mode` property and, if needed, listing brokers that have been added or will be removed through the `spec.brokers` property. + + +[[contents-optimization-proposals]] +=== The results of an optimization proposal + +When an optimization proposal is generated, a summary and broker load is returned. + +Summary:: The summary is contained in the `KafkaRebalance` resource. The summary provides an overview of the proposed cluster rebalance and indicates the scale of the changes involved. +A summary of a successfully generated optimization proposal is contained in the `Status.optimizationResult` property of the `KafkaRebalance` resource. +The information provided is a summary of the full optimization proposal. +Broker load:: The broker load is stored in a ConfigMap that contains data as a JSON string. The broker load shows before and after values for the proposed rebalance, so you can see the impact on each of the brokers in the cluster. + +=== Manually approving or rejecting an optimization proposal + +An optimization proposal summary shows the proposed scope of changes. + +You can use the name of the `KafkaRebalance` resource to return a summary from the command line. + +.Returning an optimization proposal summary +[source,shell] +---- +kubectl describe kafkarebalance -n +---- + +You can also use the `jq` {JQTool}. + +.Returning an optimization proposal result using jq +[source,shell] +---- +kubectl get kafkarebalance -n -o json | jq '.status.optimizationResult' +---- + +Use the summary to decide whether to approve or reject an optimization proposal. + +Approving an optimization proposal:: You approve the optimization proposal by setting the `strimzi.io/rebalance` annotation of the `KafkaRebalance` resource to `approve`. +Cruise Control applies the proposal to the Kafka cluster and starts a cluster rebalance operation. +Rejecting an optimization proposal:: If you choose not to approve an optimization proposal, +you can xref:proc-generating-optimization-proposals-str[change the optimization goals] or xref:con-rebalance-{context}[update any of the rebalance performance tuning options], and then generate another proposal. +You can generate a new optimization proposal for a `KafkaRebalance` resource by setting the `strimzi.io/rebalance` annotation to `refresh`. + +Use optimization proposals to assess the movements required for a rebalance. +For example, a summary describes inter-broker and intra-broker movements. +Inter-broker rebalancing moves data between separate brokers. +Intra-broker rebalancing moves data between disks on the same broker when you are using a JBOD storage configuration. +Such information can be useful even if you don't go ahead and approve the proposal. + +You might reject an optimization proposal, or delay its approval, because of the additional load on a Kafka cluster when rebalancing. +If the proposal is delayed for too long, the cluster load may change significantly, so it may be better to request a new proposal. + +In the following example, the proposal suggests the rebalancing of data between separate brokers. +The rebalance involves the movement of 55 partition replicas, totaling 12MB of data, across the brokers. +Though the inter-broker movement of partition replicas has a high impact on performance, the total amount of data is not large. +If the total data was much larger, you could reject the proposal, or time when to approve the rebalance to limit the impact on the performance of the Kafka cluster. + +Rebalance performance tuning options can help reduce the impact of data movement. +If you can extend the rebalance period, you can divide the rebalance into smaller batches. +Fewer data movements at a single time reduces the load on the cluster. + +.Example optimization proposal summary +[source,yaml] +---- +Name: my-rebalance +Namespace: myproject +Labels: strimzi.io/cluster=my-cluster +Annotations: API Version: kafka.strimzi.io/v1alpha1 +Kind: KafkaRebalance +Metadata: +# ... +Status: + Conditions: + Last Transition Time: 2022-04-05T14:36:11.900Z + Status: ProposalReady + Type: State + Observed Generation: 1 + Optimization Result: + Data To Move MB: 0 + Excluded Brokers For Leadership: + Excluded Brokers For Replica Move: + Excluded Topics: + Intra Broker Data To Move MB: 12 + Monitored Partitions Percentage: 100 + Num Intra Broker Replica Movements: 0 + Num Leader Movements: 24 + Num Replica Movements: 55 + On Demand Balancedness Score After: 82.91290759174306 + On Demand Balancedness Score Before: 78.01176356230222 + Recent Windows: 5 + Session Id: a4f833bd-2055-4213-bfdd-ad21f95bf184 +---- + +The proposal will also move 24 partition leaders to different brokers. +This requires a change to the cluster metadata, which has a low impact on performance. + +The balancedness scores are measurements of the overall balance of the Kafka cluster before and after the optimization proposal is approved. +A balancedness score is based on optimization goals. +If all goals are satisfied, the score is 100. +The score is reduced for each goal that will not be met. +Compare the balancedness scores to see whether the Kafka cluster is less balanced than it could be following a rebalance. + +=== Automatically approving an optimization proposal + +To save time, you can automate the process of approving optimization proposals. +With automation, when you generate an optimization proposal it goes straight into a cluster rebalance. + +To enable the optimization proposal auto-approval mechanism, create the `KafkaRebalance` resource with the `strimzi.io/rebalance-auto-approval` annotation set to `true`. +If the annotation is not set or set to `false`, the optimization proposal requires manual approval. + +.Example rebalance request with auto-approval mechanism enabled +[source,yaml,subs="+attributes"] +---- +apiVersion: {KafkaRebalanceApiVersion} +kind: KafkaRebalance +metadata: + name: my-rebalance + labels: + strimzi.io/cluster: my-cluster + annotations: + strimzi.io/rebalance-auto-approval: "true" +spec: + mode: # any mode + # ... +---- + +You can still check the status when automatically approving an optimization proposal. +The status of the `KafkaRebalance` resource moves to `Ready` when the rebalance is complete. + +=== Optimization proposal summary properties + +The following table explains the properties contained in the optimization proposal's summary. + +.Properties contained in an optimization proposal summary +[cols="1m,1",options="header"] +|=== +| JSON property +| Description + +| numIntraBrokerReplicaMovements +| The total number of partition replicas that will be transferred between the disks of the cluster's brokers. + +*Performance impact during rebalance operation*: Relatively high, but lower than `numReplicaMovements`. + +| excludedBrokersForLeadership +| Not yet supported. An empty list is returned. + +| numReplicaMovements +| The number of partition replicas that will be moved between separate brokers. + +*Performance impact during rebalance operation*: Relatively high. + +| onDemandBalancednessScoreBefore + +onDemandBalancednessScoreAfter +| A measurement of the overall _balancedness_ of a Kafka Cluster, before and after the optimization proposal was generated. + +The score is calculated by subtracting the sum of the `BalancednessScore` of each violated soft goal from 100. Cruise Control assigns a `BalancednessScore` to every optimization goal based on several factors, including priority--the goal's position in the list of `default.goals` or user-provided goals. + +The `Before` score is based on the current configuration of the Kafka cluster. +The `After` score is based on the generated optimization proposal. + +| intraBrokerDataToMoveMB +| The sum of the size of each partition replica that will be moved between disks on the same broker (see also `numIntraBrokerReplicaMovements`). + +*Performance impact during rebalance operation*: Variable. The larger the number, the longer the cluster rebalance will take to complete. Moving a large amount of data between disks on the same broker has less impact than between separate brokers (see `dataToMoveMB`). + +| recentWindows +| The number of metrics windows upon which the optimization proposal is based. + +| dataToMoveMB +| The sum of the size of each partition replica that will be moved to a separate broker (see also `numReplicaMovements`). + +*Performance impact during rebalance operation*: Variable. The larger the number, the longer the cluster rebalance will take to complete. + +| monitoredPartitionsPercentage +| The percentage of partitions in the Kafka cluster covered by the optimization proposal. Affected by the number of `excludedTopics`. + +| excludedTopics +| If you specified a regular expression in the `spec.excludedTopicsRegex` property in the `KafkaRebalance` resource, all topic names matching that expression are listed here. +These topics are excluded from the calculation of partition replica/leader movements in the optimization proposal. + +| numLeaderMovements +| The number of partitions whose leaders will be switched to different replicas. This involves a change to ZooKeeper configuration. + +*Performance impact during rebalance operation*: Relatively low. + +| excludedBrokersForReplicaMove +| Not yet supported. An empty list is returned. + +|=== + +=== Comparing broker load data + +Broker load data provides insights into current and anticipated usage of resources following a rebalance. +The data is stored in a `ConfigMap` (with the same name as the `KafkaRebalance` resource) as a JSON formatted string + +When a Kafka rebalance proposal reaches the `ProposalReady` state, Cruise Control generates a `ConfigMap` (named after the `KafkaRebalance` custom resource) containing a JSON string of broker metrics. +Each broker has a set of key metrics represented by three values: + +* The current metric value before the optimization proposal is applied +* The expected metric value after applying the proposal +* The difference between the two values (after minus before) + +This `ConfigMap` remains accessible even after the rebalance completes. + +To view this data from the command line, use the `ConfigMap` name. + +.Returning ConfigMap data +[source,shell] +---- +kubectl describe configmaps -n +---- + +You can also use the `jq` {JQTool} to extract the JSON string. + +.Extracting the JSON string from the ConfigMap using jq +[source,shell] +---- +kubectl get configmaps -o json | jq '.["data"]["brokerLoad.json"]|fromjson|.' +---- + +.Properties captured in the config map +[cols="35m,65",options="header"] +|=== + +| JSON property | Description +| leaders | The number of replicas on this broker that are partition leaders. +| replicas | The number of replicas on this broker. +| cpuPercentage | The CPU utilization as a percentage of the defined capacity. +| diskUsedPercentage | The disk utilization as a percentage of the defined capacity. +| diskUsedMB | The absolute disk usage in MB. +| networkOutRate | The total network output rate for the broker. +| leaderNetworkInRate | The network input rate for all partition leader replicas on this broker. +| followerNetworkInRate | The network input rate for all follower replicas on this broker. +| potentialMaxNetworkOutRate | The hypothetical maximum network output rate that would be realized if this broker became the leader of all the replicas it currently hosts. + +|=== + +=== Adjusting the cached proposal refresh rate + +Cruise Control maintains a _cached optimization proposal_ based on the configured default optimization goals. +This proposal is generated from the workload model and updated every 15 minutes to reflect the current state of the Kafka cluster. +When you generate an optimization proposal using the default goals, Cruise Control returns the latest cached version. + +For clusters with rapidly changing workloads, you may want to shorten the refresh interval to ensure the optimization proposal reflects the most recent state. +However, reducing the interval increases the load on the Cruise Control server. +To adjust the refresh rate, modify the `proposal.expiration.ms` setting in the Cruise Control deployment configuration. [role="_additional-resources"] .Additional resources diff --git a/documentation/modules/cruise-control/con-optimization-goals.adoc b/documentation/modules/cruise-control/con-optimization-goals.adoc deleted file mode 100644 index 1f826f7c8f7..00000000000 --- a/documentation/modules/cruise-control/con-optimization-goals.adoc +++ /dev/null @@ -1,222 +0,0 @@ -// Module included in the following assemblies: -// -// assembly-cruise-control-concepts.adoc - -[id='con-optimization-goals-{context}'] -= Optimization goals overview - -[role="_abstract"] -Optimization goals are constraints on workload redistribution and resource utilization across a Kafka cluster. -To rebalance a Kafka cluster, Cruise Control uses optimization goals to generate xref:con-optimization-proposals-{context}[optimization proposals], which you can approve or reject. - -== Goals order of priority - -Strimzi supports most of the optimization goals developed in the Cruise Control project. -The supported goals, in the default descending order of priority, are as follows: - -. Rack-awareness -* Rack-awareness distribution -. Minimum number of leader replicas per broker for a set of topics -. Replica capacity -. Capacity goals -** Disk capacity -** Network inbound capacity -** Network outbound capacity -** CPU capacity -. Replica distribution -. Potential network output -. Resource distribution goals -** Disk utilization distribution -** Network inbound utilization distribution -** Network outbound utilization distribution -** CPU utilization distribution -. Leader bytes-in rate distribution -. Topic replica distribution -. Leader replica distribution -. Preferred leader election -. Intra-broker disk capacity -. Intra-broker disk usage distribution - -For more information on each optimization goal, see link:https://github.com/linkedin/cruise-control/wiki/Pluggable-Components#goals[Goals^] in the Cruise Control Wiki. - -NOTE: "Write your own" goals and Kafka assigner goals are not yet supported. - -== Goals configuration in Strimzi custom resources - -You configure optimization goals in `Kafka` and `KafkaRebalance` custom resources. -Cruise Control has configurations for hard optimization goals that must be satisfied, as well as main, default, and user-provided optimization goals. - -You can specify optimization goals in the following configuration: - -* *Main goals* -- `Kafka.spec.cruiseControl.config.goals` -* *Hard goals* -- `Kafka.spec.cruiseControl.config.hard.goals` -* *Default goals* -- `Kafka.spec.cruiseControl.config.default.goals` -* *User-provided goals* -- `KafkaRebalance.spec.goals` - -[NOTE] -==== -Resource distribution goals are subject to link:{BookURLConfiguring}#property-cruise-control-broker-capacity-reference[capacity limits^] on broker resources. -==== - -[[hard-soft-goals]] -== Hard and soft optimization goals - -Hard goals are goals that _must_ be satisfied in optimization proposals. -Goals that are not defined as _hard goals_ in the Cruise Control code are known as _soft goals_. -You can think of soft goals as _best effort_ goals: they do _not_ need to be satisfied in optimization proposals, but are included in optimization calculations. -An optimization proposal that violates one or more soft goals, but satisfies all hard goals, is valid. - -Cruise Control will calculate optimization proposals that satisfy all the hard goals and as many soft goals as possible (in their priority order). -An optimization proposal that does _not_ satisfy all the hard goals is rejected by Cruise Control and not sent to the user for approval. - -NOTE: For example, you might have a soft goal to distribute a topic's replicas evenly across the cluster (the topic replica distribution goal). -Cruise Control will ignore this goal if doing so enables all the configured hard goals to be met. - -In Cruise Control, the following xref:main-goals[main optimization goals] are hard goals: - -[source] -RackAwareGoal; ReplicaCapacityGoal; DiskCapacityGoal; NetworkInboundCapacityGoal; NetworkOutboundCapacityGoal; CpuCapacityGoal - -In your Cruise Control deployment configuration, you can specify which hard goals to enforce using the `hard.goals` property in `Kafka.spec.cruiseControl.config`. - -* To enforce execution of all hard goals, simply omit the `hard.goals` property. - -* To change which hard goals Cruise Control enforces, specify the required goals in the `hard.goals` property using their fully-qualified domain names. -* To prevent execution of a specific hard goal, ensure that the goal is not included in both the `default.goals` and `hard.goals` list configurations. - -NOTE: It is not possible to configure which goals are considered soft or hard goals. -This distinction is determined by the Cruise Control code. - -.Example `Kafka` configuration for hard optimization goals -[source,yaml,subs="attributes+"] ----- -apiVersion: {KafkaApiVersion} -kind: Kafka -metadata: - name: my-cluster -spec: - kafka: - # ... - zookeeper: - # ... - entityOperator: - topicOperator: {} - userOperator: {} - cruiseControl: - brokerCapacity: - inboundNetwork: 10000KB/s - outboundNetwork: 10000KB/s - config: - # Note that `default.goals` (superset) must also include all `hard.goals` (subset) - default.goals: > - com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, - com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal - hard.goals: > - com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, - com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal - # ... ----- - -Increasing the number of configured hard goals will reduce the likelihood of Cruise Control generating valid optimization proposals. - -If `skipHardGoalCheck: true` is specified in the `KafkaRebalance` custom resource, Cruise Control does _not_ check that the list of user-provided optimization goals (in `KafkaRebalance.spec.goals`) contains _all_ the configured hard goals (`hard.goals`). -Therefore, if some, but not all, of the user-provided optimization goals are in the `hard.goals` list, Cruise Control will still treat them as hard goals even if `skipHardGoalCheck: true` is specified. - -[[main-goals]] -== Main optimization goals - -The _main optimization goals_ are available to all users. -Goals that are not listed in the main optimization goals are not available for use in Cruise Control operations. - -Unless you change the Cruise Control xref:proc-configuring-deploying-cruise-control-{context}[deployment configuration], Strimzi will inherit the following main optimization goals from Cruise Control, in descending priority order: - -[source] -RackAwareGoal; MinTopicLeadersPerBrokerGoal; ReplicaCapacityGoal; DiskCapacityGoal; NetworkInboundCapacityGoal; NetworkOutboundCapacityGoal; CpuCapacityGoal; ReplicaDistributionGoal; PotentialNwOutGoal; DiskUsageDistributionGoal; NetworkInboundUsageDistributionGoal; NetworkOutboundUsageDistributionGoal; CpuUsageDistributionGoal; TopicReplicaDistributionGoal; LeaderReplicaDistributionGoal; LeaderBytesInDistributionGoal; PreferredLeaderElectionGoal - -Some of these goals are preset as xref:hard-soft-goals[hard goals]. - -To reduce complexity, we recommend that you use the inherited main optimization goals, unless you need to _completely_ exclude one or more goals from use in `KafkaRebalance` resources. The priority order of the main optimization goals can be modified, if desired, in the configuration for xref:default-goals[default optimization goals]. - -You configure main optimization goals, if necessary, in the Cruise Control deployment configuration: `Kafka.spec.cruiseControl.config.goals` - -* To accept the inherited main optimization goals, do not specify the `goals` property in `Kafka.spec.cruiseControl.config`. - -* If you need to modify the inherited main optimization goals, specify a list of goals, in descending priority order, in the `goals` configuration option. - -NOTE: To avoid errors when generating optimization proposals, make sure that any changes you make to the `goals` or `default.goals` in `Kafka.spec.cruiseControl.config` include all of the hard goals specified for the `hard.goals` property. To clarify, the hard goals must also be specified (as a subset) for the main optimization goals and default goals. - -[[default-goals]] -== Default optimization goals - -Cruise Control uses the _default optimization goals_ to generate the _cached optimization proposal_. -For more information about the cached optimization proposal, see xref:con-optimization-proposals-{context}[]. - -You can override the default optimization goals by setting xref:user-provided-goals[user-provided optimization goals] in a `KafkaRebalance` custom resource. - -Unless you specify `default.goals` in the Cruise Control xref:proc-configuring-deploying-cruise-control-{context}[deployment configuration], the main optimization goals are used as the default optimization goals. -In this case, the cached optimization proposal is generated using the main optimization goals. - -* To use the main optimization goals as the default goals, do not specify the `default.goals` property in `Kafka.spec.cruiseControl.config`. - -* To modify the default optimization goals, edit the `default.goals` property in `Kafka.spec.cruiseControl.config`. -You must use a subset of the main optimization goals. - -.Example `Kafka` configuration for default optimization goals - -[source,yaml,subs="attributes+"] ----- -apiVersion: {KafkaApiVersion} -kind: Kafka -metadata: - name: my-cluster -spec: - kafka: - # ... - zookeeper: - # ... - entityOperator: - topicOperator: {} - userOperator: {} - cruiseControl: - brokerCapacity: - inboundNetwork: 10000KB/s - outboundNetwork: 10000KB/s - config: - # Note that `default.goals` (superset) must also include all `hard.goals` (subset) - default.goals: > - com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, - com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, - com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal - hard.goals: > - com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal - # ... ----- - -If no default optimization goals are specified, the cached proposal is generated using the main optimization goals. - -[[user-provided-goals]] -== User-provided optimization goals - -_User-provided optimization goals_ narrow down the configured default goals for a particular optimization proposal. -You can set them, as required, in `spec.goals` in a `KafkaRebalance` custom resource: - ----- -KafkaRebalance.spec.goals ----- - -User-provided optimization goals can generate optimization proposals for different scenarios. -For example, you might want to optimize leader replica distribution across the Kafka cluster without considering disk capacity or disk utilization. -So, you create a `KafkaRebalance` custom resource containing a single user-provided goal for leader replica distribution. - -User-provided optimization goals must: - -* Include all configured xref:hard-soft-goals[hard goals], or an error occurs -* Be a subset of the main optimization goals - -To ignore the configured hard goals when generating an optimization proposal, add the `skipHardGoalCheck: true` property to the `KafkaRebalance` custom resource. See xref:proc-generating-optimization-proposals-{context}[]. - -[role="_additional-resources"] -.Additional resources - -* xref:proc-configuring-deploying-cruise-control-{context}[Configuring and deploying Cruise Control with Kafka] -* link:https://github.com/linkedin/cruise-control/wiki/Configurations[Configurations^] in the Cruise Control Wiki. diff --git a/documentation/modules/cruise-control/con-optimization-proposals.adoc b/documentation/modules/cruise-control/con-optimization-proposals.adoc deleted file mode 100644 index 34be99f3631..00000000000 --- a/documentation/modules/cruise-control/con-optimization-proposals.adoc +++ /dev/null @@ -1,303 +0,0 @@ -// Module included in the following assemblies: -// -// assembly-cruise-control-concepts.adoc - -[id='con-optimization-proposals-{context}'] - -= Optimization proposals overview - -[role="_abstract"] -Configure a `KafkaRebalance` resource to generate optimization proposals and apply the suggested changes. -An _optimization proposal_ is a summary of proposed changes that would produce a more balanced Kafka cluster, with partition workloads distributed more evenly among the brokers. - -Each optimization proposal is based on the set of xref:con-optimization-goals-{context}[optimization goals] that was used to generate it, subject to any configured link:{BookURLConfiguring}#property-cruise-control-broker-capacity-reference[capacity limits on broker resources]. - -All optimization proposals are _estimates_ of the impact of a proposed rebalance. -You can approve or reject a proposal. -You cannot approve a cluster rebalance without first generating the optimization proposal. - -You can run optimization proposals in one of the following rebalancing modes: - -* `full` -* `add-brokers` -* `remove-brokers` - -[id='con-optimization-proposals-modes-{context}'] -== Rebalancing modes - -You specify a rebalancing mode using the `spec.mode` property of the `KafkaRebalance` custom resource. - -`full`:: The `full` mode runs a full rebalance by moving replicas across all the brokers in the cluster. -This is the default mode if the `spec.mode` property is not defined in the `KafkaRebalance` custom resource. - -`add-brokers`:: The `add-brokers` mode is used after scaling up a Kafka cluster by adding one or more brokers. -Normally, after scaling up a Kafka cluster, new brokers are used to host only the partitions of newly created topics. -If no new topics are created, the newly added brokers are not used and the existing brokers remain under the same load. -By using the `add-brokers` mode immediately after adding brokers to the cluster, the rebalancing operation moves replicas from existing brokers to the newly added brokers. -You specify the new brokers as a list using the `spec.brokers` property of the `KafkaRebalance` custom resource. - -`remove-brokers`:: The `remove-brokers` mode is used before scaling down a Kafka cluster by removing one or more brokers. -If you scale down a Kafka cluster, brokers are shut down even if they host replicas. -This can lead to under-replicated partitions and possibly result in some partitions being under their minimum ISR (in-sync replicas). -To avoid this potential problem, the `remove-brokers` mode moves replicas off the brokers that are going to be removed. -When these brokers are not hosting replicas anymore, you can safely run the scaling down operation. -You specify the brokers you're removing as a list in the `spec.brokers` property in the `KafkaRebalance` custom resource. - -In general, use the `full` rebalance mode to rebalance a Kafka cluster by spreading the load across brokers. -Use the `add-brokers` and `remove-brokers` modes only if you want to scale your cluster up or down and rebalance the replicas accordingly. - -The procedure to run a rebalance is actually the same across the three different modes. -The only difference is with specifying a mode through the `spec.mode` property and, if needed, listing brokers that have been added or will be removed through the `spec.brokers` property. - - -[[contents-optimization-proposals]] -== The results of an optimization proposal - -When an optimization proposal is generated, a summary and broker load is returned. - -Summary:: The summary is contained in the `KafkaRebalance` resource. The summary provides an overview of the proposed cluster rebalance and indicates the scale of the changes involved. -A summary of a successfully generated optimization proposal is contained in the `Status.OptimizationResult` property of the `KafkaRebalance` resource. -The information provided is a summary of the full optimization proposal. -Broker load:: The broker load is stored in a ConfigMap that contains data as a JSON string. The broker load shows before and after values for the proposed rebalance, so you can see the impact on each of the brokers in the cluster. - -== Manually approving or rejecting an optimization proposal - -An optimization proposal summary shows the proposed scope of changes. - -You can use the name of the `KafkaRebalance` resource to return a summary from the command line. - -.Returning an optimization proposal summary -[source,shell,subs=+quotes] ----- -kubectl describe kafkarebalance __ -n __ ----- - -You can also use the `jq` {JQTool}. - -.Returning an optimization proposal summary using jq -[source,shell,subs=+quotes] ----- -`kubectl get kafkarebalance -o json | jq __`. ----- - -Use the summary to decide whether to approve or reject an optimization proposal. - -Approving an optimization proposal:: You approve the optimization proposal by setting the `strimzi.io/rebalance` annotation of the `KafkaRebalance` resource to `approve`. -Cruise Control applies the proposal to the Kafka cluster and starts a cluster rebalance operation. -Rejecting an optimization proposal:: If you choose not to approve an optimization proposal, -you can xref:proc-generating-optimization-proposals-str[change the optimization goals] or xref:rebalance_tuning_options[update any of the rebalance performance tuning options], and then generate another proposal. -You can generate a new optimization proposal for a `KafkaRebalance` resource by setting the `strimzi.io/rebalance` annotation to `refresh`. - -Use optimization proposals to assess the movements required for a rebalance. -For example, a summary describes inter-broker and intra-broker movements. -Inter-broker rebalancing moves data between separate brokers. -Intra-broker rebalancing moves data between disks on the same broker when you are using a JBOD storage configuration. -Such information can be useful even if you don't go ahead and approve the proposal. - -You might reject an optimization proposal, or delay its approval, because of the additional load on a Kafka cluster when rebalancing. - -In the following example, the proposal suggests the rebalancing of data between separate brokers. -The rebalance involves the movement of 55 partition replicas, totaling 12MB of data, across the brokers. -Though the inter-broker movement of partition replicas has a high impact on performance, the total amount of data is not large. -If the total data was much larger, you could reject the proposal, or time when to approve the rebalance to limit the impact on the performance of the Kafka cluster. - -Rebalance performance tuning options can help reduce the impact of data movement. -If you can extend the rebalance period, you can divide the rebalance into smaller batches. -Fewer data movements at a single time reduces the load on the cluster. - -.Example optimization proposal summary -[source,yaml] ----- -Name: my-rebalance -Namespace: myproject -Labels: strimzi.io/cluster=my-cluster -Annotations: API Version: kafka.strimzi.io/v1alpha1 -Kind: KafkaRebalance -Metadata: -# ... -Status: - Conditions: - Last Transition Time: 2022-04-05T14:36:11.900Z - Status: ProposalReady - Type: State - Observed Generation: 1 - Optimization Result: - Data To Move MB: 0 - Excluded Brokers For Leadership: - Excluded Brokers For Replica Move: - Excluded Topics: - Intra Broker Data To Move MB: 12 - Monitored Partitions Percentage: 100 - Num Intra Broker Replica Movements: 0 - Num Leader Movements: 24 - Num Replica Movements: 55 - On Demand Balancedness Score After: 82.91290759174306 - On Demand Balancedness Score Before: 78.01176356230222 - Recent Windows: 5 - Session Id: a4f833bd-2055-4213-bfdd-ad21f95bf184 ----- - -The proposal will also move 24 partition leaders to different brokers. -This requires a change to the ZooKeeper configuration, which has a low impact on performance. - -The balancedness scores are measurements of the overall balance of the Kafka cluster before and after the optimization proposal is approved. -A balancedness score is based on optimization goals. -If all goals are satisfied, the score is 100. -The score is reduced for each goal that will not be met. -Compare the balancedness scores to see whether the Kafka cluster is less balanced than it could be following a rebalance. - -== Automatically approving an optimization proposal - -To save time, you can automate the process of approving optimization proposals. -With automation, when you generate an optimization proposal it goes straight into a cluster rebalance. - -To enable the optimization proposal auto-approval mechanism, create the `KafkaRebalance` resource with the `strimzi.io/rebalance-auto-approval` annotation set to `true`. -If the annotation is not set or set to `false`, the optimization proposal requires manual approval. - -.Example rebalance request with auto-approval mechanism enabled -[source,yaml,subs="+attributes"] ----- -apiVersion: {KafkaRebalanceApiVersion} -kind: KafkaRebalance -metadata: - name: my-rebalance - labels: - strimzi.io/cluster: my-cluster - annotations: - strimzi.io/rebalance-auto-approval: "true" -spec: - mode: # any mode - # ... ----- - -You can still check the status when automatically approving an optimization proposal. -The status of the `KafkaRebalance` resource moves to `Ready` when the rebalance is complete. - -== Optimization proposal summary properties - -The following table explains the properties contained in the optimization proposal's summary section. - -.Properties contained in an optimization proposal summary -[cols="35,65",options="header",stripes="none",separator=¦] -|=== - -m¦JSON property -¦Description - -m¦numIntraBrokerReplicaMovements -¦The total number of partition replicas that will be transferred between the disks of the cluster's brokers. - -*Performance impact during rebalance operation*: Relatively high, but lower than `numReplicaMovements`. - -m¦excludedBrokersForLeadership -¦Not yet supported. An empty list is returned. - -m¦numReplicaMovements -¦The number of partition replicas that will be moved between separate brokers. - -*Performance impact during rebalance operation*: Relatively high. - -m¦onDemandBalancednessScoreBefore, onDemandBalancednessScoreAfter -¦A measurement of the overall _balancedness_ of a Kafka Cluster, before and after the optimization proposal was generated. - -The score is calculated by subtracting the sum of the `BalancednessScore` of each violated soft goal from 100. Cruise Control assigns a `BalancednessScore` to every optimization goal based on several factors, including priority--the goal's position in the list of `default.goals` or user-provided goals. - -The `Before` score is based on the current configuration of the Kafka cluster. -The `After` score is based on the generated optimization proposal. - -m¦intraBrokerDataToMoveMB -¦The sum of the size of each partition replica that will be moved between disks on the same broker (see also `numIntraBrokerReplicaMovements`). - -*Performance impact during rebalance operation*: Variable. The larger the number, the longer the cluster rebalance will take to complete. Moving a large amount of data between disks on the same broker has less impact than between separate brokers (see `dataToMoveMB`). - -m¦recentWindows -¦The number of metrics windows upon which the optimization proposal is based. - -m¦dataToMoveMB -¦The sum of the size of each partition replica that will be moved to a separate broker (see also `numReplicaMovements`). - -*Performance impact during rebalance operation*: Variable. The larger the number, the longer the cluster rebalance will take to complete. - -m¦monitoredPartitionsPercentage -¦The percentage of partitions in the Kafka cluster covered by the optimization proposal. Affected by the number of `excludedTopics`. - -m¦excludedTopics -¦If you specified a regular expression in the `spec.excludedTopicsRegex` property in the `KafkaRebalance` resource, all topic names matching that expression are listed here. -These topics are excluded from the calculation of partition replica/leader movements in the optimization proposal. - -m¦numLeaderMovements -¦The number of partitions whose leaders will be switched to different replicas. This involves a change to ZooKeeper configuration. - -*Performance impact during rebalance operation*: Relatively low. - -m¦excludedBrokersForReplicaMove -¦Not yet supported. An empty list is returned. - -|=== - -== Broker load properties - -The broker load is stored in a ConfigMap (with the same name as the KafkaRebalance custom resource) as a JSON formatted string. This JSON string consists of a JSON object with keys for each broker IDs linking to a number of metrics for each broker. -Each metric consist of three values. -The first is the metric value before the optimization proposal is applied, the second is the expected value of the metric after the proposal is applied, and the third is the difference between the first two values (after minus before). - -NOTE: The ConfigMap appears when the KafkaRebalance resource is in the `ProposalReady` state and remains after the rebalance is complete. - -You can use the name of the ConfigMap to view its data from the command line. - -.Returning ConfigMap data -[source,shell,subs=+quotes] ----- -kubectl describe configmaps __ -n __ ----- - -You can also use the `jq` {JQTool} to extract the JSON string from the ConfigMap. - -.Extracting the JSON string from the ConfigMap using jq -[source,shell,subs=+quotes] ----- -kubectl get configmaps __ -o json | jq '.["data"]["brokerLoad.json"]|fromjson|.' ----- - -The following table explains the properties contained in the optimization proposal's broker load ConfigMap: - -[cols="35,65",options="header",stripes="none"] -|====================================================================================================== - -| JSON property | Description - -m| leaders | The number of replicas on this broker that are partition leaders. - -m| replicas | The number of replicas on this broker. - -m| cpuPercentage | The CPU utilization as a percentage of the defined capacity. - -m| diskUsedPercentage | The disk utilization as a percentage of the defined capacity. - -m| diskUsedMB | The absolute disk usage in MB. - -m| networkOutRate | The total network output rate for the broker. - -m| leaderNetworkInRate | The network input rate for all partition leader replicas on this broker. - -m| followerNetworkInRate | The network input rate for all follower replicas on this broker. - -m| potentialMaxNetworkOutRate | The hypothetical maximum network output rate that would be realized if this broker became the leader of all the replicas it currently hosts. - -|====================================================================================================== - -== Cached optimization proposal - -Cruise Control maintains a _cached optimization proposal_ based on the configured default optimization goals. -Generated from the workload model, the cached optimization proposal is updated every 15 minutes to reflect the current state of the Kafka cluster. -If you generate an optimization proposal using the default optimization goals, Cruise Control returns the most recent cached proposal. - -To change the cached optimization proposal refresh interval, edit the `proposal.expiration.ms` setting in the Cruise Control deployment configuration. -Consider a shorter interval for fast changing clusters, although this increases the load on the Cruise Control server. - -[role="_additional-resources"] -.Additional resources - -* xref:con-optimization-goals-{context}[] -* xref:proc-generating-optimization-proposals-{context}[] -* xref:proc-approving-optimization-proposal-{context}[] diff --git a/documentation/modules/cruise-control/con-rebalance-performance.adoc b/documentation/modules/cruise-control/con-rebalance-performance.adoc index 2e3e4c2e005..51d6b0e83a0 100644 --- a/documentation/modules/cruise-control/con-rebalance-performance.adoc +++ b/documentation/modules/cruise-control/con-rebalance-performance.adoc @@ -3,40 +3,24 @@ // assembly-cruise-control-concepts.adoc [id='con-rebalance-{context}'] += Tuning options for rebalances -= Rebalance performance tuning overview +Configuration options allow you to fine-tune cluster rebalance performance. +These settings control the movement of partition replicas and leadership, as well as the bandwidth allocated for rebalances. -You can adjust several performance tuning options for cluster rebalances. -These options control how partition replicas and leadership movements in a rebalance are executed, as well as the bandwidth that is allocated to a rebalance operation. - -== Partition reassignment commands - -xref:con-optimization-proposals-{context}[Optimization proposals] are comprised of separate partition reassignment commands. -When you xref:proc-approving-optimization-proposal-{context}[approve] a proposal, the Cruise Control server applies these commands to the Kafka cluster. - -A partition reassignment command consists of either of the following types of operations: - -* Partition movement: Involves transferring the partition replica and its data to a new location. Partition movements can take one of two forms: - ** Inter-broker movement: The partition replica is moved to a log directory on a different broker. - ** Intra-broker movement: The partition replica is moved to a different log directory on the same broker. - -* Leadership movement: This involves switching the leader of the partition's replicas. - -Cruise Control issues partition reassignment commands to the Kafka cluster in batches. -The performance of the cluster during the rebalance is affected by the number of each type of movement contained in each batch. - -== Replica movement strategies +== Selecting replica movement strategies Cluster rebalance performance is also influenced by the _replica movement strategy_ that is applied to the batches of partition reassignment commands. -By default, Cruise Control uses the `BaseReplicaMovementStrategy`, which simply applies the commands in the order they were generated. -However, if there are some very large partition reassignments early in the proposal, this strategy can slow down the application of the other reassignments. +By default, Cruise Control uses the `BaseReplicaMovementStrategy`, which applies the commands in the order they were generated. +However, if large partition reassignments are handled early, this strategy may delay other reassignments. Cruise Control provides four alternative replica movement strategies that can be applied to optimization proposals: -* `PrioritizeSmallReplicaMovementStrategy`: Order reassignments in order of ascending size. -* `PrioritizeLargeReplicaMovementStrategy`: Order reassignments in order of descending size. -* `PostponeUrpReplicaMovementStrategy`: Prioritize reassignments for replicas of partitions which have no out-of-sync replicas. -* `PrioritizeMinIsrWithOfflineReplicasStrategy`: Prioritize reassignments with (At/Under)MinISR partitions with offline replicas. This strategy will only work if `cruiseControl.config.concurrency.adjuster.min.isr.check.enabled` is set to `true` in the `Kafka` custom resource's spec. +* `PrioritizeSmallReplicaMovementStrategy`: Reassign smaller partitions first. +* `PrioritizeLargeReplicaMovementStrategy`: Reassign larger partitions first. +* `PostponeUrpReplicaMovementStrategy`: Prioritize partitions without out-of-sync replicas. +* `PrioritizeMinIsrWithOfflineReplicasStrategy`: Prioritize reassignments for partitions at or below their minimum in-sync replicas (MinISR) with offline replicas. + +Set `cruiseControl.config.concurrency.adjuster.min.isr.check.enabled` to `true` in the `Kafka` resource to enable this strategy. These strategies can be configured as a sequence. The first strategy attempts to compare two partition reassignments using its internal logic. @@ -44,26 +28,23 @@ If the reassignments are equivalent, then it passes them to the next strategy in == Intra-broker disk balancing -Moving a large amount of data between disks on the same broker has less impact than between separate brokers. -If you are running a Kafka deployment that uses JBOD storage with multiple disks on the same broker, Cruise Control can balance partitions between the disks. +Intra-broker balancing shifts data between disks on the same broker, useful for deployments with JBOD storage and multiple disks. +This type of balancing incurs less network overhead than inter-broker balancing. -NOTE: If you are using JBOD storage with a single disk, intra-broker disk balancing will result in a proposal with 0 partition movements since there are no disks to balance between. +NOTE: If you are using JBOD storage with a single disk, intra-broker disk balancing will result in a proposal with 0 partition movements since there are no disks to balance. -To perform an intra-broker disk balance, set `rebalanceDisk` to `true` under the `KafkaRebalance.spec`. -When setting `rebalanceDisk` to `true`, do not set a `goals` field in the `KafkaRebalance.spec`, as Cruise Control will automatically set the intra-broker goals and ignore the inter-broker goals. +To enable intra-broker balancing, set `rebalanceDisk` to `true` in `KafkaRebalance.spec`. +When this is enabled, do not specify a `goals` field, as Cruise Control will automatically configure intra-broker goals and disregard inter-broker goals. Cruise Control does not perform inter-broker and intra-broker balancing at the same time. -== Rebalance tuning options - -Cruise Control provides several configuration options for tuning the rebalance parameters discussed above. -You can set these tuning options when xref:proc-configuring-deploying-cruise-control-{context}[configuring and deploying Cruise Control with Kafka] or xref:proc-generating-optimization-proposals-{context}[optimization proposal] levels: +== Rebalance tuning -* The Cruise Control server setting can be set in the Kafka custom resource under `Kafka.spec.cruiseControl.config`. -* The individual rebalance performance configurations can be set under `KafkaRebalance.spec`. +You can set the following rebalance tuning options when configuring Cruise Control or individual rebalances: -The relevant configurations are summarized in the following table. +* Set Cruise Control server configurations in `Kafka.spec.cruiseControl.config` in the `Kafka` resource. +* Set individual rebalances in `KafkaRebalance.spec` in the `KafkaRebalance` resource. -.Rebalance performance tuning configuration +.Rebalance configuration tuning properties [cols="4m,4m,1,2",options="header"] |=== | Cruise Control properties diff --git a/documentation/modules/cruise-control/proc-approving-optimization-proposal.adoc b/documentation/modules/cruise-control/proc-approving-optimization-proposal.adoc index a044db1c62e..e5ab93c8de6 100644 --- a/documentation/modules/cruise-control/proc-approving-optimization-proposal.adoc +++ b/documentation/modules/cruise-control/proc-approving-optimization-proposal.adoc @@ -4,7 +4,7 @@ [id='proc-approving-optimization-proposal-{context}'] -= Approving an optimization proposal += Approving optimization proposals You can approve an xref:con-optimization-proposals-{context}[optimization proposal] generated by Cruise Control, if its status is `ProposalReady`. Cruise Control will then apply the optimization proposal to the Kafka cluster, reassigning partitions to brokers and changing partition leadership. diff --git a/documentation/modules/cruise-control/proc-configuring-deploying-cruise-control.adoc b/documentation/modules/cruise-control/proc-configuring-deploying-cruise-control.adoc index 4bb41eb504b..4b14ae41e59 100644 --- a/documentation/modules/cruise-control/proc-configuring-deploying-cruise-control.adoc +++ b/documentation/modules/cruise-control/proc-configuring-deploying-cruise-control.adoc @@ -3,7 +3,7 @@ // assembly-cruise-control-concepts.adoc [id='proc-configuring-deploying-cruise-control-{context}'] -= Configuring and deploying Cruise Control with Kafka += Deploying Cruise Control with Kafka [role="_abstract"] Configure a `Kafka` resource to deploy Cruise Control alongside a Kafka cluster. @@ -137,12 +137,12 @@ my-cluster-cruise-control 1/1 1 1 `READY` shows the number of replicas that are ready/expected. The deployment is successful when the `AVAILABLE` output shows `1`. -[discrete] -== Auto-created topics +[id='proc-cruise-control-auto-created-topics-{context}'] +== Auto-created Cruise Control topics The following table shows the three topics that are automatically created when Cruise Control is deployed. These topics are required for Cruise Control to work properly and must not be deleted or changed. You can change the name of the topic using the specified configuration option. -.Auto-created topics +.Topics created when Cruise Control is deployed [cols="1m,1m,1,3",options="header",stripes="none",separator=¦] |=== @@ -174,7 +174,3 @@ NOTE: If the names of the auto-created topics are changed in a Kafka cluster tha .What to do next After configuring and deploying Cruise Control, you can xref:proc-generating-optimization-proposals-{context}[generate optimization proposals]. - -[role="_additional-resources"] -.Additional resources -* xref:con-optimization-goals-{context}[Optimization goals overview] diff --git a/documentation/modules/cruise-control/proc-fixing-problems-with-kafkarebalance.adoc b/documentation/modules/cruise-control/proc-fixing-problems-with-kafkarebalance.adoc index 4d677c98ae2..668a32e37de 100644 --- a/documentation/modules/cruise-control/proc-fixing-problems-with-kafkarebalance.adoc +++ b/documentation/modules/cruise-control/proc-fixing-problems-with-kafkarebalance.adoc @@ -3,21 +3,22 @@ // assembly-cruise-control-concepts.adoc [id='proc-fixing-problems-with-kafkarebalance-{context}'] += Troubleshooting and refreshing rebalances -= Fixing problems with a `KafkaRebalance` resource +[role="_abstract"] +When creating a `KafkaRebalance` resource or interacting with Cruise Control, errors are reported in the resource status, along with guidance on how to fix them. +In such cases, the resource transitions to the `NotReady` state. -If an issue occurs when creating a `KafkaRebalance` resource or interacting with Cruise Control, the error is reported in the resource status, along with details of how to fix it. -The resource also moves to the `NotReady` state. +To continue with a cluster rebalance operation, you must rectify any configuration issues in the `KafkaRebalance` resource or address any problems with the Cruise Control deployment. -To continue with the cluster rebalance operation, you must fix the problem in the `KafkaRebalance` resource itself or with the overall Cruise Control deployment. -Problems might include the following: +Common issues include the following: -* A misconfigured parameter in the `KafkaRebalance` resource. +* Misconfigured parameters in the `KafkaRebalance` resource. * The `strimzi.io/cluster` label for specifying the Kafka cluster in the `KafkaRebalance` resource is missing. * The Cruise Control server is not deployed as the `cruiseControl` property in the `Kafka` resource is missing. * The Cruise Control server is not reachable. -After fixing the issue, you need to add the `refresh` annotation to the `KafkaRebalance` resource. +After fixing any issues, you need to add the `refresh` annotation to the `KafkaRebalance` resource. During a “refresh”, a new optimization proposal is requested from the Cruise Control server. .Prerequisites @@ -52,8 +53,3 @@ kubectl describe kafkarebalance _rebalance-cr-name_ ---- . Wait until the status changes to `PendingProposal`, or directly to `ProposalReady`. - -[role="_additional-resources"] -.Additional resources - -* xref:con-optimization-proposals-{context}[] diff --git a/documentation/modules/cruise-control/proc-generating-optimization-proposals.adoc b/documentation/modules/cruise-control/proc-generating-optimization-proposals.adoc index 98abb4af094..1b7f3f22480 100644 --- a/documentation/modules/cruise-control/proc-generating-optimization-proposals.adoc +++ b/documentation/modules/cruise-control/proc-generating-optimization-proposals.adoc @@ -6,7 +6,7 @@ = Generating optimization proposals [role="_abstract"] -When you create or update a `KafkaRebalance` resource, Cruise Control generates an xref:con-optimization-proposals-{context}[optimization proposal] for the Kafka cluster based on the configured xref:con-optimization-goals-{context}[optimization goals]. +When you create or update a `KafkaRebalance` resource, Cruise Control generates an optimization proposal for the Kafka cluster based on a set of optimization goals. Analyze the information in the optimization proposal and decide whether to approve it. You can use the results of the optimization proposal to rebalance your Kafka cluster. diff --git a/documentation/modules/cruise-control/proc-stopping-cluster-rebalance.adoc b/documentation/modules/cruise-control/proc-stopping-cluster-rebalance.adoc index 43603d4ac0a..14298a6a1f9 100644 --- a/documentation/modules/cruise-control/proc-stopping-cluster-rebalance.adoc +++ b/documentation/modules/cruise-control/proc-stopping-cluster-rebalance.adoc @@ -4,7 +4,7 @@ [id='proc-stopping-cluster-rebalance-{context}'] -= Stopping a cluster rebalance += Stopping rebalances Once started, a cluster rebalance operation might take some time to complete and affect the overall performance of the Kafka cluster. @@ -17,8 +17,6 @@ NOTE: The performance of the Kafka cluster in the intermediate (stopped) state m .Prerequisites -* You have xref:proc-approving-optimization-proposal-{context}[approved the optimization proposal] by annotating the `KafkaRebalance` custom resource with `approve`. - * The status of the `KafkaRebalance` custom resource is `Rebalancing`. .Procedure diff --git a/documentation/modules/managing/proc-cluster-recovery-volume-zk.adoc b/documentation/modules/managing/proc-cluster-recovery-volume-zk.adoc index 8f32c351fce..e26c18205c7 100644 --- a/documentation/modules/managing/proc-cluster-recovery-volume-zk.adoc +++ b/documentation/modules/managing/proc-cluster-recovery-volume-zk.adoc @@ -19,7 +19,7 @@ WARNING: If the User Operator is enabled and Kafka users are not recreated, user In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption. A `volumeName` is specified for the PVC and this must match the name of the PV. -For more information, see xref:ref-persistent-storage-{context}[Persistent storage]. +For more information, see xref:assembly-storage-{context}[]. .Procedure diff --git a/documentation/shared/images/kafka-concepts-cruise-control.png b/documentation/shared/images/kafka-concepts-cruise-control.png new file mode 100644 index 00000000000..deb17050235 Binary files /dev/null and b/documentation/shared/images/kafka-concepts-cruise-control.png differ