Support backup precision option for WC #2978

l-bat · 2024-09-20T09:35:34Z

Changes

Add functionality to determine the backup precision to be used for layers that are not quantized to the primary precision, which is set to INT8_ASYM by default.

Example: Compress weights to INT4_ASYM channel-wise (group size=-1), except embeddings, convolutions and last linear layers - they are remain in original floating-point precision.

from nncf import compress_weights, BackupMode, CompressWeightsMode
compressed_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group size=-1, backup_mode=BackupMode.NONE) # model is openvino.Model object

statistics:

For mode=CompressWeightsMode.INT4_ASYM, backup_mode=BackupMode.INT8_ASYM, and a non-empty ignored_scope, the statistics string contains three different precisions:

Reason for changes

To define backup mode for compress_weight

Related tickets

152056

Tests

test_data_free_compression_with_backup_mode
test_data_based_compression_with_backup_mode
tinyllama_awq_backup_mode_none

l-bat · 2024-09-20T09:41:08Z

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

Add fp16/fp32 to Num bits
Change Num bits to Data type

cc @AlexKoff88, @alexsu52, @ljaljushkin

AlexKoff88 · 2024-09-20T10:17:25Z

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

Add fp16/fp32 to Num bits

Change Num bits to Data type

cc @AlexKoff88, @alexsu52, @ljaljushkin

Data type looks better to me.

l-bat · 2024-09-20T12:30:26Z

I also extended Statistics of the bitwidth distribution with nodes from ignored scope
Output of tests/openvino/native/quantization/test_weights_compression.py::test_awq_with_ignored_scope

alexsu52 · 2024-09-23T06:12:07Z

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

Add fp16/fp32 to Num bits

Change Num bits to Data type

cc @AlexKoff88, @alexsu52, @ljaljushkin

Data type looks better to me.

IMHO: the postfixes "asym", "sym" do not look consistent for the column name "data type". I would expect to see int8, int4, uint4, fp8_*, fp16, fp32 for data type column. I would note that we can understand scheme of quantization by data type as well. If you want to stay int4_asym, int4_sym, int8_sym and etc I would suggest to name the column as "weight compression mode".

alexsu52 · 2024-09-23T06:20:48Z

nncf/quantization/quantize_model.py

@@ -394,6 +395,7 @@ def compress_weights(
    scale_estimation: Optional[bool] = None,
    gptq: Optional[bool] = None,
    lora_correction: Optional[bool] = None,
+    backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM,


Suggested change

backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM,

backup_mode: Optional[CompressWeightsMode] = CompressWeightsMode.INT8_ASYM,

I would like to propose this change. It is more general and consistent with what compression mode will be used for backup operations.
cc' @AlexKoff88, @MaximProshin

In this case we should add CompressWeightsMode.FP, which doesn't make sense

If CompressWeightsMode.FP means that NNCF does not compress backup weights then None can be used.

@l-bat what do you think about renaming backup_precision -> backup_mode and introduce BackupMode or BackupCompressWeightsModeclass:

class BackupMode(StrEnum): NONE = "none" INT8_SYM = "int8_sym" INT8_ASYM = "int8_asym"

cc' @AlexKoff88

docs/usage/post_training_compression/weights_compression/Usage.md

nncf/parameters.py

nncf/quantization/algorithms/weight_compression/awq.py

nncf/quantization/quantize_model.py

andreyanufr · 2024-09-23T16:50:30Z

nncf/quantization/algorithms/weight_compression/algorithm.py

                    )

-                if node.node_name in ignored_names or self._backup_precision == BackupPrecision.FP:
+                if node.node_name in ignored_names or self._backup_mode == BackupMode.NONE:


This code leads to the different number of nodes in nodes_to_compress and all_weight_params. This leads to an error in the AWQ algorithm. This bug can also be reproduced if we add Embeddings to the ignored_scope.

Aligned all_weight_params and nodes_to_compress below

ljaljushkin · 2024-09-24T09:49:51Z

nncf/quantization/algorithms/weight_compression/algorithm.py

@@ -310,6 +313,7 @@ def apply(
        dataset: Optional[Dataset] = None,
    ) -> TModel:
        self._set_backend_entity(model)
+        # nodes_to_compress includes nodes from the ignored scope to be added to bitwidth_distribution_str
        nodes_to_compress = self._get_nodes_to_compress(graph)


I'd rename it to avoid confusion with actual list of nodes to compress

Suggested change

nodes_to_compress = self._get_nodes_to_compress(graph)

candidates_to_compress = self._get_nodes_to_compress(graph)

ljaljushkin · 2024-09-24T09:51:00Z

nncf/quantization/algorithms/weight_compression/algorithm.py

+        # Filter the weight parameters that should remain in their original floating-point precision
+        all_weight_params = [w_params for w_params in all_weight_params if w_params.compression_config is not None]
+        # Remove nodes in the ignored scope from nodes_to_compress
+        nodes_to_compress = [node for node in nodes_to_compress if node.node_name not in ignored_names]


Then it makes sense to collect statistics for this limited list of nodes:

if dataset is not None and self._sensitivity_metric != SensitivityMetric.WEIGHT_QUANTIZATION_ERROR: activations = self._get_activations(dataset, self._subset_size, nodes_to_compress, graph, model)

To be more consistent, I'd extract the list of nodes to compress from all_weight_params. It would guarantee the same content in both lists.

tests/openvino/native/quantization/test_weights_compression.py

ljaljushkin · 2024-09-27T13:47:15Z

tests/post_training/data/wc_reference_data.yaml

+  num_int8: 290
+tinyllama_awq_backup_mode_none_backend_OV:
+  metric_value: 0.85679


@alexsu52 @l-bat is it what we get on developer hosts: CLX (Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz)?

alexsu52 · 2024-09-30T09:48:47Z

nncf/quantization/quantize_model.py

@@ -394,6 +395,7 @@ def compress_weights(
    scale_estimation: Optional[bool] = None,
    gptq: Optional[bool] = None,
    lora_correction: Optional[bool] = None,
+    backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM,


Optional[BackupMode] is the same as Union[BackupMode], None]. What is the behavior if the user passes None to backup_mode?

Perhaps the same behavior should be used here as for ratio.

I aligned behavior with subset_size: Optional[int] = 128.

if backup_mode is None: raise ValueError("Invalid backup_mode specified. Please choose from the available BackupMode options.")

If you still throw an error, then there is no point in indicating in the type hints that such a value can be used. The same for subset.

Suggested change

backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM,

backup_mode: BackupMode = BackupMode.INT8_ASYM,

alexsu52 · 2024-09-30T10:37:12Z

nncf/quantization/quantize_model.py

                "Default values of `ratio` (1) and `group_size` (-1) parameters can not be overridden"
            )
+
+        if backup_mode != BackupMode.INT8_ASYM:


As far as I understand, if the user calls the following: compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_ASYM), then no error will occur, am I right?

but error will occur if user calls compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_SYM)

Why we have this restriction?

An error occurs for all backup_mode values except for the default value (BackupMode.INT8_ASYM). This indicates that the other options are not supported for 8-bit compression.

Does the default value (BackupMode.INT8_ASYM) support for 8 bit compression? If not, you should raise error for compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_ASYM) as well.

…bution_str

l-bat requested a review from a team as a code owner September 20, 2024 09:35

github-actions bot added documentation Improvements or additions to documentation NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Sep 20, 2024

alexsu52 reviewed Sep 23, 2024

View reviewed changes

l-bat requested review from ljaljushkin and andreyanufr September 23, 2024 07:21

ljaljushkin requested changes Sep 23, 2024

View reviewed changes

andreyanufr reviewed Sep 23, 2024

View reviewed changes

ljaljushkin requested changes Sep 24, 2024

View reviewed changes

l-bat force-pushed the lt/wc_backup_precision branch from 12e3129 to c038ae2 Compare September 25, 2024 08:25

ljaljushkin mentioned this pull request Sep 25, 2024

[TorchFX] INT8 Weights Compression Support #2891

Merged

l-bat force-pushed the lt/wc_backup_precision branch from b19c9c2 to c1fd87a Compare September 25, 2024 16:13

l-bat requested review from andreyanufr, ljaljushkin and alexsu52 September 26, 2024 14:32

ljaljushkin requested changes Sep 27, 2024

View reviewed changes

tests/openvino/native/quantization/test_weights_compression.py Show resolved Hide resolved

l-bat force-pushed the lt/wc_backup_precision branch from c1fd87a to 6b83591 Compare September 27, 2024 13:20

ljaljushkin reviewed Sep 27, 2024

View reviewed changes

ljaljushkin approved these changes Sep 27, 2024

View reviewed changes

l-bat force-pushed the lt/wc_backup_precision branch 2 times, most recently from c23fe73 to bd407a8 Compare September 27, 2024 14:35

alexsu52 reviewed Sep 30, 2024

View reviewed changes

l-bat force-pushed the lt/wc_backup_precision branch from bd407a8 to bf2a82d Compare September 30, 2024 13:54

github-actions bot added the experimental label Sep 30, 2024

l-bat requested a review from alexsu52 October 2, 2024 07:35

andreyanufr approved these changes Oct 4, 2024

View reviewed changes

l-bat force-pushed the lt/wc_backup_precision branch 2 times, most recently from 8e2b872 to 9f70ac5 Compare October 7, 2024 11:18

l-bat added 13 commits October 7, 2024 14:24

Support backup precision option for WC

ae3726c

Add ignored nodes to _get_bitwidth_distribution_str

48940da

apply comments

02c27ba

Align nodes_to_compress and all_weight_params

33d76f4

Add tinyllama_awq_backup_mode_none_backend_OV test

ece5139

Add backup mode clarification

7826bf4

Move ignored_scope_weight_statistics directly to _get_bitwidth_distri…

7d4ad66

…bution_str

Add test_raise_error_with_unsupported_params_for_int8 test

6689b52

Move is_float method

38fc20b

raise error if backup_mode is None

8e5450e

update tinyllama_awq_backup_mode_none_backend_OV metric

5e07ead

update backup_mode arg

dc4525d

Set backup_mode to None for 8 bit WC

9f70ac5

alexsu52 approved these changes Oct 7, 2024

View reviewed changes

alexsu52 merged commit 174bd03 into openvinotoolkit:develop Oct 7, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support backup precision option for WC #2978

Support backup precision option for WC #2978

l-bat commented Sep 20, 2024 •

edited

Loading

l-bat commented Sep 20, 2024

AlexKoff88 commented Sep 20, 2024

l-bat commented Sep 20, 2024

alexsu52 commented Sep 23, 2024

alexsu52 Sep 23, 2024

l-bat Sep 23, 2024

alexsu52 Sep 23, 2024

andreyanufr Sep 23, 2024

l-bat Sep 24, 2024

ljaljushkin Sep 24, 2024

ljaljushkin Sep 24, 2024

ljaljushkin Sep 24, 2024

ljaljushkin Sep 27, 2024

alexsu52 Sep 30, 2024

l-bat Sep 30, 2024

alexsu52 Sep 30, 2024 •

edited

Loading

alexsu52 Sep 30, 2024 •

edited

Loading

andreyanufr Oct 2, 2024

l-bat Oct 2, 2024

alexsu52 Oct 7, 2024

	backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM,
	backup_mode: Optional[CompressWeightsMode] = CompressWeightsMode.INT8_ASYM,

	nodes_to_compress = self._get_nodes_to_compress(graph)
	candidates_to_compress = self._get_nodes_to_compress(graph)

	backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM,
	backup_mode: BackupMode = BackupMode.INT8_ASYM,

Support backup precision option for WC #2978

Support backup precision option for WC #2978

Conversation

l-bat commented Sep 20, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

l-bat commented Sep 20, 2024

AlexKoff88 commented Sep 20, 2024

l-bat commented Sep 20, 2024

alexsu52 commented Sep 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsu52 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

alexsu52 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-bat commented Sep 20, 2024 •

edited

Loading

alexsu52 Sep 30, 2024 •

edited

Loading

alexsu52 Sep 30, 2024 •

edited

Loading