-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support backup precision option for WC #2978
Support backup precision option for WC #2978
Conversation
With introducing |
Data type looks better to me. |
IMHO: the postfixes "asym", "sym" do not look consistent for the column name "data type". I would expect to see int8, int4, uint4, fp8_*, fp16, fp32 for data type column. I would note that we can understand scheme of quantization by data type as well. If you want to stay int4_asym, int4_sym, int8_sym and etc I would suggest to name the column as "weight compression mode". |
nncf/quantization/quantize_model.py
Outdated
@@ -394,6 +395,7 @@ def compress_weights( | |||
scale_estimation: Optional[bool] = None, | |||
gptq: Optional[bool] = None, | |||
lora_correction: Optional[bool] = None, | |||
backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM, | |
backup_mode: Optional[CompressWeightsMode] = CompressWeightsMode.INT8_ASYM, |
I would like to propose this change. It is more general and consistent with what compression mode will be used for backup operations.
cc' @AlexKoff88, @MaximProshin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we should add CompressWeightsMode.FP
, which doesn't make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CompressWeightsMode.FP
means that NNCF does not compress backup weights then None can be used.
@l-bat what do you think about renaming backup_precision
-> backup_mode
and introduce BackupMode
or BackupCompressWeightsMode
class:
class BackupMode(StrEnum):
NONE = "none"
INT8_SYM = "int8_sym"
INT8_ASYM = "int8_asym"
cc' @AlexKoff88
docs/usage/post_training_compression/weights_compression/Usage.md
Outdated
Show resolved
Hide resolved
) | ||
|
||
if node.node_name in ignored_names or self._backup_precision == BackupPrecision.FP: | ||
if node.node_name in ignored_names or self._backup_mode == BackupMode.NONE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code leads to the different number of nodes in nodes_to_compress and all_weight_params. This leads to an error in the AWQ algorithm. This bug can also be reproduced if we add Embeddings to the ignored_scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligned all_weight_params
and nodes_to_compress
below
@@ -310,6 +313,7 @@ def apply( | |||
dataset: Optional[Dataset] = None, | |||
) -> TModel: | |||
self._set_backend_entity(model) | |||
# nodes_to_compress includes nodes from the ignored scope to be added to bitwidth_distribution_str | |||
nodes_to_compress = self._get_nodes_to_compress(graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename it to avoid confusion with actual list of nodes to compress
nodes_to_compress = self._get_nodes_to_compress(graph) | |
candidates_to_compress = self._get_nodes_to_compress(graph) |
# Filter the weight parameters that should remain in their original floating-point precision | ||
all_weight_params = [w_params for w_params in all_weight_params if w_params.compression_config is not None] | ||
# Remove nodes in the ignored scope from nodes_to_compress | ||
nodes_to_compress = [node for node in nodes_to_compress if node.node_name not in ignored_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it makes sense to collect statistics for this limited list of nodes:
if dataset is not None and self._sensitivity_metric != SensitivityMetric.WEIGHT_QUANTIZATION_ERROR:
activations = self._get_activations(dataset, self._subset_size, nodes_to_compress, graph, model)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more consistent, I'd extract the list of nodes to compress from all_weight_params
. It would guarantee the same content in both lists.
12e3129
to
c038ae2
Compare
b19c9c2
to
c1fd87a
Compare
c1fd87a
to
6b83591
Compare
num_int8: 290 | ||
tinyllama_awq_backup_mode_none_backend_OV: | ||
metric_value: 0.85679 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c23fe73
to
bd407a8
Compare
nncf/quantization/quantize_model.py
Outdated
@@ -394,6 +395,7 @@ def compress_weights( | |||
scale_estimation: Optional[bool] = None, | |||
gptq: Optional[bool] = None, | |||
lora_correction: Optional[bool] = None, | |||
backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[BackupMode]
is the same as Union[BackupMode], None]. What is the behavior if the user passes None to backup_mode
?
Perhaps the same behavior should be used here as for ratio
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aligned behavior with subset_size: Optional[int] = 128
.
if backup_mode is None:
raise ValueError("Invalid backup_mode specified. Please choose from the available BackupMode options.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you still throw an error, then there is no point in indicating in the type hints that such a value can be used. The same for subset.
backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM, | |
backup_mode: BackupMode = BackupMode.INT8_ASYM, |
nncf/quantization/quantize_model.py
Outdated
"Default values of `ratio` (1) and `group_size` (-1) parameters can not be overridden" | ||
) | ||
|
||
if backup_mode != BackupMode.INT8_ASYM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, if the user calls the following: compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_ASYM)
, then no error will occur, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but error will occur if user calls compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_SYM)
Why we have this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error occurs for all backup_mode
values except for the default value (BackupMode.INT8_ASYM
). This indicates that the other options are not supported for 8-bit compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the default value (BackupMode.INT8_ASYM
) support for 8 bit compression? If not, you should raise error for compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_ASYM)
as well.
bd407a8
to
bf2a82d
Compare
8e2b872
to
9f70ac5
Compare
Changes
Add functionality to determine the backup precision to be used for layers that are not quantized to the primary precision, which is set to INT8_ASYM by default.
Example: Compress weights to INT4_ASYM channel-wise (group size=-1), except embeddings, convolutions and last linear layers - they are remain in original floating-point precision.
statistics:
For mode=CompressWeightsMode.INT4_ASYM, backup_mode=BackupMode.INT8_ASYM, and a non-empty ignored_scope, the statistics string contains three different precisions:
Reason for changes
To define backup mode for compress_weight
Related tickets
Tests