Int8 quantization for microcontroller #736

aqibsaeed · 2022-05-04T19:47:01Z

Hi,

Is there a way to get a int8 quantized model that can eventually run on a microcontroller? I am converting a binary densenet keras model (https://docs.larq.dev/zoo/api/literature/#binarydensenet28) as follows but it does not result in a quantized model. Am I missing something here?

tflite_model = lce.convert_keras_model(kera_model, 
    inference_input_type=tf.int8, 
    inference_output_type=tf.int8,
    experimental_default_int8_range=(-3,3))

Thanks in advance.

The text was updated successfully, but these errors were encountered:

aqibsaeed · 2022-05-06T18:16:29Z

Any pointers on how to solve this?

CNugteren · 2022-05-09T07:12:50Z

How did you conclude that it does not result in a quantized model? I got a INT8/BNN model by running the code snippet below:

from pathlib import Path
import larq_zoo
import larq_compute_engine as lce
import tensorflow as tf

keras_model = larq_zoo.literature.BinaryDenseNet28(
    input_shape=None,
    input_tensor=None,
    weights="imagenet",
    include_top=True,
    num_classes=1000
)

tflite_model = lce.convert_keras_model(keras_model,
    inference_input_type=tf.int8, 
    inference_output_type=tf.int8,
    experimental_default_int8_range=(-3,3)
)

Path("model.tflite").write_bytes(tflite_model)

I inspected the resulting .tflite file in Netron.

aqibsaeed · 2022-05-09T08:44:54Z

I tried the following:

model_a = lqz.literature.BinaryDenseNet28(
    input_shape=(32,32,3),
    weights=None,
    include_top=True,
    num_classes=10
)

with lq.context.quantized_scope(True):
  weights = model_a.get_weights()
  model_a.set_weights(weights)


tflite_model = lqce.convert_keras_model(model_a,
    inference_input_type=tf.int8, 
    inference_output_type=tf.int8,
    experimental_default_int8_range=(-3,3)
)

Path("model.tflite").write_bytes(tflite_model)

Now if I submit it to https://plumerai.com/benchmark: I get the following response back: Your model contains layers that aren't INT8 quantized but instead use floating-point values. This is not suitable for microcontrollers. You can find information on how to quantize your model in the TensorFlow documentation.

aqibsaeed · 2022-05-09T08:55:59Z

I do not get the above mentioned error when I use your snippet. Maybe I am missing something here experimental_default_int8_range.

CNugteren · 2022-05-09T09:03:04Z

Indeed, if I run your snippet I get a model that has a float layer somewhere in the middle. You can see this in Netron:

@Tombana , do you perhaps have any idea why this might happen?

aqibsaeed · 2022-05-09T09:16:29Z

There's also a difference at the start

Custom CIFAR-10 Model

IN-1K Model

Tombana · 2022-05-09T10:59:29Z

I'm not sure, this sounds like a bug in the converter, it might be in the tensorflow converter itself. From what I see in the two code snippets, the difference is only in

with lq.context.quantized_scope(True):
  weights = model_a.get_weights()
  model_a.set_weights(weights)

That shouldn't affect the outcome of the converter though.

Which version of larq-compute-engine was used for this?

aqibsaeed · 2022-05-09T11:02:38Z

I tested converting removing this

with lq.context.quantized_scope(True):
  weights = model_a.get_weights()
  model_a.set_weights(weights)

but it does not work.

flatbuffers-1.12 larq-compute-engine-0.7.0

aqibsaeed · 2022-05-09T11:26:31Z

I think the problem occurs if we specify input_shape or input_tensor when creating model object lqz.literature.BinaryDenseNet28.

CNugteren · 2022-05-11T14:18:01Z

I managed to reproduce the issue with a minimal example without the zoo, without binary layers, with just 3 layers:

from pathlib import Path
import larq_compute_engine as lce
import tensorflow as tf

image_input = tf.keras.layers.Input(shape=(32, 32, 3))
x = image_input
x = tf.keras.layers.Conv2D(64, kernel_size=3)(x)
x = tf.keras.layers.MaxPool2D(3)(x)
x = tf.keras.layers.BatchNormalization()(x)
keras_model = tf.keras.Model(inputs=image_input, outputs=x)

tflite = lce.convert_keras_model(
    keras_model,
    inference_input_type=tf.int8,
    inference_output_type=tf.int8,
    experimental_default_int8_range=(-3, 3),
)
Path("model.tflite").write_bytes(tflite)

With larq-compute-engine==0.7.0 tensorflow==2.8.0 keras==2.8.0 this gives:

If I remove any of the three layers (Conv2D, MaxPool2D, or BatchNormalization) then it does produce the proper int8 model.

If I downgrade to larq-compute-engine==0.6.2 tensorflow==2.6.1 keras==2.6.0 then it looks better but still the first Conv2D layer is in float:

If I instead of LCE use the standard TFLite post-training quantizer then everything becomes INT8 as expected:

def representative_dataset():
    for _ in range(10):
      data = np.random.rand(1, 32, 32, 3)
      yield [data.astype(np.float32)]

converter = tf.lite.TFLiteConverter.from_saved_model("test_keras_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite = converter.convert()
Path("model.tflite").write_bytes(tflite)

So in conclusion this seems to be an LCE issue, and we will investigate further.

aqibsaeed · 2022-05-11T14:21:57Z

Interesting. Thank you for looking into it.

CNugteren · 2022-05-11T15:27:27Z

My colleague @lgeiger pointed me to a similar LCE issue: #421, which in turn points to an unresolved TensorFlow issue: tensorflow/tensorflow#40055. I believe your issue might be the same.

Let's look at the model produced by the following code in Netron:

x = tf.keras.layers.Conv2D(64, kernel_size=3)(x)
x = tf.keras.layers.MaxPool2D(3)(x)
x = tf.keras.layers.BatchNormalization()(x)

I see two tensors of the same size and dimensions: one for the bias of the Conv2D layer of size 64, and one for the batch-normalisation mean values of size 64. They both have the same value as well (all zeros), triggering tensorflow/tensorflow#40055.

Until this is solved in TensorFlow (which might take a very long time if ever), the work-around is to make sure this doesn't happen. There are several options. One option is to train the model for one step, since that will already change both tensors to non-zero, and the chance that they are equal is minimal. Or do a full training session or load pre-trained weights. The alternative is to initialize your model such that this doesn't happen, e.g.as follows:

x = tf.keras.layers.Conv2D(64, kernel_size=3, bias_initializer=tf.keras.initializers.Constant(0.1))(x)
x = tf.keras.layers.MaxPool2D(3)(x)
x = tf.keras.layers.BatchNormalization()(x)

So I propose to close this issue if @aqibsaeed agrees, and keep the other LCE issue open to track the bug in TensorFlow.

aqibsaeed · 2022-05-11T15:45:41Z

Got it! I just double checked if I load model weights, conversion works fine. Thanks again for looking into this. Closing this now.

CNugteren added the bug Something isn't working label May 11, 2022

aqibsaeed closed this as completed May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int8 quantization for microcontroller #736

Int8 quantization for microcontroller #736

aqibsaeed commented May 4, 2022

aqibsaeed commented May 6, 2022

CNugteren commented May 9, 2022

aqibsaeed commented May 9, 2022 •

edited by CNugteren

Loading

aqibsaeed commented May 9, 2022

CNugteren commented May 9, 2022 •

edited

Loading

aqibsaeed commented May 9, 2022

Tombana commented May 9, 2022

aqibsaeed commented May 9, 2022

aqibsaeed commented May 9, 2022 •

edited

Loading

CNugteren commented May 11, 2022 •

edited

Loading

aqibsaeed commented May 11, 2022

CNugteren commented May 11, 2022

aqibsaeed commented May 11, 2022

Int8 quantization for microcontroller #736

Int8 quantization for microcontroller #736

Comments

aqibsaeed commented May 4, 2022

aqibsaeed commented May 6, 2022

CNugteren commented May 9, 2022

aqibsaeed commented May 9, 2022 • edited by CNugteren Loading

aqibsaeed commented May 9, 2022

CNugteren commented May 9, 2022 • edited Loading

aqibsaeed commented May 9, 2022

Tombana commented May 9, 2022

aqibsaeed commented May 9, 2022

aqibsaeed commented May 9, 2022 • edited Loading

CNugteren commented May 11, 2022 • edited Loading

aqibsaeed commented May 11, 2022

CNugteren commented May 11, 2022

aqibsaeed commented May 11, 2022

aqibsaeed commented May 9, 2022 •

edited by CNugteren

Loading

CNugteren commented May 9, 2022 •

edited

Loading

aqibsaeed commented May 9, 2022 •

edited

Loading

CNugteren commented May 11, 2022 •

edited

Loading