Skip to content

Commit

Permalink
Add StableDiffusion3 (#1820)
Browse files Browse the repository at this point in the history
* Add VGG16 backbone (#1737)

* Agg Vgg16 backbone

* update names

* update tests

* update test

* add image classifier

* incorporate review comments

* Update test case

* update backbone test

* add image classifier

* classifier cleanup

* code reformat

* add vgg16 image classifier

* make vgg generic

* update doc string

* update docstring

* add classifier test

* update tests

* update docstring

* address review comments

* code reformat

* update the configs

* address review comments

* fix task saved model test

* update init

* code reformatted

* Add `ResNetBackbone` and `ResNetImageClassifier` (#1765)

* Add ResNetV1 and ResNetV2

* Address comments

* Add CSP DarkNet backbone and classifier (#1774)

* Add CSP DarkNet

* Add CSP DarkNet

* snake_case function names

* change use_depthwise to block_type

* Add `FeaturePyramidBackbone` and port weights from `timm` for `ResNetBackbone` (#1769)

* Add FeaturePyramidBackbone and update ResNetBackbone

* Simplify the implementation

* Fix CI

* Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone

* Add conversion implementation

* Update docstrings

* Address comments

* Add DenseNet (#1775)

* Add DenseNet

* fix testcase

* address comments

* nit

* fix lint errors

* move description

* Add ViTDetBackbone (#1776)

* add vit det vit_det_backbone

* update docstring

* code reformat

* fix tests

* address review comments

* bump year on all files

* address review comments

* rename backbone

* fix tests

* change back to ViT

* address review comments

* update image shape

* Add Mix transformer (#1780)

* Add MixTransformer

* fix testcase

* test changes and comments

* lint fix

* update config list

* modify testcase for 2 layers

* update input_image_shape -> image_shape (#1785)

* update input_image_shape -> image_shape

* update docstring example

* code reformat

* update tests

* Create __init__.py (#1788)

add missing __init__ file to vit_det

* Hack package build script to rename to keras-hub (#1793)

This is a temporary way to test out the keras-hub branch.
- Does a global rename of all symbols during package build.
- Registers the "old" name on symbol export for saving compat.
- Adds a github action to publish every commit to keras-hub as
  a new package.
- Removes our descriptions on PyPI temporarily, until we want
  to message this more broadly.

* Add CLIP and T5XXL for StableDiffusionV3 (#1790)

* Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`.

* Make CLIPTextEncoder as Backbone

* Add `T5XXLPreprocessor` and remove `T5XXLTokenizer`

Add `CLIPPreprocessor`

* Use `tf = None` at the top

* Replace manual implementation of `CLIPAttention` with `MultiHeadAttention`

* Add Bounding Box Utils (#1791)

* Bounding box utils

* - Correct test cases

* - Remove hard tensorflow dtype

* - fix api gen

* - Fix import for test cases
- Use setup for converters test case

* - fix api_gen issue

* - FIx api gen

* - Fix api gen error

* - Correct test cases as per new api changes

* mobilenet_v3 added in keras-nlp (#1782)

* mobilenet_v3 added in keras-nlp

* minor bug fixed in mobilenet_v3_backbone

* formatting corrected

* refactoring backbone

* correct_pad_downsample method added

* refactoring backbone

* parameters updated

* Testcaseupdated, expected output shape corrected

* code formatted with black

* testcase updated

* refactoring and description added

* comments updated

* added mobilenet v1 and v2

* merge conflict resolved

* version arg removed, and config options added

* input_shape changed to image_shape in arg

* config updated

* input shape corrected

* comments resolved

* activation function format changed

* minor bug fixed

* minor bug fixed

* added vision_backbone_test

* channel_first bug resolved

* channel_first cases working

* comments  resolved

* formatting fixed

* refactoring

---------

Co-authored-by: ushareng <[email protected]>

* Pkgoogle/efficient net migration (#1778)

* migrating efficientnet models to keras-hub

* merging changes from other sources

* autoformatting pass

* initial consolidation of efficientnet_backbone

* most updates and removing separate implementation

* cleanup, autoformatting, keras generalization

* removed layer examples outside of effiicient net

* many, mainly documentation changes, small test fixes

* Add the ResNet_vd backbone (#1766)

* Add ResNet_vd to ResNet backbone

* Addressed requested parameter changes

* Fixed tests and updated comments

* Added new parameters to docstring

* Add `VAEImageDecoder` for StableDiffusionV3 (#1796)

* Add `VAEImageDecoder` for StableDiffusionV3

* Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

* Replace `Backbone` with `keras.Model` in `CLIPTextEncoder` and `T5XXLTextEncoder` (#1802)

* Add pyramid output for densenet, cspDarknet (#1801)

* add pyramid outputs

* fix testcase

* format fix

* make common testcase for pyramid outputs

* change default shape

* simplify testcase

* test case change and add channel axis

* Add `MMDiT` for StableDiffusionV3 (#1806)

* Add `MMDiT`

* Update

* Update

* Update implementation

* Add remaining bbox utils (#1804)

* - Add formats, iou, utils for bounding box

* - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers

* - Remove scope_name  not required.

* use default keras name scope

* - Correct format error

* - Remove layers as of now and keep them at model level till keras core supports them

* - Correct api_gen

* Fix timm conversion for rersnet (#1814)

* Add `StableDiffusion3`

* Fix `_normalize_inputs`

* Separate CLIP encoders from SD3 backbone.

* Simplify `text_to_image` function.

* Address comments

* Minor update and add docstrings.

* Add VGG16 backbone (#1737)

* Agg Vgg16 backbone

* update names

* update tests

* update test

* add image classifier

* incorporate review comments

* Update test case

* update backbone test

* add image classifier

* classifier cleanup

* code reformat

* add vgg16 image classifier

* make vgg generic

* update doc string

* update docstring

* add classifier test

* update tests

* update docstring

* address review comments

* code reformat

* update the configs

* address review comments

* fix task saved model test

* update init

* code reformatted

* Add `ResNetBackbone` and `ResNetImageClassifier` (#1765)

* Add ResNetV1 and ResNetV2

* Address comments

* Add CSP DarkNet backbone and classifier (#1774)

* Add CSP DarkNet

* Add CSP DarkNet

* snake_case function names

* change use_depthwise to block_type

* Add `FeaturePyramidBackbone` and port weights from `timm` for `ResNetBackbone` (#1769)

* Add FeaturePyramidBackbone and update ResNetBackbone

* Simplify the implementation

* Fix CI

* Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone

* Add conversion implementation

* Update docstrings

* Address comments

* Add DenseNet (#1775)

* Add DenseNet

* fix testcase

* address comments

* nit

* fix lint errors

* move description

* Add ViTDetBackbone (#1776)

* add vit det vit_det_backbone

* update docstring

* code reformat

* fix tests

* address review comments

* bump year on all files

* address review comments

* rename backbone

* fix tests

* change back to ViT

* address review comments

* update image shape

* Add Mix transformer (#1780)

* Add MixTransformer

* fix testcase

* test changes and comments

* lint fix

* update config list

* modify testcase for 2 layers

* update input_image_shape -> image_shape (#1785)

* update input_image_shape -> image_shape

* update docstring example

* code reformat

* update tests

* Create __init__.py (#1788)

add missing __init__ file to vit_det

* Hack package build script to rename to keras-hub (#1793)

This is a temporary way to test out the keras-hub branch.
- Does a global rename of all symbols during package build.
- Registers the "old" name on symbol export for saving compat.
- Adds a github action to publish every commit to keras-hub as
  a new package.
- Removes our descriptions on PyPI temporarily, until we want
  to message this more broadly.

* Add CLIP and T5XXL for StableDiffusionV3 (#1790)

* Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`.

* Make CLIPTextEncoder as Backbone

* Add `T5XXLPreprocessor` and remove `T5XXLTokenizer`

Add `CLIPPreprocessor`

* Use `tf = None` at the top

* Replace manual implementation of `CLIPAttention` with `MultiHeadAttention`

* Add Bounding Box Utils (#1791)

* Bounding box utils

* - Correct test cases

* - Remove hard tensorflow dtype

* - fix api gen

* - Fix import for test cases
- Use setup for converters test case

* - fix api_gen issue

* - FIx api gen

* - Fix api gen error

* - Correct test cases as per new api changes

* mobilenet_v3 added in keras-nlp (#1782)

* mobilenet_v3 added in keras-nlp

* minor bug fixed in mobilenet_v3_backbone

* formatting corrected

* refactoring backbone

* correct_pad_downsample method added

* refactoring backbone

* parameters updated

* Testcaseupdated, expected output shape corrected

* code formatted with black

* testcase updated

* refactoring and description added

* comments updated

* added mobilenet v1 and v2

* merge conflict resolved

* version arg removed, and config options added

* input_shape changed to image_shape in arg

* config updated

* input shape corrected

* comments resolved

* activation function format changed

* minor bug fixed

* minor bug fixed

* added vision_backbone_test

* channel_first bug resolved

* channel_first cases working

* comments  resolved

* formatting fixed

* refactoring

---------

Co-authored-by: ushareng <[email protected]>

* Pkgoogle/efficient net migration (#1778)

* migrating efficientnet models to keras-hub

* merging changes from other sources

* autoformatting pass

* initial consolidation of efficientnet_backbone

* most updates and removing separate implementation

* cleanup, autoformatting, keras generalization

* removed layer examples outside of effiicient net

* many, mainly documentation changes, small test fixes

* Add the ResNet_vd backbone (#1766)

* Add ResNet_vd to ResNet backbone

* Addressed requested parameter changes

* Fixed tests and updated comments

* Added new parameters to docstring

* Add `VAEImageDecoder` for StableDiffusionV3 (#1796)

* Add `VAEImageDecoder` for StableDiffusionV3

* Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

* Replace `Backbone` with `keras.Model` in `CLIPTextEncoder` and `T5XXLTextEncoder` (#1802)

* Add pyramid output for densenet, cspDarknet (#1801)

* add pyramid outputs

* fix testcase

* format fix

* make common testcase for pyramid outputs

* change default shape

* simplify testcase

* test case change and add channel axis

* Add `MMDiT` for StableDiffusionV3 (#1806)

* Add `MMDiT`

* Update

* Update

* Update implementation

* Add remaining bbox utils (#1804)

* - Add formats, iou, utils for bounding box

* - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers

* - Remove scope_name  not required.

* use default keras name scope

* - Correct format error

* - Remove layers as of now and keep them at model level till keras core supports them

* - Correct api_gen

* Fix timm conversion for rersnet (#1814)

* Fix

* Update

* Rename to diffuser and decoder

* Define functional model

* Merge from upstream/master

* Delete old SD3

* Fix copyright

* Rename to keras_hub

* Address comments

* Update

* Fix CI

* Fix bugs occurred in keras3.1

---------

Co-authored-by: Divyashree Sreepathihalli <[email protected]>
Co-authored-by: Sachin Prasad <[email protected]>
Co-authored-by: Matt Watson <[email protected]>
Co-authored-by: Siva Sravana Kumar Neeli <[email protected]>
Co-authored-by: Usha Rengaraju <[email protected]>
Co-authored-by: ushareng <[email protected]>
Co-authored-by: pkgoogle <[email protected]>
Co-authored-by: gowthamkpr <[email protected]>
  • Loading branch information
9 people authored Sep 25, 2024
1 parent 3fbbeea commit 743adea
Show file tree
Hide file tree
Showing 27 changed files with 2,582 additions and 870 deletions.
13 changes: 13 additions & 0 deletions keras_hub/api/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@
from keras_hub.src.models.bloom.bloom_tokenizer import BloomTokenizer
from keras_hub.src.models.causal_lm import CausalLM
from keras_hub.src.models.causal_lm_preprocessor import CausalLMPreprocessor
from keras_hub.src.models.clip.clip_preprocessor import CLIPPreprocessor
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
from keras_hub.src.models.csp_darknet.csp_darknet_backbone import (
CSPDarkNetBackbone,
)
Expand Down Expand Up @@ -260,14 +262,25 @@
from keras_hub.src.models.sam.sam_image_segmenter import SAMImageSegmenter
from keras_hub.src.models.seq_2_seq_lm import Seq2SeqLM
from keras_hub.src.models.seq_2_seq_lm_preprocessor import Seq2SeqLMPreprocessor
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_backbone import (
StableDiffusion3Backbone,
)
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_text_to_image import (
StableDiffusion3TextToImage,
)
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_text_to_image_preprocessor import (
StableDiffusion3TextToImagePreprocessor,
)
from keras_hub.src.models.t5.t5_backbone import T5Backbone
from keras_hub.src.models.t5.t5_preprocessor import T5Preprocessor
from keras_hub.src.models.t5.t5_tokenizer import T5Tokenizer
from keras_hub.src.models.task import Task
from keras_hub.src.models.text_classifier import TextClassifier
from keras_hub.src.models.text_classifier import TextClassifier as Classifier
from keras_hub.src.models.text_classifier_preprocessor import (
TextClassifierPreprocessor,
)
from keras_hub.src.models.text_to_image import TextToImage
from keras_hub.src.models.vgg.vgg_backbone import VGGBackbone
from keras_hub.src.models.vgg.vgg_image_classifier import VGGImageClassifier
from keras_hub.src.models.vit_det.vit_det_backbone import ViTDetBackbone
Expand Down
1 change: 1 addition & 0 deletions keras_hub/api/tokenizers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from keras_hub.src.models.bart.bart_tokenizer import BartTokenizer
from keras_hub.src.models.bert.bert_tokenizer import BertTokenizer
from keras_hub.src.models.bloom.bloom_tokenizer import BloomTokenizer
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
from keras_hub.src.models.deberta_v3.deberta_v3_tokenizer import (
DebertaV3Tokenizer,
)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from keras import dtype_policies
from keras import layers
from keras import ops

Expand Down Expand Up @@ -43,7 +44,7 @@ def __init__(
intermediate_activation = quick_gelu

self.layer_norm_1 = layers.LayerNormalization(
epsilon=0.00001, dtype=self.dtype_policy, name="layer_norm_1"
epsilon=1e-5, dtype="float32", name="layer_norm_1"
)
self.attention = layers.MultiHeadAttention(
num_heads,
Expand All @@ -52,7 +53,7 @@ def __init__(
name="attention",
)
self.layer_norm_2 = layers.LayerNormalization(
epsilon=0.00001, dtype=self.dtype_policy, name="layer_norm_2"
epsilon=1e-5, dtype="float32", name="layer_norm_2"
)
self.dense_1 = layers.Dense(
self.intermediate_dim, dtype=self.dtype_policy, name="dense_1"
Expand All @@ -67,6 +68,11 @@ def __init__(
def build(self, input_shape):
self.layer_norm_1.build(input_shape)
self.attention.build(input_shape, input_shape, input_shape)
# Before Keras 3.2, there was no setter for `dtype_policy`. Directly
# assign a `DTypePolicy` instead.
self.attention._softmax.dtype_policy = dtype_policies.DTypePolicy(
"float32"
)
self.layer_norm_2.build(input_shape)
self.dense_1.build(input_shape)
input_shape = self.dense_1.compute_output_shape(input_shape)
Expand Down
147 changes: 147 additions & 0 deletions keras_hub/src/models/clip/clip_preprocessor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Copyright 2024 The KerasHub Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import keras

from keras_hub.src.api_export import keras_hub_export
from keras_hub.src.layers.preprocessing.start_end_packer import StartEndPacker
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
from keras_hub.src.models.preprocessor import Preprocessor
from keras_hub.src.utils.tensor_utils import preprocessing_function

try:
import tensorflow as tf
except ImportError:
tf = None


@keras_hub_export("keras_hub.models.CLIPPreprocessor")
class CLIPPreprocessor(Preprocessor):
"""CLIP preprocessing layer which tokenizes and packs inputs.
This preprocessing layer will do 2 things:
- Tokenize the inputs using the `tokenizer`.
- Construct a dictionary with keys `"token_ids"`, `"padding_mask"`.
This layer can be used directly with `tf.data.Dataset.map` to preprocess
string data in the `(x, y, sample_weight)` format used by
`keras.Model.fit`.
The call method of this layer accepts three arguments, `x`, `y`, and
`sample_weight`. `x` can be a python string or tensor representing a single
segment, a list of python strings representing a batch of single segments,
or a list of tensors representing multiple segments to be packed together.
`y` and `sample_weight` are both optional, can have any format, and will be
passed through unaltered.
`CLIPPreprocessor` forces the input to have only one segment, as CLIP is
mainly used for generation tasks. For tasks having multi-segment inputs
like "glue/mnli", please use a model designed for classification purposes
such as BERT or RoBERTa.
Args:
tokenizer: A `keras_hub.models.CLIPTokenizer` instance.
sequence_length: The length of the packed inputs.
add_start_token: If `True`, the preprocessor will prepend the tokenizer
start token to each input sequence.
add_end_token: If `True`, the preprocessor will append the tokenizer
end token to each input sequence.
to_lower: bool. Whether to lower the inputs.
Call arguments:
x: A string, `tf.Tensor` or list of python strings.
y: Any label data. Will be passed through unaltered.
sample_weight: Any label weight data. Will be passed through unaltered.
sequence_length: Pass to override the configured `sequence_length` of
the layer.
"""

# TODO: Add example once we have a CLIP model.

tokenizer_cls = CLIPTokenizer

def __init__(
self,
tokenizer,
sequence_length=77,
add_start_token=True,
add_end_token=True,
to_lower=True,
**kwargs,
):
super().__init__(**kwargs)
self.tokenizer = tokenizer
self.packer = None
self.sequence_length = sequence_length
self.add_start_token = add_start_token
self.add_end_token = add_end_token
self.to_lower = to_lower

def build(self, input_shape):
# Defer packer creation to `build()` so that we can be sure tokenizer
# assets have loaded when restoring a saved model.
self.packer = StartEndPacker(
start_value=self.tokenizer.start_token_id,
end_value=self.tokenizer.end_token_id,
pad_value=self.tokenizer.end_token_id,
sequence_length=self.sequence_length,
return_padding_mask=True,
)
self.built = True

@preprocessing_function
def call(
self,
x,
y=None,
sample_weight=None,
sequence_length=None,
):
sequence_length = sequence_length or self.sequence_length
if self.to_lower:
x = tf.strings.lower(x)
token_ids, padding_mask = self.packer(
self.tokenizer(x),
sequence_length=sequence_length,
add_start_value=self.add_start_token,
add_end_value=self.add_end_token,
)
x = {
"token_ids": token_ids,
"padding_mask": padding_mask,
}
return keras.utils.pack_x_y_sample_weight(x, y, sample_weight)

def get_config(self):
config = super().get_config()
config.update(
{
"sequence_length": self.sequence_length,
"add_start_token": self.add_start_token,
"add_end_token": self.add_end_token,
"to_lower": self.to_lower,
}
)
return config

@property
def sequence_length(self):
"""The padded length of model input sequences."""
return self._sequence_length

@sequence_length.setter
def sequence_length(self, value):
self._sequence_length = value
if self.packer is not None:
self.packer.sequence_length = value
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,8 @@
# limitations under the License.
import pytest

from keras_hub.src.models.stable_diffusion_v3.clip_preprocessor import (
CLIPPreprocessor,
)
from keras_hub.src.models.stable_diffusion_v3.clip_tokenizer import (
CLIPTokenizer,
)
from keras_hub.src.models.clip.clip_preprocessor import CLIPPreprocessor
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
from keras_hub.src.tests.test_case import TestCase


Expand All @@ -43,7 +39,7 @@ def test_preprocessor_basics(self):
input_data=self.input_data,
expected_output={
"token_ids": [[5, 1, 2, 1, 3, 4, 4, 4]],
"padding_mask": [[1, 1, 1, 1, 1, 0, 0, 0]],
"padding_mask": [[1, 1, 1, 1, 1, 1, 0, 0]],
},
)

Expand All @@ -54,17 +50,16 @@ def test_no_start_end_token(self):
sequence_length=8,
add_start_token=False,
add_end_token=False,
pad_with_end_token=False,
)
x = preprocessor(input_data)
self.assertAllEqual(x["token_ids"], [[1, 2, 1, 3, 0, 0, 0, 0]] * 4)
self.assertAllEqual(x["token_ids"], [[1, 2, 1, 3, 4, 4, 4, 4]] * 4)
self.assertAllEqual(x["padding_mask"], [[1, 1, 1, 1, 0, 0, 0, 0]] * 4)

def test_sequence_length_override(self):
input_data = " airplane airport"
preprocessor = CLIPPreprocessor(**self.init_kwargs)
x = preprocessor(input_data, sequence_length=4)
self.assertAllEqual(x["token_ids"], [5, 1, 2, 1])
x = preprocessor(input_data, sequence_length=5)
self.assertAllEqual(x["token_ids"], [5, 1, 2, 1, 4])

@pytest.mark.kaggle_key_required
@pytest.mark.extra_large
Expand Down
Loading

0 comments on commit 743adea

Please sign in to comment.