latent padding training and inference different #660

cdfan0627 · 2025-01-12T16:59:33Z

您好，請問為什麼在CogvideoX 1.5 I2V model training的時候是先補齊latent的shape然後再對image latent 做padding，如下code 是 lora_trainer.py

patch_size_t = self.state.transformer_config.patch_size_t
        if patch_size_t is not None:
            ncopy = latent.shape[2] % patch_size_t
            first_frame = latent[:, :, :1, :, :]  # Get first frame [B, C, 1, H, W]
            latent = torch.cat([first_frame.repeat(1, 1, ncopy, 1, 1), latent], dim=2)
            assert latent.shape[2] % patch_size_t == 0
latent = latent.permute(0, 2, 1, 3, 4)
image_latents = image_latents.permute(0, 2, 1, 3, 4)
padding_shape = (latent.shape[0], latent.shape[1] - 1, *latent.shape[2:])
        latent_padding = image_latents.new_zeros(padding_shape)
        image_latents = torch.cat([image_latents, latent_padding], dim=1)

inference的時候卻對image latent做padding後，再補上第一個frame，如下code是pipeline_cogvideox_image2video.py

padding_shape = (
            batch_size,
            num_frames - 1,
            num_channels_latents,
            height // self.vae_scale_factor_spatial,
            width // self.vae_scale_factor_spatial,
        )

        latent_padding = torch.zeros(padding_shape, device=device, dtype=dtype)
        image_latents = torch.cat([image_latents, latent_padding], dim=1)

        if self.transformer.config.patch_size_t is not None:
            first_frame = image_latents[:, : image_latents.size(1) % self.transformer.config.patch_size_t, ...]
            image_latents = torch.cat([first_frame, image_latents], dim=1)

理論上training跟inference應該要一樣，inference出來的結果才會正確的，所以想請問一下training跟inference哪個是對的呢

The text was updated successfully, but these errors were encountered:

OleehyO · 2025-01-13T10:07:19Z

pipeline里的代码由于一些历史原因写起来有点啰嗦，建议以training的代码作为参考。虽然实现方式有区别，但是最后的结果都是一样的。

cdfan0627 · 2025-01-13T10:41:11Z

可是inference的看起來比較合理，因為這樣image_latents的每一個latent才會跟video latent 對齊

cdfan0627 changed the title ~~latent padding training and inference~~ latent padding training and inference different Jan 12, 2025

OleehyO self-assigned this Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latent padding training and inference different #660

latent padding training and inference different #660

cdfan0627 commented Jan 12, 2025 •

edited

Loading

OleehyO commented Jan 13, 2025

cdfan0627 commented Jan 13, 2025

latent padding training and inference different #660

latent padding training and inference different #660

Comments

cdfan0627 commented Jan 12, 2025 • edited Loading

OleehyO commented Jan 13, 2025

cdfan0627 commented Jan 13, 2025

cdfan0627 commented Jan 12, 2025 •

edited

Loading