Neural Audio Codec with Continuous Vectors #8

manmay-nakhashi · 2023-04-29T13:45:30Z

manmay-nakhashi
Apr 29, 2023

@lucidrains This is something different from Enocodec and SoundStream
We use a neural audio codec to convert speech waveform into continuous vectors instead of discrete tokens
and
The audio encoder consists of several convolutional blocks with a
total downsampling rate of 200 for 16KHz audio.
means they are compressing 16000 khz audio to 80 size continuous vector.

lucidrains · 2023-04-29T17:50:32Z

lucidrains
Apr 29, 2023
Maintainer

@manmay-nakhashi yea, they do assert that, but never showed any experiments comparing the two

what they actually did in the paper follows all the other recent successes. they used the soundstream architecture with the residual VQ, and even had a special loss to each quantizer codebook

2 replies

manmay-nakhashi Apr 29, 2023
Author

@lucidrains so computation of continuous vectors is the key and that depends on

which can be computed from codebook embeddings and the quantized token IDs

will it be something like this ?

Residual Vector Quantizer

z = []
for i in range(0, h.size(2), hop_size):
    frame = h[:, :, i:i+hop_size]

    # Calculate L2 distance between frame and codebooks
    distances = torch.norm(frame.unsqueeze(1) - self.codebooks, dim=-1, p=2)

    # Find closest codebook index
    indices = torch.argmin(distances, dim=1)

    # Gather codebook embeddings based on indices
    embeddings = self.codebooks[torch.arange(self.num_res_vq_blocks), indices]

    # Calculate residual vectors
    residuals = frame - embeddings

    # Concatenate residual vectors
    z.append(residuals.reshape(residuals.size(0), -1))

z = torch.stack(z, dim=1)

where we concatinate vq over timre

lucidrains Apr 29, 2023
Maintainer

yup, all that is already taken care of here https://github.com/lucidrains/vector-quantize-pytorch#residual-vq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural Audio Codec with Continuous Vectors #8

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Neural Audio Codec with Continuous Vectors #8

manmay-nakhashi Apr 29, 2023

Replies: 1 comment · 2 replies

lucidrains Apr 29, 2023 Maintainer

manmay-nakhashi Apr 29, 2023 Author

Residual Vector Quantizer

lucidrains Apr 29, 2023 Maintainer

manmay-nakhashi
Apr 29, 2023

Replies: 1 comment 2 replies

lucidrains
Apr 29, 2023
Maintainer

manmay-nakhashi Apr 29, 2023
Author

lucidrains Apr 29, 2023
Maintainer