Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Noises: Offset, Pyramid, Pink #56

Open
torridgristle opened this issue Mar 12, 2023 · 2 comments
Open

Alternative Noises: Offset, Pyramid, Pink #56

torridgristle opened this issue Mar 12, 2023 · 2 comments

Comments

@torridgristle
Copy link

torridgristle commented Mar 12, 2023

I've been seeing some promising results from using alternative noise methods to teach the model to adjust the lower frequency components of an input, since pure randn noise is mostly high frequency content and Stable Diffusion (and possibly other diffusion models trained on randn noise) learned to create image with the same average and can't make brighter or darker images. When sampling it appears to use normal randn noise for offset and pyramid, not certain for pink.

With offset noise it learns to shift the output up or down more. It's a very small change to the noise generation for training: noise = torch.randn_like(latents) + 0.1 * torch.randn(latents.shape[0], latents.shape[1], 1, 1)

With pyramid noise the input is more evenly masked across different frequencies, rather than just high frequency content. The noise is generated by scaling a low resolution noise up to a random scale (they wanted to avoid always doing 2x upscale), adding more noise after upsampling, and repeating. The code they use is given in the article, Ctrl+F for def pyramid_noise_like(x, discount=0.9):.

With pink noise (EleutherAI Discord message link) I'm not 100% sure on the benefit. It's apparently closer to the noise found in images so it seems to make sense for image generation, but perhaps it'll be good for audio too.

In case you can't open the Discord link, the code provided by crowsonkb / alstroemeria313 is

import math
from dctorch import functional as DF
import torch

def sqrtm(x):
    vals, vecs = torch.linalg.eigh(x)
    return vecs * vals.sqrt() @ vecs.T

def colored_noise(shape, power=2.0, mean=None, color=None, device='cpu', dtype=torch.float32):
    mean = torch.zeros([shape[-3]]) if mean is None else mean
    color = torch.eye(shape[-3]) if color is None else color
    f_h = math.pi * torch.arange(shape[-2], device=device, dtype=dtype) / shape[-2]
    f_w = math.pi * torch.arange(shape[-1], device=device, dtype=dtype) / shape[-1]
    freqs_sq = (f_h[:, None] ** 2 + f_w[None, :] ** 2)
    freqs_sq[..., 0, 0] = freqs_sq[..., 0, 1]
    spd = freqs_sq ** -(power / 2)
    spd /= spd.mean()
    noise = torch.randn(shape, device=device, dtype=dtype)
    noise = torch.einsum('...chw,cd->...dhw', noise, color.to(device, dtype))
    noise = DF.idct2(noise * spd.sqrt())
    noise = noise + mean.to(device, dtype)[..., None, None]
    return noise

Ideally this will help with generating lower frequency components in audio.

@flavioschneider
Copy link
Member

@torridgristle thanks for sharing! Do you have some results to show for audio? This is something I also wanted to try at some point. Very interested to see how the different types of noise compare

@StevenSchrembeck
Copy link

StevenSchrembeck commented Mar 13, 2023

I wonder what would happen if you had a network specialize in different frequency bands. Where the loss function is judged only on the final mixed output of all of them. Perhaps more in the 500 to 4k khz range where we hear, like you're saying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants