clarifying differences between available models #18

zeke · 2021-09-27T17:54:08Z

Hi @mehdidc 👋🏼 I'm a new team member at @replicate.

I was trying out your model on replicate.ai and noticed that the names of the models are a bit cryptic, so it's hard to know what differences to expect when using each:

Here's where those are declared:

feed_forward_vqgan_clip/predict.py

Lines 10 to 14 in dd640c0

    
           MODELS = [ 
        
               "cc12m_32x1024_vitgan_v0.1.th", 
        
               "cc12m_32x1024_vitgan_v0.2.th", 
        
               "cc12m_32x1024_mlp_mixer_v0.2.th", 
        
           ]

Looking at the source for cog's Input class it looks like options can be a list of anything:

options: Optional[List[Any]] = None

I'm not sure if this is right, but maybe this means that each model could be declared as a tuple with an accompanying label:

MODELS = [
    ("cc12m_32x1024_vitgan_v0.1.th", "This model does x"),
    ("cc12m_32x1024_vitgan_v0.2.th" "This model does y"),,
    ("cc12m_32x1024_mlp_mixer_v0.2.th", "This model does z"),
]

We could then display those labels on the model form on replicate.ai to make the available options more clear to users.

Curious to hear your thoughts!

cc @cjwbw @bfirsh @andreasjansson

The text was updated successfully, but these errors were encountered:

mehdidc · 2021-10-01T11:30:35Z

Hi @zeke, sorry for my late answer, thanks for the proposition, you are absolutely right, the model names are not very informative. The thing is that the models are doing the same thing in a sense (also trained on the same prompts dataset), it's just that the architecture is different (vitgan vs mlp_mixer) and between 0.1 and 0.2 I used different set of data augmentations. The reason they are provided altogether is that the user might prefer one option over the other one for a specific prompt. One way to avoid the naming would be to to not provide model choice explicitly, but rather, display a grid of images as an output like in ICGAN (https://replicate.ai/arantxacasanova/ic_gan), where the image of each cell of the grid would be the generated image from a model.

So I am not totally sure, I will think about it, if you or anyone have any propositions, would be glad to hear from you.

afiaka87 · 2021-10-03T13:45:46Z

@mehdidc @zeke

The distinguishing information is:
modelType: ["mlp_mixer", "vitgan"] -> basically "experimental (mlp_mixer) versus established (vitgan)"
version: ["v0.1", "v0.2"] -> not sure what the precise differences are here, @mehdidc ?
dimension: [128, 256, 512, 1024] -> correlates directly with accuracy of model. bigger is better, but slower.
depth: [8, 16, 32] -> number of hidden layers. correlates directly with accuracy of model. bigger is better, but slower.

this info is contained in the filename (albeit cryptically) . The format is:
{dataset}_{depth}x{dimension}_{type}_{version}
if you remove the curly braces. So
cc12m_32x1024_vitgan_v0
gives you:
dataset: cc12m
depth: 32
dimension 1024
type: vitgan
version: v0

From skimming your post @zeke am I correct in assuming you have a somewhat limited API to work with on replicate? There are a few ways this information could be presented. Perhaps easiest would be to summarize this info and make it easy to get to from replicate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarifying differences between available models #18

clarifying differences between available models #18

zeke commented Sep 27, 2021

mehdidc commented Oct 1, 2021

afiaka87 commented Oct 3, 2021

clarifying differences between available models #18

clarifying differences between available models #18

Comments

zeke commented Sep 27, 2021

mehdidc commented Oct 1, 2021

afiaka87 commented Oct 3, 2021