Accessing the intermediate SOM states during the training #182

fransua · 2021-03-30T16:00:08Z

Is your feature request related to a problem? Please describe.
I trained my SOM for 2000 epochs, and would like to store intermediate results (each 500 epochs), something like:

datainfo = loadCSVSet(:test,files,header=false)
som = initGigaSOM(datainfo, 20, 20, seed=seed)
radius_list = [10, 8.9, 7.8, 6.7, 5.6, 4.5, 3.4, 2.3, 1.2, 0.5, 0.1]
for i in 1:10:
    som = trainGigaSOM(som, datainfo, rStart=radius_list[i], rFinal=radius_list[i+1], epochs=200, radiusFun=linearRadius)
    e = embedGigaSOM(som, datainfo)
    e2 = distributed_collect(e)
    writedlm(string("GigaSOM_iker_1400k_embed_seed",seed,"_epochs",epochs,".tsv"),e2,'\t')
    open(f -> serialize(f, som), ("partly_trained_%s.jls", i), "w");
end

I want to assess if I did enough training. Problem is that in with this strategy I can only use a linearRadius, or do a very ugly hack
inputing specific radius function.

Describe the solution you'd like
Perhaps one could input starting/ending epoch/iteration to the train function (here:

GigaSOM.jl/src/analysis/core.jl

Line 130 in f4e712b

for j = 1:epochs

).

Describe alternatives you've considered
Allow to "do something" (call a function) each X epochs in order to serialize the som object, or save the coordinates.

Additional context
none

The text was updated successfully, but these errors were encountered:

exaexa · 2021-03-30T16:40:29Z

Hello! Would a per-epoch callback work for you? very roughly like:

trainGigaSOM(som, di, ...,
  beforeEachEpochDo = (epoch, data, som) ->
    if epoch%20 == 0
      e=embedGigaSOM(data, som)
      writedlm(...e...)
    end )

Anyway, 2000 epochs is a bit of overkill for normal use, so I guess you're basically trying to visualize how the SOM training progresses?

EDIT: one of the good ways to check if your SOM has been trained well is to plot your datapoints (cells) projected to a normal 2D scatterplot, and then plot the som.codes to them -- the distribution should roughly match the one of the datapoints, although will be much sparser. If the SOM is under-trained, you'll likely see that the "corners" of the data are not getting enough "coverage" by som.codes.

fransua · 2021-03-31T10:29:38Z

Hello,
thanks for the quick reply.
Sorry I don't really know Julia, I think what you are proposing is to create a function to be called each xx epochs (20 in the example), to store data or do whatever. This indeed would help, it would allow me to visualize the progress of the training.
One side question: can I create a gigasom object from an embedded array? or do I need to serialize the som instance? From your experience is serialization of these huge gigasom objects a good (possible) option?

Another part of my problem is if I train my data with 1000 epochs, stop the process, and then want to resume the training (after the 1000 epochs). As your epoch count starts a 1 (

GigaSOM.jl/src/analysis/core.jl

Line 130 in f4e712b

for j = 1:epochs

), the expRadius function will start again with this exponential decay if I resume the training after 1000 epochs.
My solution so far is to create a radius function hacking the rStart and rFinal parameters to account for the extra rounds of epochs:

function myRadius(initRadius::Float64, finalRadius::Float64, iteration::Int64, epochs::Int64)
    if (iteration + convert(Int, finalRadius)) <= convert(Int, initRadius)
        val = [expRadius(100.0)(20.0, 0.1, i, 5000) for i in 1:5000][iteration + convert(Int, finalRadius)]
    else
        val = [1 - 1 / 200 * i for i in 1:200][iteration + convert(Int, finalRadius) - convert(Int, initRadius)]
    end
end

(I know... super ugly)
So perhaps adding extra parameters to the trainGigaSOM function would make it easier to split and control the training.

About the 2000 epochs, between 500 and 1000 epochs I still see important movements (same seed), we are now trying 2000 epochs. And as this is very long (days) we (and the planet :P) would rather start from the already pre-trained som... The data is huge, millions of datapoints thousands of dimensions, and btw, we are very thankful for GigaSOM and to your team!

About the check you propose, I tried with a random example:

using GigaSOM
using Plots
using Random

 Random.seed!(1)
       d = randn(30000,5) .+ rand(0:1, 30000, 5).*10
       som = initGigaSOM(d, 20, 20)
       anim = @animate for train in 1:1:100
           radius = (100-train)^2/1000 + 0.1
           print("$train: $radius")
           print("\n")
           som = trainGigaSOM(som, d, epochs=1, rStart=radius, rFinal=radius)
           e = embedGigaSOM(som, d)
           e2 = embedGigaSOM(som, som.codes)
           scatter(e2[:,1], e2[:,2], ms=4, legend=false, alpha=0.5, color="grey")
           scatter!(e[:,1], e[:,2], ms=1, legend=false)
           annotate!(9, 21, text("epoch-$train (r:$(round(radius, digits=1)))", 9))
           xlims!(0, 20)
           ylims!(0, 21)
           savefig("test_$train.png")    
       end

gif(anim, "anim_fps15_1-100_r10-0.1.gif", fps = 15)

which gives me:

But I am not sure that the fact that datapoints are surrounding the map is due to a lack of training but rather to a too large radius.

For instance if I do the same keeping the radius fixed at 1.0 I got:

My interpretation is that after very few epochs with a small radius you will get a more or less homogeneous distribution of datapoints per codebooks.

The way I wanted to check for training was to visualize (measure?) the amount of changes between epochs. If the datapoints and codebooks are still moving a lot, than I would do more training.
Does this strategy sounds good to you?

thanks a lot for the answer, and for the software in general :)

exaexa · 2021-03-31T10:49:11Z

OK, adding the function callback to the TODO list, I'll hopefully get to it soon.

Regarding the SOM training: there's no good way to detect if the SOM is "optimal" -- you can have metrics like MQE etc, but these are equally well optimized by normal k-means and by SOM with a small radius (it's useful to think a bit about the kMeans/SOM correspondence).

With SOMs, the beginning of the training with large radius forces the SOM to become smooth and topologically correct (while ignoring complicated details, such as MQE). Dduring the smooth shift to smaller radius, the learned "global" shape hopefully stays as intact as possible while the optimization gradually optimizes more and more "raw" quantization error, as kMeans.

The optimal training is, say, an equilibrium of those -- if you do too much of the first phase, you'll basically waste the computation and the output's gonna be chunky; if you don't do enough of it you will get nice output but global structure won't make much sense. If you omit the "middle", you will have a good "global structure" but medium-size details within clusters will be mixed up. (Cytometry example: it's gonna be able to separate CD4s from CD8s, but Thelpers will be e.g. split into 2 clusters because there was no "zoom level" where this would be optimized).

For your visualizations:

embedding the SOM codes seems to give interesting result, actually I never thought about it this way (it's basically "embedding the codes on themselves", which doesn't make much sense as a projection, but it's cool nevertheless)
you may want to plot a simple 2D projection of the points (like, just first 2 dimensions) with 2D projection of the codes (also unembedded) so that you see how the SOM behaves in the "real space" throughout the learning process

fransua · 2021-03-31T13:52:09Z

I understand your point about the optimization
thank you for taking the time to answer!

... and indeed it is a much better visualization

(umap)

thanks a lot!

exaexa · 2021-03-31T14:53:17Z

UMAP doesn't convey much about what's actually happening geometrically :]

I wanted to show something like this (this is on 2D data for clarity but feel free to play with it):

The point: you see what the radius choice causes the SOM to do certain stuff. Notably, the "precise mapping" of the actual clusters happens all the way back in the last several steps. We implemented the exponential radius decay precisely because of forcing the SOM to spend some time in this step.

Also, the initialization is not optimal, this would get solved with a few more rounds with high radius (btw. this is usually an issue only in 2D)

(EDIT: compare with your original animation -- in the beginning, all datapoints are "far away" from the SOM borders, thus displayed on the borders of the embedding.)

Code is like yours, except:

d = randn(30000, 2) .+ rand(0:2, 30000, 2) .* 10

and

    scatter(som.codes[:, 1], som.codes[:, 2], ms = 4, legend = false, alpha = 0.5, color = "grey")
    scatter!(d[:, 1], d[:, 2], ms = 1, legend = false)

fransua · 2021-03-31T15:12:34Z

Nice! I see, thanks a lot!
I will give a try to this direct representation also,the problem is that with ~6000 dimensions I am afraid that taking only the first 2 would be misleading.
Anyway, this discussion helped me a lot (and produced many gifs for future "SOM" google image searches :P )

exaexa · 2021-03-31T15:23:50Z

Ah, 6000 dimensions... Suggest squashing it down a bit either by the oldschool PCA, or (usually better) random projections or (even better) random radial basis functions. What's the data?

fransua · 2021-03-31T15:40:12Z

ok, yes, we were going for the random projection (did not knew about random radial basis functions thanks!).
The data are protein-protein interaction (PPI) probabilities (110x110) in millions of subsamples of a global PPI network...

exaexa · 2021-03-31T16:43:22Z

okay wow, that sounds cool. Let us know if you get some nice graphics, we're always interested in user stories. :]

exaexa · 2021-04-01T07:18:44Z

Hi, can you try with the code from #183 ? The new callback parameter for trainGigaSOM is called eachEpoch, gets called after epochs with parameters (epoch, radius, codes), and also once before the first epoch starts with epoch = 0.

laurentheirendt · 2021-04-01T09:33:15Z

Leaving this issue open until checked

fransua · 2021-04-01T10:19:07Z

Hi,
thanks a lot!
got an error, I guess here:

https://github.com/LCSB-BioCore/GigaSOM.jl/blob/develop/src/analysis/core.jl#L161

epoch should be j I guess (or the other way around :P)
Also, as it is, if I want to do the embedding in the eachEpoch function right now I have to create a new Som object from the codes, which I guess is ok, using square root of the length of the codes to get the x/ydim, assuming that the grid is symmetric... but it sounds a bit weird. Wouldn't it be easier to directly pass the Som instead of the codes?
thanks!

exaexa · 2021-04-01T10:34:22Z

I see, sorry, my testing pipeline has totally failed :D There's already a fix in #184.

Regarding som -- you can just put it into the original SOM (should be okay). I'm not sure how the variable aliasing will work, will have a deeper look at it.

fransua · 2021-04-01T10:39:00Z

I saw your travis uses julia 1.5; the error raises only with julia 1.6 (don't know why in julia 1.5 this part of the code is not checked until called with a not None function).
ok I will plugin the new codes into the existing som, true.

exaexa · 2021-04-01T10:42:40Z

I just rewrote it a bit to actually work with a copy of the SOM (that should be done from the beginning anyway). Pushing in 5 minutes. :]

exaexa · 2021-04-01T10:50:10Z

(the code with the tests (running now, hopefully working correctly) is here: https://github.com/exaexa/GigaSOM.jl/tree/develop )

fransua · 2021-04-01T11:01:44Z

in julia 1.6 I got this:

... sorry I can't test further, I started with Julia this week and still don't know how to Pkg.add a local repo xD

exaexa · 2021-04-01T11:09:02Z

I'm battling the test framework in parallel, hopefully I get it to some shape in a few minutes :]

fransua · 2021-04-01T11:19:31Z

Thanks a lot!
(There is really no rush)

exaexa · 2021-04-01T11:26:44Z

no rush

I broke it, feeling a moral obligation to fix it asap :D :D

exaexa · 2021-04-01T11:30:57Z

Anyway the branch seems to have no problems anymore (https://github.com/exaexa/GigaSOM.jl/runs/2245496108 passed); we'll try to merge to develop ASAP. We'll push a version with this.

fransua · 2021-04-01T14:22:30Z

Ok, this is working nicely!
about the same time, but much less RAM usage than with our hack... and our code is now much cleaner!
thanks a lot!

and again thanks for the tool, this is game changing for us! :)

exaexa · 2021-04-01T19:00:30Z

OK, great to hear that! The change will (slowly) bubble to official package repo, should be eventually available in 0.6.5 if you need to depend on it reliably.

Thank you for the idea (and the pictures :D )!

fransua · 2021-04-01T22:22:56Z

thanks!
pictures are so satisfying... :)

exaexa · 2021-04-02T08:47:13Z

Btw could you share the 3D plotting code? I've failed to put that together properly (with 3param scatter I only got empty plots), and have some PRIME material for making animations (https://bioinfo.uochb.cas.cz/embedsom/vignettes/bones.html)

fransua · 2021-04-02T10:21:58Z

This is amazing!!

here is the code:

using GigaSOM
using Random
using Plots


function eachEpoch(epoch::Int64, radius::Float64, som_result::Som)
    if epoch %1 == 0
#         print("$epoch $radius\n")
        scatter3d(datapoints[:,1], datapoints[:,2], datapoints[:,3], ms=1, label="")
        scatter3d!(som_result.codes[:,1], som_result.codes[:,2], som_result.codes[:,3], 
            ms=4, alpha=0.5, color="grey", fg_legend = :transparent, label="", legendtitlefontsize=8,
            legendtitle="epoch: " * lpad(epoch, 3, " "))  # annotate not working in 3D :P
        Plots.frame(anim)
    end
end


Random.seed!(1)
datapoints = randn(30000, 3).+ rand(0:1, 30000, 3).* 10
som = initGigaSOM(datapoints, 20, 20, seed=13)  # yes, I tried a couple to get the visually perpendicular start
anim = Plots.Animation()
trainGigaSOM(som, datapoints, eachEpoch=eachEpoch, epochs=100)

gif(anim, "3d-anim_fps15_1-100_r20-0.1.gif", fps = 15)

Note that I am using the eachEpoch new parameter, and this has one down side in this case, I cannot use the @animate macro and this slows a bit the run (~30%).

exaexa · 2021-04-02T10:55:20Z

Great, thanks a lot! Also nice to have it here as a reference

exaexa · 2021-04-02T13:35:14Z

aaaaaaaaaaaaaaaaaaand we finally have a decent illustration of how the whole thing works! :D

fransua · 2021-04-02T13:51:12Z

very nice!!!
I bet this is going to be in many intro slides!!
(mine for sure xD)

exaexa · 2021-04-06T07:02:18Z

Pinning & renaming this for better future reference.

exaexa mentioned this issue Apr 1, 2021

support user-specified callbacks after each SOM training epoch #183

Merged

exaexa closed this as completed Apr 1, 2021

exaexa pinned this issue Apr 6, 2021

exaexa changed the title ~~Option to resume training at a given stage~~ Accessing the intermediate SOM states during the training Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing the intermediate SOM states during the training #182

Accessing the intermediate SOM states during the training #182

fransua commented Mar 30, 2021

exaexa commented Mar 30, 2021 •

edited

Loading

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021 •

edited

Loading

fransua commented Mar 31, 2021 •

edited

Loading

exaexa commented Mar 31, 2021 •

edited

Loading

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021

exaexa commented Apr 1, 2021

laurentheirendt commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

exaexa commented Apr 1, 2021 •

edited

Loading

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021 •

edited

Loading

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 2, 2021

fransua commented Apr 2, 2021 •

edited

Loading

exaexa commented Apr 2, 2021

exaexa commented Apr 2, 2021

fransua commented Apr 2, 2021

exaexa commented Apr 6, 2021

Accessing the intermediate SOM states during the training #182

Accessing the intermediate SOM states during the training #182

Comments

fransua commented Mar 30, 2021

exaexa commented Mar 30, 2021 • edited Loading

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021 • edited Loading

fransua commented Mar 31, 2021 • edited Loading

exaexa commented Mar 31, 2021 • edited Loading

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021

fransua commented Mar 31, 2021

exaexa commented Mar 31, 2021

exaexa commented Apr 1, 2021

laurentheirendt commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

exaexa commented Apr 1, 2021 • edited Loading

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021 • edited Loading

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 1, 2021

fransua commented Apr 1, 2021

exaexa commented Apr 2, 2021

fransua commented Apr 2, 2021 • edited Loading

exaexa commented Apr 2, 2021

exaexa commented Apr 2, 2021

fransua commented Apr 2, 2021

exaexa commented Apr 6, 2021

exaexa commented Mar 30, 2021 •

edited

Loading

exaexa commented Mar 31, 2021 •

edited

Loading

fransua commented Mar 31, 2021 •

edited

Loading

exaexa commented Mar 31, 2021 •

edited

Loading

exaexa commented Apr 1, 2021 •

edited

Loading

exaexa commented Apr 1, 2021 •

edited

Loading

fransua commented Apr 2, 2021 •

edited

Loading