-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing the intermediate SOM states during the training #182
Comments
Hello! Would a per-epoch callback work for you? very roughly like: trainGigaSOM(som, di, ...,
beforeEachEpochDo = (epoch, data, som) ->
if epoch%20 == 0
e=embedGigaSOM(data, som)
writedlm(...e...)
end ) Anyway, 2000 epochs is a bit of overkill for normal use, so I guess you're basically trying to visualize how the SOM training progresses? EDIT: one of the good ways to check if your SOM has been trained well is to plot your datapoints (cells) projected to a normal 2D scatterplot, and then plot the |
Hello, Another part of my problem is if I train my data with 1000 epochs, stop the process, and then want to resume the training (after the 1000 epochs). As your epoch count starts a 1 ( GigaSOM.jl/src/analysis/core.jl Line 130 in f4e712b
My solution so far is to create a radius function hacking the rStart and rFinal parameters to account for the extra rounds of epochs:
(I know... super ugly) About the 2000 epochs, between 500 and 1000 epochs I still see important movements (same seed), we are now trying 2000 epochs. And as this is very long (days) we (and the planet :P) would rather start from the already pre-trained som... The data is huge, millions of datapoints thousands of dimensions, and btw, we are very thankful for GigaSOM and to your team! About the check you propose, I tried with a random example:
which gives me: But I am not sure that the fact that datapoints are surrounding the map is due to a lack of training but rather to a too large radius. For instance if I do the same keeping the radius fixed at 1.0 I got: My interpretation is that after very few epochs with a small radius you will get a more or less homogeneous distribution of datapoints per codebooks. The way I wanted to check for training was to visualize (measure?) the amount of changes between epochs. If the datapoints and codebooks are still moving a lot, than I would do more training. thanks a lot for the answer, and for the software in general :) |
OK, adding the function callback to the TODO list, I'll hopefully get to it soon. Regarding the SOM training: there's no good way to detect if the SOM is "optimal" -- you can have metrics like MQE etc, but these are equally well optimized by normal k-means and by SOM with a small radius (it's useful to think a bit about the kMeans/SOM correspondence). With SOMs, the beginning of the training with large radius forces the SOM to become smooth and topologically correct (while ignoring complicated details, such as MQE). Dduring the smooth shift to smaller radius, the learned "global" shape hopefully stays as intact as possible while the optimization gradually optimizes more and more "raw" quantization error, as kMeans. The optimal training is, say, an equilibrium of those -- if you do too much of the first phase, you'll basically waste the computation and the output's gonna be chunky; if you don't do enough of it you will get nice output but global structure won't make much sense. If you omit the "middle", you will have a good "global structure" but medium-size details within clusters will be mixed up. (Cytometry example: it's gonna be able to separate CD4s from CD8s, but Thelpers will be e.g. split into 2 clusters because there was no "zoom level" where this would be optimized). For your visualizations:
|
Nice! I see, thanks a lot! |
Ah, 6000 dimensions... Suggest squashing it down a bit either by the oldschool PCA, or (usually better) random projections or (even better) random radial basis functions. What's the data? |
ok, yes, we were going for the random projection (did not knew about random radial basis functions thanks!). |
okay wow, that sounds cool. Let us know if you get some nice graphics, we're always interested in user stories. :] |
Hi, can you try with the code from #183 ? The new callback parameter for |
Leaving this issue open until checked |
Hi, https://github.com/LCSB-BioCore/GigaSOM.jl/blob/develop/src/analysis/core.jl#L161
|
I see, sorry, my testing pipeline has totally failed :D There's already a fix in #184. Regarding |
I saw your travis uses julia 1.5; the error raises only with julia 1.6 (don't know why in julia 1.5 this part of the code is not checked until called with a not |
I just rewrote it a bit to actually work with a copy of the SOM (that should be done from the beginning anyway). Pushing in 5 minutes. :] |
(the code with the tests (running now, hopefully working correctly) is here: https://github.com/exaexa/GigaSOM.jl/tree/develop ) |
I'm battling the test framework in parallel, hopefully I get it to some shape in a few minutes :] |
Thanks a lot! |
I broke it, feeling a moral obligation to fix it asap :D :D |
Anyway the branch seems to have no problems anymore (https://github.com/exaexa/GigaSOM.jl/runs/2245496108 passed); we'll try to merge to develop ASAP. We'll push a version with this. |
Ok, this is working nicely! and again thanks for the tool, this is game changing for us! :) |
OK, great to hear that! The change will (slowly) bubble to official package repo, should be eventually available in 0.6.5 if you need to depend on it reliably. Thank you for the idea (and the pictures :D )! |
Btw could you share the 3D plotting code? I've failed to put that together properly (with 3param scatter I only got empty plots), and have some PRIME material for making animations (https://bioinfo.uochb.cas.cz/embedsom/vignettes/bones.html) |
This is amazing!! here is the code:
Note that I am using the eachEpoch new parameter, and this has one down side in this case, I cannot use the |
Great, thanks a lot! Also nice to have it here as a reference |
very nice!!! |
Pinning & renaming this for better future reference. |
Is your feature request related to a problem? Please describe.
I trained my SOM for 2000 epochs, and would like to store intermediate results (each 500 epochs), something like:
I want to assess if I did enough training. Problem is that in with this strategy I can only use a linearRadius, or do a very ugly hack
inputing specific radius function.
Describe the solution you'd like
Perhaps one could input starting/ending epoch/iteration to the train function (here:
GigaSOM.jl/src/analysis/core.jl
Line 130 in f4e712b
Describe alternatives you've considered
Allow to "do something" (call a function) each X epochs in order to serialize the som object, or save the coordinates.
Additional context
none
The text was updated successfully, but these errors were encountered: