Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward probabilities (\beta) not necessary #1

Open
rakeshvar opened this issue Sep 5, 2020 · 5 comments
Open

Backward probabilities (\beta) not necessary #1

rakeshvar opened this issue Sep 5, 2020 · 5 comments

Comments

@rakeshvar
Copy link

rakeshvar commented Sep 5, 2020

Namaste @maetshju ,
Great work, Have been waiting for CTC in Julia for a long time. I have written ctc in python a very long time ago, before there were big name packages from Baidu, etc.
After studying it, I realized you do not need to calculate backward probabilities and take a mean.
You can see my implementation here
You just need to take the corner most value in the forward probabilities and that is all.
You could try that and see if you are getting same/similar results.

@maetshju
Copy link
Owner

maetshju commented Sep 5, 2020

I think that's an interesting idea. I had noticed that the loss values ended up being equivalent (or nearly so) at each time step, so only one time step's loss value would be needed. With the current implementation as it is, though, I'm not sure I can completely skip the beta probabilities because I need to calculate the gradients manually since current releases of the autodiff Zygote don't support array mutation.

I would certainly be interested in looking at skipping the beta calculations though, since it'd result in fewer instances of index math that I have found tricky to get correct. Perhaps after I get this merged into Flux: FluxML/Flux.jl#1287

@rakeshvar
Copy link
Author

I was writing my own ctc, and ran into ERROR: LoadError: Mutating arrays is not supported
😄

May be I will try avoiding array mutation by allocating \alpha as array of arrays...
if that fails...
I will try to write the gradient myself... based on your gradient!

@maetshju
Copy link
Owner

maetshju commented Sep 8, 2020

Something that may work is to write a separate function, say, calcAlpha, that would return the indices of the probability values that are multiplied or added to get to that bottom-right corner of the alpha values. You would need to either set up a custom adjoint with @adjoint to return nothing as the gradient for calcAlpha, or perhaps use the @nograd macro. Then, you would call calcAlpha from within your ctc function to get the indices, and then call sum or prod on your probability values indexed by the result of calcAlpha and then your standard -1 * log as necessary, which should allow Zygote to calculate the gradients for the loss value.

@rakeshvar
Copy link
Author

Hmmm...
I could not find a way around 'Array Mutation'.
I was reading Graves' book, and when writing our own code for gradients, betas come in handy, so it makes sense to calculate them. However you do not need another loop, you can use the same code as forward pass, but with inputs flipped. But that won't save much code either.

I will just do what you are doing. I used to think manual gradients are for 20th century losers, but I see it can come in really handy. 😄

Let me know if you want me to look into anything specific in your code.

I had a cute toy example of CTC. You can check it out in the pictures of my python repository.

@rakeshvar
Copy link
Author

I implemented a few versions of CTC in julia.
https://github.com/rakeshvar/Explore-CTC-Loss.jl

Just did it for fun...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants