Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independence of the step-size and stochastic gradient ? #8

Open
Algue-Rythme opened this issue Apr 27, 2021 · 2 comments
Open

Independence of the step-size and stochastic gradient ? #8

Algue-Rythme opened this issue Apr 27, 2021 · 2 comments

Comments

@Algue-Rythme
Copy link

Dear authors,
Thanks for this work

According to the paper, Appendix F.1 in page 25: "To enforce independence of the step-size and stochastic gradient, we perform a backtracking line-search at the current iterate w_k using a mini-batch of examples that is independent of the mini-batch on which∇f_ik(w_k) is evaluated."

I am not sure if I understand well:

  • do you perform all the computations of Armijo line search with a batch i (and its gradient) to find learning rate Eta, before using Eta to perform gradient step with gradient of batch j ?

Because I read the implementation of Sls and it seems that you are using the same batch (x,y) for both the update and the Line Search, contrary to what specifies the paper. I understand because you are using closure() function to perform Armijo, and the last iterate of Armijo is actually used as final step (still using the same closure() function).

Is there anywhere else in the code where you used the trick of forcing indepandance between Eta and Gradient ?

Thank you very much

@Algue-Rythme
Copy link
Author

Dear authors
@IssamLaradji
I am still interested in this topic and I would be glad if you could answer my concern.
Thank you

@Saurabh-29
Copy link

Hi Algue,

Yes, your understanding of the implementation is right. The SLS is using the same batch (x,y) for both the update and the Line Search which is in line with the description of SLS in the paper. In the main paper, it is mentioned that the SLS is used on the same batch (x,y) that the gradients were computed. The closure is a proxy for loss_function and gives the loss the current parameter.

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants