You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the paper, Appendix F.1 in page 25: "To enforce independence of the step-size and stochastic gradient, we perform a backtracking line-search at the current iterate w_k using a mini-batch of examples that is independent of the mini-batch on which∇f_ik(w_k) is evaluated."
I am not sure if I understand well:
do you perform all the computations of Armijo line search with a batch i (and its gradient) to find learning rate Eta, before using Eta to perform gradient step with gradient of batch j ?
Because I read the implementation of Sls and it seems that you are using the same batch (x,y) for both the update and the Line Search, contrary to what specifies the paper. I understand because you are using closure() function to perform Armijo, and the last iterate of Armijo is actually used as final step (still using the same closure() function).
Is there anywhere else in the code where you used the trick of forcing indepandance between Eta and Gradient ?
Thank you very much
The text was updated successfully, but these errors were encountered:
Yes, your understanding of the implementation is right. The SLS is using the same batch (x,y) for both the update and the Line Search which is in line with the description of SLS in the paper. In the main paper, it is mentioned that the SLS is used on the same batch (x,y) that the gradients were computed. The closure is a proxy for loss_function and gives the loss the current parameter.
Dear authors,
Thanks for this work
According to the paper, Appendix F.1 in page 25: "To enforce independence of the step-size and stochastic gradient, we perform a backtracking line-search at the current iterate w_k using a mini-batch of examples that is independent of the mini-batch on which∇f_ik(w_k) is evaluated."
I am not sure if I understand well:
Because I read the implementation of Sls and it seems that you are using the same batch (x,y) for both the update and the Line Search, contrary to what specifies the paper. I understand because you are using closure() function to perform Armijo, and the last iterate of Armijo is actually used as final step (still using the same closure() function).
Is there anywhere else in the code where you used the trick of forcing indepandance between Eta and Gradient ?
Thank you very much
The text was updated successfully, but these errors were encountered: