How to calculate R1 & R2 mentioned in the paper? #2

jaytimbadia · 2022-04-17T13:40:15Z

I have not even understood them properly, but I really want to.

As per me,
more R1 -> it tells u that we have more dense curvature, means we have more up and downs in the loss function, rough loss function with many hills and valleys, so sign sgd will get confused in the majority vote here, may be, right, so use sgd?

Also more R2 -> more noise in the feature set?

Also what does it mean by dense gradients, means more in magnitude or entire gradient vector have almost same numbers?

I do not understand, any such deep optimisation tricks to improve training do not have enough literature, as maths is too complcated to understand just by looking at the paper.

Please help if possible by explaining what are components in R1 & R2?

Jay

jxbz · 2022-05-25T23:03:11Z

Hi Jay,

Sorry for the late reply.

The idea in this paper was that phi measures the sparseness / denseness of a vector. When the vector is "dense" (meaning most of the components are similar in magnitude) then phi is close to one. On the other hand, when the vector is "sparse" (meaning a few components are much larger than the others) then phi is close to zero.

This means that phi(L) was supposed to measure whether the function is very curvy in just a few directions ( phi(L) ≈ zero ) or in lots of directions ( phi(L) ≈ one ). Similarly phi(sigma) measures whether the stochastic gradient is noisy in just a few directions or in lots of directions. And phi(g) measures the same property for the expected gradient.

In hindsight, it's not obvious that assumption 2 in that paper is a good model of curvature for deep neural networks. My more recent work has attempted to design better notions of curvature for neural nets. See for instance this paper: https://arxiv.org/abs/2002.03432.

In general, our understanding of optimisation theory of deep neural nets is still evolving. I hope we have better and simpler math to describe it soon.

Jeremy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to calculate R1 & R2 mentioned in the paper? #2

How to calculate R1 & R2 mentioned in the paper? #2

jaytimbadia commented Apr 17, 2022

jxbz commented May 25, 2022

How to calculate R1 & R2 mentioned in the paper? #2

How to calculate R1 & R2 mentioned in the paper? #2

Comments

jaytimbadia commented Apr 17, 2022

jxbz commented May 25, 2022