why large influence is harmful #11

zwxu064 · 2020-05-08T14:01:16Z

does there have any evaluation for influence function values? I found the most helpful train image label in CIFAR10 is 2 while the test image label 4.

nimarb · 2020-05-10T12:09:50Z

There is no absolute guidance on what influence numbers are harmful / helpful to the model to make a single prediction. Rather, those are in relation to one another meaning you're able to rank them.

QingXuTHU · 2020-07-14T04:43:43Z

I have the same question. I'm not sure whether in implementation you calculate I_{up,loss} or -1/n I_{up,loss} in original paper. And I use it in MINIST and find out that the avg contribution is very negative if the chosen train set and test set have the same number. I'm not sure whether something wrong occurs in my experiments. But hope you can check this part..

chengrunyang · 2020-12-28T16:41:49Z

I have the same doubts as @QingXuTHU . It seems from

pytorch_influence_functions/pytorch_influence_functions/calc_influence_function.py

Line 334 in 66c9a9e

tmp_influence = -sum(

that the implementation is calculating 1/n * I_{up, loss} (z, z_test), and then take the largest as helpful and smallest as harmful. However, I think it should be the opposite: the points with negative I_{up, loss} (z, z_test) should be helpful, and those positive I_{up, loss} (z, z_test) should be harmful (refer to the title of Page 3, Figure 1 of the Koh and Liang paper). I wonder if someone could help check it, thanks!

lange-martin · 2022-03-24T09:36:21Z

the points with negative I_{up, loss} (z, z_test) should be helpful, and those positive I_{up, loss} (z, z_test) should be harmful (refer to the title of Page 3, Figure 1 of the Koh and Liang paper)

I agree @chengrunyang. On page 6 of the paper, the authors also clarify that the most helpful images are those with the "most positive -I_{up, loss}" (or equivalently the most negative I_{up, loss}). Since np.argsort sorts the indices in ascending order, the harmful and helpful arrays should be switched in this code snippet:

pytorch_influence_functions/pytorch_influence_functions/calc_influence_function.py

Lines 346 to 347 in 66c9a9e

    
           harmful = np.argsort(influences) 
        
           helpful = harmful[::-1]

I stumbled upon this issue while testing how often a training sample is its own most influential sample. Most of the time, the training sample came out as its own most harmful sample, which is counter-intuitive. This bug in the code explains that behavior.

lange-martin linked a pull request Mar 24, 2022 that will close this issue

Switch harmful and helpful lists #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why large influence is harmful #11

why large influence is harmful #11

zwxu064 commented May 8, 2020 •

edited

Loading

nimarb commented May 10, 2020

QingXuTHU commented Jul 14, 2020 •

edited

Loading

chengrunyang commented Dec 28, 2020 •

edited

Loading

lange-martin commented Mar 24, 2022

why large influence is harmful #11

why large influence is harmful #11

Comments

zwxu064 commented May 8, 2020 • edited Loading

nimarb commented May 10, 2020

QingXuTHU commented Jul 14, 2020 • edited Loading

chengrunyang commented Dec 28, 2020 • edited Loading

lange-martin commented Mar 24, 2022

zwxu064 commented May 8, 2020 •

edited

Loading

QingXuTHU commented Jul 14, 2020 •

edited

Loading

chengrunyang commented Dec 28, 2020 •

edited

Loading