Analyse precision recall curve #59

KoenLoeffen · 2023-05-26T04:59:46Z

I have two questions:

The precision-recall curve is a trade off between the min similarity and the percentage matched. So in the ideal case you want both the precision as the recall as high as possible. However I found out in my results that the model with the highest precision and recall isn't always the best. Am I missing something?
How would I set the optimal threshold for the similarity? Is this also based on the precision recall curve?

MaartenGr · 2023-05-28T04:32:46Z

The precision-recall curve is a trade off between the min similarity and the percentage matched. So in the ideal case you want both the precision as the recall as high as possible. However I found out in my results that the model with the highest precision and recall isn't always the best. Am I missing something?

The precision-recall curve is an approximation as we do not have the ground-truth available. We ideally still want this to be as high as possible but it would still be an approximation.

How would I set the optimal threshold for the similarity? Is this also based on the precision recall curve?

Yes, that is the main purpose of the precision-recall curve as defined in PolyFuzz. It helps you understand what the threshold would be to get a certain amount of matches and the relative accuracy of the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyse precision recall curve #59

Analyse precision recall curve #59

KoenLoeffen commented May 26, 2023 •

edited

Loading

MaartenGr commented May 28, 2023 •

edited

Loading

Analyse precision recall curve #59

Analyse precision recall curve #59

Comments

KoenLoeffen commented May 26, 2023 • edited Loading

MaartenGr commented May 28, 2023 • edited Loading

KoenLoeffen commented May 26, 2023 •

edited

Loading

MaartenGr commented May 28, 2023 •

edited

Loading