-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement strategy for assessing the quality of the model during lifelong training #87
Comments
A relevant theory paper I found: Understanding Continual Learning Settings with Data Distribution Drift Analysis -- Essentially describes the theory of data distribution shifts, proposes new concepts in analyzing model/data drifts and some of the existing concepts in lifelong learning that are related to this phenomenon. Relevant sections: Sections 3.1, 3.2, 4.1, and 6.2 |
one idea of validation is to compute the CSA variation across contrasts from the test set of the spine generic data |
with each new release adding more contrasts and datasets, we observed a disturbing upward trend in absolute csa error across all the deployed models, suggesting that the drift is way too high and the model is losing in "contrast-agnostic"-ness in some sense. The no. 1 suspicion is the dataset imbalance created by an unusually high number of T2w and T2star images in the training set of the later models (i.e. v2.4, v2.5 etc) Next step:
|
I created a balanced version of the aggregated dataset. Below are the details and the splits. Each contrast now has approximately 150 images (a hard-coded value after looking at how many total number of images each contrast has). Except for a few contrasts (e.g. STIR) all contrasts have at least 150 images. balanced dataset statistics
Next step:
|
Something interesting happened -- it seems that an imbalance in the no. of images per contrast in the aggregated dataset is not causing the CSA drift. Note the per-contrast CSA errors are high for T2star, DWI and MToff (GRE-T1w) contrasts, but in the previous comment, we see from the dataset statistics these contrasts have 150 images each in the training set. I am wondering if any of the new contrasts added to the training set (after the original 6 contrasts in contrast-agnostic v2.0) are negatively impacting the performance on the remaining contrasts. |
interesting investigation! Do you have the aboslute CSA error per contrast for each model version? |
nice! why in v2.5 DWI has a lower error than in the balanced? (3.39 mm2 vs 12.34) |
Figuring that the "dataset imbalancement" is not the only problem, I trained another model with only DCM pathology data and 1 new contrast (more like how model v2.3 was trained). It turns out that the per contrast error is still high for a few contrast despite not seeing several new contrasts. This makes me wonder if adding new contrasts is not the problem but maybe, in this case, too much compressed cords in the data is a problem? |
As we are adding more contrasts and re-training the model overtime (see eg: #83, #74, ivadomed/canproco#46), we need to put in place a quality check assessment of model performance shift across various data domains (ie: monitor catastrophic forgetting).
The text was updated successfully, but these errors were encountered: