-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mldb 1209 statstable fncts noise injection #884
base: master
Are you sure you want to change the base?
Conversation
…d laplacian noise to the counts to allow the usage of stats tables training examples to train a classifier
Conflicts: plugins/stats_table_procedure.cc testing/testing.mk
Conflicts: jml/math/xdiv.h plugins/stats_table_procedure.cc testing/MLDB-873_stats_table_test.py testing/testing.mk
…9_statstable_fncts_noise_injection
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
…l issue where the rounding was always done to the lower value
plugins/stats_table_procedure.cc
Outdated
@@ -372,11 +376,19 @@ run(const ProcedureRunConfig & run, | |||
/* STATS TABLE FUNCTION */ | |||
/*****************************************************************************/ | |||
|
|||
static const std::string INJECT_NOISE_DOC_STR = | |||
"Inject laplacian noise to counts. This is useful when training " | |||
"a classifier on the examples that were used to genereate the counts. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genereate -> generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say that this is acceptable only if the count values are relatively large compare to the noise? How large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add more on label data leakage in the documentation of stats table functions. I feel this comment is hard to understand without an example.
|
||
mldb = mldb_wrapper.wrap(mldb) # noqa | ||
|
||
class MLDB1209StatstableBiasNoiseTest(MldbUnitTest): # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not realized that this is also lacking some validation point. I will add these too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done that.
{ | ||
} | ||
|
||
bool injectNoise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to transition to C++ 11 member initialization (i.e. bool injectNoise = false)
@@ -245,6 +249,7 @@ struct StatsTablePosNegFunctionConfig { | |||
|
|||
std::string outcomeToUse; | |||
|
|||
bool injectNoise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
|
||
struct NoiseInjector { | ||
|
||
NoiseInjector() : mu(0), b(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize defaults in the declaration and kill the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after very minor changes to constructor / member initialization.
This is Francois' branch with a change by vbisserie to make it compile and a unit test for the noise generation functions. A small issue was found in the NoiseInjector::add_noise where the rounding was always done to the lower value.