Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added linear regression to compute execution time #46

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nbacquey
Copy link
Member

@nbacquey nbacquey commented Jan 9, 2019

Uses olsRegress from Statistics.Regression, and adds additional information to the reports and their display. Results on the micro benchmark are strange : some batches have almost perfect confidence, some others are disastrous.

Should be a step in fixing issue #40

Fixes #40, Added information to report display
@nbacquey nbacquey requested a review from aspiwack January 9, 2019 16:28
Copy link
Member

@aspiwack aspiwack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about the maths. And I think we want to replace the previous measurement by the linear regression, rather than adding the linear regression to the average. If the average is inaccurate anyway, what point is there to keep it.

CC @mboes @qnikst

@@ -47,7 +50,10 @@ analyze ident samp = Report
, _reportBenchParams =
map (\(Parameter x) -> fromEnum x) $ benchmarkParameters ident
, _reportTimeInNanos =
totalDuration / trueNumIterations
totalDuration / trueNumIterations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware! Trailing whitespace here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate, we don't need that time anymore do we? We want to replace it with the slope of the linear regression. Am I correct? (cc @qnikst )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just keep totalDuration here. totalDuration on it's own is meaningful statistics. Sometime we may be interested in the total time of the test and do not care about the time of each measurement, for example that may happen if we want to test RTS options or different RTS optimizations.
So I'd keep the statistics and the name, but it should be just totalDuration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure of what you precisely mean here: should the value of _reportTimeInNanos be replaced by totalDuration, or by slope ? In the latter case, should a new record _totalDuration be created ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_reportTimeInNanos = _totalDuration

@@ -25,6 +25,9 @@ data Report = Report
{ _reportBenchName :: !Text
, _reportBenchParams :: [Int]
, _reportTimeInNanos :: !Double
, _linearRegressionTime :: !Double
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just replace _reportTimeInNanos, that way all current tools benefit from the improvement.

src/Hyperion/Analysis.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@qnikst qnikst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me thank you for making this PR happen, I really happy to see this!
Can you address comments by @aspiwack.
If it's not had for you can you add a bit more haddocks for the linear regression and confidence fields (what is the range and what does value there mean), that would definitely help the next reader and the user of the library (though this part is optional).

@@ -47,7 +50,10 @@ analyze ident samp = Report
, _reportBenchParams =
map (\(Parameter x) -> fromEnum x) $ benchmarkParameters ident
, _reportTimeInNanos =
totalDuration / trueNumIterations
totalDuration / trueNumIterations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just keep totalDuration here. totalDuration on it's own is meaningful statistics. Sometime we may be interested in the total time of the test and do not care about the time of each measurement, for example that may happen if we want to test RTS options or different RTS optimizations.
So I'd keep the statistics and the name, but it should be just totalDuration

@qnikst
Copy link
Contributor

qnikst commented Jan 9, 2019

@nbacquey actually it would be nice if you address the comments, but if you want I can take that over can care for the comment and the merge, just tell me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants