Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite mean and cov of distribution with parameter values if given? #82

Closed
marvinpfoertner opened this issue Jul 10, 2020 · 9 comments · Fixed by #164
Closed

Overwrite mean and cov of distribution with parameter values if given? #82

marvinpfoertner opened this issue Jul 10, 2020 · 9 comments · Fixed by #164
Assignees
Labels
improvement Improvements of existing functionality randvars Issues related to random variables refactoring Refactoring of existing functionality

Comments

@marvinpfoertner
Copy link
Collaborator

marvinpfoertner commented Jul 10, 2020

I feel like the self._mean should be overriden with

self._mean = lambda: parameters["mean"] if "mean" in parameters else mean.

def __init__(self, parameters=None, pdf=None, logpdf=None, cdf=None, logcdf=None, sample=None,
mean=None, cov=None, shape=None, dtype=None, random_state=None):
if parameters is None:
parameters = {} # sentinel value to avoid anti-pattern
self._parameters = parameters
self._pdf = pdf
self._logpdf = logpdf
self._cdf = cdf
self._logcdf = logcdf
self._sample = sample
self._mean = mean

@marvinpfoertner
Copy link
Collaborator Author

@JonathanWenger What's you take on this?

@nathanaelbosch
Copy link
Collaborator

Maybe a more general question: What is the parameters value used for?

Depending on that I would argue that we do not actually want two different ways to specify the mean and covariance.

@marvinpfoertner
Copy link
Collaborator Author

I agree, the parameters dict also makes it notoriously hard to implement slicing and masking for random variables.
Wouldn't it be better to put everything that is in parameters right now into members of the Distribution?
That way we avoid having unstructured members.

@JonathanWenger
Copy link
Contributor

I feel like the self._mean should be overriden with

self._mean = lambda: parameters["mean"] if "mean" in parameters else mean.

def __init__(self, parameters=None, pdf=None, logpdf=None, cdf=None, logcdf=None, sample=None,
mean=None, cov=None, shape=None, dtype=None, random_state=None):
if parameters is None:
parameters = {} # sentinel value to avoid anti-pattern
self._parameters = parameters
self._pdf = pdf
self._logpdf = logpdf
self._cdf = cdf
self._logcdf = logcdf
self._sample = sample
self._mean = mean

Yes. This should probably also throw a ValueError if mean is nonetheless specified.

Maybe a more general question: What is the parameters value used for?

Depending on that I would argue that we do not actually want two different ways to specify the mean and covariance.

parameters contains the parameters of the distribution, (e.g. mean and covariance for the normal, shape and scale for the gamma, ...). So this is not necessarily the mean of the distribution. A subclass of distribution may not even implement mean() or cov() if they do not exist.

The reason why there is a way to specify mean and parameters was to allow arbitrary distributions to be defined without subclassing, e.g.

mygammadist = Distribution(parameters={"shape":.1, "scale": .1}, mean=0.1)

The trouble with this is that if parameters is mutable, then inconsistencies arise. However we need to be able to construct arbitrary Distribution objects, e.g. for random variables where we only have a sampling method available.

@marvinpfoertner
Copy link
Collaborator Author

Ah I see. But would you still agree that, for the case of the mean parameter, where we have a "more structured" alternative (the mean method), we should not use the parameters dict?

IMHO the same would hold for the "support" parameter in the Dirac distribution. There we subclass anyway and it would feel just way cleaner if we just created a member "self._support" instead of storing the argument in the parameters dict.

@JonathanWenger
Copy link
Contributor

I don't mind having members for the distribution parameters, but if you then don't (also) store them in the dict, that is pretty inconsistent with the super class definition,i.e. if a user reads the docs of Distribution and tries to inspect the parameters of a Dirac, those would be empty. I'm not sure what the cleanest solution is here.

@JonathanWenger
Copy link
Contributor

JonathanWenger commented Jul 10, 2020

One approach would be to remove the parameters dict and only allow custom parametrized Distributions via subclassing. Any user defined distributions then do not have parameters. Since eventually we will implement most common distributions anyway, this is probably not an issue.

However it would still be nice to offer a function of the form:

def get_parameters(self):
   return self._param1, self._param2

@JonathanWenger JonathanWenger added improvement Improvements of existing functionality refactoring Refactoring of existing functionality labels Jul 10, 2020
@marvinpfoertner
Copy link
Collaborator Author

I like that proposal. However, I would not get rid of the parameters dict in the end.
It can still be useful as an unstructured datastore for the user-defined distributions.
I just wouldn't use it in the internal implementations of the (sub)class.

@JonathanWenger
Copy link
Contributor

JonathanWenger commented Jul 10, 2020

Okay so one can construct a Distribution by passing parameters but parameters is not a public member. Instead it is exposed in the following way in Distribution:

def get_parameters(self):
   return self._parameters

Distribution sub-classes can then create the dict in the get_parameters function from the members representing the parameters.

@nathanaelbosch nathanaelbosch added the randvars Issues related to random variables label Jul 28, 2020
@marvinpfoertner marvinpfoertner self-assigned this Aug 17, 2020
@marvinpfoertner marvinpfoertner linked a pull request Aug 30, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements of existing functionality randvars Issues related to random variables refactoring Refactoring of existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants