Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimated "true" repertoire as a predictor #1

Open
tbendixen opened this issue Jul 23, 2023 · 5 comments
Open

Estimated "true" repertoire as a predictor #1

tbendixen opened this issue Jul 23, 2023 · 5 comments

Comments

@tbendixen
Copy link

tbendixen commented Jul 23, 2023

Thanks for yet another instructive case study, Richard.

I was thinking whether there's a way to extend the example such that the estimated "true" repertoire size is used as a predictor in another model.

For instance, we might imagine a dataset composed of species (instead of individuals). Say we're interested in repertoire size and brain size at the level of species. However, the observed behavioral repertoire of any given species is always an imperfect measure; ideally, we'd want to estimate the "true" repertoire size (perhaps as a function of other variables, e.g. observed repertoire size, research effort, phylogeny, habitat, etc.) and plug that estimate (and uncertainty) in our outcome model predicting brain size.

It seems conceptually linked to the measurement error and missing data models that are so neatly explained in Statistical Rethinking, although repertoire size is an integer and therefore would require a different computational approach from Gaussian variables.

Best wishes

@rmcelreath
Copy link
Owner

Yeah it could be done. The trick would be to marginalize over the unknown repertoire size in the likelihood of the second model. This is like how populations size models work.

One thing I worry about in generalizing to many species is the open-endedness of repertoire size. Would need to think carefully about a good prior family there, something like Pitman–Yor process.

@simeonqs
Copy link

I have been working on something similar (though much simpler at the moment). There is a large literature on innovation frequency, which is the number of innovations observed in a taxon/species. The generative process is slightly different, but the modelling is probably very similar. I think there is a structural equation model solution, where the 'true' repertoire size in the first model is the predictor or response in the second model. The focus in my simulation so far is more on controlling for research effort though: https://github.com/simeonqs/research_effort. If anybody is interested, I really need to start documenting it. But I would also be very happy if Richard made my paper completely unnecessary!

@tbendixen
Copy link
Author

@rmcelreath Thanks! I see, good point on the open-endedness. I wonder whether one work-around is to assume a maximum number of possible behaviors in the repertoire of a taxon (say, a literature review returned n identified hunting/foraging behaviors) and treat that as the number of "trials" in a binomial model (e.g., a species might have k out of n identified behaviors). Incidentally, that'd link to my question here on incorporating phylogeny in a binomial model: rmcelreath/stat_rethinking_2023#7

@simeonqs Thanks for the link to your repo, I'll check it out!

@rmcelreath
Copy link
Owner

Issue that concerns me with prior on behavior repertoire is that typically the distribution is highly imbalanced---lots of rare behavior types. So those are hard to count. And then once you start worrying about sampling effort, the upper bound on the maximum repertoire needs some prior that knows about the imbalanced distribution. Does that make sense?

Very similar problem in estimating number of species in a community. So e.g.: https://doi.org/10.1034/j.1600-0706.2000.890320.x

@tbendixen
Copy link
Author

Yeah, it does make sense -- even if it introduces new fresh problems! For instance, something like Simpson's diversity index seems apt, but ideally we'd probably want to adjust for something like habitat, body size, etc. too, since some species are just easier to observe than others.

Anyway, thanks so much for taking the time, Richard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants