Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAN task view proposal: Paleontology #57

Open
willgearty opened this issue Sep 19, 2023 · 12 comments
Open

CRAN task view proposal: Paleontology #57

willgearty opened this issue Sep 19, 2023 · 12 comments

Comments

@willgearty
Copy link
Contributor

willgearty commented Sep 19, 2023

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together a) a collection of traditional packages that are often seen in use in standard computational paleontological workflows, b) more recent paleontological or paleo-adjacent packages that are commonly in use in paleontology, and c) cutting edge paleo-explicit packages that we believe should be adopted by the paleontological community. Therefore, the purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, we have excluded older packages that have been superseded by more robust and/or featureful newer packages (e.g., there are a ~million packages related to ENM, but we have only included a handful). We also recognize that there are many other packages out there that are relevant to or explicitly for paleontology (we originally built a list of ~140 packages that we whittled down to the list below). We excluded most of these packages because we, as a group, had little experience with them or because the packages seemed unfinished or too niche to be useful. However, we'd love to hear from anyone that might have suggestions about other packages to include/exclude. Finally, where applicable, we plan to direct users to other CTVs that overlap in scope (see below).

Packages

Data acquisition

mapast, neotuma2, paleobioDB, rgbif, rgplates, ridigbio, chronosphere

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, ggtern, ggtree, SDAR, StratigrapheR, tidypaleo, geoChronR, rphylopic

Paleoecology

ade4, dismo, ecospace, ENMeval, ENMTools, fossil, fundiversity, vegan

Paleobiogeography and biodiversity

BAT, Compadre, divDyn, divvy, iNext, sepkoski

Phylogenetics

caper, diversitree, fbdR, FossilSim, geiger, mvMORPH, paleobuddy, paleotree, phytools, strap

Morphology

geomorph, Claddis, dispRity, morphospace

Time series

paleoTS, evoTS, layeranalyzer

Overlap

There is considerable overlap of the scope of this proposed CTV with the scope of other CTVs, including Environmetrics, Phylogenetics, TimeSeries, and Spatial. This stems from the fact that this proposed CTV is subject-oriented, rather than methodology-oriented. This doesn't appear to be an exception, though, given there are already CTVs on other subjects (e.g., ChemPhys). Further, this CTV is focused on which packages in these other CTVs may be used specifically within computational paleontological workflows.

Maintainers

Principal maintainer: @willgearty (also the principal maintainer of the Phylogenetics CTV)
Co-maintainers: @AlfioAlessandroChiarenza, @bethany-j-allen, @ChristopherDavidDean, @KEichenseer, @LewisAJones, and @pedrolgodoy
(this is a @palaeoverse project)

@zeileis
Copy link
Contributor

zeileis commented Oct 2, 2023

Thanks for the proposal, Will @willgearty, and apologies for the slow response! I've finally had a closer look.

I like the proposal but I'm not fully convinced, yet, that the task view will be sufficiently separated from the existing task views. Relatedly, your process of package selection appears to be somewhat subjective - which we try to avoid in task views by adopting clear inclusion/exclusion criteria. Especially, excluding packages that you feel are too old or that you have no experience with, is too subjective.

Hence, I would ask you to establish sufficiently clear rules for inclusion/exclusion of a package, e.g., that it must be explicitly geared towards paleontology or something like that. And rules that would necessitate some individual review process (e.g., to determine whether a package is "useful" or "finished") should be avoided.

Regarding the maintainers: It's great to see an active community proposing a task view. Seven maintainers might still be feasible but maybe a smaller team would be easier to coordinate? Others could still contribute through issues and PRs. Also, I'm not sure whether the palaeoverse community is already so diverse and heterogeneous so that different palaeological views are reflected in it. Or would it help to bring in maybe one person from the outside as well?

I'm also pinging the principal maintainers of the Spatial, SpatioTemporal, and Environmetrics task views here: @rsbivand, @edzer, @gavinsimpson. Maybe you have some thoughts/ideas as well?

@willgearty
Copy link
Contributor Author

willgearty commented Oct 3, 2023

Thanks @zeileis for the helpful comments.

We are certainly open to defining clearer rules for package inclusion/exclusion. I think if we are as exclusive as "explicitly geared towards paleontology", we'll be leaving lots of commonly used packages out (but you are right in that it would then be a very clear rule). However, most, if not all of these excluded packages are already in other task views, so they would at least already be covered there.

We'll give a little time for other folks to provide their thoughts/ideas as well, then we'll look into revising accordingly.

@tuxette
Copy link
Contributor

tuxette commented Oct 4, 2023

Hi all! I am also unsure but, as I see it, the overlap with Phylogenetics is also non negligeable (but you know the TV better than I do). In short, what is not clear for me is: "do you have in mind at least some core packages that are very specific to Paleontology and not just to other related topics but useful for Paleontology in you list?" My question is probably quite naive (maybe these are clearly listed in your proposal but I am not able to identify them). These are the packages that, somehow, should be put forward in your TV, mentioning packages that have a larger broad but can be useful for the field afterward. But again, my comment might be completely wrong.

@willgearty
Copy link
Contributor Author

willgearty commented Jun 24, 2024

My deepest apologies (to my co-maintainers and the CTV editors) for the horrible delay in responding to the feedback here. Despite some reservations, we've decided to go for a more conservative approach, as suggested by @zeileis, that includes only packages that are either explicitly designed for paleontology or are explicitly advertised to paleontologists (it appears this is similar to the approach of the Agriculture CTV, for example).

There are many other packages that paleontologists use as part of their workflows, and so, as part of the development of this CTV, we plan to suggest many of these packages to other CTVs where we believe they will be appropriate. We then plan to link out to these CTVs to ensure that users of the Paleontology CTV can find all of the resources that they may need for their highly interdisciplinary work (see below).

@tuxette there isn't a lot of interpackage dependencies in paleontology, so I wouldn't say any packages really stand out as "core" packages. However, if I had to pick a handful of packages based solely on their breadth of use, I would probably say palaeoverse, paleotree, and paleobioDB, but I'm probably biased. I'd be happy to look into download numbers in the future to identify which packages are most widely used before finalizing the list of "core" packages.

Here is an updated proposal for the Paleontology CTV:

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together the vast majority of paleontological or paleo-adjacent packages that are in use in paleontology. The purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, to keep the list manageable, we also do not include packages that are often used in paleontological workflows but are not explicitly designed for or advertised to paleontologists. Where applicable, we plan to direct users to other CTVs that include many of these packages (and also plan to submit recommendations to these CTVs as necessary).

Packages

Data acquisition

chronosphere, folio, neotoma2, paleobioDB, rgbif, rgplates, ridigbio, rmacrostrat, rpaleoclim

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, GEOmap, rphylopic, SDAR, StratigrapheR, tidypaleo

Paleoecology

analogue, ecospace, fossil, rioja (and Environmetrics CTV)

Paleobiogeography and biodiversity

Compadre, divDyn, divvy, hespdiv, ppgm, sepkoski (and Spatial CTV)

Phylogenetics

CladeDate, fbdR, FossilSim/FossilSimShiny, paleobuddy, paleotree, RRphylo, strap (and Phylogenetics CTV)

Morphology

morphospace (and Phylogenetics CTV)

Time series

adePEM, astrocron, evoTS, paleoTS, RRatepol (and TimeSeries CTV)

Paleoclimate and Earth System variables

Bchron, cRacle, DAIME, geoChronR, isogeochem, pastclim, sedproxy

Overlap

Only 10 of the proposed packages are included in other CTVs (rgbif, analogue, rioja, FossilSim, paleobuddy, paleotree, strap, paleoTS, deeptime, and GEOmap).

@willgearty
Copy link
Contributor Author

@zeileis @tuxette Bumping this since the summer is wrapping up. Please let me know what you think of the new proposal!

@tuxette
Copy link
Contributor

tuxette commented Sep 5, 2024

@willgearty : Sorry, I completely missed your update of June. I took a look at it today and I think that I understand where this goes. For me, this is convincing but @zeileis has a better global view of CTV and possible overlaps so he might have a different opinion. Also, @rsbivand could have interesting additional insights to provide here maybe?
A minor remark is that the titles sometimes give the impression that the corresponding section is slightly out of scope. For instance, the generic title "Time series" is very broad, and until we look at the package list, it is not clear that it doesn't overlap with the TimeSeries task view (also, shouldn't deeptime be included in this section?). I’m not sure exactly how to improve it, but I suspect the time series have a particular focus that could perhaps be reflected more precisely in the title.

@willgearty
Copy link
Contributor Author

Thanks @tuxette. That section should probably be titled "Time series analysis" to better reflect that those packages are for analyzing time series, not just visualizing them (this is also why deeptime is not included). I can definitely go back through the headings once the package list is finalized to make sure they are succinct and descriptive.

@zeileis
Copy link
Contributor

zeileis commented Sep 22, 2024

Will @willgearty, apologies for the late feedback. I agree with Nathalie @tuxette that this goes in the right direction and that the task view is also well-separated from the existing task view topics. I still think that the explanation of the scope needs to be phrased better - but from the current list of packages it's sufficiently clear to me what you want to do. So you can still improve the scope in the next revision.

In short, I endorse this proposal and suggest we let Will and his co-maintainers work out the details. Roger @rsbivand, Dirk @eddelbuettel, Julia @jpiaskowski, and Nathalie @tuxette, if you agree, you can comment below or just react with a thumbs-up.

@jpiaskowski
Copy link

This looks great (I endorse). You can also list other relevant task views (e.g Time Series) and how they specifically support paleontological applications, but that is your choice.

@zeileis
Copy link
Contributor

zeileis commented Sep 23, 2024

Thanks for the positive feedback, Julia and Dirk. Together with my endorsement you have the necessary three votes (plus Nathalie was also already very positive). So you can move on and elaborate the entire task view.

Do you want to do that first in your own repository and then transfer it later to the cran-task-views organization? Or should I already open cran-task-views/Paleontology/ for you? Both is fine with me.

@willgearty
Copy link
Contributor Author

willgearty commented Sep 23, 2024

Fantastic news, thank you all for the feedback and support!

I have a draft in progress here: https://github.com/palaeoverse/PaleontologyTaskView. I'm happy to keep using that and then transfer it later.

@willgearty
Copy link
Contributor Author

Our task view draft is now ready for review: https://github.com/palaeoverse/PaleontologyTaskView/blob/main/Paleontology.md. I'd also appreciate feedback from @benmarwick to make sure our two task views remain unique and complementary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants