This repository has been archived by the owner on Sep 9, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 107
/
Copy pathREADME.Rmd
227 lines (155 loc) · 7.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
rplos
=====
```{r echo=FALSE}
knitr::opts_chunk$set(
fig.path = "man/figures/",
warning = FALSE,
message = FALSE,
collapse = TRUE,
comment = "#>",
fig.cap = ""
)
```
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![cran checks](https://cranchecks.info/badges/worst/rplos)](https://cranchecks.info/pkgs/rplos)
[![R-check](https://github.com/ropensci/rplos/workflows/R-check/badge.svg)](https://github.com/ropensci/rplos/actions?query=workflow%3AR-check)
[![codecov.io](https://codecov.io/github/ropensci/rplos/coverage.svg?branch=master)](https://codecov.io/github/ropensci/rplos?branch=master)
[![rstudio mirror downloads](https://cranlogs.r-pkg.org/badges/rplos)](https://github.com/r-hub/cranlogs.app)
[![cran version](https://www.r-pkg.org/badges/version/rplos)](https://cran.r-project.org/package=rplos)
## Install
You can get this package at CRAN [here](https://cran.r-project.org/package=rplos), or install it within R by doing
```{r eval=FALSE}
install.packages("rplos")
```
Or install the development version from GitHub
```{r eval=FALSE}
remotes::install_github("ropensci/rplos")
```
```{r}
library("rplos")
```
## What is this?
`rplos` is a package for accessing full text articles from the Public Library of Science journals using their API.
## Information
You used to need a key to use `rplos` - you no longer do as of 2015-01-13 (or `v0.4.5.999`).
rplos vignetttes: <https://docs.ropensci.org/rplos/>
PLOS API documentation: <http://api.plos.org/>
PLOS Solr schema is at <https://gist.github.com/openAccess/9e76aa7fa6135be419968b1372c86957> but is 1.5 years old so may not be up to date.
Crossref API documentation can be found at <https://github.com/CrossRef/rest-api-doc>. See also [rcrossref](https://github.com/ropensci/rcrossref) ([on CRAN](https://cran.r-project.org/package=rcrossref)) with a much fuller implementation of R functions for all Crossref endpoints.
## Throttling
Beware, PLOS recently has started throttling requests. That is,
they will give error messages like "(503) Service Unavailable -
The server cannot process the request due to a high load", which
means you've done too many requests in a certain time period. Here's
[what they say](http://api.plos.org/solr/faq/#solr_api_recommended_usage) on the matter:
> Please limit your API requests to 7200 requests a day, 300 per hour, 10 per minute and allow 5 seconds for your search to return results. If you exceed this threshold, we will lock out your IP address. If you're a high-volume user of the PLOS Search API and need more API requests a day, please contact us at [email protected] to discuss your options. We currently limit API users to no more than five concurrent connections from a single IP address.
## Quick start
### Search
Search for the term ecology, and return id (DOI) and publication date, limiting to 5 items
```{r}
searchplos('ecology', 'id,publication_date', limit = 5)
```
Get DOIs for full article in PLoS One
```{r}
searchplos(q="*:*", fl='id', fq=list('journal_key:PLoSONE',
'doc_type:full'), limit=5)
```
Query to get some PLOS article-level metrics, notice difference between two outputs
```{r}
out <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'), fq='doc_type:full')
out_sorted <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'),
fq='doc_type:full', sort='counter_total_all desc')
head(out$data)
head(out_sorted$data)
```
A list of articles about social networks that are popular on a social network
```{r}
searchplos(q="*:*",fl=c('id','alm_twitterCount'),
fq=list('doc_type:full','subject:"Social networks"','alm_twitterCount:[100 TO 10000]'),
sort='counter_total_month desc')
```
Show all articles that have these two words less then about 15 words apart
```{r}
searchplos(q='everything:"sports alcohol"~15', fl='title', fq='doc_type:full', limit=3)
```
Narrow results to 7 words apart, changing the ~15 to ~7
```{r}
searchplos(q='everything:"sports alcohol"~7', fl='title', fq='doc_type:full', limit=3)
```
Remove DOIs for annotations (i.e., corrections) and Viewpoints articles
```{r}
searchplos(q='*:*', fl=c('id','article_type'),
fq=list('-article_type:correction','-article_type:viewpoints'), limit=5)
```
### Faceted search
Facet on multiple fields
```{r}
facetplos(q='alcohol', facet.field=c('journal','subject'), facet.limit=5)
```
Range faceting
```{r}
facetplos(q='*:*', url=url, facet.range='counter_total_all',
facet.range.start=5, facet.range.end=100, facet.range.gap=10)
```
### Highlight searches
Search for and highlight the term _alcohol_ in the abstract field only
```{r}
(out <- highplos(q='alcohol', hl.fl = 'abstract', rows=3))
```
And you can browse the results in your default browser
```{r eval=FALSE}
highbrow(out)
```
![highbrow](man/figures/highbrow.png)
### Full text urls
Simple function to get full text urls for a DOI
```{r}
full_text_urls(doi='10.1371/journal.pone.0086169')
```
### Full text xml given a DOI
```{r}
(out <- plos_fulltext(doi='10.1371/journal.pone.0086169'))
```
Then parse the XML any way you like, here getting the abstract
```{r}
library("XML")
xpathSApply(xmlParse(out$`10.1371/journal.pone.0086169`), "//abstract", xmlValue)
```
### Search within a field
There are a series of convience functions for searching within sections of articles.
* `plosauthor()`
* `plosabstract()`
* `plosfigtabcaps()`
* `plostitle()`
* `plossubject()`
For example:
```{r}
plossubject(q='marine ecology', fl = c('id','journal'), limit = 10)
```
However, you can always just do this in `searchplos()` like `searchplos(q = "subject:science")`. See also the `fq` parameter. The above convenience functions are simply wrappers around `searchplos`, so take all the same parameters.
### Search by article views
Search with term _marine ecology_, by field _subject_, and limit to 5 results
```{r}
plosviews(search='marine ecology', byfield='subject', limit=5)
```
### Visualize
Visualize word use across articles
```{r fig.cap="wordusage"}
plosword(list('monkey','Helianthus','sunflower','protein','whale'), vis = 'TRUE')
```
### progress bars
```{r eval = FALSE}
res <- searchplos(q='*:*', limit = 2000, progress = httr::progress())
#> |=====================================| 100%
#> |=====================================| 100%
#> |=====================================| 100%
#> |=====================================| 100%
```
## Meta
* Please [report any issues or bugs](https://github.com/ropensci/rplos/issues).
* License: MIT
* Get citation information for `rplos` in R doing `citation(package = 'rplos')`
* Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.
---
This package is part of a richer suite called [fulltext](https://github.com/ropensci/fulltext), along with several other packages, that provides the ability to search for and retrieve full text of open access scholarly articles. We recommend using `fulltext` as the primary R interface to `rplos` unless your needs are limited to this single source.
---