Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qmd replacement script with httr2 #7

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions Tutorial_DataSubmission_R.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
title: "API_CreateSubmit_Tutorial"
output: pdf_document
date: "2024-08-03"
---

```{r}
#| label: "setup"
#| include: false
knitr::opts_chunk$set(echo = TRUE)
```

# Tutorial: Data Submission to ESS-DIVE SANDBOX Using API

The ESS-DIVE Dataset API is a service that enables projects to programmatically
submit and manage datasets with ESS-DIVE. This is an alternative to using the
ESS-DIVE Online form for data uploads. This service encodes metadata using the
JSON-LD specification. JSON-LD is a schema to encode linked Data using JSON,
and in the future will be used by Google to index metadata for searches.
The use of the standardized JSON-LD schema will dramatically increase the
visibility of datasets, and also enable projects to create one-time code
that can be reused for periodic uploads of datasets to ESS-DIVE.

⭐ Contact [email protected] to
**submit more than 10GB per upload attempt**.
Additional permissions are required.

⭐ Current Maximum Upload Limit: **500 GB per upload attempt**

Please contact [email protected] to submit more than 500GB of
data at once.

Use Sandbox https://api-sandbox.ess-dive.lbl.gov when testing code to submit
datasets to ESS-DIVE. All code examples use sandbox. Once you have tested your
code and you're ready to create new datasets for publication, use our
production domain https://api.ess-dive.lbl.gov/.

For additional information about the API, review the documentation at https://api-sandbox.ess-dive.lbl.gov.

Email ESS-DIVE at [email protected] if you require assistance

Before creating datasets, you must be registered as an ESS-DIVE data
contributor. To become a data contributor, set up your account by logging in
with your ORCID, then fill out the New Data Contributor form.

After approval, you will be able to find your authentication token in
your ESS-DIVE profile. This token is required to submit datasets
through the API.

## Setup
### Get Authentication Token

1. Go to https://data-sandbox.ess-dive.lbl.gov
2. Sign in with Orcid
3. Click your Name in the right hand corner and select My Profile
4. Now Click the Settings > Authentication Token
5. Scroll down and click Copy on the “Token” tab to get your
authentication token

⭐️ If you are not already registered to submit data with ESS-DIVE,
follow the steps on the Register to Submit Data page: https://docs.ess-dive.lbl.gov/contributing-data/new-contributor-registration

### Install Packages

```{r}
#| label: "package installs"

install.packages("httr2")
install.packages("jsonlite")
install.packages("readr")

# Require the package so you can use it
require("httr2")
require("curl")
library(readr)
library(jsonlite)
```

### Dataset API Information and Token

You will need to copy your authentication token from your profile on ESS-DIVE.
Tokens expire *every 24 hours.*

```{r}
#| label: "query buidling"

token <- "<ENTER TOKEN HERE>"

# DO NOT EDIT
header_authorization <- paste("bearer",
token,
sep=" ")
base <- "https://api-sandbox.ess-dive.lbl.gov"
endpoint <- "packages"
```

## Submit a Dataset
### Create Metadata

Due to R complex JSON-LD support limitations, you need to create a text file of
your JSON-LD and add it’s directory in the following read_file function.
Here’s an example for a JSON-LD located on our ESS-DIVE package service
examples github repository
(https://github.com/ess-dive/essdive-package-service-examples).

While creating your metadata, refer to the Dataset Requirements page for
instructions on completing each metadata field: https://docs.ess-dive.lbl.gov/contributing-data/package-level-metadata

To make sure your file is properly saved in the JSON-LD format.

Once you have completed your metadata file, enter the path to replace
the below example.

```{r}
json_file <- readr::read_file("example-1.jsonld")
```

## Submit Your Dataset
### Submitting Only Metadata

```{r}
# DO NOY EDIT

# Construct the request
req <- request(base_url = base) |>
# Add the endpoint to the url
req_url_path_append(paste0("/",
endpoint)) |>
# Attach headers
req_headers(Authorization = header_authorization,
"Content-Type"="application/json") |>
# Attach the json file to the request body
req_body_raw(json_file)

# See the request that will be sent
req |>
req_dry_run()

# Send the request
resp <- req |>
req_perform()
```

Review results. results allows you to view your dataset ID, URL, the full
dataset metadata (`results$dataset`), warnings or errors, and details about
dataset submission. If your dataset has been submitted correctly,
`results$detail` should return "Dataset created successfully."

```{r}
# Take response body and extract it
extracted_resp <- resp |>
httr2::resp_body_json(simplifyVector = TRUE)

# What are the response elements
attributes(extracted_resp)

# View metadata
extracted_resp$detail
extracted_resp$viewUrl
extracted_resp$errors
```

### Submit Metadata and Data

To submit the metadata and a data file, create a folder and add your data file
to it then execute the following code:

```{r}
# Construct the request
req <- request(base_url = base) |>
# Add the endpoint to the url
req_url_path_append(paste0("/",
endpoint)) |>
# Attach headers
req_headers(Authorization = header_authorization,
"Content-Type"="multipart/form-data") |>
# Attach the JSON metadata and the CSV data file to the request body
req_body_multipart("json-ld"=json_file,
data = curl::form_file("example_datafile.csv"))

# See the request that will be sent
req |>
req_dry_run()

# Send the request
resp <- req |>
req_perform()
```

Review results. results allows you to view your dataset ID, URL, the full
dataset metadata (`results$dataset`), warnings or errors, and details about
dataset submission. If your dataset has been submitted correctly,
`results$detail` should return "Dataset created successfully."

```{r}
# Take response body and extract it
extracted_resp <- resp |>
httr2::resp_body_json(simplifyVector = TRUE)

# What are the response elements
attributes(extracted_resp)

# View metadata
extracted_resp$detail
extracted_resp$viewUrl
extracted_resp$errors
```