Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argis_rest file formats? #257

Closed
ashjolly opened this issue Mar 26, 2021 · 31 comments
Closed

argis_rest file formats? #257

ashjolly opened this issue Mar 26, 2021 · 31 comments

Comments

@ashjolly
Copy link

Hi team,

In using the bcdata package for downloading spatial data from the Data Catalogue, I've run into multiple instances where data is present as an 'arcgis_rest' file format. My impression is that this happens when the Data Catalogue is scraping data from Map Hub resources.

For example, the Snow Basin Indices item within the Data Catalogue represents snow data present for polygons within the RFC's Snow Map (I believe...) :

RFC Snow Map Link
https://governmentofbc.maps.arcgis.com/apps/webappviewer/index.html?id=b57800e08e46468bab506f9b9f0cbad6

The resultant Data Catalogue entry is in an "arcgis_rest" format:
https://catalogue.data.gov.bc.ca/dataset/snow-basin-indices

library(bcdata)
bcdata::bcdc_get_record("712d39f3-de6f-4ddf-a5e5-2066be5e4482")
bcdata::bcdc_query_geodata("712d39f3-de6f-4ddf-a5e5-2066be5e4482") %>% dplyr::collect()
bcdata::bcdc_get_data("712d39f3-de6f-4ddf-a5e5-2066be5e4482") %>% dplyr::collect()

I notice that the 'bcdata_available' column is FALSE for all of the resources within this link.

We are going to update the polygons within the Snow Map, which will hopefully be represented within the BC Data Catalogue entry. I'm hoping to be able to point my R script to this entry in the BC Data Catalogue rather than rely on a local copy of the spatial data. Additionally, this is also the same situation for the Drought Polygons.

Thanks for your two cents about any suggestions on how to deal with this situation, and thanks for developing such a useful package!

-Ashlee

@stephhazlitt
Copy link
Member

Thanks for this Issue @ashjolly.

Currently {bcdata} searches and returns all the metadata records available in the B.C. Data Catalogue. It can only pull data from a catalogue record where the data resource is stored in the B.C. Data Catalogue itself (bcdc_get_data()) or the data resource is stored in the B.C. Geographic Warehouse (bcdc_query_geodata()), with a few exceptions around file types (e.g. {bcdata} does not work when a data resource is a set of multiple files in a zipped folder).

As you point out above, some of the catalogue metadata records are for Web Apps/User Tools, such as the Snow Map---where the record provides a licence and metadata for the App itself. Rather than scraping data from a Web App, I think a more direct path would be to add the data layer itself (in a non-proprietary or open or common format 😉) as a resource within the record.

@ashjolly
Copy link
Author

ashjolly commented Apr 6, 2021

Thanks Stephanie! Excellent points and clarification - I really appreciate it (and apologies for the delay - I was on leave). I know that GeoBC makes the Web App map from spatial data, so I could see a pathway forward where this initial layer is added to the catalogue, along with the Web App meta data you describe. I'll check in with this regarding the RFC-related resources. Thanks again!

@ashjolly ashjolly closed this as completed Apr 6, 2021
@bevingtona
Copy link

It might be worth revisiting this idea of bcdata having functionality for ArcGIS REST formats...

For example:

# remotes::install_github("yonghah/esri2sf")
snow_basins_url <- "https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
esri2sf::esri2sf(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")

image

There are 59 publicly published datasets in the BC Data Catalogue that are stored as arcgis_rest.

image

@boshek
Copy link
Collaborator

boshek commented Jan 17, 2024

My take is that while this is in scope for bcdata, the real limitation is that esri2sf is not on CRAN and therefore would at best require a dodgy workaround CRAN policies and at worst impose a fairly sizable maintenance cost.

@ateucher
Copy link
Collaborator

ateucher commented Jan 17, 2024

Yeah, I came here to say the same thing. Would love to do it but without esri2sf on CRAN it's not really doable... One possible way would be to return the url when it's an esri endpoint, and add some documentation about how to use esri2sf?

@boshek
Copy link
Collaborator

boshek commented Jan 17, 2024

I mean @bevingtona could also write a custom parser too. I assume it is "just" some json that the esri2sf is handling. 😜

@stephhazlitt
Copy link
Member

+1 for breadcrumbs leading users to esri2sf. It might be worthwhile to see if the authors of esri2sf have a path to CRAN?

I'll reiterate my philosophical objection to supporting spatial formats in the BC Data Catalogue—the open data portal—that are not in an open format. I still think this is akin to the getting a horse off a balcony situation, where the horse should not be there in the first place 🐴.

@bevingtona
Copy link

For sure, great comments @stephhazlitt @ateucher @boshek .. I think the reality is that so many are hosting data in this format.

I'll look into an in-house parser ..

@bevingtona
Copy link

This one is on CRAN ... arcpullr

# install.packages("arcpullr")
library(arcpullr)
snow_basins_url <-"https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0"
arcpullr::get_spatial_layer(snow_basins_url) |> mapview::mapview(zcol = "Snow_Basin_Index")

image

@ateucher
Copy link
Collaborator

That does potentially change things, let's reopen this

@ateucher ateucher reopened this Jan 17, 2024
@stephhazlitt
Copy link
Member

Agreed @ateucher & @bevingtona. If there is a CRAN package we can import to parse these spatial files and there is bandwidth to author a PR, I am +1 for adding this enhancement.

@ateucher
Copy link
Collaborator

@bevingtona if you have time and inclination to do a PR, that would probably expedite this. I can probably get to it some time, but I can't say when.

@ateucher
Copy link
Collaborator

ateucher commented Jan 17, 2024

I haven't been in that bit of the package in a while, but it might be mostly a matter of editing this function/table:

bcdc_read_functions <- function(){

and then getting the dependencies in order... and adding tests of course :)

@ateucher
Copy link
Collaborator

Related: #325

@boshek
Copy link
Collaborator

boshek commented Jan 18, 2024

Even the testing should be pretty straight forward as really all we want to do is make sure it actually works like here:

test_that("bcdc_get_data works with an xls when specifying a specific resource",{
skip_if_net_down()
skip_on_cran()
name <- 'bc-grizzly-bear-habitat-classification-and-rating'
expect_s3_class(bcdc_get_data(name, resource = '7b09f82f-e7d0-44bf-9310-b94039b323a8'), "tbl")
})

Testing also probably should involve using a SQL query passed into arcpullr::get_spatial_layer via ... (ie. where = "WATERBODY_ROW_NAME = 'Wisconsin River'")

FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a bcdc_get_data "method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.

@stephhazlitt
Copy link
Member

I think jsonlite::read_json() was the last new data format reader we added, here is the PR that provides a reasonable recipe to follow (edit function table, add test, update NEWS, add import etc.).

@bevingtona
Copy link

FWIW, the arcpullr package looks pretty full featured. For convenience it is definitely helpful to have a bcdc_get_data "method" to access data like this. But if you were working a ton with a data source like this, you may be better off just get the relevant url with bcdata and then using the arcpullr package directly.

So... maybe just a message when it's a REST format to use arcpullr with a syntax example? Or is the use case strong enough to build a few functions?

@ateucher
Copy link
Collaborator

I think we can justify adding the ability to get arcgis data with bcdc_get_data(). But I think adding full query functionality would take quite a bit more work (ie enabling bcdc_query_geodata() to query an arcgis endpoint in addition to wfs). So I propose just the simple method, where the ... could take the SQL argument and pass it to arcpullr.

@boshek
Copy link
Collaborator

boshek commented Jan 31, 2024

So the catalogue only returns urls to arcgic ui. For example the record for the snow basin index:

R> rec <- bcdc_tidy_resources('712d39f3-de6f-4ddf-a5e5-2066be5e4482')
R> rec$url
[1] "https://governmentofbc.maps.arcgis.com/home/item.html?id=f842bd03020241ed9512746a83137a1f"
[2] "https://governmentofbc.maps.arcgis.com/home/item.html?id=637a958538e44b928fda568784cbb8eb"

For this to work we'd need a field in the record to have the associated API url: https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/Snow_Basins_Indices_View/FeatureServer

There may be some way to construct that link above but that seems brittle. Instead if one can get the API url included in the record, the rest is pretty easy.

@ateucher
Copy link
Collaborator

ateucher commented Feb 5, 2024

It doesn't address the issue @boshek identified, but this looks like an alternative package: https://r.esri.com/arcgislayers/index.html. It looks like it's actually developed by ESRI so may be the most reliable for long-term maintenance... maybe?

Edit: It's not on CRAN yet but I think that is the intention - it's very new

@boshek
Copy link
Collaborator

boshek commented Feb 6, 2024

I'm going to close this again. Lots of good info in this about what needs to happen for bcdata to access the arc rest api directly but ultimately none of them that bcdata can fix at the moment.

@boshek boshek closed this as completed Feb 6, 2024
@stephhazlitt
Copy link
Member

@bevingtona @ashjolly FWIW, if either of you are an editor of a record with an arcgis_rest resource and want to add in the associated API url to the record then we would at least have an example to work with to get the plumbing working/tested in bcdata. With a proof-of-concept in-hand, maybe other arcgis_rest catalogue record editors would follow suit.

@bevingtona
Copy link

Maybe @jongoetz can make this happen?

@stephhazlitt
Copy link
Member

https://www.esri.com/arcgis-blog/products/developers/announcements/announcing-arcgis-r-package/

@jongoetz
Copy link

jongoetz commented Apr 2, 2024

Amazing! They have just the solution we need. Thanks Steph!

library(arcgis) arc_open("https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services/Snow_Basins_Indices_View/FeatureServer/0") |> arc_select() |> mapview::mapview(zcol="Snow_Basin_Index")

@ateucher
Copy link
Collaborator

ateucher commented Apr 2, 2024

Almost! We still need that catalogue records to publish the REST API endpoint rather than (or in addition to) the AGOL gui (#257 (comment))

@boshek
Copy link
Collaborator

boshek commented Apr 2, 2024

If folks are really interested in this, the implementation on the bcdata side is pretty easy so just even just getting one record that has the arcgis endpoint would enable this as a proof of concept

@stephhazlitt
Copy link
Member

Agreed. There is a solution here for bcdata and arc_gis_rest files, we just need a data provider to include the endpoint in the BC Data Catalogue metadata record.

@boshek
Copy link
Collaborator

boshek commented Apr 2, 2024

cough @jongoetz or @bevingtona cough ;)

@bevingtona
Copy link

Looks like there is a .json file that maintains the REST links: https://services6.arcgis.com/ubm4tcTYICKBpist/ArcGIS/rest/services?f=pjson but there are hundreds.. not sure what they all are.

So we'd need to connect each one to their BC Data Catalog counterpart? ugh

@stephhazlitt
Copy link
Member

@bevingtona I think data providers who have made the effort to document and make their data findable through the BC Data Catalogue could be convinced to add one more field to records (existing and new) to make the data quickly usable through R. I suggest a start with one record where you or @jongoetz (or someone on your teams) are an editor and we can get a proof of concept in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants