Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcdata::bcdc_list_*_records() only returning first 10 records #353

Open
stephhazlitt opened this issue Jan 7, 2025 · 3 comments
Open

bcdata::bcdc_list_*_records() only returning first 10 records #353

stephhazlitt opened this issue Jan 7, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@stephhazlitt
Copy link
Member

stephhazlitt commented Jan 7, 2025

Looks like a bug 🐞 since the CKAN changes a few weeks ago, the bcdata::bcdc_list_organization_records() and bcdata::bcdc_list_group_records() are both only returning the first 10 records.

e.g., the BC Stats organization has 69 records in the BC Data Catalogue

bcdata::bcdc_list_organization_records("bc-stats")
#> # A tibble: 10 × 43
#>    author     author_email creator_user_id download_audience groups id    isopen
#>  * <chr>      <lgl>        <chr>           <chr>             <list> <chr> <lgl> 
#>  1 <NA>       NA           40a48d33-5a6c-… Public            <list> ca3a… FALSE 
#>  2 53112dd5-… NA           53112dd5-472c-… Public            <list> 55da… FALSE 
#>  3 <NA>       NA           b3245224-9d10-… Public            <list> cace… FALSE 
#>  4 <NA>       NA           40a48d33-5a6c-… Public            <list> 45a0… FALSE 
#>  5 <NA>       NA           67e6caff-767e-… Public            <list> 1482… FALSE 
#>  6 427ce3ac-… NA           427ce3ac-d77d-… Public            <list> 5661… FALSE 
#>  7 <NA>       NA           40a48d33-5a6c-… Public            <list> 2c75… FALSE 
#>  8 <NA>       NA           40a48d33-5a6c-… Public            <list> 3cb4… FALSE 
#>  9 <NA>       NA           40a48d33-5a6c-… Public            <list> 466d… FALSE 
#> 10 <NA>       NA           b3245224-9d10-… Public            <list> 35f5… FALSE 
#> # ℹ 36 more variables: license_id <chr>, license_title <chr>,
#> #   license_url <chr>, maintainer <lgl>, maintainer_email <lgl>,
#> #   metadata_created <chr>, metadata_modified <chr>, metadata_visibility <chr>,
#> #   name <chr>, notes <chr>, num_resources <int>, num_tags <int>,
#> #   organization <df[,10]>, owner_org <chr>, private <lgl>,
#> #   publish_state <chr>, record_create_date <chr>, record_last_modified <chr>,
#> #   record_publish_date <chr>, relationships_as_object <list>, …
@stephhazlitt stephhazlitt added the bug Something isn't working label Jan 7, 2025
@stephhazlitt
Copy link
Member Author

Trying to find where this is bottlenecked. Walking through this code, it looks like the get request sees all the records but only downloads the first 10.

Image

And the recent CKAN update change log refers to dropping a number of overrides for what might be related end points (e.g., package_update, organization_list).

Maybe one of the overrides overrode pagination?

@stephhazlitt stephhazlitt self-assigned this Jan 8, 2025
@stephhazlitt
Copy link
Member Author

stephhazlitt commented Jan 8, 2025

Looks like the default of all/most CKAN REST API endpoints, including organization_show, is to use pagination to limit the number of records returned per request. This must have been over-ridden in the previous version of the BCDC. Working on a PR to add limit and offset query parameters to return all the records ("packages"), as is done with bcdata::bcdc_list().

@stephhazlitt
Copy link
Member Author

bcdata::bcdc_search_facets("organization") returns 50 records, instead of the well over 100 organizations listed in the GUI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant