Skip to content

ibartomeus/IberianBees

Repository files navigation

DOI License: CC BY 4.0

IberianBees database v.1.0.0 🐝

This is a repository to document the distribution and diversity of bee species of the Iberian Peninsula. You can see a summary of the data here.

How to contribute:

If you have data on Iberian bee's occurrence, fill in this template and send it to [email protected]

How to use this repo

  • The IberianBees database can be found on: Data/iberian_bees.csv.gz. This is a zip file so double click on it to unzip.

  • Metadata can be consulted here.

  • Records with non-accepted names on the Iberian bee species masterlist have been excluded of the final dataset but can be found on Data/Processing_iberian_bees_raw/removed.csv.

  • Please, if you spot any issue, please let @ibartomeus know to avoid duplicating efforts by creating an issue with the corresponding unique identifier (uid) of the record that needs to be fixed.

  • If you are curious on the process keep reading.

Process:

To build this database, we follow a reproducible workflow to clean and ensemble the data.

1- Use Scripts/1_1_Fetch_data.R to update data from internet (i.e. Gbif, iNaturalist).

2- Add new datasets (i.e. csv files) locally to Data/Rawdata/csvs/.

3- Process and clean individual files and assign a unique identifier within the folder Scripts/1_2_Processing_raw_data/.

4- Run Scripts/2_Run_all-Merge_all.R. This will run all individual files in Scripts/1_2_Processing_raw_data/and bind the data. The data can be merged directly without running all files by running the second section of the code "2 Merge all files".

5- Conduct a final cleaning (things that weren't fixed on the individual files on step 3). This is done in Scripts/3_1_Final_cleaning.R and will generate the final dataset Data/iberian_bees.csv.gz.

5.1- Non accepted species are excluded and saved on Data/Processing_iberian_bees_raw/removed.csv.

5.2- The non-accepted species names (e.g., synonyms) are checked manually from Data/Processing_iberian_bees_raw/to_check.csv and added to Data/Processing_iberian_bees_raw/manual_checks.csv once they have been reviewed with taxonomic advice when necessary. After running Scripts/3_1_Final_cleaning.R the fixed species will be included on the final Iberianbees dataset.

Metadata is generated using DataSpice.

Example:

Here, we provide an example of how to select, filter and plot the distribution of the species Xylocopa violacea for the records after the year 1999.

  • First, read compressed data in gzip format:
data <- read.table("../Data/iberian_bees.csv.gz", 
header = T, quote = "\"", sep = ",",row.names=1)
  • Second, select records of X. violacea after 1999
library(dplyr) #Library to filter data
xylocopa <- data %>% filter(Accepted_name == "Xylocopa violacea" & Year > 1999)
  • Finally, load map and plot records:
library(ggplot2) #to load worldmap and plotting
#Load map
world <- map_data("world")
#Plot records and adjust map to the Iberian Peninsula
ggplot(data = xylocopa, aes(Longitude, Latitude)) +
geom_map(data = world, map = world,
aes(long, lat, map_id = region), color = "white", fill = "grey", size = 0.1) +
coord_sf(xlim = c(-9, 4), ylim = c(36, 44)) +
geom_point()