This GitHub repository is part of a collaborative project that aims to create data-driven partitions of the ocean from globally observed data.
All data are available on the main GitHub repository https://github.com/brorfred/ocean_clustering.
The dataset consists of 15 monthly climatologies resampled to the same 1/2° grid, both gridded in netcdf format and tabulated as hdf5 and csv files. The tabulated data are also provided subsampled by including every 2nd, 4th or 8th datapoint.
The effort is part of the Simons Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems Simons CBIOMES, which seeks to develop and apply quantitative models of the structure and function of marine microbial communities at seasonal and basin scales.
- Chlorophyll (Chl), PAR, Kd490, Euphotic Depth
- remotely sensed reflectances (Rrs412, Rrs443, Rrs490, Rrs510, Rrs555, Rrs670)
- Sea Surface Temperature (SST), Mixed Layer Depth (MLD)
- Wind Speed (wind), Eddy Kinetik Energy (EKE), Bathymetry
Below is an example in MATLAB how to download the data and using hierarchical clustering with Ward's method and cosine similarity.
% Specify link to data available from the main GitHub repository (low resolution data)
oceanURlFile = 'https://rsg.pml.ac.uk/shared_files/brj/CBIOMES_ecoregions/ver_0_2/tabulated_geospatial_montly_clim_045_090_ver_0_2.csv';
% Load table from the main GitHub into oceanData
oceanData = webread(oceanURlFile);
% Feature indices (first four indices are index, month, lat, lon)
featureInds = 5:19;
nFeatures = length(featureInds);
% Extract all features from the data file
oceanX = table2array(oceanData(:,featureInds));
% z-scoring
oceanZ = zscore(oceanX);
% Hierarchical clustering
linkZ_cosine = linkage(oceanZ,'ward','cosine');
% Group into the top-7 global yearly clusters
T_cosine = cluster(linkZ_cosine,'maxclust',7);
The image represents the clustering of the ocean into seven regions for the month of September.
An MATLAB notebook using hierarchical clustering with several distance measures is available here.
A complete clustering and analysis framework in R is developed here.