Skip to content
fqqlsun edited this page Mar 4, 2022 · 29 revisions

Welcome to LEAF production tool wiki!

Lixin Sun and Richard Fernandes, Canada Centre for Remote Sensing, Government of Canada

1. Description

LEAF production tool is an application developed based on the Earth Engine (EE) python API. With this tool, users can efficiently produce biophysical parameter maps with associated uncertainty estimates from the surface reflectance satellite imagery stored in Google Earth Engine (GEE). The two major features of this tool are: (1) various production requirements can be achieved by configuring a flexible input parameter dictionary object; (2) all results can be automatically exported to a specified location (either Google Drive or Google Cloud Storage) in a batch mode.

The regular output spatial unit of the LEAF production tool is a tile (900km x 900km), which is defined by the Canadian geospatial tile griding system (there are 26 tiles covering the Canadian landmass). However, a customized polygon can also be utilized to define the spatial area of a production. Currently, four types of biophysical products (LAI, fCOVER, fAPAR and Albedo) can be generated with this tool. Note that all the instructions in this document assume on Windows platform, because that is the only system we tested the LEAF production tool on.

2. Set Up Environment For LEAF Production Tool

The prerequisites for running LEAF production tool on a local/client computer are described as follows:

  • Having a Google Cloud account;
  • Install Anaconda and Earth Engine (EE) Python API on an local/client computer. The LEAF production tool must be executed through Jupyter Notebook. Anaconda is a widely used Python distribution for data science. It includes the libraries required by both EE python API and Jupyter Notebook;
  • Clone or download a copy of the Python LEAF production tool from GitHub and save it on the local computer.

2.1 Install Anaconda and EE python API

The installation of Anaconda on a local computer is a straightforward process. Just go to the website of Anaconda and follow the given instructions. After the installation, an user needs to search (from the Start menu of Windows) and open Anaconda Prompt, because all conda command must be issued within the Prompt window.

It is highly recommended to create a dedicated conda environment for installing EE Python API and executing LEAF production tool. Next line is the command to create a dedicated conda environment ("leaf_prod" here is used to name the environment):

conda create -n leaf_prod

Once a dedicated conda environment has been created, the following commands can be used to activate the environment and install EE Python API to the environment:

conda activate leaf-prod

conda install -c conda-forge earthengine-api

To ensure you can access GEE server through your Google account, you have to obtain a credential from Google. In your dedicated environment (e.g., “leaf_prod”), run earthengine authenticate command and follow the subsequent instructions. A URL will be provided that generates an authorization code upon agreement. Copy the authorization code and enter it as the input of the command line.

2.2 Obtain a copy of the LEAF production tool from GitHub

There are two ways in which you can obtain the source code hosted on Github.

  1. Download the source code from the option you can see when you land on github source code.

  2. Clone the repo on your desktop (which is same as downloading on desktop but in a GIT way). You just have to run only one command which is git clone https://github.com/username/repo-url.

  • Copy the url which is given in your clone with HTTPS box.

  • Open you terminal/command prompt and paste the url and hit enter

  • The source code is copied or downloaded on your preferred location.

3. Run LEAF Production Tool

Running LEAF production tool is straightforward. Only a few python statements are needed. The following python code is a sample.

test

import ee

ee.Initialize()

import LEAFNets as LFNs

LEAF_PARAMS = {

`'sensor': 101,                  # A sensor type code integer (one of 5, 7, 8, 101 or 102)    `

`'year': 2020,                   # A image acquisition year `

`'months': [5,6,7,8,9,10],       # A list of month integers `

`'prod_names': ['LAI', 'fCOVER', 'Albedo', 'fAPAR'],  # A list of biophysical parameter name strings`

`'tile_names': ['tile52'],       # A list of tile name strings  `

`'spatial_scale': 20,            # The spatial resolution (in meter) for exporting results`

`'location': 'storage',          # The location of product exporting ("drive" or "storage") `

`'bucket': 's2_leaf_2020_v0',    # The bucket name already created on Google Cloud Storage`

`'out_folder': ''}               # An optional specified folder name on Google Drive `

export_tasks = LFNs.LEAF_tool_main(LEAF_PARAMS)

Within the package of LEAF production tool downloaded/cloned from GitHub, two files, leaf_tool.py and leaf_tool.ipynb, have been provided for quickly running LEAF production tool in two different ways. leaf_tool.py and leaf_tool.ipynb are normal python source code and Jupyter Notebook file, respectively.

To run leaf_tool.py file, enter the directory where LEAF code is saved with "cd" commands, type python leaf_tool.py then press ENTER key within Anaconda Prompt window.

To run LEAF production tool with leaf_tool.ipynb file, you have to first start Jupyter Notebook by either clicking on the Jupyter Notebook icon installed by Anaconda in the start menu or by typing Jupyter Notebook command in a terminal (cmd on Windows or Anaconda Prompt). This will open a new browser tab showing the Notebook Dashboard, a sort of control panel that allows (among other things) to select which notebook to open. Navigate to the directory where LEAF code is saved, click leaf_tool.ipynb file from there. This leads to another browser tab is opened. In leaf_tool.ipynb tab page, run

4. Major Data Structures Used In LEAF Production Tool

There are three major data structures used in LEAF production tool and all of them are python dictionary objects. Of the three dictionary objects, two are defined as global variables and named “COLL_OPTIONS” and “PROD_OPTIONS”, respectively. The third one can be named in any user-preferred way, but must be passed to the main function (“LEAF_tool_main”) of LEAF production tool. The detailed descriptions on the three dictionary objects are provided in the following three sections.

2.1 Image Collection Options This is a python dictionary currently containing two key:value pairs for storing the options associated with Sentinel-2 and Landsat-8 dataset, respectively. The two keys are “COPERNICUS/S2_SR” and “LANDSAT/LC08/C01/T1_SR”, which are the catalog names used by Google Earth Engine for the two datasets. The values paired with the two keys are also python dictionary objects including 18 key:value pairs.

2.2 Product Options This is a python dictionary object with 8 key:value pairs. Each key:value pair contains the option information related to one biophysical parameter product. Currently, the four major products that can be generated with the LEAF production tool are LAI, fCOVER, fAPAR and Albedo.

2.3 Parameter Dictionary for Running LEAF Production Tool The execution of the LEAF production tool requires a number of parameters. To facilitate the transfer of the parameters, a python dictionary structure is utilized as a container. An example of the parameter dictionary is displayed below:

LEAF_PARAMS = {'sensor': 8,
‘year': 2020, 'months': [5,6,7,8,9,10], 'prod_names': ['LAI', 'fCOVER', 'Albedo', 'fAPAR'], 'tile_names': ['tile31', 'tile32','tile33', 'tile34','tile35', 'tile36'], 'spatial_scale': 30,
'location': 'storage',
'bucket': 'l8_leaf_2020_v0', 'out_folder': ' '}

Specifically, the dictionary includes 9 “key : value” pairs, which are described in detail as follows:
(1) 'sensor: a single integer that represents a satellite sensor. The valid values for this key are 5, 7, 8, 9 and 101, which stand for Landsat 5, 7, 8, 9 and Sentinel-2, respectively. (2) 'year' : a 4 digits integer, identifying the year of image acquisition (e.g., 2020). (3) 'months' : a list of integers (e.g., [6, 7, 8] represent June, July and August). With a list of integers within the range of 1 to 12, several monthly biophysical parameter products can be generated through one execution of the LEAF production tool. If the list contains only one integer with its value outside the range, the biophysical parameter products corresponding to the peak season (June 15 to September 15) of a year (specified by 'year' key) will be produced. (4) 'prod_names' : a list of strings standing for different biophysical parameters. Currently, LEAF production tool can be used to generate a subset or a full set of 4 biophysical parameters ['LAI', 'fCOVER', 'Albedo', 'fAPAR']. (5) 'tile_names' : a list of strings representing different tiles. The basic spatial unit of the LEAF production tool is a tile, which is a 900km x 900km area and defined by the Canadian tile griding system. Providing a list of tile names means the biophysical parameter products for multiple tiles can be generated through one execution of the LEAF production tool. (6) 'spatial_scale' : a single integer defining the spatial resolution (in meter) of exported product maps. (7) 'location' : a single string specifying the location to export the product maps. There are two valid strings for this parameter, 'drive' and 'storage', representing Google Drive (GD) and Google Cloud Storage (GCS), respectively. (8) 'bucket' : the name string of a bucket on GCS. This parameter only is used when the value corresponding to the 'location' key is 'storage'. (9) 'out_folder' : the folder name on GD or GCS for holding a set of exported biophysical parameter maps corresponding to one tile and one year. An empty string for this key means a folder name will be created automatically with a tile name (an element of the list associated with the ‘tile_names’ key) and the acquisition year (the value corresponding to the “year” key). In summary, of the 9 key:value pairs of the input dictionary, six keys ('sensor', 'year', 'spatial_scale', 'location', 'bucket' and 'out_folder') require a single value, while the other three need a list. With the different combinations between the lists, various production scenarios can be carried out. For instance, to generate monthly (e.g., July and August) biophysical parameter maps for multiple tiles (e.g., 'tile41', 'tile42' and 'tile43'), two lists, [7, 8] and ['tile41', 'tile42', 'tile43'], should be provided for 'months' and 'tile_names' keys. Once an input dictionary is defined, running the python LEAF production tool is a straightforward process. A simple Jupyter Notebook for running this tool is shown in Figure 2.

Figure 2. A simple Jupyter Notebook for running python LEAF production tool

As a result, there are 11 the biophysical parameter data set associated with one tile consists of 11 image files in GeoTiff format. Specifically, for each of four biophysical parameters (LAI, fCOVER, Albedo and fAPAR), there are two associated images, a parameter estimation and its corresponding uncertainty map (8 image files in total). Additionally, there are three ancillary image files (quality control, acquisition date and land cover partition). With spatial resolution set to 20m, the size of one tile’s parameter dataset (in GeoTiff format) ranges from 3.1GB to 44GB depending on the location of the tile. It must be noted that a smaller dataset size unnecessarily means less computing time is needed. The reason for this is that the size of the image files is determined by the homogeneity of the covered land surface. In northern Canada, the land cover is relatively homogeneous, however the number of satellite images involved in the calculation is larger due to the geometry of satellite paths. This leads to a longer computing time.

The execution environment of the code is the Jupyter Notebook. If the “geemap” Python package (https://geemap.org/) is also installed and Chrome browser (rather than FireFox browser) is utilized to open the notebook file, then the resultant parameter maps can be visualized and explored as well within the notebook.

The basic spatial output unit of the LEAF production code is a tile (900km x 900km), which is defined by the Canadian geospatial tile grinding system (there are 26 tiles covering the Canadian landmass). Currently, the biophysical parameter dataset associated with one tile consists of 11 image files. Specifically, for each of four biophysical parameters (LAI, fCOVER, Albedo and fAPAR), there are two associated images, a parameter estimation and its corresponding uncertainty map (8 image files in total). Additionally, there are three ancillary image files (quality control, acquisition date and land cover partition). With spatial resolution set to 20m, the size of one tile’s parameter dataset (in GeoTiff format) ranges from 3.1GB to 44GB depending on the location of the tile. It must be noted that a smaller dataset size unnecessarily means less computing time is needed. The reason for this is that the size of the image files is determined by the homogeneity of the covered land surface. In northern Canada, the land cover is relatively homogeneous, however the number of satellite images involved in the calculation is larger due to the geometry of satellite paths. This leads to a longer computing time.

LEAF production code is released under the Government of Canada's Open Government License

Current Efficiency

A GEE application (in either JavaScript or Python) must be executed on a client-server architecture with majority of the computations are carried out on GEE servers. This means that the efficiency of the LEAF production code is not determined by a client-side computer.

Currently, 2 hours on average is required to produce a 20m resolution biophysical vegetation parameter data set for one tile with the satellite images acquired by Sentinel-2, meaning that the generation of a Canadian national (26 tiles) one month 20m resolution biophysical parameter data set requires at least 52 hours. Of course, this time requirement varies depending on the location of a tile, network connection status and time zone (day, night or weekend). So it can be estimated that three or 4 days are required to operationally finish a monthly 20m resolution national biophysical parameter map.

Code in Repository

This repository does not include all the Python code of LEAF production tool. The main purpose of this repository is to keep a communication channel for investigating the efficiency of running LEAF production code on a Google Cloud VM.