The TM Extractor script is designed to trigger extraction requests for Tasking Manager projects. It utilizes the HOTOSM Tasking Manager API and the Raw Data API for data extraction. For more complex queries navigate to osm-rawdata module
- Python 3.x
- Access token for Raw Data API
Make sure you have python3 installed on your system
- Clone the repository and cd
git clone https://github.com/kshitijrajsharma/TM-Extractor
cd TM-Extractor
Head over to Env to verify you have setup correct env variables & Run the script from the command line with the following options:
- For Specific TM Projects :
python tm_extractor.py --projects 123 456 789
- For fetching active projects within last 24hr
python tm_extractor.py --fetch-active-projects 24
- For tracking request and Dumping result
python tm_extractor.py --projects 123 --track
You can set it up as systemd service or cronjob in your PC if required or run manually.
-
Create an AWS Lambda Function:
- In the AWS Management Console, navigate to the Lambda service, Choose role and create one with necessary permissions
-
Set Environment Variables:
-
Add the following environment variables to your Lambda function configuration:
CONFIG_JSON
: Path to the config JSON file. Default isconfig.json
.- Refer to Configurations for more env variables as required
-
-
Deploy the Script as a Lambda Function:
- Zip the contents of your project, excluding virtual environments and unnecessary files (Including config.json)
- Upload the zip file to your Lambda function.
-
Configure Lambda Trigger:
- Configure an appropriate event source for your Lambda function. This could be an API Gateway, CloudWatch Event, or another trigger depending on your requirements.
-
Invoke the Lambda Function:
- Trigger the Lambda function manually or wait for the configured event source to invoke it.
Your Lambda function should be able to execute the script with the provided configurations.
-
Download & install Terraform:
- Install terraform here.
-
Create AWS Credentials
-
Create an IAM programmatic user :
- Configure .aws/credentials or use AWS Environment varibles for terraform authentication. Check Here
-
-
Environment Variables You can set terraform input variables using
TF_VAR_<varible-name-from-vars.tf>
if you don't want to provide them each time. -
Run Terraform Plan/Apply
- Run
terraform init
to download required providers - Run
terraform plan
plan to check for infrastructure changes. - Run
terraform apply
to apply terraform configurations.
- Run
NOTE: Check
main.tf
for resources terraform creates.
You can run streamlit app to use frontend
- Run Locally
pip install streamlit
streamlit run streamlit_app.py
- To Use hosted Service : Go to tm-extractor.streamlit.app
Set the following environment variables for proper configuration:
Example :
export RAWDATA_API_AUTH_TOKEN='my_token'
-
RAWDATA_API_AUTH_TOKEN
: API token for Raw Data API authentication, Request admins for yours to RAW DATA API -
RAW_DATA_API_BASE_URL
: Base URL for the Raw Data API. Default ishttps://api-prod.raw-data.hotosm.org/v1
. -
TM_API_BASE_URL
: Base URL for the Tasking Manager API. Default ishttps://tasking-manager-tm4-production-api.hotosm.org/api/v2
. -
CONFIG_JSON
: Path to the config JSON file. Default isconfig.json
.
The config.json
file contains configuration settings for the extraction process. It includes details about the dataset, categories, and geometry of the extraction area.
{
"geometry": {...},
"dataset": {...},
"categories": [...]
}
Defines the geographical area for extraction. Typically auto-populated with Tasking Manager (TM) geometry.
Specifies the Raw Data API queue, often set as "raw_default" for default processing, This can be changed if there is disaster activation and special services are deployed so that those requests can be prioritized.
Contains information about the dataset:
dataset_prefix
: Prefix for the dataset extraction eg : hotosm_project_123.dataset_folder
: Default Mother folder to place during extraction eg : TM , Mindful to change this.dataset_title
: Title of the Tasking Manager project eg : Tasking Manger Project 123.
Array of extraction categories, each represented by a dictionary with:
Category Name
: Name of the extraction category (e.g., "Buildings", "Roads").types
: Types of geometries to extract (e.g., "polygons", "lines", "points").select
: Attributes to select during extraction (e.g., "name", "highway", "surface").where
: Conditions for filtering the data during extraction (e.g., filtering by tags).formats
: File formats for export (e.g., "geojson", "shp", "kml").
Adjust these settings based on your project requirements and the types of features you want to extract.
Refer to the sample config.json for default config.
The script is designed to trigger extraction requests for Tasking Manager projects using the Raw Data API. It automates the extraction process based on predefined configurations.
- Supports both command line and AWS Lambda execution.
- Dynamically fetches project details, including mapping types and geometry, from the Tasking Manager API.
- Configurable extraction settings using a
config.json
file. - Helps debugging the service and track the api requests
- Your export download link will be generated based on the config , with raw-data-api logic it will be
Base_download_url
/dataset_folder
/dataset_prefix
/Category_name
/feature_type
/dataset_prefix_category_name_export_format.zip
- Example for Waterways configuration :
Here Category Name is
Waterways
, dataset_prefix ishotosm_project_9
, dataset_folder isTM
, feature_type islines
and export format isgeojson
"Waterways": {
"resources": [
{
"name": "hotosm_project_9_waterways_lines_geojson.zip",
"format": "geojson",
"description": "GeoJSON",
"url": "https://s3.sample.your_domain.org/TM/hotosm_project_9/waterways/lines/hotosm_project_9_waterways_lines_geojson.zip",
"last_modifed": "2023-12-28T17:48:21.378667"
},
]
}
See sample result to go through how result will look like