-
Notifications
You must be signed in to change notification settings - Fork 1.5k
/
job.yml
25 lines (25 loc) · 888 Bytes
/
job.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: src
command: >-
python startDask.py
--script prep-nyctaxi.py
--nyc_taxi_dataset ${{inputs.nyc_taxi_dataset}}
--output_folder ${{outputs.output_folder}}
inputs:
nyc_taxi_dataset:
path: wasbs://[email protected]/nyctaxi/
mode: ro_mount
outputs:
output_folder:
type: uri_folder
environment:
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: conda.yaml
compute: azureml:cpu-cluster-lg
resources:
instance_count: 4
distribution:
type: pytorch
display_name: dask-nyctaxi-example
experiment_name: dask-nyctaxi-example
description: This sample shows how to run a distributed DASK job on AzureML. The 24GB NYC Taxi dataset is read in CSV format by a 4 node DASK cluster, processed and then written as job output in parquet format.