This template automates to build ChainerMN cluster on AWS. The overview of AWS resources to be created by this template are below:
- VPC and Subnet where cluster places (you can configure existing VPC/Subnet)
- S3 Bucket for sharing ephemeral ssh-key which is used to communicate among MPI processes in the cluster
- Placement group for optimizing network performance
- ChainerMN cluster which consists
1
master EC2 instanceN (>=0)
worker instnaces (via AutoScalingGroup)chainer
user to run mpi job in each instancehostfile
to run mpi job in each instance- All the instances are launched from Chainer AMI
- (Option) Amazon Elastic Filesystem (you can configure existing filesystem)
- This is mounted on cluster instances automatically to share your code and data.
- Several required SecurityGroups, IAM Role
Please see template/main.py for detailed resource definitions.
Please also refer to our blog: ChainerMN on AWS with CloudFormation
make build
# Configure AWS account properly first.
# this will create a stack via a template you built.
make create-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
# perform ChainerMN's train_mnist.py
make e2e-test TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
# cleanup stack
make delete-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
# Configure AWS account properly first.
# build template
make build
# perform e2e test
make create-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
make e2e-test TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
make delete-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
# publish to stage
make publish STAGE=(production|staging)
- Initial release
- Based on Chainer AMI
0.1.0
- Based on Chainer AMI
- Released Template
MIT License (see LICENSE
file).