Skip to content

Commit

Permalink
Merge pull request #293 from DataBiosphere/dev
Browse files Browse the repository at this point in the history
PR for 0.4.13 release
  • Loading branch information
wnojopra authored Jul 18, 2024
2 parents 0c3e313 + 37b4e0e commit 0cbbdb5
Show file tree
Hide file tree
Showing 11 changed files with 152 additions and 29 deletions.
30 changes: 19 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# How to Contribute
# Contributing

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.
Expand All @@ -8,16 +8,24 @@ just a few small guidelines you need to follow.
Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution,
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to https://cla.developers.google.com/ to see
your current agreements on file or to sign a new one.
part of the project. Review and sign our
[Contributor License Agreement](https://docs.google.com/document/d/1Yc-z59DQKRqiqpVgHcIHC3xhfwm3DoPmpc9eFwr_J8c/edit)
and send it to our IP team at [email protected].

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.
## How to Contribute

## Code reviews
There are many ways to contribute, including:

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.
* **Reporting Bugs:** If you find a bug, please open an issue describing the problem in detail. Include steps to reproduce, expected behavior, and any relevant error messages.
* **Suggesting Enhancements:** Have an idea for a new feature or improvement? Open an issue to discuss it with us.
* **Improving Documentation:** Help make our documentation clearer and more helpful by suggesting changes or fixing errors.
* **Submitting Pull Requests (PRs):** The best way to contribute code is by submitting a pull request. Before you start working on a major change, please open an issue to discuss the proposed change first.

## Pull Request Guidelines

1. **Fork the repository:** Create your own copy of the repository.
2. **Create a branch:** Make your changes on a new branch.
3. **Write clear commit messages:** Explain what each commit does.
4. **Follow coding style:** Match the existing code style as closely as possible.
5. **Add tests:** Include tests for your changes (if applicable).
6. **Open a pull request:** Submit your changes for review.
4 changes: 4 additions & 0 deletions docs/retries.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ current attempt will still be treated as a preemptible attempt used.
Transient failure rates should be much lower in practice than preemption rates
and more complex retry logic is not clearly more desirable.

When using the `google-batch` provider, using the `--preemptible` flag will
cause your tasks to be run on [Spot VMs](https://cloud.google.com/spot-vms).
Unlike standard GCE preemptible VMs, Spot VMs do not have a 24-hour time limit.

## Tracking task attempts

When viewing tasks with `dstat --full` the attempt number will be available
Expand Down
2 changes: 1 addition & 1 deletion dsub/_dsub_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@
0.1.3.dev0 -> 0.1.3 -> 0.1.4.dev0 -> ...
"""

DSUB_VERSION = '0.4.12'
DSUB_VERSION = '0.4.13'
1 change: 1 addition & 0 deletions dsub/providers/batch_dummy.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ class AllocationPolicy(object):
NetworkPolicy = None
Accelerator = None
LocationPolicy = None
ProvisioningModel = None

class LogsPolicy(object):
Destination = None
Expand Down
34 changes: 30 additions & 4 deletions dsub/providers/google_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@
import textwrap
from typing import Dict, List, Set

from ..lib import dsub_util
from ..lib import job_model
from ..lib import param_util
from ..lib import providers_util
from . import base
from . import google_base
from . import google_batch_operations
from . import google_custom_machine
from . import google_utils
from ..lib import job_model
from ..lib import param_util
from ..lib import providers_util


# pylint: disable=g-import-not-at-top
try:
Expand Down Expand Up @@ -298,6 +298,7 @@ def get_field(self, field: str, default: str = None):
elif field == 'provider-attributes':
# TODO: This needs to return instance (VM) metadata
value = {}
value['preemptible'] = google_batch_operations.get_preemptible(self._op)
elif field == 'events':
# TODO: This needs to return a list of events
value = []
Expand Down Expand Up @@ -394,16 +395,25 @@ class GoogleBatchJobProvider(google_utils.GoogleJobProviderBase):
def __init__(
self, dry_run: bool, project: str, location: str, credentials=None
):
storage_service = dsub_util.get_storage_service(credentials=credentials)

self._dry_run = dry_run
self._location = location
self._project = project
self._storage_service = storage_service

def _batch_handler_def(self):
return GoogleBatchBatchHandler

def _operations_cancel_api_def(self):
return batch_v1.BatchServiceClient().delete_job

def _get_provisioning_model(self, task_resources):
if task_resources.preemptible:
return batch_v1.AllocationPolicy.ProvisioningModel.SPOT
else:
return batch_v1.AllocationPolicy.ProvisioningModel.STANDARD

def _get_batch_job_regions(self, regions, zones) -> List[str]:
"""Returns the list of regions and zones to use for a Batch Job request.
Expand Down Expand Up @@ -743,6 +753,7 @@ def _create_batch_request(
accelerator_type=job_resources.accelerator_type,
accelerator_count=job_resources.accelerator_count,
),
provisioning_model=self._get_provisioning_model(task_resources),
)

ipt = google_batch_operations.build_instance_policy_or_template(
Expand Down Expand Up @@ -835,6 +846,17 @@ def submit_job(
requests = []

for task_view in job_model.task_view_generator(job_descriptor):

job_params = task_view.job_params
task_params = task_view.task_descriptors[0].task_params

outputs = job_params['outputs'] | task_params['outputs']
if skip_if_output_present:
# check whether the output's already there
if dsub_util.outputs_are_present(outputs, self._storage_service):
print('Skipping task because its outputs are present')
continue

request = self._create_batch_request(task_view)
if self._dry_run:
requests.append(request)
Expand All @@ -849,6 +871,10 @@ def submit_job(
# closely resembles yaml, but can't actually be serialized into yaml.
# Ideally, we could serialize these request objects to yaml or json.
print(requests)

if not requests and not launched_tasks:
return {'job-id': dsub_util.NO_JOB}

return {
'job-id': job_descriptor.job_metadata['job-id'],
'user-id': job_descriptor.job_metadata.get('user-id'),
Expand Down
13 changes: 13 additions & 0 deletions dsub/providers/google_batch_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,16 @@ def get_status_events(op: batch_v1.types.Job):
return op.status.status_events


def get_preemptible(op: batch_v1.types.Job) -> bool:
pm = op.allocation_policy.instances[0].policy.provisioning_model
if pm == batch_v1.AllocationPolicy.ProvisioningModel.SPOT:
return True
elif pm == batch_v1.AllocationPolicy.ProvisioningModel.STANDARD:
return False
else:
raise ValueError(f'Invalid provisioning_model value: {pm}')


def build_job(
task_groups: List[batch_v1.types.TaskGroup],
allocation_policy: batch_v1.types.AllocationPolicy,
Expand Down Expand Up @@ -317,6 +327,7 @@ def build_instance_policy(
disks: List[batch_v1.types.AllocationPolicy.AttachedDisk],
machine_type: str,
accelerators: MutableSequence[batch_v1.types.AllocationPolicy.Accelerator],
provisioning_model: batch_v1.types.AllocationPolicy.ProvisioningModel,
) -> batch_v1.types.AllocationPolicy.InstancePolicy:
"""Build an instance policy for a Batch request.
Expand All @@ -325,6 +336,7 @@ def build_instance_policy(
disks (List[AttachedDisk]): Non-boot disks to be attached for each VM.
machine_type (str): The Compute Engine machine type.
accelerators (List): The accelerators attached to each VM instance.
provisioning_model (enum): Either SPOT (preemptible) or STANDARD
Returns:
An object representing an instance policy.
Expand All @@ -334,6 +346,7 @@ def build_instance_policy(
instance_policy.disks = [disks]
instance_policy.machine_type = machine_type
instance_policy.accelerators = accelerators
instance_policy.provisioning_model = provisioning_model

return instance_policy

Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
# version mismatches.
# This version list generated: 05/03/2024
# direct dependencies
'google-api-python-client>=2.47.0,<=2.127.0',
'google-api-python-client>=2.47.0,<=2.131.0',
'google-auth>=2.6.6,<=2.29.0',
'google-cloud-batch<=0.17.18',
'google-cloud-batch<=0.17.20',
'python-dateutil<=2.9.0',
'pytz<=2024.1',
'pyyaml<=6.0.1',
Expand All @@ -29,6 +29,7 @@
'google-api-core>=2.7.3,<=2.19.0',
'google-auth-httplib2<=0.2.0',
'httplib2<=0.22.0',
'protobuf>=3.19.0,<=5.26.0',
'pyasn1<=0.6.0',
'pyasn1-modules<=0.4.0',
'rsa<=4.9',
Expand Down
13 changes: 3 additions & 10 deletions test/integration/e2e_logging_content.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ EOF
)"

run_dsub \
--unique-job-id \
--command '\
echo -n '"'${STDOUT_MSG%.}'"' && \
1>&2 echo -n '"'${STDERR_MSG%.}'"'
Expand All @@ -73,24 +74,16 @@ echo "Checking output..."
# Check the results
readonly STDOUT_RESULT_EXPECTED="$(echo -n "${STDOUT_MSG%.}")"

# There is a bug with the Batch API where blank lines
# do not get printed to log files.
# Temporarily ignore blank lines for the Batch provider.
diff_args=()
if [[ "${DSUB_PROVIDER}" == "google-batch" ]]; then
diff_args+=("--ignore-blank-lines")
fi

readonly STDOUT_RESULT="$(gsutil cat "${STDOUT_LOG}")"
if ! diff "${diff_args[@]}" <(echo "${STDOUT_RESULT_EXPECTED}") <(echo "${STDOUT_RESULT}"); then
if ! diff <(echo "${STDOUT_RESULT_EXPECTED}") <(echo "${STDOUT_RESULT}"); then
echo "STDOUT file does not match expected"
exit 1
fi

readonly STDERR_RESULT_EXPECTED="$(echo -n "${STDERR_MSG%.}")"

readonly STDERR_RESULT="$(gsutil cat "${STDERR_LOG}")"
if ! diff "${diff_args[@]}" <(echo "${STDERR_RESULT_EXPECTED}") <(echo "${STDERR_RESULT}"); then
if ! diff <(echo "${STDERR_RESULT_EXPECTED}") <(echo "${STDERR_RESULT}"); then
echo "STDERR file does not match expected"
exit 1
fi
Expand Down
1 change: 1 addition & 0 deletions test/integration/e2e_verify_failure_log.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ source "${SCRIPT_DIR}/test_setup_e2e.sh"

# Run the job
if JOB_ID=$(run_dsub \
--unique-job-id \
--image gcr.io/no.such.image \
--command 'echo "Test"' \
--wait); then
Expand Down
66 changes: 66 additions & 0 deletions test/integration/unit_flags.google-batch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,67 @@ function test_zones() {
}
readonly -f test_zones

function test_preemptible_zero() {
local subtest="${FUNCNAME[0]}"

if call_dsub \
--command 'echo "${TEST_NAME}"' \
--preemptible 0; then

# Check that the output contains expected values
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
if [[ "${result}" != "STANDARD" ]]; then
1>&2 echo "provisioning_model was actually ${result}, expected STANDARD"
exit 1
fi
test_passed "${subtest}"
else
test_failed "${subtest}"
fi
}
readonly -f test_preemptible_zero

function test_preemptible_off() {
local subtest="${FUNCNAME[0]}"

if call_dsub \
--command 'echo "${TEST_NAME}"' \
--regions us-central1; then

# Check that the output contains expected values
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
if [[ "${result}" != "STANDARD" ]]; then
1>&2 echo "provisioning_model was actually ${result}, expected STANDARD"
exit 1
fi
test_passed "${subtest}"
else
test_failed "${subtest}"
fi
}
readonly -f test_preemptible_off

function test_preemptible_on() {
local subtest="${FUNCNAME[0]}"

if call_dsub \
--command 'echo "${TEST_NAME}"' \
--regions us-central1 \
--preemptible; then

# Check that the output contains expected values
result=$(grep " provisioning_model:" "${TEST_STDERR}" | awk '{print $2}')
if [[ "${result}" != "SPOT" ]]; then
1>&2 echo "provisioning_model was actually ${result}, expected SPOT"
exit 1
fi
test_passed "${subtest}"
else
test_failed "${subtest}"
fi
}
readonly -f test_preemptible_on

# # Run the tests
trap "exit_handler" EXIT

Expand Down Expand Up @@ -408,3 +469,8 @@ test_neither_region_nor_zone
test_region_and_zone
test_regions
test_zones

echo
test_preemptible_zero
test_preemptible_off
test_preemptible_on
12 changes: 11 additions & 1 deletion test/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,10 @@ function get_test_providers() {
return
fi
case "${test_file}" in
e2e_after.py | \
e2e_after.sh | \
e2e_after_fail.py | \
e2e_after_fail.sh | \
e2e_command_flag.sh | \
e2e_dsub_summary.sh | \
e2e_env_list.py | \
Expand All @@ -233,6 +237,7 @@ function get_test_providers() {
e2e_io_auto.sh | \
e2e_io_gcs_tasks.sh | \
e2e_io_mount_bucket.google-v2.sh | \
e2e_io_mount_bucket_requester_pays.google-v2.sh | \
e2e_io_recursive.sh | \
e2e_io_tasks.sh | \
e2e_logging_content.sh | \
Expand All @@ -241,13 +246,18 @@ function get_test_providers() {
e2e_logging_paths_basic_tasks.sh | \
e2e_logging_paths_log_suffix_tasks.sh | \
e2e_logging_paths_pattern_tasks.sh | \
e2e_logging_paths_retry_failure_tasks.sh | \
e2e_logging_paths_retry_tasks.sh | \
e2e_non_root.sh | \
e2e_preemptible_retries_fail.google-v2.sh | \
e2e_python.sh | \
e2e_requester_pays_buckets.sh | \
e2e_retries_success.sh | \
e2e_retries_fail_1.sh | \
e2e_retries_fail_2.sh | \
e2e_runtime.sh)
e2e_runtime.sh | \
e2e_skip.sh | \
e2e_skip_tasks.sh)
local all_provider_list="${DSUB_PROVIDER:-local google-v2 google-cls-v2 google-batch}"
;;
*)
Expand Down

0 comments on commit 0cbbdb5

Please sign in to comment.