Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick hint job scheduling #384

Merged
merged 1 commit into from
May 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/user_guides/projects/jobs/pyspark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,13 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a PySpark job.
the various configuration parameters each job type comes with. Hopsworks clusters support scheduling to run jobs on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).

The PySpark program can either be a `.py` script or a `.ipynb` file.

PySpark program can either be a `.py` script or a `.ipynb` file, however be mindful of how to access/create
the spark session based on the extension you provide.

!!! notice "Instantiate the SparkSession"
For a `.py` file, remember to instantiate the SparkSession i.e `spark=SparkSession.builder.getOrCreate()`
Expand Down
4 changes: 3 additions & 1 deletion docs/user_guides/projects/jobs/python_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a Python job.
the various configuration parameters each job type comes with. Hopsworks support scheduling jobs to run on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).

!!! note "Kubernetes integration required"
Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster.
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guides/projects/jobs/schedule_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Documentation on how to schedule a job on Hopsworks.

## Introduction

Hopsworks jobs can be scheduled to run at regular intervals using the scheduling function provided by Hopsworks. Each job can be configured to have a single schedule.
Hopsworks clusters can run jobs on a schedule, allowing you to automate the execution. Whether you need to backfill your feature groups on a nightly basis or run a model training pipeline every week, the Hopsworks scheduler will help you automate these tasks. Each job can be configured to have a single schedule. For more advanced use cases, Hopsworks integrates with any DAG manager and directly with the open-source [Apache Airflow](https://airflow.apache.org/use-cases/), check out our [Airflow Guide](../airflow/airflow.md).

Schedules can be defined using the drop down menus in the UI or a Quartz [cron](https://en.wikipedia.org/wiki/Cron) expression.

Expand Down
5 changes: 4 additions & 1 deletion docs/user_guides/projects/jobs/spark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a Spark job.
the various configuration parameters each job type comes with. Hopsworks support scheduling to run jobs on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).


## UI

Expand Down