A "hello world" example of how to implement a containerized Java batch job (extractor, pipeline, algorithm) that runs on a schedule.
This module illustrates the following capabilities:
- Metrics for monitoring
- Configuration via file and env. variables
- Logging
- Wrapping everything nicely into a container
You can run this module in several ways: 1) locally as a Java application, 2) locally as a container on K8s, 3) on a remote K8s cluster. All options allow you to both run but also enjoy a full debugging developer experience.
The minimum requirements for running the module locally:
- Java 17 SDK
- Maven
On Linux/MaxOS:
$ mvn compile exec:java -Dexec.mainClass="com.cognite.examples.Demo"
On Windows Powershell:
> mvn compile exec:java -D exec.mainClass="com.cognite.examples.Demo"
Minimum requirements for running the module on K8s:
- Java 17 SDK: https://adoptium.net/
- Maven: https://maven.apache.org/download.cgi
- Skaffold: https://github.com/GoogleContainerTools/skaffold/releases
- Local K8s with kubectl
Make sure your kube context points to the K8s cluster that you want to run the container on. For example, if you have Docker desktop installed, you should see something like the following:
$ kubectl config current-context
docker-desktop
Then you can build and deploy the container using Skaffold's dev
mode:
$ skaffold dev
This will compile the code, build the container locally and deploy it as a job
on your local K8s cluster. By using
skaffold dev
you also get automatic log tailing so the container logs will be output to your console. When the
container job finishes, you can press ctrl + c
and all resources will be cleaned up.
Metrics are exposed as prometheus metrics
via Prometheus client libraries. You instrument your code by:
- Import the libraries. Have a look at the
pom.xml
file. - Defining the metrics.
- Populating/updating the metric values.
- Exposing the metrics via a http endpoint (for services) or by pushing them to the gateway (for batch jobs).
Demo.java
illustrates how to set up basic batch job metrics and push them to the metrics gateway.
Adding configurability allows you to easily reuse your data application by adjusting parameters at deployment time (as opposed to having to do code changes). Smallrye Config
is a configuration library that enables configuration via yaml config files and environment variables.
Smallrye
will try to resolve a configuration entry by checking a set of "sources":
- The default config file packaged with the code at ./src/main/resources/META-INF/microprofile.yaml.
- The optional, deployment-provided config file at
./config/config.yaml
. - Environment variables.
- System properties (set via the cmd arguments).
A config source with a higher number will override a config source with a smaller number. I.e. an environment variable will override the config file setting.
You should always provide a default config file packaged with the code at ./src/main/resources/META-INF/microprofile.yaml
. This file ensures you provide your module with sensible running defaults as well as serve as the configuration template for deployments. In the default config file you should also define the auth config keys, but not populate them with any values. Check the example 2-raw-to-clean-batch-job for an illustration of how to deal with auth.
Demo.java illustrates how to access the configuration settings in your code. Have a look at the static members near the top of the file.
./kubernetes-manifests/*
illustrate how to supply a configuration file when running this module as a container on K8s. The basic steps are as follows:
- Define the
config.yaml
file with the settings you want to apply. This would typically be all the configuration settings except the secrets (i.e. keys, passwords, etc.). - Define the main application manifest,
k8-demo.job.yaml
. In this example, we useJob
as the workload. You would typically useCronJob
orDeployment
in test/prod whileJob
is a good option for dev. In the manifest, we mount the configuration as a file for the container (your code) to read. - Bind it all together via
kustomization.yaml
.
We recommend following container logging best-practices and route log entries to st out
and st error
. This ensures that the logs will be picked up by K8s. Then your K8s infrastructure can forward the logs to the sink of your choice via a centralized infrastructure as opposed to configuring each individual code module.
In this example, we use slf4j and Logback as logging libraries. The logger configuration is defined at ./src/resources/logback.xml and specifies logging to st out/error
.
There are several ways to package a Java application as a container: Docker build
, Jib and Buildpacks. Out of these options, Jib
is a fast, mature and convenient option today while we expect buildpacks
to become an improved alternative in the longer run.
This example illustrates how to use Jib to build the container with Skaffold as the orchestrator.
To enable Jib
, just add it to the Maven pom.xml as a build plugin:
<build>
<plugins>
<plugin>
<groupId>com.google.cloud.tools</groupId>
<artifactId>jib-maven-plugin</artifactId>
<version>${jib.maven.plugin.version}</version>
<configuration>
<container>
<creationTime>USE_CURRENT_TIMESTAMP</creationTime>
</container>
</configuration>
</plugin>
</plugins>
</build>
Then we use the skaffold.yaml configuration to specify how to build the container. Skaffold
has a native integration with Jib
and will manage Jib's work. The build
section in skaffold.yaml
directs how the container is built:
build:
local:
push: false # When building locally, do not push the image to a repository
artifacts:
- image: k8-demo # Image name
context: .
jib: # Use Jib as the container builder
fromImage: "gcr.io/distroless/java17-debian11" # Use a "distroless" base image
tagPolicy:
dateTime: # The image will be tagged with the build timestamp
format: "20060102T150405"
timezone: "UTC"