Base image for Overview converters.
A converter's job is to turn files of one type into files of another type. It does this in a loop. It receives jobs from an internal Overview HTTP server.
This base image provides portable executables that communicate with Overview. They make up a framework: they'll call your converter program, which you can write in any language.
Your converter will have a Dockerfile that looks like this:
FROM overview/overview-converter-framework AS framework
# multi-stage build
FROM alpine:3.7 AS build
... (build your executables, including `do-convert-single-file`)
FROM alpine:3.7 AS production
# Add ca-certificates to let container download from S3 https:// URLs
RUN apk add --update --no-cache ca-certificates
WORKDIR /app
# The framework provides the main executable
COPY --from=framework /app/run /app/run
# Your `do-convert` code can choose from a few different input and output
# formats. The framework provides many `/app/convert` implementations: pick
# the one that matches your `do-convert`.
COPY --from=framework /app/convert-single-file /app/convert
COPY --from=build /app/do-convert-single-file /app/do-convert-single-file
This framework runs on a loop:
- Download a task from Overview as JSON.
- Open a stream to download the body of the input file.
- Stream the body to
/app/convert MIME-BOUNDARY JSON
and pipe the results to Overview.
/app/run
handles all communication with Overview. In particular:
/app/run
polls for tasks atPOLL_URL
. Overview's administrator must setPOLL_URL
for your container./app/run
will retry if there is a connection error./app/run
will never crash.- TODO
/app/run
will poll Overview to check if the task is canceled. It will notify/app/convert
withSIGINT
if the task is canceled.
/app/convert
is a program we provide, under a few different names. That is,
when you create your program you'll choose one of the following implementations
to copy into /app/convert
in your image.
From /app/run
's point of view, /app/convert
will read the input stream
and JSON
command-line argument and produce a multipart/form-data
output
stream with MIME boundary MIME-BOUNDARY
(in C lingo, argv[1]
).
/app/convert
will never crash, and it will always output a data stream that
Overview can handle.
Your code is invoked by /app/convert
, following one of these strategies:
This version of /app/convert
will:
- Write standard input to
input.blob
in a temporary directory and verify it's the correct size - Run
/app/do-convert-single-file JSON
(your code) in the temporary directory - Translate the
stdout
from your code into progress events or an error event - When your code exits with status
0
and no error message, pipeoutput.json
,output.blob
-- and if they exist,output-thumbnail.jpg
,output-thumbnail.png
andoutput.txt
-- and adone
event
Special cases:
- Cancelation: if
/app/run
sends aSIGINT
signal, sends your programSIGINT
. Your program should kill and wait for any child processes, then exit. Its standard output and standard error will be ignored. - Error: if
/app/do-convert-single-file
exits with non-zero return value, pipes anerror
event.
You must provide /app/do-convert-single-file
. The framework will invoke
/app/do-convert JSON
. Your program can read input.blob
in the current
working directory. Your program must:
- Write progress messages to
stdout
, newline-delimited, that look like:p1/2
-- "finished processing page 1 of 2"b102/412
-- "finished processing byte 102 of 412"0.324
-- "finished processing 32.4% of input"anything else at all
-- "ERROR: [the line of text]"
- Write
output.json
,output.blob
, and optionallyoutput-thumbnail.jpg
,output-thumbnail.png
and/oroutput.txt
. - Exit with status code
0
. Any other exit code is an error in your code.
You can test /app/do-convert-single-file
by creating a Docker image with the
special framework program, /app/test-convert-single-file
. This is designed to
integrate with automated build enviroments like Docker Hub.
Your Docker build stage doesn't need a CMD
. It should include:
/app/test-convert-single-file
-- and you shouldRUN [ "/app/test-convert-single-file" ]
/app/do-convert-single-file
and everything it depends on --/app/test-convert-single-file
will invoke it once per test/app/test/test-*
: one directory per test, e.g./app/test/test-with-ocr
. Each test directory should contain:input.blob
input.json
-- the JSON passed todo-convert-single-file
stdout
-- expected standard output fromdo-convert-single-file
0.blob
-- expected0.blob
output0.json
-- expected0.json
output0.txt
(optional) -- expected0.txt
output0-thumbnail.{png,jpg}
(optional) -- expected output
test-convert-single-file
will run do-convert-single-file
in a separate
directory per test. It will output in TAP format
and exit with status code 1
if any test fails.
The test output is designed to help you correct your tests. For instance, here
is example output from a test that fails because you did not write
0-thumbnail.jpg
Step 12/13 : RUN [ "/app/test-convert-single-file" ]
---> Running in f65521f3a30c
1..3
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
not ok 1 - test-jpg-ocr
do-convert-single-file wrote /tmp/test-do-convert-single-file912093989/0-thumbnail.jpg, but we expected it not to exist
...
Upon seeing this error, you can
docker cp f65521f3a30c:/tmp/test-do-convert-single-file912093989/0-thumbnail.jpg .
to inspect the file in question (and perhaps make it the expected one).
PDF output is a common case. We use QPDF for file comparison, to ease debugging.
Your Dockerfile must install QPDF -- e.g., apk --no-cache add qpdf
-- before
running RUN [ "/app/test-convert-single-file" ]
if you are testing PDF output.
This version of /app/convert
will:
- Create an empty temporary directory
- Run
/app/do-convert-stream-to-mime-multipart MIME-BOUNDARY JSON
(your code) within the temporary directory - Stream the input file from Overview to your program's
stdin
and and pipe your program'sstdout
to Overview
Special cases:
- Cancelation: if
/app/run
sends aSIGINT
signal, sends your programSIGINT
. Your program should kill and wait for any child processes, then exit. Its standard output and standard error will be ignored. - Error: if your program exits with non-zero return value, pipes an
error
event. - Buggy code: emits an
error
event if your program does not produce aerror
ordone
event or end with--MIME-BOUNDARY--
. - Temporary files: if your program emits temporary files to its current working directory, they will be deleted.
You must provide /app/do-convert-stream-to-mime-multipart
. The framework
will invoke it with MIME-BOUNDARY
and JSON
as arguments. MIME-BOUNDARY
will match the regex [a-fA-F0-9]{1,60}
. Your program can read input.blob
in the current directory.
Your program must write valid multipart/form-data
output to stdout
. For
instance:
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="0.json"\r\n
\r\n
{JSON for first output file}\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="0.blob"\r\n
\r\n
Blob for first output file\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="progress"\r\n
\r\n
{"pages":{"nProcessed":1,"nTotal":3}}\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="done"\r\n
\r\n
--MIME-BOUNDARY--
Rules:
- Your output must end with a
done
orerror
element. Adone
element should be empty; anerror
element must include an error message. - Your output must be in order:
0.json
,0.blob
, (optionally0.png
,0.jpg
and/or0.txt
),1.json
,1.blob
, ...,done
. - You should output an accurate progress report before each
N.json
to help Overview's progressbar behave well.
Even more lightweight than /app/convert-stream-to-mime-multipart
is to roll
your own version of /app/convert
. Beware, though:
- Your own version of
/app/convert
must always output messages to Overview: especially adone
orerror
event. Without those events, Overview will never finish processing the file: it will retry indefinitely. - Your own version of
/app/convert
must always exit successfully. The trickiest case, in our experience, is handling "out of memory." If your/app/convert
does not exit successfully, Overview will retry indefinitely and the file will never be processed. - Your own version of
/app/convert
should output helpful error messages, so you can debug it easily. - Your own version of
/app/convert
should end quickly after receivingSIGUSR
, because Overview will ignore all further output. - Your own version of
/app/convert
must ensure temporary files invoked during one invocation aren't read by the next invocation: that would leak users' documents to other users.
/app/convert-stream-to-mime-multipart
is small and fast, and it solves these
problems for you. You probably want it.
./dev
will start a development loop that runs tests. Restart it if you edit
Dockerfile
.
docker build .
will run all tests.
Tests are in ./test/*/suite.bats
. They're run in
bats, an ideal framework for testing
programs that pipe data around.
./release MAJOR.MINOR.PATCH
will push to GitHub. Docker Hub will build the
images for mass consumption.
This software is Copyright 2011-2018 Jonathan Stray and Copyright 2019-2020 Overview Computing Inc., and distributed under the terms of the GNU Affero General Public License. See the LICENSE file for details.