Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic dita documentation #29

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from
Open
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ This bash script runs the post correction using a pre-trained
OCRs should be used, models for these OCR steps are required and must
be configured in an according configuration file (see ocrd-tool.json).

Arguments:
This tool accepts the following Arguments:
* `--parameter` path to configuration file
* `--input-file-grp` name of the master-OCR file group
* `--output-file-grp` name of the post-correction file group
Expand All @@ -62,7 +62,7 @@ This tool is used to align the master OCR with any additional support
OCRs. It accepts a comma-separated list of input file groups, which
it aligns in order.

Arguments:
This tool accepts the following Arguments:
* `--parameter` path to configuration file
* `--input-file-grp` comma seperated list of the input file groups;
first input file group is the master OCR
Expand All @@ -72,8 +72,10 @@ Arguments:

### ocrd-cis-train.sh
Script to train a model from a list of ground-truth archives (see
ocrd-tool.json) for the post correction. The tool somewhat mimics the
behaviour of other ocrd tools:
ocrd-tool.json) for the post correction.

The tool somewhat mimics the behaviour of other ocrd tools and accepts
the following Arguments:
* `--mets` for the workspace
* `--log-level` is passed to other tools
* `--parameter` is used as configuration
Expand All @@ -85,10 +87,12 @@ Helper tool to get the path of the installed data files. Usage:
path to th default 3-grams language model file.

### ocrd-cis-wer
Helper tool to calculate the word error rate aligned ocr files. It
writes a simple JSON-formated stats file to the given output file group.
Helper tool to calculate the word error rate of aligned ocr files. It
writes a simple JSON-formated stats file to the given output file
group.

Arguments:
This tool accepts the following Arguments:
* `--parameter` set configuration file
* `--input-file-grp` input file group of aligned ocr results with
their respective ground truth.
* `--output-file-grp` name of the file group for the stats file
Expand Down
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-align/authors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Authors
1. Christoph Weber
2. Florian Fink
3. Robert Sachunsky
4. Tobias Englmeier
22 changes: 22 additions & 0 deletions data/docs/ocrd-cis-align/copyright.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# License
MIT License

Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-align/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Description of ocrd-cis-align {#description .concept}
Aligns tokens of multiple input file groups to one output file group.
This tool is used to align the master OCR with any additional support
OCRs. It accepts a comma-separated list of input file groups, which
it aligns in order.
42 changes: 42 additions & 0 deletions data/docs/ocrd-cis-align/glossary.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE glossgroup
PUBLIC "-//OASIS//DTD DITA Glossary Group//EN" "glossgroup.dtd">
<glossgroup id="Glossar">
<title>Glossar</title>
<!--
<glossentry id="txtline">
<glossterm>Textline</glossterm>
<glossdef>A TextLine is a block of text without line break.
</glossdef>
</glossentry>
<glossentry id="gt">
<glossterm>Ground Truth</glossterm>
<glossdef>Ground truth (GT) in the context of OCR-D are
transcriptions, specific structure descriptions and word lists.
These are essentially available in PAGE XML format in
combination with the original image. Essential parts of
the GT were created manually.
</glossdef>
-->
</glossgroup>
<!--
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE glossgroup
PUBLIC "-//OASIS//DTD DITA Glossary Group//EN" "glossgroup.dtd">
<glossgroup id="Glossar">
<title>Glossar</title>
<glossentry id="txtline">
<glossterm>Textline</glossterm>
<glossdef>A TextLine is a block of text without line break.
</glossdef>
</glossentry>
<glossentry id="gt">
<glossterm>Ground Truth</glossterm>
<glossdef>Ground truth (GT) in the context of OCR-D are
transcriptions, specific structure descriptions and word lists.
These are essentially available in PAGE XML format in
combination with the original image. Essential parts of
the GT were created manually.
</glossdef>
</glossgroup>
-->
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-align/inputFormatDescription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Input format {#inputFormatDescription .reference}
4 changes: 4 additions & 0 deletions data/docs/ocrd-cis-align/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Installation of ocrd-cis-align {#installation .task}
1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional)
2. Install ocrd_cis: `make install`
3. Test the installation: `make test` (optional)
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-align/name.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# ocrd-cis-align
8 changes: 8 additions & 0 deletions data/docs/ocrd-cis-align/option.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Options for ocrd-cis-align {#option .reference}
This tool accepts the following Arguments:
* `--parameter` path to configuration file
* `--input-file-grp` comma seperated list of the input file groups;
first input file group is the master OCR
* `--output-file-grp` name of the file group for the aligned result
* `--log-level` set log level
* `--mets` path to METS file in workspace
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-align/outputFormatDescription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Output format {#outputFormatDescription .reference}
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-align/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Parameters {#parameters .reference}
The tool ocrd-cis-align accepts the following configuration parameters:
```json
{}
```
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-align/release_notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Release notes
2 changes: 2 additions & 0 deletions data/docs/ocrd-cis-align/reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Reporting
Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues)
2 changes: 2 additions & 0 deletions data/docs/ocrd-cis-align/tool.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Tool ocrd-cis-align {#Tool .concept}
Align multiple OCRs and/or GTs
18 changes: 18 additions & 0 deletions data/docs/ocrd-cis-align/topicmap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="name.md" format="markdown"/>
<topicref href="release_notes.md" format="markdown"/>
<topicref href="installation.md" format="markdown"/>
<topicref href="tool.md" format="markdown"/>
<topicref href="description.md" format="markdown"/>
<topicref href="option.md" format="markdown"/>
<topicref href="inputFormatDescription.md" format="markdown"/>
<topicref href="parameters.md" format="markdown"/>
<topicref href="outputFormatDescription.md" format="markdown"/>
<topicref href="troubleshooting.xml"/>
<topicref href="glossary.xml"/>
<topicref href="authors.md" format="markdown"/>
<topicref href="reporting.md" format="markdown"/>
<topicref href="copyright.md" format="markdown"/>
</map>
29 changes: 29 additions & 0 deletions data/docs/ocrd-cis-align/troubleshooting.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE troubleshooting
PUBLIC "-//OASIS//DTD DITA 1.3 Troubleshooting//EN" "troubleshooting.dtd">
<troubleshooting id="Troubleshooting">
<title>Troubleshooting</title>
<!--
<troublebody>
<condition>
<title>Condition</title>
<p></p>
</condition>
<troubleSolution>
<cause>
<title>Cause</title>
<p></p>
</cause>
<remedy>
<title>Remedy</title>
<responsibleParty></responsibleParty>
<steps>
<step>
<cmd></cmd>
</step>
</steps>
</remedy>
</troubleSolution>
</troublebody>
-->
</troubleshooting>
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/authors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Authors
1. Christoph Weber
2. Florian Fink
3. Robert Sachunsky
4. Tobias Englmeier
22 changes: 22 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/copyright.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# License
MIT License

Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Description of ocrd-cis-post-correct.sh {#description .concept}
This bash script runs the post correction using a pre-trained
[model](http://cis.lmu.de/~finkf/model.zip). If additional support
OCRs should be used, models for these OCR steps are required and must
be configured in an according configuration file (see ocrd-tool.json).
42 changes: 42 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/glossary.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE glossgroup
PUBLIC "-//OASIS//DTD DITA Glossary Group//EN" "glossgroup.dtd">
<glossgroup id="Glossar">
<title>Glossar</title>
<!--
<glossentry id="txtline">
<glossterm>Textline</glossterm>
<glossdef>A TextLine is a block of text without line break.
</glossdef>
</glossentry>
<glossentry id="gt">
<glossterm>Ground Truth</glossterm>
<glossdef>Ground truth (GT) in the context of OCR-D are
transcriptions, specific structure descriptions and word lists.
These are essentially available in PAGE XML format in
combination with the original image. Essential parts of
the GT were created manually.
</glossdef>
-->
</glossgroup>
<!--
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE glossgroup
PUBLIC "-//OASIS//DTD DITA Glossary Group//EN" "glossgroup.dtd">
<glossgroup id="Glossar">
<title>Glossar</title>
<glossentry id="txtline">
<glossterm>Textline</glossterm>
<glossdef>A TextLine is a block of text without line break.
</glossdef>
</glossentry>
<glossentry id="gt">
<glossterm>Ground Truth</glossterm>
<glossdef>Ground truth (GT) in the context of OCR-D are
transcriptions, specific structure descriptions and word lists.
These are essentially available in PAGE XML format in
combination with the original image. Essential parts of
the GT were created manually.
</glossdef>
</glossgroup>
-->
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Input format {#inputFormatDescription .reference}
4 changes: 4 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Installation of ocrd-cis-post-correct.sh {#installation .task}
1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional)
2. Install ocrd_cis: `make install`
3. Test the installation: `make test` (optional)
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-post-correct.sh/name.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# ocrd-cis-post-correct.sh
7 changes: 7 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/option.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Options for ocrd-cis-post-correct.sh {#option .reference}
This tool accepts the following Arguments:
* `--parameter` path to configuration file
* `--input-file-grp` name of the master-OCR file group
* `--output-file-grp` name of the post-correction file group
* `--log-level` set log level
* `--mets` path to METS file in workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Output format {#outputFormatDescription .reference}
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Parameters {#parameters .reference}
The tool ocrd-cis-post-correct.sh accepts the following configuration parameters:
```json
null
```
1 change: 1 addition & 0 deletions data/docs/ocrd-cis-post-correct.sh/release_notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Release notes
2 changes: 2 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Reporting
Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues)
2 changes: 2 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/tool.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Tool ocrd-cis-post-correct.sh {#Tool .concept}
null
18 changes: 18 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/topicmap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="name.md" format="markdown"/>
<topicref href="release_notes.md" format="markdown"/>
<topicref href="installation.md" format="markdown"/>
<topicref href="tool.md" format="markdown"/>
<topicref href="description.md" format="markdown"/>
<topicref href="option.md" format="markdown"/>
<topicref href="inputFormatDescription.md" format="markdown"/>
<topicref href="parameters.md" format="markdown"/>
<topicref href="outputFormatDescription.md" format="markdown"/>
<topicref href="troubleshooting.xml"/>
<topicref href="glossary.xml"/>
<topicref href="authors.md" format="markdown"/>
<topicref href="reporting.md" format="markdown"/>
<topicref href="copyright.md" format="markdown"/>
</map>
29 changes: 29 additions & 0 deletions data/docs/ocrd-cis-post-correct.sh/troubleshooting.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE troubleshooting
PUBLIC "-//OASIS//DTD DITA 1.3 Troubleshooting//EN" "troubleshooting.dtd">
<troubleshooting id="Troubleshooting">
<title>Troubleshooting</title>
<!--
<troublebody>
<condition>
<title>Condition</title>
<p></p>
</condition>
<troubleSolution>
<cause>
<title>Cause</title>
<p></p>
</cause>
<remedy>
<title>Remedy</title>
<responsibleParty></responsibleParty>
<steps>
<step>
<cmd></cmd>
</step>
</steps>
</remedy>
</troubleSolution>
</troublebody>
-->
</troubleshooting>
5 changes: 5 additions & 0 deletions data/docs/ocrd-cis-train.sh/authors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Authors
1. Christoph Weber
2. Florian Fink
3. Robert Sachunsky
4. Tobias Englmeier
22 changes: 22 additions & 0 deletions data/docs/ocrd-cis-train.sh/copyright.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# License
MIT License

Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Loading