HangulDB-Image

Korean handwriting dataset parsed from the HangulDB.

Samples

Each image has different width and height. For the consistency with the original, I intentionally preserve the property.

This repo contains PE92, SERI95, and HanDB.

PE92 contains 2350 classes, each with 100 samples.
SERI95 contains 520 classes, each with 1000 samples.
HANDB merges SERI95 and PE92. That is, 520 classes have 1100 samples and the others (1820 classes) have 100 samples.

Architecture

Three datasets have the same structure:

<dataset_name>/<label>/<sample_index>.jpg

warning

PE92 contains some mislabeled samples at the last few samples for each class.

parser.ipynb parses a hgu1 file to several jpg files. You can test whether it correctly parse the original dataset using parser.ipynb.