-
Notifications
You must be signed in to change notification settings - Fork 40
Importing a preexisting database
Work in progress
This topic is not strictly related to EFES but is connected more broadly to the creation of EpiDoc files from preexisting document collections. The following instructions were not intended as an ideal universal workflow, but derive from specific sample cases.
Assuming that you have an Excel file in which each row contains some data related to a specific document/inscription:
-
Fill in the empty cells, e.g. with '-'
-
Export the Excel file as XML Spreadsheet, saving it on the Desktop e.g. as 'all.xml' (this will generate one XML file with all the documents)
-
Restore any missed apostrophes and accents in the xml file (e.g. replace all
'
with ') -
Delete all the occurrences of 'ss:' from the xml file
-
In the xml file replace all the occurrences of
<Row .+?>
with<Row>
(using Regular Expressions; this can be done with Oxygen XML Editor 'Find/Replace' selecting the 'Regular expression' option) -
Delete everything before the
<Row>
of the first useful row and everything after the</Row>
of the last useful row -
Type the following command in the Terminal:
cd Desktop && awk '{if ($0 ~ /<Row>/) a++} { print >> ("doc"a".xml") } {close("doc"a".xml")}' all.xml
(this will generate one XML file for each document) -
Create an XSLT file to transform the generated raw XML files into XML files based on the EpiDoc template; you can name it e.g. 'xml-to-epidoc.xsl' and save it on the Desktop (see an example here)
-
In Oxygen XML Editor create a new Project (from the Project menu or tab) and add all the generated XML files to the Project
-
Select all the XML files from the Project side tab, right click on them and select 'Transform' > 'Configure transformation scenario' > 'New' > 'XML transformation with XSLT' (with these values: XML URL
${currentFileURL}
, XSL URL${cfd}/xml-to-epidoc.xsl
, Save as${cfd}/epidoc/${cfne}
; these values should be changed if your XML and XSLT files are located elsewhere); then select 'Apply associated' (this will generate an EpiDoc XML file for each document) -
Add the link to the EpiDoc schema to all files with Oxygen XML Editor 'Find/Replace in Files', selecting as Scope the 'epidoc' folder with the new EpiDoc files, selecting the 'Regular expression' option and replacing
<TEI
with<?xml-model href="http://epidoc.stoa.org/schema/latest/tei-epidoc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>\n<TEI
-
Move all the EpiDoc files inside the EFES 'epidoc' folder
- Export the database as a single XML file, saving it on the Desktop e.g. as 'all.xml'
- Restore any missed apostrophes and accents in the xml file (e.g. replace all
'
with ') - In the xml file replace all the occurrences of
<ROW .+?>
with<ROW>
(using Regular Expressions; this can be done with Oxygen XML Editor 'Find/Replace' selecting the 'Regular expression' option) - Delete everything before the
<ROW>
of the first useful row and everything after the</ROW>
of the last useful row - Type the following command in the Terminal:
cd Desktop && awk '{if ($0 ~ /<ROW>/) a++} { print >> ("doc"a".xml") } {close("doc"a".xml")}' all.xml
(this will generate one XML file for each document) - Follow steps 8-12 of the previous section