FileBasedMiniDMS.php by Stefan Weiss (2017-2023)
https://github.com/stweiss/FileBasedMiniDMS
- Place this file on your FileServer/NAS
- For OCR (Step 1): Install Docker and pull an ocrmypdf image, eg.
docker pull jbarlow83/ocrmypdf
- For Automatic rename (Step 1.1): make sure that pdftotext is available.
- Copy config.php.template to config.php.
- Adjust settings for this script in config.php to fit your needs.
- Create a cronjob on your FileServer/NAS to execute this script regularly. (In DSM you can do this in Control Panel -> Task Scheduler) It might be required to assign root privilege.
ex.php /volume1/home/stefan/Scans/FileBasedMiniDMS.php
or redirect stdout to see PHP Warnings/Errors:
php /volume1/home/stefan/Scans/FileBasedMiniDMS.php >> /volume1/home/stefan/Scans/my.log 2>&1
This script works in three steps. Each step can be turned on/off in config.php:
OCR pdf files in the $inboxfolder, whose filename matches $matchWithoutOCR
The pdf is going to be renamed to following structure: "<date> <name> <tags>.pdf"
<date>: The script tries to find a date in the pdf. If none is found the current date is used.
<name>: You can define $renamerules. The first rule which matches the ocr'ed content of the first page is used. You can use the operators & (AND) and , (OR) and you can use the wildcard operators ? and *.
<tags>: In $tagrules you can specify your tags. All matching rules will add their tag to the filename. You can use the same operators here.
This script creates a subfolder for each hashtag it finds in your filenames and creates a hardlink in that folder. Documents are expected to be stored flat in one folder. Name-structure needs to be like "<any name> #hashtag1 #hashtag2.extension".
eg: "Documents/Scans/2015-12-25 Bill of Santa Clause #bills #2015.pdf" will be linked into:
- "Documents/Scans/tags/2015/2015-12-25 Bill of Santa Clause #bills.pdf"
- "Documents/Scans/tags/bills/2015-12-25 Bill of Santa Clause #2015.pdf"
Q: How do I assign another tag to my file?
A: Simply rename the file in the $scanfolder and add the tag at the end (but
before the extension).
Q: How can I fix a typo in a documents filename?
A: Simply rename the file in the $scanfolder. The tags are created from scratch
at the next scheduled interval and the old links and tags are automatically
getting removed.
Make sure to have a backup before you start using this script. You use this software on your own risk.
Version 0.18 (03.02.2023)
- new feature: incase you manually rename the file to have a proper date, map that back to file modification date. Adds $doFixTimestampBasedOnName to the config!
- fixed finding correct date in form of yyyy-mm-dd in a PDF
Version 0.17 (10.02.2021)
- renamed default config.php to config.php.template (in git), so you can pull without overwriting your local config.php (issue #13)
- introduced option to set the detected date as file date (variable $setfiletime, enabled by default) (issue #5)
- skip future dates during date detection
- minor bugfixes
Version 0.16 (29.10.2019)
- Add date format 01. Januar, 2019 / 01 Januar 2019 (thanks @SirUli) (pull #12)
- internal clean up
Version 0.15 (27.09.2019)
- now compatible with ocrmypdf v9.0.0
Version 0.14 (12.04.2019)
- don't ORC files, which already have been ocr'ed. Should have been happening only in special rare cases. (issue #9)
- change to long php opening tags for better php compatibility (issue #6)
Version 0.13 (22.10.2018)
- improved detection of dates (thanks vanto) (pull #7)
Version 0.12b (12.06.2017)
- New: $dateseperator can be modified in config.php
- Change: Default date for rename is now creation date of the pdf. (was "now" before)
Version 0.11 (08.06.2017)
- New: automatic OCR and automatic rename
Version 0.02 (02.03.2016)
- release of this file based document management system.
- sorts files with hashtags into hashtag-folders.