-
Copy
subset_1_filtered_updated_final_output.csv
into the local folder- This is the raw output from the complete PDF extraction pipeline
-
Process the raw OCR output using
process_raw_output.ipynb
- Demonstrates REGEX application via
extract_meaningful_text
function - Pipeline:
- Filter pipeline errors from dirty web-scraped PDFs
- Apply REGEX to produce continuous training text
- Modify
extract_meaningful_text
function as needed for different outputs
- Demonstrates REGEX application via
-
Notifications
You must be signed in to change notification settings - Fork 0
aisingapore/goto_indo_journal_pipeline
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published