GitHub

A pdf table extracter and presents the extracted data in csv format. getting output in the form of json, xml is currently in testing stage.

Dependencies 1)pdftohtml ---this tool must be installed and must be used for making the given pdf to xml command:-- pdftohtml filename.pdf -xml output--filename.xml
2)lxml parser is required
3)beautiful soup 4 is required

steps:--

1)After converting given pdf to xml using pdftohtml tool using above command then

2)use command : -- (change directory to where code.py is placed)
python code.py -f filename.xml > /path/to/destination_filename.csv

i.e the output csv is redirected to destination filename using ">" operator and /path/to/ :-is the path where final csv output must be copied to

[ the above I have tested and used in linux --ubuntu 14.04 ]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
code.py		code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

nikhilponnuru/DistillTable

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages