Parse Wikipedia XML data dumps to create an interactive graph to explore relationships between Wikipedia pages.
This project has been split into seperate smaller projects, in the subfolders explorer/
and dbLoader/
. The READMEs for these projects are linked below.
- Parse the XML file, extracting page names and links
- Analyse performance using GProf
- Parallelise the parser.
- Import data into Neo4j using Neo4j-Admin-Import
- Setup a third party visual Neo4j graph database explorer
- Develop a custom graph storage library
- Develop a custom viewer to visualise the data.
In order for me to keep track of what I am doing on this project. I have tried to keep a progress log. Some sections are missing, or very sparse. Progress Log