Skip to content

Parse Wikipedia XML data dumps to create an interactive graph

Notifications You must be signed in to change notification settings

ejagombar/WikiMapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiMapper

Parse Wikipedia XML data dumps to create an interactive graph to explore relationships between Wikipedia pages.

This project has been split into seperate smaller projects, in the subfolders explorer/ and dbLoader/. The READMEs for these projects are linked below.

Overall Project Aims

  • Parse the XML file, extracting page names and links
  • Analyse performance using GProf
  • Parallelise the parser.
  • Import data into Neo4j using Neo4j-Admin-Import
  • Setup a third party visual Neo4j graph database explorer
  • Develop a custom graph storage library
  • Develop a custom viewer to visualise the data.

Progress Log

In order for me to keep track of what I am doing on this project. I have tried to keep a progress log. Some sections are missing, or very sparse. Progress Log

About

Parse Wikipedia XML data dumps to create an interactive graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published