Skip to content

Datasets created during Hierarchical Interest Graph generation

Notifications You must be signed in to change notification settings

pavan046/higdataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Hierarchical Interest Graph datasets

This repository comprises of the datasets that are created for our approach that generates Hierarchical Interest Graph for Twitter users.

The two datasets are zipped files and their description are as follows:

  1. wikipedia-hierarchy-priority -- This dataset comprises of the Wikipedia Hierarchy as a tab seperated file. The format of the tsv is
    <sub-category><tab><category><tab><priority>
    <sub-category> and are labels of Wikipedia categories
    <priority> is an integer value that reflects the suitability of the being a category of

           Numbers: 802,194 Categories 
                    1,177,558 Category-Subcategory relationships
    
  2. irrelevant-wiki-categories -- This dataset comprises of the list of Wikipedia Admin Categories that are removed from the Wikipedia Category Graph before creation of the Wikipedia Hierarchy.

           Numbers: 64,362 Categories
    

About

Datasets created during Hierarchical Interest Graph generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published