Skip to content

haifeng-zhang/Wiki_Semantic_Intention

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying Edit Intentions from Revisions in Wikipedia

We develop in collaboration with Wikipedia editors a 13-category taxonomy of the semantic intention behind edits in Wikipedia articles. Using labeled article edits, we build a computational classifier of intentions that achieved a micro-averaged F1 score of 0.621.

Install

conda create --name wiki_edit_intention python=3.5 
source activate wiki_edit_intention
pip install mwapi 
pip install revscoring

You might also need to install some dependencies (e.g., scipy, numpy and sklearn).

Run

To make features associated with each revision, please run:

python ./feat_src/wiki_edit_main.py edit_intention_dataset.csv

This will generate an arff file "edit_intention_dataset.feats.arff".

To predict the edit intentions for a set of revisions, please run:

python ./pred_src/wiki_model.py edit_intention_dataset.feats.arff test_file_to_be_predicted.arff

Data

To retrive the content of each revision, please use:

https://en.wikipedia.org/wiki/WP:Labels?diff=<replace_with_revision_id>

The mapping from label to edit intention can be found below:

		{	
			'counter-vandalism':0, 
			'fact-update': 1, 
			'refactoring':2, 
			'copy-editing':3, 
			'other':4, 
			'wikification':5, 
			'vandalism':6, 
			'simplification':7, 
			'elaboration':8, 
			'verifiability':9, 
			'process':10, 
			'clarification':11,
			'disambiguation':12, 
			'point-of-view':13
		}

To use our trained word embeddings for Wikipedia article revision, please download it from this link: https://goo.gl/An7DZP (wiki_revision_trained_embedding.bin)

Cite

If you use our tools for your work, please cite the following paper:

  • Yang, Diyi, Aaron Halfaker, Robert Kraut, and Eduard Hovy. "Identifying semantic edit intentions from revisions in wikipedia." In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2000-2010. 2017.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Predict edit intentions on Wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%