A list of selected resources, methods, and tools dedicated to Legal Text Analytics.
Please read the contribution guidelines before contributing. Please add a resource by raising a pull request. We also seek for discussion and proposal of new ideas (including additional content sections) as issues.
- Selected Use Cases
- Methods
- Libraries
- Datasets and Data
- Annotation and Data Schemes
- Annotation Tools
- Research Groups and Labs
- Tutorials
- Optical Character Recognition (find more information here)
- Legal Document Pre-processing (find more information here)
- Clause Segmentation and Sentence Boundary Detection
- Information Extraction and Named Entity Recognition (find more information here)
- Legal Norm Classification
- Machine Translation
- Document Comparison and Semantic Matching
- Text Summarization
- Argument Mining
- Question Answering
- Legal Case Outcome Prediction
- Reference and Coreference Extraction
- Document Assembling and Generation
- Voice Transcription
- Anomaly Detection
- Data Anonymization
- NLP Overview
- NLP Progress
- Text Visualizations
- Optical Character Recognition
- Rule-based methods for NLP, Apache Ruta, Jape Grammar
- Statistical NLP
- Machine Learning Frameworks
- Neural networks and deep learning for NLP Tutorial
- Spacy - Industrial-Strength Natural Language Processing
- Scikit - machine learning in python
- NLTK - Natural Language Toolkit
- Apache UIMA
- Gate - General Architecture for Text Engineering
- Transformers and re-trained embedding models including LegalBERT
- German Bert Model: Deepset AI
- Flair - SOTA NLP (incl. biomedical and legal data)
- Blackstone - Legal Named Entity Recognition and Text Categorizer
- Legal Reference Detection, Legal Reference Detection II
- Haystack - Transformers at scale for question answering & neural search
- NLP Datasets
- OpenLegalData
- Legal Entity Recognition
- Legal Text Summarization
- Legal Text Translation
- Legal Document Classification
- Legal Sentence Classification (German)
- 100k German Court Decisions
- Legal Paper Datasets
- Awesome Legal Data
- Germany: Gesetze im Internet, Rechtsprechung im Internet, Verwaltungsvorschriften im Internet,Annotated German Court Decisions (Judgment style), German Federal Courts Dataset
- Switzerland: Swiss Legislation Corpus French and German
- German NLP Ressources: Awesome German NLP
- ECtHR: Judicial Decisions of the European Court of Human Rights
- EU: European Union Law (eurlex R Package), Digital Corpus of the European Parliament (DCEP)
- Canada: Federal Laws and Regulations (ftp://205.193.86.89/)
- UK: UK Law Reports & Case Law Search
- USA: Caselaw Access Project
- USA: Supreme Court Database
- Israel: The Israeli Supreme Court Database
- United Nations: United Nations General Debate Corpus, United Nations Parallel Corpus
- International Law: Text of Trade Agreements (ToTA), Electronic Database on Investment Treaties (EDIT)
- Meta Search: Google Dataset Search
- Overview of Political Science Datasets: PolData
- Stanford University - CodeX: The Stanford Center for Legal Informatics
- Technical University of Munich
- Bucerius Center on the Legal Profession
- Suffolk Law School - Legal Innovation & Technology (LIT) Lab
- University of Ottawa - Legal Technology Lab
- University of Vienny - Department of Innovation and Digitalisation in Law
- University of Amsterdam - Leibniz Center for Law
- University of Helsinki - LegalTech Research Lab
- Hofstra University - Law, Logic & Technology Research Laboratory
- Computational Legal Studies
- Monkey Learn - Text Analysis
- Using NLP to understand laws
- Document Representation for Legal Texts
- Data Science for Lawyers - Learning Resources
- Coding for Lawyers (discontinued)
- Custom NLP Approaches to Data Anonymization
- Information Extraction in legal documents
See contributors and committers (and many more).
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.