I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]
Our take
TL;DR: Released en_legal_ner_ind_trf v0.1 - InLegalBERT fine-tuned on ~34,700 silver-annotated chunks from 33k Indian SC judgments. 13 labels. 78.67% overall F1. CASE_CITATION at 97.76% already exceeds OpenNyAI's PRECEDENT score by +17 points. Free, Apache-2.0.
Why this exists
OpenNyAI is the only prior Indian legal NER model with any community presence. It's unmaintained and degrades on pre-1990 OCR-era text - the first 40 years of India's constitutional jurisprudence.
No replacement existed.
Results
| Entity | F1 | Support |
|---|---|---|
| CASE_CITATION | 97.76% | 3,821 |
| PROVISION | 96.35% | 20,248 |
| STATUTE | 91.94% | 8,187 |
| LAWYER | 74.67% | 3,982 |
| JUDGE | 68.06% | 1,978 |
| DATE | 55.15% | 3,289 |
| RESPONDENT | 50.44% | 1,731 |
| COURT | 50.34% | 1,033 |
| WITNESS | 49.77% | 762 |
| OTHER_PERSON | 47.11% | 4,266 |
| PETITIONER | 44.71% | 1,573 |
| ORG | 41.34% | 2,128 |
| GPE | 36.56% ⚠ | 1,197 |
| micro avg | 78.67% | 54,195 |
Evaluated on a held-out validation split (~500 documents, stride=512, non-overlapping). The 25-file locked test set is untouched - head-to-head with OpenNyAI runs in v1.0.
Comparison note: OpenNyAI (RoBERTa + transition-based parser, gold-annotated) achieved 91.1% overall strict F1. Not directly comparable - different test sets, different annotation quality, different corpus scope. The +17 point gap on CASE_CITATION is the one apples-to-apples number worth flagging.
The annotation pipeline
Silver labels from four automatic pipelines merged per document:
- Regex — 14-pattern citation extractor + statute/provision extractor →
CASE_CITATION,STATUTE,PROVISION - Metadata projection — case metadata JSONs mapped to character offsets via RapidFuzz →
JUDGE,PETITIONER,RESPONDENT - Transformer NER — OpenNyAI
en_legal_ner_trf, offset-corrected →LAWYER,COURT,ORG,GPE,DATE,OTHER_PERSON,WITNESS - Gazetteer — 858 Central Acts with alias resolution → confirms and adds
STATUTEspans
Trained with Focal Loss (γ=2.0) to handle label imbalance between STATUTE/CASE_CITATION and O tokens. Hardware: Kaggle T4 (free tier).
Known weak spots - being honest
GPE (36.56%) and ORG (41.34%) are the problem labels. In Indian legal text, "State of Maharashtra" or "Union of India" appear as GPE, PETITIONER, RESPONDENT, or ORG depending on context. A linear token classification head can't resolve overlapping roles. CRF head is v1.0's job.
Positional bias - silver training data has repetitive header structures. Performance degrades when parties appear mid-document.
Pre-1990 OCR noise - judgments from 1950–1989 vary in quality. Recall drops the further back you go.
What's next
300-file gold annotation is in progress (3 volunteers onboard). v1.0 will add a CRF head, run the locked test set, and publish the official head-to-head with OpenNyAI.
Model: huggingface.co/evolawyer/inlegalbert-sc-ner-silver
Dataset: huggingface.co/datasets/evolawyer/indian-sc-judgments-ner-silver
GitHub: github.com/evolawyer/inlegalbert-sc-ner-silver
Happy to go deep on the annotation pipeline, conflict resolution between the four label sources, or the Focal Loss setup.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience