Datasets

Two core datasets — all publicly available, no private data.

CourtListener Federal Appellate

🔄 In progress

Size: 1,465,484 opinions

License: CC BY-ND 4.0

Role: Retrieval corpus + SQLite citation index

LePaRD (ACL 2024)

⏳ Pending

Size: ~4M pairs (cap 500K–1M)

License: Open research

Role: Training + evaluation (Priority 1)

API Placeholder

GET /api/datasets — returns dataset stats and DVC artifact metadata (pending)