Research Question
Which retrieval setup most improves evidence grounding and reduces contradiction and neutral-evidence failures in a legal RAG system built over U.S. federal appellate opinions?
Hypotheses
- H1: Hybrid BM25+BGE-M3+CrossEncoder achieves significantly higher Recall@10 than BM25 and BGE-M3 alone (paired bootstrap, p < 0.05).
- H2: Architectures with higher Recall@10 produce significantly lower contradiction rate in downstream generation (normalized by claim count and per 1K tokens).
- H3: Hybrid achieves higher Recall@10 than BGE-M3 alone.
Motivation
Grounded in Mata v. Avianca Airlines (2023) — a documented case of legal hallucination with real-world consequences. Targets U.S. federal appellate opinions from CourtListener (1,465,484 opinions).
API Placeholder
GET /api/research-question — returns structured hypotheses and metrics (pending)