Research Question

Which retrieval setup most improves evidence grounding and reduces contradiction and neutral-evidence failures in a legal RAG system built over U.S. federal appellate opinions?

Hypotheses

  • H1: Hybrid BM25+BGE-M3+CrossEncoder achieves significantly higher Recall@10 than BM25 and BGE-M3 alone (paired bootstrap, p < 0.05).
  • H2: Architectures with higher Recall@10 produce significantly lower contradiction rate in downstream generation (normalized by claim count and per 1K tokens).
  • H3: Hybrid achieves higher Recall@10 than BGE-M3 alone.

Motivation

Grounded in Mata v. Avianca Airlines (2023) — a documented case of legal hallucination with real-world consequences. Targets U.S. federal appellate opinions from CourtListener (1,465,484 opinions).

API Placeholder

GET /api/research-question — returns structured hypotheses and metrics (pending)