Contoural

AI citation analysis pipeline for an information governance consulting firm. Cut service delivery time from 40 hours to 10 per engagement. 340k+ citations classified at 80–90% precision, turning hourly billable work into fixed-price contracts.

PythonFastAPIOpenAILiteLLMInstructorPostgreSQLpgvector

Demo: the citation classification pipeline in action: rule-based filtering, LLM classification, human review queue.

40h → 10h

per engagement

340k+

citations classified

75%

time reduction

When Contoural brought me in, they had a vague ask: "We want to use AI." So I embedded with their consultants and spent the first few weeks watching.

What I observed: for every client engagement, they downloaded a 10,000-row Excel file of legal citations from a third-party vendor. They went through it row by row, classifying which citations were relevant to that specific client. The vendor data was full of errors, so consultants were also constantly making corrections. The whole process took about 40 hours per engagement. And after each project, they'd download a fresh file for the next client. Every correction they'd made? Gone. Start over.

I saw two problems: classification (which citations are relevant to this client?) and data persistence (corrections should accumulate across projects, not reset). The original ask, an AI extraction tool to replace the vendor, wouldn't have touched either. I pitched a different solution.

The pipeline

The system syncs citation data directly from FilersKeepers, the vendor Contoural was already paying for. Rather than replacing the vendor (the original ask), I built on top of their data. The citations come pre-extracted. My job was classification: is this citation relevant to this specific client's business?

Classification happens in stages. First, a rule-based pass handles the known cases: statute of limitations citations get routed to their own sheet, citations already marked "always relevant" or "never relevant" skip the AI entirely, and a configurable retention period filter catches the obvious misses. What's left goes to the LLM classifier: GPT-4o via the OpenAI Responses API, with context built per citation (client business description, jurisdiction, all citation fields) and structured output enforced. The models return relevance, reasoning, and confidence. Anything uncertain surfaces for human review.

Contoural citation analysis interface showing document review queue — The review interface. Citations that pass classification flow through automatically; edge cases surface for human review.

The hard part: legal precision

For most classification tasks, 85% precision is fine. Legal compliance is not most tasks. A citation that looks correct but points to the wrong section number, the wrong version, or a repealed statute can create real liability. False positives here aren't just annoying: they have consequences.

Getting to 80–90% precision required a lot of iteration on the classification prompts and significant work on the normalization layer. Citations in legal documents are inconsistently formatted (abbreviated, parenthetical, embedded in footnotes), and getting that normalization right was a bigger project than the AI classification itself.

The correction system decision

The pipeline handles the AI classification. But the more interesting engineering decision was the corrections database. Every user of the system can correct citation data in real time, and those corrections persist to production immediately.

Senior domain experts pushed back: they wanted final sign-off on every correction before it went live. My counterargument was that an approval gate would kill the culture of contribution. If corrections sit in a queue, people stop bothering. The whole value of crowdsourcing is that fixes happen immediately when someone notices something wrong.

I shipped direct corrections for all users, with a full audit log and easy reversion. Hundreds of corrections have been crowdsourced. We've never had to revert one.

What changed for the business

Service delivery time dropped from 40 hours to 10 hours per engagement (75% reduction). That change in unit economics is what made fixed-price contracts viable. Before: hourly billing, cost hard to predict, hard to scale to large document sets. After: predictable cost, ability to take on engagements that would have been unprofitable to staff manually.

The system processes 340k+ citations, runs in production supporting Contoural's Fortune 500 client base, and they're now training external consultants to use it. The work from this engagement is still running.