Search academic literature, build dependency graphs of mathematical results, and hunt for high-impact errors that cascade through downstream work.
Generates queries from a topic config, searches 6 academic APIs in parallel, scores papers by theoretical density and domain fit, downloads LaTeX, extracts formal results, and builds a dependency graph.
Decomposes proofs into skeleton steps, identifies load-bearing steps via BFS from main theorems, runs N parallel verifiers per step, then filters false positives through 3 gates before tracing impact downstream.
GPT + Claude dual provider. CLI or API mode. Per-phase checkpointing with resume. Full I/O logging per LLM call.
JSON config: topic, keywords, seed arXiv IDs, citation range filters, context documents.
LLM rates theoretical density and domain fit per paper; combined with citation count, influential citations, arXiv/venue bonuses into a weighted score.
DAG of results with internal (Thm→Lem) and external (cite) edges. Load-bearing nodes annotated by downstream count.
Each proof is decomposed into 4–10 atomic steps. Each step records its claim, technique, dependencies on prior steps, and external results invoked. Two quality checks verify coherence and source attribution.
Algorithmic BFS from main theorems through hard dependency edges. Only load-bearing and external-invocation steps are verified—focusing effort where errors would cascade farthest.
Dismissal tries to defend the proof; counterexample tries to break the step claim with specific values; audit re-reads raw LaTeX. Errors surviving all three gates are classified as fatal/structural/cosmetic.