We build tools that stress-test, verify, and structure mathematical knowledge — for LLM training, for automated refereeing, and for retrieval that understands mathematical structure, not just text.
Automated auditing of mathematical claims in papers and LLM outputs. We find errors, flag counterexamples, and verify proofs — at a throughput no human review process can match.
Adversarial stress-testing of LLM mathematical reasoning and research papers. We generate the hardest edge cases, verify solutions, and produce clean training data from the wreckage.
Semantic search that exploits mathematical structure — not just text similarity. Our retrieval understands theorems, definitions, and proof dependencies, enabling AI systems to reason over mathematics, not just pattern-match.
Live tools for mathematics robustification — refereeing, torture testing, structured retrieval, and dimensionality reduction.
Advanced dimensionality reduction library that improves upon UMAP in both performance and accuracy. Built on JAX for GPU-accelerated computation, with theoretical improvements grounded in differential geometry.
AI-powered search and Q&A over 700,000+ arXiv mathematics papers. Ask natural-language questions — get AI-synthesized answers with citations. Hybrid BGE embeddings + full-text search. Multi-LLM support.
AI system that automatically audits mathematical proofs in arXiv papers for errors and counterexamples. Analyzed 31,000+ papers across math.DS and math.GT categories. 70% of flagged papers contain explicit counterexamples to their main claims.
Adversarial stress-testing for LLMs and research papers. Generates hard edge-case problems, finds counterexamples to claimed theorems, and produces verified problem–solution pairs for LLM training pipelines. Includes a referee mode for paper-level auditing.
Head-to-head comparison of 7 frontier LLMs on Lean 4 formalization of 222 classical analysis problems. GPT-5.4 and Gemini 3.1 Pro lead at ~38%; extended chain-of-thought appears to hurt formalization precision.
Mathematical content goes in. Verified, structured, machine-ready knowledge comes out.
Mathematical content from any source — arXiv papers, LLM outputs, training corpora, textbooks. We parse LaTeX, natural language, and code.
We represent mathematical objects as mathematical objects — theorems, definitions, proof dependencies — not bags of words. This is where our RAG gains its edge.
Automated refereeing: counterexample search, proof auditing, adversarial stress-testing. We find the errors that human reviewers miss and that LLMs confidently generate.
Clean training data with verified solutions. Audit reports with specific error citations. Searchable knowledge bases with mathematical structure preserved.
Algorithms for ideal convex polyhedra in hyperbolic 3-space using Rivin's variational characterization. Volume distributions follow a Beta distribution; maximal configurations exhibit rational dihedral angles.
Using the Aristotle system, 80/80 LLM-generated Lean proofs were successfully verified — bridging the "30× gap" between informal and formal mathematical reasoning.
Comprehensive benchmark of LLM-based vs. specialist OCR for mathematical content. Key finding: Gemini Flash is 6× cheaper than Mathpix and more accurate.
We work with AI labs, publishers, and research groups who need mathematical knowledge they can trust. If you are training models, curating datasets, or building on mathematical content — let's talk.
Start the Conversation igor@dimensionreducers.com