Introducing SubQ
Build agents that never run out of context.
SubQ isn’t just another model. It represents an architectural breakthrough. It can process up to 12 million tokens with [X]% retrieval accuracy and near-zero latency so you can build agents to build agents that can handle your most ambitious work.

Your team's context problem costs more than you think.
RAG pipelines, compaction, and grep loops cost your team time and precision. SubQ gives that time back, with full-codebase context in every pass.
Say goodbye to |
SubQ
The new architecture for serious engineers.
Complete weeks of work at a time without degrading. Reason across entire code bases. Merge 100s of PRs at once. All in one shot, without losing accuracy, speed or context.
XX% Accuracy at 12M Tokens
Measured on MRCR V2 across the full context window
52X Faster Than Flash Attention
A rebuilt attention mechanism means faster inference with linear scaling. The advantage compounds as context grows.
260x Cheaper other leading LLMs
SubQ's linear attention scales cost with context. Your infrastructure budget scales with your ambition.
SubQ is not just another model. It's architectural breakthrough.
SubQ is the first model built on a fully subquadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words. But, only a small fraction of these relationships matter. SubQ finds and focuses only on those, ensuring compute is used where it matters most
At 12MM tokens, this reduces compute almost 1,000x, changing the way LLMs scale.
Verified Independent benchmarks
SubQ leads on long-context retrieval and coding tasks.
MRCR V2 Score: [X]% at 12 million tokens.
The standard for multi-hop retrieval across massive context windows. SubQ scored [X]% at 12M tokens, holding accuracy at the full window where it matters most.
Full methodology and raw data available hereSWE-Bench Pro Score: [X]% tasks resolved · $[X] per task.
Live GitHub issue resolution on large, production codebases. SubQ resolved [X]% of tasks at $[X] per task.
Full methodology and raw data available hereTwo ways to use SubQ.
EngineeringHow SubQ handles 12M token context windows
ResearchWhy subquadratic attention changes everything
BenchmarksBenchmarking long-context models at scale
We built the architecture the industry said wasn't possible.
Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs on a proprietary post-transformer architecture. While other major labs focus on incremental transformer improvements, we're pushing foundational change at the model architecture level - enabling large-context, multimodal inference that scales efficiently where transformers can't.