Introducing SubQ

Build agents that never run out of context.

SubQ isn’t just another model. It represents an architectural breakthrough. It can process up to 12 million tokens with [X]% retrieval accuracy and near-zero latency so you can build agents to build agents that can handle your most ambitious work.

Your team's context problem costs more than you think.

RAG pipelines, compaction, and grep loops cost your team time and precision. SubQ gives that time back, with full-codebase context in every pass.

ALT

Say goodbye to |

SubQ

The new architecture for serious engineers.

Complete weeks of work at a time without degrading. Reason across entire code bases. Merge 100s of PRs at once. All in one shot, without losing accuracy, speed or context.

XX% Accuracy at 12M Tokens

Measured on MRCR V2 across the full context window

52X Faster Than Flash Attention

A rebuilt attention mechanism means faster inference with linear scaling. The advantage compounds as context grows.

260x Cheaper other leading LLMs

SubQ's linear attention scales cost with context. Your infrastructure budget scales with your ambition.

SubQ is not just another model. It's architectural breakthrough.

SubQ is the first model built on a fully subquadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words. But, only a small fraction of these relationships matter. SubQ finds and focuses only on those, ensuring compute is used where it matters most

At 12MM tokens, this reduces compute almost 1,000x, changing the way LLMs scale.

Other Models
Standard Attention
SubQ
Sparse Attention

Verified Independent benchmarks

SubQ leads on long-context retrieval and coding tasks.

MRCR V2 Score: [X]% at 12 million tokens.

The standard for multi-hop retrieval across massive context windows. SubQ scored [X]% at 12M tokens, holding accuracy at the full window where it matters most.

Full methodology and raw data available hereArrow

SWE-Bench Pro Score: [X]% tasks resolved · $[X] per task.

Live GitHub issue resolution on large, production codebases. SubQ resolved [X]% of tasks at $[X] per task.

Full methodology and raw data available hereArrow

Two ways to use SubQ.

API

The full-context API for engineering teams.

Process full repositories and pipeline states in a single API call at linear cost.Built for AI engineers and software developers shipping context-heavy products.

Code

See the full repo. Every file. Every time.

SubQ Code holds your entire codebase in memory and reasons across all of it. Planning runs 5x faster than SOTA coding IDEs. Implementation runs 2-4x faster.

How SubQ handles 12M token context windowsEngineering

How SubQ handles 12M token context windows

Why subquadratic attention changes everythingResearch

Why subquadratic attention changes everything

Benchmarking long-context models at scaleBenchmarks

Benchmarking long-context models at scale

About Subquadratic

We built the architecture the industry said wasn't possible.

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs on a proprietary post-transformer architecture. While other major labs focus on incremental transformer improvements, we're pushing foundational change at the model architecture level - enabling large-context, multimodal inference that scales efficiently where transformers can't.

Contact

Work with SubQ

For sales & enterprise

For Press