Deep Cogito goes big, releasing 4 new open source hybrid reasoning models with self-improving ‘intuition’

by | Jul 31, 2025 | Technology

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Deep Cogito, a lesser-known AI research startup based in San Francisco, founded by ex-Googlers, today released four new open-ish large language models (LLMs) that attempt something few others do: learn how to reason more effectively over time — and get better at it on their own.

The models, released as part of Cogito’s v2 family, range from 70 billion to 671 billion parameters and are available for AI developers and enterprises to use under a mix of limited and fully open licensing terms. They include:

Cogito v2-70B (Dense)

Cogito v2-109B (Mixture-of-Experts)

Cogito v2-405B (Dense)

Cogito v2-671B (Mixture-of-Experts)

The Cogito v2 series includes both dense and Mixture-of-Experts (MoE) models, each suited to different needs. Dense models, like the 70B and 405B variants, activate all parameters on every forward pass, making them more predictable and easier to deploy across a wide range of hardware.

They’re ideal for low-latency applications, fine-tuning, and environments with limited GPU capacity. MoE models, such as the 109B and 671B versions, use a sparse routing mechanism to activate only a few specialized “expert” subnetworks at a time, allowing for much larger total model sizes without proportional increases in compute cost.

This makes them well suited for high-performance inference tasks, research into complex reasoning, or serving frontier-level accuracy at lower runtime expense. In Cogito v2, the 671B MoE model serves as the flagship, leveraging its scale and routing efficiency to match or exceed leading open models on benchmarks—while using significantly shorter reasoning chains.

They’re available now on Hugging Face for download and usage by enterprises and on Unsloth for local online usage, or, for those who can’t host the model inferences on their own hardware, through application programming interfaces (APIs) from Together AI, Baseten and RunPod.

There’s also a quantized “8-bit floating point (FP8)” version of the 671B model, which reduces the size of the numbers used to represent the model’s parameters from 16-bits to 8-bits, helping users run massive models faster, cheaper, and on more accessible hardware — sometimes with only a negligi …

Article Attribution | Read More at Article Source