The State of Open-Source Models 2026
The definitive board for finding and evaluating open-weight models - all twenty-one, ranked, graded, and tagged. New models ship monthly; the stamp above is your freshness check.
The open source AI landscape is rapidly changing, the best open-weight models now trade blows with the closed frontier. There are more of them than any one team can track, and the most interesting story on the board is who's building them. This is the whole class, ranked and graded, an opinionated read. The specs and benchmarks are sourced; the ranking, the grades, and the verdicts are a point of view, not a measurement. Skim the board, or drill into any model. Use the tags to find one that fits your hardware, your license, and your use case.
Verified 2026-06-24. These ship monthly - treat this as a living board, and see the source notes before betting on a number.
How to read the board
One ranked list, 1 to 21, ordered by significance: a blend of raw capability, how genuinely open the thing is, how deployable it is, and how much the lab moves the field. Capability is anchored on Artificial Analysis and LMArena rather than vendor decks. The ranking, the radar grades, and the verdicts are mine; the benchmarks are sourced.
A lot of these entries aren't a model, they're a family. Qwen alone spans a 0.6B you can run on a phone to a 235B-A22B that needs an 8-GPU node. The practical effect: don't ask "can I run Qwen," ask which Qwen fits your GPU.
The frontier of open weights is Chinese, and it isn't close. Z.ai, Moonshot, DeepSeek, MiniMax, and a deep bench behind them are doing work the Western open labs aren't matching right now, mostly under MIT licenses you can actually use.
For self-hosting, it's the total parameter count that sets your memory floor, not the active count: figure roughly one byte per parameter at FP8, half a byte at 4-bit. An 8-GPU H100 node gives you about 640GB; an H200 node, about 1.1TB. Mixture-of-experts keeps the active params small, which buys speed and cost, but you still have to hold the whole model in memory.
Six capability grades — reasoning, coding, math, knowledge, agentic, long-context — each scored 0 to 100 and averaged into a composite. My judgment, informed by benchmarks and the Artificial Analysis Intelligence Index, not a lab result.
Almost every headline benchmark below is vendor-reported — run by the lab on its own model under maximum thinking effort. Scale's SEAL leaderboard finds vendor scaffolding inflates results by ten to thirty points. The one genuinely comparable spine is the Artificial Analysis Intelligence Index.
Twenty-one labs, scouted
Filter by category, hardware, and license; sort the eight graded frontier models by any radar axis; open a card for the full breakdown.
Command A+
Coheresovereign enterprise generalistCohere's first Apache 2.0 release consolidates five previous Command models into one. It's a genuine enterprise generalist: agentic workflows, multimodal (text, image, tool use), 48 languages, 128K context. Native citation grounding is real and useful — when it pulls from a tool it emits explicit grounding spans — but it's one feature, not the whole story. 218B MoE with 25B active, so two H100s or a single Blackwell runs it.
Nemotron 3 / Llama Nemotron
NVIDIALlama, tuned by the people who make the GPUsNVIDIA distills and tunes Llama into a reasoning/agentic family (Super 49B, Ultra 253B), now with a hybrid Mamba-Transformer MoE line at 1M context and a 30B multimodal Nano Omni. Tuned for throughput on the silicon you're already buying.
Hunyuan (Hy3)
Tencenta capable Chinese generalist with WeChat-scale backingTencent's open 295B MoE (21B active, 256K context), led by a former OpenAI researcher, with WeChat-scale distribution behind it. Capable generalist; numbers still mostly vendor-reported.
Step 3.7 Flash
StepFunDeepSeek performance at a fraction of the sizeA Shanghai lab's tiny-active MoE vision-language model (198B / 11B active) that, on its own benchmarks, outruns much larger models at ~$0.10 per million tokens. Impressive if it holds; verify before quoting.
OLMo 3
Ai2 (Allen Institute)the open model you can actually audit end-to-endThe only truly open model here: Ai2 publishes the full flow — datasets, intermediate checkpoints, RL stages — plus a "Think" lineage with inspectable reasoning traces. America's open-everything option, and the best fully-open 32B.
Granite 4.1
IBMthe compliance-officer's open modelThe compliance lane: modest 3B/8B/30B sizes for existing hardware, ISO 42001, US vendor, IBM indemnity. Not the smartest model on the board — the easiest to get through procurement.
Jamba
AI21 Labsa long-context specialist with a different engineA hybrid SSM-Transformer (Mamba) architecture that trades peak smarts for long-context speed — up to ~2.5× faster on long inputs, the longest context in its size class. The pick when the document, not the reasoning, is the hard part.
Falcon H1R
TII (UAE)a 7B that reasons like a 50BAbu Dhabi's TII ships efficient hybrids: a 7B that reportedly out-reasons models up to 7× its size, plus open vision (Perception, OCR) and the leading Arabic models. The Gulf's sovereign-AI entrant.
InternVL 3
Shanghai AI Labthe open answer to GPT-4o visionIf the job is seeing, not just reading, this is the open frontier: SOTA open multimodal benchmarks (72.2 MMMU at 78B) while keeping strong text skills. The default open VLM.
Three things, if you only take three
Not just the top four, but a whole bench, Hunyuan, StepFun, ERNIE, behind them, mostly under MIT and Apache licenses you can use commercially. The Western open ecosystem now competes on values and trust, Mistral on sovereignty, Cohere and IBM on compliance, Ai2 on radical transparency, rather than on topping the leaderboard.
There's open-weight, where you download the model and own your deployment, and open-lab, where a lab that also publishes weights keeps its best model behind an API. Qwen and ERNIE are the warning shots. Watch whether the others follow, because the day the frontier goes API-only is the day "open" stops meaning what you think it means.
A single 8-GPU node running MiniMax-M2, DeepSeek V4-Flash, or Cohere Command A+ gets you genuinely close, on hardware you control, with data that never leaves your building. Depending on what you're protecting, that might be the only ranking that matters.
Capability anchored on Artificial Analysis and LMArena where available; vendor-reported figures are labelled. Parameter counts and licenses from official model cards; hardware tiers are rule-of-thumb estimates. Every model here is a post-Jan-2026 release - confirm the current flagship and its open status on publish day.
A single 8-GPU node now gets you frontier-class work on hardware you control.
Northwood provides the compute, models, software, and services to deploy open weights inside sensitive technical organizations - with permissioning, citations, and audit logs built in. Your knowledge never leaves your environment.
Talk to a deployment lead →