The Big Board·2026 Open-Weight Class
Last updated 24 Jun 2026· Living board
Language

The State of Open-Source Models 2026

The definitive board for finding and evaluating open-weight models - all twenty-one, ranked, graded, and tagged. New models ship monthly; the stamp above is your freshness check.

OMThe Open Models Desk·16 min read
1No. 1 Overall
GLM-5.2
Z.ai (Zhipu AI)
Composite
91

The open source AI landscape is rapidly changing, the best open-weight models now trade blows with the closed frontier. There are more of them than any one team can track, and the most interesting story on the board is who's building them. This is the whole class, ranked and graded, an opinionated read. The specs and benchmarks are sourced; the ranking, the grades, and the verdicts are a point of view, not a measurement. Skim the board, or drill into any model. Use the tags to find one that fits your hardware, your license, and your use case.

Verified 2026-06-24. These ship monthly - treat this as a living board, and see the source notes before betting on a number.

How to read the board

One ranked list, 1 to 21, ordered by significance: a blend of raw capability, how genuinely open the thing is, how deployable it is, and how much the lab moves the field. Capability is anchored on Artificial Analysis and LMArena rather than vendor decks. The ranking, the radar grades, and the verdicts are mine; the benchmarks are sourced.

A lot of these entries aren't a model, they're a family. Qwen alone spans a 0.6B you can run on a phone to a 235B-A22B that needs an 8-GPU node. The practical effect: don't ask "can I run Qwen," ask which Qwen fits your GPU.

The frontier of open weights is Chinese, and it isn't close. Z.ai, Moonshot, DeepSeek, MiniMax, and a deep bench behind them are doing work the Western open labs aren't matching right now, mostly under MIT licenses you can actually use.

Rule of thumb

For self-hosting, it's the total parameter count that sets your memory floor, not the active count: figure roughly one byte per parameter at FP8, half a byte at 4-bit. An 8-GPU H100 node gives you about 640GB; an H200 node, about 1.1TB. Mixture-of-experts keeps the active params small, which buys speed and cost, but you still have to hold the whole model in memory.

On the score

Six capability grades — reasoning, coding, math, knowledge, agentic, long-context — each scored 0 to 100 and averaged into a composite. My judgment, informed by benchmarks and the Artificial Analysis Intelligence Index, not a lab result.

On the numbers

Almost every headline benchmark below is vendor-reported — run by the lab on its own model under maximum thinking effort. Scale's SEAL leaderboard finds vendor scaffolding inflates results by ten to thirty points. The one genuinely comparable spine is the Artificial Analysis Intelligence Index.

Twenty-one labs, scouted

Filter by category, hardware, and license; sort the eight graded frontier models by any radar axis; open a card for the full breakdown.

The editorial board
All 21 models, ranked by significance
Tier 1The FrontierTrading blows with the closed frontier
01
Z.ai (Zhipu AI) logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite91
MIT·~753B / 40B active·1M ctx

GLM-5.2

Z.ai (Zhipu AI)
GeneralistCodingAgenticOne 8-GPU node
In practice
Claude Opus, if Opus shipped its weights

The single best open-weight model you can download today. GLM-5 took the top open-weight slot on the Artificial Analysis Intelligence Index in April, the first open model to reach that tier, and GLM-5.2 pushed the coding numbers higher in June.

Excels at agentic coding and long-horizon tool use. GLM-5 posted 77.8% on SWE-bench Verified, within about three points of Claude Opus 4.6.

Best foragentic coding and tool use you self-host
Grades
Reasoning88
Coding95
Math92
Knowledge86
Agentic90
Long ctx92
Benchmarks
51 · AA Intelligence Index
independent · Artificial Analysis
62.1 (vendor) · SWE-bench Pro (5.2)
vendor-reported · Z.ai
The team

Z.ai is the international brand of Zhipu AI, spun out of Tsinghua. The detail that matters: GLM-5 was reportedly trained end to end on Huawei Ascend silicon. That's not a model release, it's a sovereignty statement, a frontier model built without a single Nvidia GPU.

Deployment

A ~753B MoE (40B active): plan for an 8xH100/H200 node at FP8 or 4-bit; full BF16 spans two nodes.

Strengths
+Top open-weight on the AA leaderboard (Apr 2026)
+Best-in-class open coding
+1M context
+MIT license
Watch-outs
Frontier-class hardware to self-host
Headline coding/AIME numbers are vendor-reported
The catch. Running it through the Z.ai API carries China data-residency risk — which is exactly why the open weights matter: you don't have to.
02
Moonshot AI logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite90
Modified MIT·1T / 32B active·256K ctx

Kimi K2.6

Moonshot AI
AgenticCodingGeneralistMulti-node
In practice
GPT-5.5 on coding, open and ~80% cheaper

If GLM is the best model, Kimi is the one that doesn't quit. Artificial Analysis called K2.6 the new leading open-weight model, and the headline is endurance.

Excels at long-horizon agentic work. An agent-swarm architecture reportedly ran 12 hours and 4,000 coordinated steps without falling over. It's the first open model to beat GPT-5.4 on SWE-bench Pro, and it ties GPT-5.5 there at 58.6%.

Best forlong-horizon agent runs that can't fall over halfway
03
DeepSeek (High-Flyer) logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite90
MIT·Flash 284B / 13B·1M ctx

DeepSeek V4

DeepSeek (High-Flyer)
ReasoningGeneralistMulti-node
In practice
Gemini Pro reasoning, at open-weight prices

GLM is the best model on the board. DeepSeek is the most important lab on it.

Excels at frontier reasoning and competition math. V3.2-Speciale reached parity with Gemini 3.0 Pro on hard reasoning and took gold-medal results on the 2025 IMO and IOI.

Best forfrontier reasoning and math, with a deployable Flash tier
04
MiniMax logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite85
MIT·230B / 10B active·204K ctx

MiniMax-M2

MiniMax
AgenticCodingOne 8-GPU node
In practice
A self-hostable Claude for tool-calling

The other frontier models make you choose between intelligence and a hardware budget. MiniMax mostly doesn't.

Excels at agentic tool-calling and coding at a serving cost the others can't touch. VentureBeat called M2 the new king of open-source LLMs for agentic work, and it topped the open-weight Intelligence Index at release.

Best foragentic tool-calling on one node, cheaply
Tier 2The FieldStrong, with an asterisk
05
Alibaba logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite86
Apache 2.0·0.6B → 235B-A22B·up to 262K

Qwen3 family

Alibaba
GeneralistMultimodalSmall/edgeOne 8-GPU node
In practice
The Android of open models

Qwen isn't one model, it's a fleet, and that's its advantage and its asterisk.

Excels at breadth. Over 200 languages, every size from a 0.6B that runs on a phone to a 235B-A22B that needs a cluster, and the deepest tooling ecosystem here. If you want one family to standardize an org on, this is it.

Best forstandardizing an org on one family across every size
06
Mistral AI logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite80
Apache 2.0·675B / 41B active·256K ctx

Mistral Large 3

Mistral AI
GeneralistMultimodalEnterpriseOne 8-GPU node
In practice
A GPT-4-class generalist with an EU passport

Mistral is here for a reason that isn't a benchmark, and it's an honest one: it's the credible Western option, and for a lot of buyers that's the whole decision.

Excels at being the non-Chinese answer. Apache 2.0 by conviction, real multimodal range, and a data-sovereignty story that lands in European boardrooms and regulated US sectors that would rather not run weights from Hangzhou.

Best forthe credible non-Chinese pick; jurisdiction and license
07
Google DeepMind logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite84
Apache 2.0*·E2B 2B → 31B dense·256K ctx

Gemma 4

Google DeepMind
Small/edgeMultimodalGeneralistSingle GPU
In practice
Gemini, shrunk to fit one GPU

The best capability you can run on a single GPU, full stop, and Gemma 4 (March 2026) widened that lead.

Excels at punching wildly above its size. The 31B dense model ranks #3 among all open models on Arena AI (ELO 1,452), and the generational jump is the real story: AIME 2026 went from 20.8% on Gemma 3 to 89.2%, and Codeforces ELO from 110 to 2,150, the largest single-generation leap on record for an open model.

Best forbest capability you can run on a single GPU; 140+ languages, vision
08
Meta logo
ReasoningCodingMathKnowledgeAgenticLong ctx
Composite77
Llama Community License·Maverick 17B active·very long

Llama 4

Meta
GeneralistMultimodalOne 8-GPU node
In practice
GPT-4o-class multimodal, a step off the pace

The lab that opened this whole category, now visibly behind, and Gemma 4 just passed it. It stays on the board for what it still is, not just what it was.

Excels at multimodal in its size class and, still, ecosystem gravity, the deepest fine-tune and tooling base anywhere (Nvidia's Nemotron family is distilled from it).

Best formultimodal in its size class, plus the deepest tooling base
Tier 3The SpecialistsBuilt for a job, not a leaderboard
09

Command A+

Coheresovereign enterprise generalist
EnterpriseAgenticMultimodalSingle GPU

Cohere's first Apache 2.0 release consolidates five previous Command models into one. It's a genuine enterprise generalist: agentic workflows, multimodal (text, image, tool use), 48 languages, 128K context. Native citation grounding is real and useful — when it pulls from a tool it emits explicit grounding spans — but it's one feature, not the whole story. 218B MoE with 25B active, so two H100s or a single Blackwell runs it.

218B / 25B active·long·37 AA Intelligence Index
10

Nemotron 3 / Llama Nemotron

NVIDIALlama, tuned by the people who make the GPUs
ReasoningAgenticEnterpriseMultimodalOne 8-GPU node

NVIDIA distills and tunes Llama into a reasoning/agentic family (Super 49B, Ultra 253B), now with a hybrid Mamba-Transformer MoE line at 1M context and a 30B multimodal Nano Omni. Tuned for throughput on the silicon you're already buying.

Super 49B / Ultra 253B·up to 1M·leading open GPQA / AIME / BFCL
11

Hunyuan (Hy3)

Tencenta capable Chinese generalist with WeChat-scale backing
GeneralistAgenticOne 8-GPU node

Tencent's open 295B MoE (21B active, 256K context), led by a former OpenAI researcher, with WeChat-scale distribution behind it. Capable generalist; numbers still mostly vendor-reported.

295B / 21B active·256K·gains vendor benchmarks
12

Step 3.7 Flash

StepFunDeepSeek performance at a fraction of the size
AgenticCodingMultimodalSmall/edgeOne 8-GPU node

A Shanghai lab's tiny-active MoE vision-language model (198B / 11B active) that, on its own benchmarks, outruns much larger models at ~$0.10 per million tokens. Impressive if it holds; verify before quoting.

198B / 11B active·262K·74.4% SWE-bench Verified (3.5)
13

OLMo 3

Ai2 (Allen Institute)the open model you can actually audit end-to-end
Fully-openReasoningSingle GPU

The only truly open model here: Ai2 publishes the full flow — datasets, intermediate checkpoints, RL stages — plus a "Think" lineage with inspectable reasoning traces. America's open-everything option, and the best fully-open 32B.

7B / 13B / 32B·long·beats Qwen3 32B AIME 2025 (32B Think)
14

Granite 4.1

IBMthe compliance-officer's open model
EnterpriseSmall/edgeSingle GPU

The compliance lane: modest 3B/8B/30B sizes for existing hardware, ISO 42001, US vendor, IBM indemnity. Not the smartest model on the board — the easiest to get through procurement.

3B / 8B / 30B·long·matches/beats 8B vs 4.0 32B MoE
15

Jamba

AI21 Labsa long-context specialist with a different engine
Long-contextEnterpriseOne 8-GPU node

A hybrid SSM-Transformer (Mamba) architecture that trades peak smarts for long-context speed — up to ~2.5× faster on long inputs, the longest context in its size class. The pick when the document, not the reasoning, is the hard part.

Mini + Large (SSM-Transformer)·256K effective·65.4 Arena Hard (Large)
16

Falcon H1R

TII (UAE)a 7B that reasons like a 50B
Small/edgeReasoningMultimodalSingle GPU

Abu Dhabi's TII ships efficient hybrids: a 7B that reportedly out-reasons models up to 7× its size, plus open vision (Perception, OCR) and the leading Arabic models. The Gulf's sovereign-AI entrant.

H1R 7B (hybrid)·long·beats Qwen3 32B vs larger models
17

InternVL 3

Shanghai AI Labthe open answer to GPT-4o vision
MultimodalOne 8-GPU node

If the job is seeing, not just reading, this is the open frontier: SOTA open multimodal benchmarks (72.2 MMMU at 78B) while keeping strong text skills. The default open VLM.

up to 78B·extended multimodal·72.2 MMMU (78B)
On the boardThe rest of the class
#ModelWhy it's hereLicense
18
Phi-4
Microsoft
Data-quality-over-scale; beats GPT-4o on MATH/GPQA at 3.8–14B. MIT.
For small-model reasoning from curated data
MIT
19
ERNIE 4.5 (open) / 5.1
Baidu
10 models open-sourced (up to 424B/47B active); the strong 5.x flagships are closed.
For a capable open Chinese MoE
Apache 2.0
20
Yi
01.AI
Yi 34B was the first Chinese model to top Hugging Face's leaderboard. The family (Lightning, Large, VL, Coder) is solid and easy to run, though no longer at the frontier.
For a proven, lightweight open family
open (Yi)
21
SmolLM3
Hugging Face
A 3B with dual-mode reasoning, 128K context, the full recipe published. The reference small open model.
For the strongest genuinely-tiny, fully-open reasoner; edge and on-device
Apache 2.0

Three things, if you only take three

01
The frontier of open weights is Chinese, and it's deep.

Not just the top four, but a whole bench, Hunyuan, StepFun, ERNIE, behind them, mostly under MIT and Apache licenses you can use commercially. The Western open ecosystem now competes on values and trust, Mistral on sovereignty, Cohere and IBM on compliance, Ai2 on radical transparency, rather than on topping the leaderboard.

02
"Open" is splitting in two.

There's open-weight, where you download the model and own your deployment, and open-lab, where a lab that also publishes weights keeps its best model behind an API. Qwen and ERNIE are the warning shots. Watch whether the others follow, because the day the frontier goes API-only is the day "open" stops meaning what you think it means.

03
And the practical one: you no longer need a frontier closed model to do frontier work.

A single 8-GPU node running MiniMax-M2, DeepSeek V4-Flash, or Cohere Command A+ gets you genuinely close, on hardware you control, with data that never leaves your building. Depending on what you're protecting, that might be the only ranking that matters.

Sources & method

Capability anchored on Artificial Analysis and LMArena where available; vendor-reported figures are labelled. Parameter counts and licenses from official model cards; hardware tiers are rule-of-thumb estimates. Every model here is a post-Jan-2026 release - confirm the current flagship and its open status on publish day.

Artificial Analysis - Intelligence Index & open-weights leaderboard; Kimi K2.6 write-up - artificialanalysis.ai
DeepSeek-V3.2 technical report - arXiv 2512.02556 - Kimi K2 - github.com/moonshotai/Kimi-K2
MiniMax-M2 - github.com/MiniMax-AI/MiniMax-M2 - VentureBeat "new king of open source LLMs"
Mistral 3 / Large 3 - mistral.ai/news/mistral-3 - Llama 4 - ai.meta.com; Behemoth status - interconnects.ai
Gemma 4 - blog.google + ai.google.dev model card - Cohere Command A+ - NVIDIA Nemotron - developer.nvidia.com
Ai2 OLMo 3 - allenai.org/blog/olmo3 - IBM Granite 4.1 - AI21 Jamba - TII Falcon H1R - InternVL3 arXiv 2504.10479
Tencent Hunyuan Hy3 - StepFun Step 3.x - Baidu ERNIE 4.5/5.1 - 01.AI Yi arXiv 2403.04652 - HF SmolLM3 - Microsoft Phi-4
Ranking, radar grades, and verdicts are the author's; benchmarks are sourced. Several figures are vendor-reported - labelled in each card.
Deploying open weights

A single 8-GPU node now gets you frontier-class work on hardware you control.

Northwood provides the compute, models, software, and services to deploy open weights inside sensitive technical organizations - with permissioning, citations, and audit logs built in. Your knowledge never leaves your environment.

Talk to a deployment lead