The Big Board·2026 Open-Weight Class

Last updated 24 Jun 2026· Living board

Language

The State of Open-Source Models 2026

The definitive board for finding and evaluating open-weight models - all twenty-one, ranked, graded, and tagged. New models ship monthly; the stamp above is your freshness check.

OMThe Open Models Desk·16 min read

1No. 1 Overall

GLM-5.2

Z.ai (Zhipu AI)

Composite

The open source AI landscape is rapidly changing, the best open-weight models now trade blows with the closed frontier. There are more of them than any one team can track, and the most interesting story on the board is who's building them. This is the whole class, ranked and graded, an opinionated read. The specs and benchmarks are sourced; the ranking, the grades, and the verdicts are a point of view, not a measurement. Skim the board, or drill into any model. Use the tags to find one that fits your hardware, your license, and your use case.

Verified 2026-06-24. These ship monthly - treat this as a living board, and see the source notes before betting on a number.

How to read the board

One ranked list, 1 to 21, ordered by significance: a blend of raw capability, how genuinely open the thing is, how deployable it is, and how much the lab moves the field. Capability is anchored on Artificial Analysis and LMArena rather than vendor decks. The ranking, the radar grades, and the verdicts are mine; the benchmarks are sourced.

A lot of these entries aren't a model, they're a family. Qwen alone spans a 0.6B you can run on a phone to a 235B-A22B that needs an 8-GPU node. The practical effect: don't ask "can I run Qwen," ask which Qwen fits your GPU.

The frontier of open weights is Chinese, and it isn't close. Z.ai, Moonshot, DeepSeek, MiniMax, and a deep bench behind them are doing work the Western open labs aren't matching right now, mostly under MIT licenses you can actually use.

Rule of thumb

For self-hosting, it's the total parameter count that sets your memory floor, not the active count: figure roughly one byte per parameter at FP8, half a byte at 4-bit. An 8-GPU H100 node gives you about 640GB; an H200 node, about 1.1TB. Mixture-of-experts keeps the active params small, which buys speed and cost, but you still have to hold the whole model in memory.

On the score

Six capability grades — reasoning, coding, math, knowledge, agentic, long-context — each scored 0 to 100 and averaged into a composite. My judgment, informed by benchmarks and the Artificial Analysis Intelligence Index, not a lab result.

On the numbers

Almost every headline benchmark below is vendor-reported — run by the lab on its own model under maximum thinking effort. Scale's SEAL leaderboard finds vendor scaffolding inflates results by ten to thirty points. The one genuinely comparable spine is the Artificial Analysis Intelligence Index.

Twenty-one labs, scouted

Filter by category, hardware, and license; sort the eight graded frontier models by any radar axis; open a card for the full breakdown.

The editorial board

All 21 models, ranked by significance

Tier 1The FrontierTrading blows with the closed frontier

Composite91

MIT·~753B / 40B active·1M ctx

GLM-5.2

Z.ai (Zhipu AI)

In practice

Claude Opus, if Opus shipped its weights

The single best open-weight model you can download today. GLM-5 took the top open-weight slot on the Artificial Analysis Intelligence Index in April, the first open model to reach that tier, and GLM-5.2 pushed the coding numbers higher in June.

Excels at agentic coding and long-horizon tool use. GLM-5 posted 77.8% on SWE-bench Verified, within about three points of Claude Opus 4.6.

Best for — agentic coding and tool use you self-host

Grades

Reasoning88

Coding95

Math92

Knowledge86

Agentic90

Long ctx92

Benchmarks

51 · AA Intelligence Index

independent · Artificial Analysis

62.1 (vendor) · SWE-bench Pro (5.2)

vendor-reported · Z.ai

The team

Z.ai is the international brand of Zhipu AI, spun out of Tsinghua. The detail that matters: GLM-5 was reportedly trained end to end on Huawei Ascend silicon. That's not a model release, it's a sovereignty statement, a frontier model built without a single Nvidia GPU.

Deployment

A ~753B MoE (40B active): plan for an 8xH100/H200 node at FP8 or 4-bit; full BF16 spans two nodes.

Strengths

+Top open-weight on the AA leaderboard (Apr 2026)

+Best-in-class open coding

+1M context

+MIT license

Watch-outs

–Frontier-class hardware to self-host

–Headline coding/AIME numbers are vendor-reported

The catch. Running it through the Z.ai API carries China data-residency risk — which is exactly why the open weights matter: you don't have to.

Composite90

Modified MIT·1T / 32B active·256K ctx

Kimi K2.6

Moonshot AI

In practice

GPT-5.5 on coding, open and ~80% cheaper

If GLM is the best model, Kimi is the one that doesn't quit. Artificial Analysis called K2.6 the new leading open-weight model, and the headline is endurance.

Excels at long-horizon agentic work. An agent-swarm architecture reportedly ran 12 hours and 4,000 coordinated steps without falling over. It's the first open model to beat GPT-5.4 on SWE-bench Pro, and it ties GPT-5.5 there at 58.6%.

Best for — long-horizon agent runs that can't fall over halfway

Composite90

MIT·Flash 284B / 13B·1M ctx

DeepSeek V4

DeepSeek (High-Flyer)

In practice

Gemini Pro reasoning, at open-weight prices

GLM is the best model on the board. DeepSeek is the most important lab on it.

Excels at frontier reasoning and competition math. V3.2-Speciale reached parity with Gemini 3.0 Pro on hard reasoning and took gold-medal results on the 2025 IMO and IOI.

Best for — frontier reasoning and math, with a deployable Flash tier

Composite85

MIT·230B / 10B active·204K ctx

MiniMax-M2

MiniMax

In practice

A self-hostable Claude for tool-calling

The other frontier models make you choose between intelligence and a hardware budget. MiniMax mostly doesn't.

Excels at agentic tool-calling and coding at a serving cost the others can't touch. VentureBeat called M2 the new king of open-source LLMs for agentic work, and it topped the open-weight Intelligence Index at release.

Best for — agentic tool-calling on one node, cheaply

Tier 2The FieldStrong, with an asterisk

Composite86

Apache 2.0·0.6B → 235B-A22B·up to 262K

Qwen3 family

Alibaba

In practice

The Android of open models

Qwen isn't one model, it's a fleet, and that's its advantage and its asterisk.

Excels at breadth. Over 200 languages, every size from a 0.6B that runs on a phone to a 235B-A22B that needs a cluster, and the deepest tooling ecosystem here. If you want one family to standardize an org on, this is it.

Best for — standardizing an org on one family across every size

Composite80

Apache 2.0·675B / 41B active·256K ctx

Mistral Large 3

Mistral AI

In practice

A GPT-4-class generalist with an EU passport

Mistral is here for a reason that isn't a benchmark, and it's an honest one: it's the credible Western option, and for a lot of buyers that's the whole decision.

Excels at being the non-Chinese answer. Apache 2.0 by conviction, real multimodal range, and a data-sovereignty story that lands in European boardrooms and regulated US sectors that would rather not run weights from Hangzhou.

Best for — the credible non-Chinese pick; jurisdiction and license

Composite84

Apache 2.0*·E2B 2B → 31B dense·256K ctx

Gemma 4

Google DeepMind

In practice

Gemini, shrunk to fit one GPU

The best capability you can run on a single GPU, full stop, and Gemma 4 (March 2026) widened that lead.

Excels at punching wildly above its size. The 31B dense model ranks #3 among all open models on Arena AI (ELO 1,452), and the generational jump is the real story: AIME 2026 went from 20.8% on Gemma 3 to 89.2%, and Codeforces ELO from 110 to 2,150, the largest single-generation leap on record for an open model.

Best for — best capability you can run on a single GPU; 140+ languages, vision

Composite77

Llama Community License·Maverick 17B active·very long

Llama 4

Three things, if you only take three

The frontier of open weights is Chinese, and it's deep.

Not just the top four, but a whole bench, Hunyuan, StepFun, ERNIE, behind them, mostly under MIT and Apache licenses you can use commercially. The Western open ecosystem now competes on values and trust, Mistral on sovereignty, Cohere and IBM on compliance, Ai2 on radical transparency, rather than on topping the leaderboard.

"Open" is splitting in two.

There's open-weight, where you download the model and own your deployment, and open-lab, where a lab that also publishes weights keeps its best model behind an API. Qwen and ERNIE are the warning shots. Watch whether the others follow, because the day the frontier goes API-only is the day "open" stops meaning what you think it means.

And the practical one: you no longer need a frontier closed model to do frontier work.

A single 8-GPU node running MiniMax-M2, DeepSeek V4-Flash, or Cohere Command A+ gets you genuinely close, on hardware you control, with data that never leaves your building. Depending on what you're protecting, that might be the only ranking that matters.

Sources & method

Capability anchored on Artificial Analysis and LMArena where available; vendor-reported figures are labelled. Parameter counts and licenses from official model cards; hardware tiers are rule-of-thumb estimates. Every model here is a post-Jan-2026 release - confirm the current flagship and its open status on publish day.

Artificial Analysis - Intelligence Index & open-weights leaderboard; Kimi K2.6 write-up - artificialanalysis.ai

DeepSeek-V3.2 technical report - arXiv 2512.02556 - Kimi K2 - github.com/moonshotai/Kimi-K2

MiniMax-M2 - github.com/MiniMax-AI/MiniMax-M2 - VentureBeat "new king of open source LLMs"

Mistral 3 / Large 3 - mistral.ai/news/mistral-3 - Llama 4 - ai.meta.com; Behemoth status - interconnects.ai

Gemma 4 - blog.google + ai.google.dev model card - Cohere Command A+ - NVIDIA Nemotron - developer.nvidia.com

Ai2 OLMo 3 - allenai.org/blog/olmo3 - IBM Granite 4.1 - AI21 Jamba - TII Falcon H1R - InternVL3 arXiv 2504.10479

Tencent Hunyuan Hy3 - StepFun Step 3.x - Baidu ERNIE 4.5/5.1 - 01.AI Yi arXiv 2403.04652 - HF SmolLM3 - Microsoft Phi-4

Ranking, radar grades, and verdicts are the author's; benchmarks are sourced. Several figures are vendor-reported - labelled in each card.

Deploying open weights

A single 8-GPU node now gets you frontier-class work on hardware you control.

Northwood provides the compute, models, software, and services to deploy open weights inside sensitive technical organizations - with permissioning, citations, and audit logs built in. Your knowledge never leaves your environment.

Talk to a deployment lead →

The State of Open-Source Models 2026

How to read the board

Twenty-one labs, scouted

GLM-5.2

Kimi K2.6

DeepSeek V4

MiniMax-M2

Qwen3 family

Mistral Large 3

Gemma 4

Llama 4

Command A+

Nemotron 3 / Llama Nemotron

Hunyuan (Hy3)

Step 3.7 Flash

OLMo 3

Granite 4.1

Jamba

Falcon H1R

InternVL 3

Three things, if you only take three

A single 8-GPU node now gets you frontier-class work on hardware you control.