Groq Benchmark Results

Groq is an inference provider known for high-throughput, low-latency serving using custom LPU hardware. Benchscope records public evaluation runs across 5 model families hosted on Groq, covering GAIA_TEXT, MUSR, GSM8K, MMLU.

About Groq Endpoints

Groq's LPU architecture prioritizes inference speed. Benchmark scores from Groq endpoints reflect their specific serving configuration and hardware — not model capability in isolation. For the same model family, scores from Groq can differ from the same model hosted elsewhere due to quantization, infrastructure, or serving optimizations. Use canonical-prompt runs for the cleanest cross-provider comparisons.

Hosted Model Families

Model families with public evaluation runs on Groq: Llama 3.3 70B, Qwen3 32B, Llama 3.1 8B, GPT-OSS 20B, GPT-OSS 120B.

Recent Groq Runs

Groq / Llama 3.3 70B on GAIA_TEXT: completed; 20.0%; 1015 ms p50 latency; 10 samples.
Groq / Qwen3 32B on GAIA_TEXT: partial; 33.3%; 8102 ms p50 latency; 10 samples.
Groq / Qwen3 32B on MUSR: completed; 30.0%; 1672 ms p50 latency; 20 samples.
Groq / Llama 3.1 8B on GSM8K: completed; 74.1%; 422 ms p50 latency; 27 samples.
Groq / Qwen3 32B on GSM8K: completed; 89.0%; 4318 ms p50 latency; 100 samples.

Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.

Groq Benchmark Results

About Groq Endpoints

Hosted Model Families

Recent Groq Runs

Related