Groq Benchmark Results

Groq is an inference provider known for high-throughput, low-latency serving using custom LPU hardware. Benchscope records public evaluation runs across 5 model families hosted on Groq, covering GAIA_TEXT, MUSR, GSM8K, MMLU.

About Groq Endpoints

Groq's LPU architecture prioritizes inference speed. Benchmark scores from Groq endpoints reflect their specific serving configuration and hardware — not model capability in isolation. For the same model family, scores from Groq can differ from the same model hosted elsewhere due to quantization, infrastructure, or serving optimizations. Use canonical-prompt runs for the cleanest cross-provider comparisons.

Hosted Model Families

Model families with public evaluation runs on Groq: Llama 3.3 70B, Qwen3 32B, Llama 3.1 8B, GPT-OSS 20B, GPT-OSS 120B.

Recent Groq Runs

Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.