ai-setup 6 min read

LocalScore - Open Local LLM Benchmark

Measure how well your hardware runs local AI models. Open-source benchmark tests prompt speed, generation speed, and time-to-first-token for LLMs.

By
Share: X in
LocalScore local LLM benchmark thumbnail

TL;DR

TL;DR: LocalScore is an open-source benchmarking tool that measures how fast large language models run on your specific hardware, helping you decide if your computer can handle local AI tasks.

Source and Accuracy Notes

What Is LocalScore?

LocalScore is an open-source benchmarking tool designed to measure how fast Large Language Models (LLMs) run on your specific hardware. It’s a Mozilla Builders Project that provides both a CLI tool for running benchmarks and a public database for comparing results across different hardware configurations.

The tool addresses a common question in the local AI community: “Can my computer run this model, and how well will it perform?”

How LocalScore Works

LocalScore measures three key performance metrics:

  • Prompt Processing Speed: How quickly your system processes input text (tokens per second)
  • Generation Speed: How fast your system generates new text (tokens per second)
  • Time to First Token: The latency before the first response appears (milliseconds)

These metrics are combined into a single LocalScore value:

  • 1,000: Excellent performance
  • 250: Passable performance
  • Under 100: Poor user experience in some regard

Under the hood, LocalScore uses Llamafile to ensure portability across different systems and hardware accelerators.

The Test Suite

LocalScore runs three test scenarios that emulate common LLM tasks:

| Prompt Tokens | Generation Tokens | Use Cases | |---------------|-------------------|-----------| | 1024 | 16 | Classification, sentiment analysis, keyword extraction | | 4096 | 256 | Long document Q&A, RAG, short summaries | | 2048 | 256 | Article summarization, medium-length generation |

The benchmark supports different model sizes:

  • Tiny (1B): Requires approximately 2GB memory
  • Small (8B): Requires approximately 6GB memory
  • Medium (14B): Requires approximately 10GB memory

Running LocalScore

Step 1: Download the CLI

Visit the download page and select your operating system:

macOS/Linux:

curl -OL https://localscore.ai/download/localscore-tiny
chmod +x localscore-tiny

Windows:

Download localscore-0.9.3.exe from the website.

Step 2: Download a Model

You can use the official models or your own .gguf files. The CLI will guide you through downloading test models if needed.

Step 3: Run the Benchmark

macOS/Linux:

./localscore-tiny

Windows:

localscore-0.9.3.exe -m Llama-3.2-1B-Instruct-Q4_K_M.gguf

If you have your own model:

./localscore-0.9.3 -m path/to/model.gguf

Step 4: Submit Your Results

After running the benchmark, you can submit your results to the public database to help others compare hardware performance.

Real-World Results

As of June 16, 2026, the latest results show performance across various hardware:

  • AMD Ryzen Threadripper PRO 3945WX with 123.4GB RAM: Running Llama 3.2 1B at 64.7 tokens/s generation, 374 tokens/s prompt processing, LocalScore 187
  • AMD Ryzen 5 5600 with 31.9GB RAM: Running Llama 3.2 1B at 42.1 tokens/s generation, 260 tokens/s prompt, LocalScore 124
  • AMD Ryzen 5 3600 with 15.5GB RAM: Running Qwen2.5 14B at 4.2 tokens/s generation, LocalScore 9

The database includes results from Apple Silicon (M3, M4 Pro, M3 Ultra), AMD Ryzen processors, Intel CPUs, and various GPU configurations.

Why Use LocalScore?

Hardware Purchase Decisions

If you’re considering building or buying a system for local AI, LocalScore helps you understand what performance to expect. You can browse results from similar hardware configurations before making a purchase.

Model Selection

Different models have different performance characteristics. LocalScore helps you understand which model sizes run well on your hardware and what kind of user experience to expect.

Performance Comparison

The standardized test suite makes it easy to compare performance across different hardware, operating systems, and model configurations.

Limitations

LocalScore focuses on standard inference workloads. It does not currently cover:

  • Speculative decoding performance
  • Multi-model or agent workflows
  • Fine-tuning or training workloads
  • Ollama integration (requested by community)
  • API-based comparisons (vllm, tgi, llama.cpp backends)

The developers have acknowledged these limitations and the community has requested additional features.

Practical Evaluation Checklist

Use LocalScore to answer these questions:

  • Can my current hardware run 8B models at acceptable speed?
  • Is it worth upgrading to 32GB RAM for 14B models?
  • How does my Apple Silicon Mac compare to a dedicated GPU setup?
  • What’s the real-world performance difference between Q4_K_M and Q5_K_M quantization?
  • Should I run models on CPU or invest in GPU acceleration?

Security and Privacy

LocalScore is designed for local-first AI workflows:

  • All benchmarks run entirely on your hardware
  • No data is sent to external servers during testing
  • Result submission is optional and anonymous
  • Works completely offline after downloading models

FAQ

Q: Is LocalScore open source? A: Yes, LocalScore is an open-source project sponsored by Mozilla Builders.

Q: What models does it support? A: LocalScore works with any .gguf format model. The CLI includes options to download official test models in 1B, 8B, and 14B sizes.

Q: Does it work with Ollama? A: Not currently. LocalScore uses Llamafile under the hood for portability. Ollama integration has been requested by the community.

Q: Can I benchmark remote servers? A: The current version runs locally. Community members have requested support for benchmarking remote headless servers and comparing different inference backends.

Q: How long does a benchmark take? A: It depends on your hardware and the model size. Tiny (1B) models on modern hardware complete in seconds. Larger models on CPU-only systems may take several minutes.

Q: Is there a GUI? A: LocalScore is a CLI tool, but the website provides a browsable database of results with visualizations at localscore.ai/latest.

Conclusion

LocalScore fills an important gap in the local AI ecosystem by providing standardized, comparable benchmarks for LLM performance across different hardware configurations. Whether you’re deciding between hardware upgrades, choosing model sizes, or just curious about your system’s AI capabilities, LocalScore gives you concrete data to work with.

As a Mozilla Builders Project, it brings institutional credibility to the open-source local AI movement. The public results database creates a valuable community resource that grows more useful with each submission.

If you’re running local LLMs or considering it, running LocalScore takes just a few minutes and contributes to a growing body of knowledge about real-world AI performance on consumer hardware.