compiled
The Mission

Run the largest possible frontier AI model locallywithout the complexity.

The goal is simple: run the biggest open-weight model possible at good tokens per second, entirely on local or consumer hardware hardware. No cloud. No API keys. Just your machine.

Follow the build →
Current state · 16GB VRAM
Qwen3.6 35B model · MoE
40.4 tokens/sec · 35B model
2.2× over Ollama default
$0 cloud cost · fully local
scroll
Latest Posts 3 published
Running Qwen3.6 35B at 40 TPS on Consumer Hardware

Ollama leaves 2.2× performance on the table for MoE models. A deep dive into memory bandwidth hierarchy and why GPU utilization % is a misleading metric.

Change Default Directory for Ollama

How to change the default directory for Ollama models on Windows using an environment variable.

What Are LLMs (Large Language Models)?

A clear breakdown of what large language models are, how they work, and why they matter for local inference.

What this is

Compiled Thoughts is a public build log with one goal: run the biggest open-weight model possible at good tokens per second, entirely on local hardware.

Open-weight models are getting bigger and better fast. But actually running them — on your own machine, without cloud APIs or expensive subscriptions — is still harder than it should be. This blog is about closing that gap.

Every post is a step in the build: benchmarks, tooling, configuration, failures, and breakthroughs. All numbers are real and measured.

More posts

Browse all articles, or follow the build from the beginning.

View all posts →