Run the largest possible frontier AI model locally — without the complexity.

The goal is simple: run the biggest open-weight model possible at good tokens per second, entirely on local or consumer hardware hardware. No cloud. No API keys. Just your machine.

Follow the build →

Current state · 16GB VRAM

Qwen3.6 35B model · MoE

40.4 tokens/sec · 35B model

2.2× over Ollama default

$0 cloud cost · fully local

scroll

Latest Posts 3 published

9 May 2026

Experiment 01 MoE llama.cpp RTX 5060 Ti

Running Qwen3.6 35B at 40 TPS on Consumer Hardware

Ollama leaves 2.2× performance on the table for MoE models. A deep dive into memory bandwidth hierarchy and why GPU utilization % is a misleading metric.

Change Default Directory for Ollama

How to change the default directory for Ollama models on Windows using an environment variable.

What Are LLMs (Large Language Models)?

A clear breakdown of what large language models are, how they work, and why they matter for local inference.

5 min read

→

What this is

Compiled Thoughts is a public build log with one goal: run the biggest open-weight model possible at good tokens per second, entirely on local hardware.

Open-weight models are getting bigger and better fast. But actually running them — on your own machine, without cloud APIs or expensive subscriptions — is still harder than it should be. This blog is about closing that gap.

Every post is a step in the build: benchmarks, tooling, configuration, failures, and breakthroughs. All numbers are real and measured.

Browse all articles, or follow the build from the beginning.

View all posts →