Goal: Optimize for old potatoes

Today I landed on a direction that feels obvious in hindsight. I keep seeing teams treat LLMs as the brain of an agentic system when they are better used as a tool. The arms race for bigger models and bigger clusters locks ordinary people into cloud workflows for tiny tasks, and I want the opposite. I want maximum utility per watt, and I want it to run locally.

So I started pushing the decision layer down into smaller, auditable components. I run a chain of lightweight classifiers and rules first, then only invoke a large model when I actually need text generation. That shift keeps function calling, multi step tool use, retrieval, and file actions on the table while pulling most of the compute cost off the critical path. I am testing this on a 2014 Mac mini with 8GB of RAM and CPU only, and the system stays responsive even though the accuracy still needs work. That is expected this early, and the training set is still small.

The more I play with this, the more I see how it helps alignment. When each component has a narrow, explicit job, it cannot wander far from the user goal because it does not even have the capacity to invent a new one. The LLM becomes the mouth, not the driver, and the agentic loop stops trying to be clever for its own sake. I can see a path where this runs on a Raspberry Pi and still feels fast.

Another reason this matters is hallucination. A token prediction model almost never stops with a clean “I do not know” because it is built to keep predicting. Ask it to write a letter without context and it will invent context to keep the text flowing. In this stack, the “I need more information” path is deterministic and enforced unless confidence crosses a hard threshold. The LLM shows up at the end of the chain with a prompt that already contains the decision, so it can ask for missing details or render the output without guessing. That means the system can refuse to pretend and still stay useful.

Next I want to tighten the pipeline and push it back into the Control OS modules for integration tests. I am also building a local layer that lets people swap in OpenAI, Anthropic, or Google when they want richer output, but the baseline stays local and cheap. That is the point of the stack and I am committed to it.

Goal: Optimize for old potatoes

Related posts

I swear i'm not distracted

I think I found the angle

Marketing mode is eating my soul

Anthony Cote