I think I found the angle

I spent time this week comparing ways to generate text and interpret inputs, and a clearer shape started to form. I keep coming back to a simple idea: the model can be a tool in the stack instead of the brain that decides the action. I think a lot of alignment problems show up because we push language models into roles that make auditing hard. We can inspect token probabilities and read a rationale, yet still lack a reliable way to verify why a decision happened in the first place. Anthropic’s public work on reward hacking and misalignment makes that gap feel more urgent.

The alternative I am testing breaks decision making into smaller steps that turn messy prose into structured data through a network of interconnected and auditable classifiers. That gives deterministic outcomes while keeping the flexibility of language model output. The model handles the voice and code after the decision has been made, which keeps the action path clean and inspectable.

That leaves the hard part: training a lot of classifiers and arranging them into a stack that stays faithful to the mission kernel. I am in the weeds now, but it feels like the first time in a while where the structure is clicking into place.

I think I found the angle

Related posts

Goal: Optimize for old potatoes

I swear i'm not distracted

Marketing mode is eating my soul

Anthony Cote