Hunting the Repetition Loop in a Self-Hosted LLM Agent

When the Agent Kept Repeating Itself At first I thought a request had hung. Where a tool call should have been, the model was instead generating its way toward max_tokens and getting nowhere — sometimes repeating the same sentence over and over, other times just producing low-value filler that never resolved into the JSON the tool call needed. Either way it would burn through the token budget, occasionally time out, and take the whole agent loop down with it. ...

June 23, 2026 · nbdawn

LLMs: The Things We've Been Overlooking

LLMs: The Things We’ve Been Overlooking “What temperature are you using?” If someone asks, what do you say? “The default.” “0.7.” “I don’t know — does it matter?” Most answers fall into one of those three. And if you try to justify the answer, you run out of words fast. That’s how we use LLMs. We call the APIs every day — stuff prompts into messages, send them off, get responses. But when the question becomes “What does Temperature actually do?”, “How is Top-P different from Temperature?”, “Does Prompt Caching just work if you turn it on?”, “Will hallucinations go away with a better model?” — the answers get fuzzy. ...

April 12, 2026 · nbdawn