Inference

LLMs: The Things We’ve Been Overlooking “What temperature are you using?” If someone asks, what do you say? “The default.” “0.7.” “I don’t know — does it matter?” Most answers fall into one of those three. And if you try to justify the answer, you run out of words fast. That’s how we use LLMs. We call the APIs every day — stuff prompts into messages, send them off, get responses. But when the question becomes “What does Temperature actually do?”, “How is Top-P different from Temperature?”, “Does Prompt Caching just work if you turn it on?”, “Will hallucinations go away with a better model?” — the answers get fuzzy. ...

Inference

Hunting the Repetition Loop in a Self-Hosted LLM Agent

LLMs: The Things We've Been Overlooking