
AI Writing Assistants
So what even is an AI writing assistant? Think of it like a very well-read friend who has absorbed an enormous amount of …
Jun 01, 2026 · 3 min read
Large language models (LLMs) are next-token predictors. That’s the entire mechanism. Given a sequence of tokens, the model outputs a probability distribution over every token in its vocabulary, then samples from that distribution to pick the next one. Repeat until done.
“Token” ≠ word. Tokens are subword units — unbelievable might be un, believ, able. English text typically runs around 0.75 words per token, meaning a 1,000-word post is roughly 1,300 tokens.
Every call to the model operates on a fixed context window — the full sequence of tokens the model can “see” at once. As of mid-2025, frontier models range from ~128K tokens (roughly 100K words) to over 1M tokens.
The context includes:
When the window fills, older content either falls off (sliding window) or gets summarized depending on the implementation. This is why long conversations sometimes feel like the model “forgot” something from earlier.
The probability distribution at each step is shaped by a temperature parameter:
Most writing tools set temperature somewhere between 0.7 and 1.0. “More creative” outputs are literally higher-temperature samples.
The model has no mechanism for distinguishing “I know this” from “I’m inferring this.” It predicts plausible continuations of text. A plausible continuation of “The capital of France is” is “Paris.” A plausible continuation of “The CEO of [obscure company] is” might be a name that sounds right but isn’t.
The model doesn’t know it’s wrong. It produced the highest-probability continuation given its training. This is a fundamental property of the architecture, not a bug that will be patched away.
Mitigation strategies: retrieval-augmented generation (RAG), grounding responses against a known document corpus, asking the model to cite sources, and just… fact-checking outputs on anything that matters.
Prompts are just the beginning of the input sequence. Everything that follows is prediction. This means:
Pre-training: show the model trillions of tokens of internet text, have it predict the next token, backpropagate the error, update the weights. Repeat billions of times.
Fine-tuning: take the pre-trained model, show it examples of the specific behavior you want (Q&A pairs, instruction-following, etc.), repeat.
RLHF (Reinforcement Learning from Human Feedback): have humans rank model outputs, train a reward model on those rankings, use that reward model to fine-tune further. This is the step that turns “plausible text predictor” into “assistant that tries to be helpful.”
The weights — billions of floating point numbers — are the model. Everything it “knows” is encoded in those weights from training. It learns nothing new at inference time (absent tools or RAG).

So what even is an AI writing assistant? Think of it like a very well-read friend who has absorbed an enormous amount of …
My actual daily workflow I use Claude for almost every post I write on this site. Not to write the posts — to work …