LLM Wiki

10 Apr 2026·3 min read

Prompting

Be explicit about format. Say "respond in bullet points" or "return valid JSON with keys x, y, z." If you don't say it, you'll get prose.
Few-shot beats explanation. Instead of describing the rule, show 2–3 examples of input → output. The model copies patterns faster than it follows instructions.
Chain-of-thought. For anything involving reasoning or math, add "think step by step" or just ask for the reasoning before the answer. Accuracy improves significantly.
Tell it what not to do. "Don't add boilerplate", "don't explain what you're doing" — constraints on the negative side are often more effective than positive instructions.
Role framing. "You are a senior backend engineer reviewing this PR" shapes the tone, depth, and assumed context of the response.
Hedge explicitly. Add "if you're not sure, say so" — models will otherwise confidently fill gaps with plausible-sounding nonsense.

What's in the middle gets ignored. Models attend better to the start and end of the context window. Put the most important instructions at the top, key constraints at the bottom.
Summarize long threads. In a long session, paste a manual summary of what was decided before continuing. Don't trust the model to weight old turns correctly.
Paste the error, not a description of it. Raw stack traces or compiler errors give the model signal that "it throws an error on line 42" doesn't.

Small model + clear prompt often beats big model + vague prompt. Clarify the task before upgrading the model.
Use fast/cheap models (Haiku, Flash) for classification, extraction, reformatting. Use capable models (Sonnet, GPT-4o) for reasoning, code review, synthesis.
Reserve the biggest models (Opus, o3) for genuinely hard problems — multi-step planning, subtle debugging, research synthesis.

Specify language, version, and framework. "In Python 3.12 using FastAPI" is not optional context.
After it writes code, ask "what edge cases does this not handle?" — a separate pass catches things the generation pass misses.
Paste the actual function signature and types when asking for changes, not just a description of the function.
When debugging: share the code + the exact error + what you expected. All three. Missing any one degrades the response.

Ground it. Paste the relevant docs, code, or text, and ask it to answer only from that. "Based only on the following…" works.
Ask for citations. "Quote the part of the text that supports this." Forces it to stay anchored.
Verify any specific facts, numbers, or API details. The model will invent plausible-looking function names and parameter defaults.
Cross-check with a second prompt: "What's wrong with this answer?" after getting a response sometimes surfaces issues.

Treat the first response as a draft. "Make it shorter", "make it more direct", "cut the intro" — iterating beats trying to write a perfect prompt upfront.
If output is consistently off, change the framing, not just the wording. Sometimes "rewrite this as a numbered list" unlocks better structure than any other adjustment.
Asking "what assumptions did you make?" after a response reveals hidden choices that caused the output to diverge from what you wanted.

No memory between sessions unless you give it the context explicitly.
Knowledge cutoff is real — anything recent (new library versions, recent events) needs to be provided in the prompt.
Token limits are hard. When a file is too long to paste, extract the relevant function or section, not the whole file.
Temperature 0 for deterministic tasks (code, extraction). Higher temperature for brainstorming or creative work.