Skip to content
← Back to Notes

LLM Wiki

Prompting

  • Be explicit about format. Say "respond in bullet points" or "return valid JSON with keys x, y, z." If you don't say it, you'll get prose.
  • Few-shot beats explanation. Instead of describing the rule, show 2–3 examples of input → output. The model copies patterns faster than it follows instructions.
  • Chain-of-thought. For anything involving reasoning or math, add "think step by step" or just ask for the reasoning before the answer. Accuracy improves significantly.
  • Tell it what not to do. "Don't add boilerplate", "don't explain what you're doing" — constraints on the negative side are often more effective than positive instructions.
  • Role framing. "You are a senior backend engineer reviewing this PR" shapes the tone, depth, and assumed context of the response.
  • Hedge explicitly. Add "if you're not sure, say so" — models will otherwise confidently fill gaps with plausible-sounding nonsense.

Context

  • What's in the middle gets ignored. Models attend better to the start and end of the context window. Put the most important instructions at the top, key constraints at the bottom.
  • Summarize long threads. In a long session, paste a manual summary of what was decided before continuing. Don't trust the model to weight old turns correctly.
  • Paste the error, not a description of it. Raw stack traces or compiler errors give the model signal that "it throws an error on line 42" doesn't.

Model choice

  • Small model + clear prompt often beats big model + vague prompt. Clarify the task before upgrading the model.
  • Use fast/cheap models (Haiku, Flash) for classification, extraction, reformatting. Use capable models (Sonnet, GPT-4o) for reasoning, code review, synthesis.
  • Reserve the biggest models (Opus, o3) for genuinely hard problems — multi-step planning, subtle debugging, research synthesis.

Coding

  • Specify language, version, and framework. "In Python 3.12 using FastAPI" is not optional context.
  • After it writes code, ask "what edge cases does this not handle?" — a separate pass catches things the generation pass misses.
  • Paste the actual function signature and types when asking for changes, not just a description of the function.
  • When debugging: share the code + the exact error + what you expected. All three. Missing any one degrades the response.

Avoiding hallucination

  • Ground it. Paste the relevant docs, code, or text, and ask it to answer only from that. "Based only on the following…" works.
  • Ask for citations. "Quote the part of the text that supports this." Forces it to stay anchored.
  • Verify any specific facts, numbers, or API details. The model will invent plausible-looking function names and parameter defaults.
  • Cross-check with a second prompt: "What's wrong with this answer?" after getting a response sometimes surfaces issues.

Output iteration

  • Treat the first response as a draft. "Make it shorter", "make it more direct", "cut the intro" — iterating beats trying to write a perfect prompt upfront.
  • If output is consistently off, change the framing, not just the wording. Sometimes "rewrite this as a numbered list" unlocks better structure than any other adjustment.
  • Asking "what assumptions did you make?" after a response reveals hidden choices that caused the output to diverge from what you wanted.

Practical limits

  • No memory between sessions unless you give it the context explicitly.
  • Knowledge cutoff is real — anything recent (new library versions, recent events) needs to be provided in the prompt.
  • Token limits are hard. When a file is too long to paste, extract the relevant function or section, not the whole file.
  • Temperature 0 for deterministic tasks (code, extraction). Higher temperature for brainstorming or creative work.