# Context Engineering
*Part of the “Machine Learning for Engineers” series*

**Previous recap:** *Vector Distance Metrics*
---
## Introduction: From Chatbot to Decision Engine
Large Language Models (LLMs) have shifted from being **casual chatbots** to **core decision-making components** in complex systems.
This evolution demands a new way to **communicate** with them during inference.
- **Old approach:** *Prompt engineering* — crafting precise phrasing to “beg” for the right answer.
- **Limitations:** High trial-and-error, fragile results, no guaranteed accuracy.
- **New approach:** *Context engineering* — dynamically, deliberately feeding the LLM **all tokens** it needs for the task.
Today, we’ll explore **context engineering** via a simple example:
> *What is the greatest sci‑fi movie of all time?*
---
## The Context Window
A [Large Language Model (LLM)](https://chrisloy.dev/post/2025/03/23/will-ai-replace-software) processes information as **tokens** (roughly words or characters).
Its **context window** — tens of thousands of tokens at most — defines how much it can “see” at once.
### Training Phase
- LLMs are trained by reading large sequences of tokens from vast corpora (e.g., text scraped from the internet).
### Inference Phase
- The model **predicts the next token** based on all tokens in its current context window.
- The “prompt” is simply the token sequence seen so far.
Example:
> Prompt: `"The greatest sci‑fi movie of all time is…"`
> Prediction: `perhaps Star Wars`

Initially, this “completion mode” was impressive — but hard to control for style, rules, or constraints.
---
## The Chat Format Advantage
To make LLMs easier to guide, developers added **structured conversational formats** to training data:
- Special tokens define roles: *user* vs. *assistant*.
- **System messages** provide persistent role or style instructions.

Now the context window may include:
- Chat history
- System instructions
- User queries
- Additional metadata
> Example: If instructed *“You are a film critic”*, the model’s completions might shift from *Star Wars* to *Blade Runner* — reflecting the critic’s perspective.
**Key point:** The **architecture** hasn’t changed — only the *framing* of inputs.
---
## Prompt Engineering — And Its Limits
Prompt engineering is about finding the right input phrasing to coax the LLM into desired output.
- Often trial-and-error
- Relies on probability, not certainty
- Closer to “casting spells” than true engineering
Example:
> *“You are a knowledgeable, impartial film critic who knows film award history.”*
Hopes for better accuracy — but no guarantees.
---
## In-Context Learning
LLMs can use **examples and data fed during inference** to guide output — this is **in-context learning**.
### What You Can Feed into Context:
- **Hardcoded examples** — curated Q&A, formatting samples
- **Non-text data** — converted images/audio/video
- **Tool/function definitions** — enabling execution outside the LLM
- **Retrieved documents & summaries** — e.g., via **RAG**
- **Conversation history/memory** — summarised long-term interactions
Example: For the movie ranking task, include:
- Box office history
- Top 100 lists
- Rotten Tomatoes scores
- Award results
**Challenge:** Even with 100k+ tokens, space runs out quickly → risk of **hallucinations**.
**Solution:** Curate for *relevance*, *brevity*, *recency*, and *accuracy*.
---
## Treating the LLM as an Analyst
Language encodes **meaning**, not just facts.
When applied correctly, LLMs can act like **analysts**:
- Supply relevant **up-to-date information**
- Define tasks **precisely**
- Document available **tools**
- Avoid relying purely on outdated memory
Instead of crafting the “perfect prompt,” **engineer the token set** required for the task.
---
## Applying Context Engineering to a Real Task
**Question:** *What is the average weekly box office revenue for UK cinemas?*
**Oracle mode answer:** Old (2019) value from training data: ~£24m/week.
**Context engineering approach:**
Include:
- Current date (`June 2024`)
- Latest figures (e.g., [BBC article](https://www.bbc.co.uk/news/articles/cx2j1jpnglvo))
- Calculation instructions for *total ÷ 52 weeks*
**Output:**
> *In 2024, the average weekly UK box office revenue was £18.8m.*
---
## RAG — A Context Engineering Pattern
**Retrieval-Augmented Generation** = fetching relevant data at inference time and inserting it into the LLM’s context.
- Conceptually simple; technically demanding to implement robustly.
- Helps avoid hallucinations by grounding the output in current data.
Example uses:
- Search for latest movie reviews and awards
- Inject summaries into the prompt
---
## Context Engineering Design Patterns
Like software engineering, context engineering benefits from reusable **patterns**:
- **RAG** — Inject topically relevant documents
- **Tool Calling** — Integrate external computation/functions
- **Structured Output** — Fix output format as JSON/XML
- **Chain of Thought / ReAct** — Include visible reasoning steps
- **Context Compression** — Shorten histories into key facts
- **Memory** — Persist knowledge between sessions
These patterns enable **composable designs** — easy to extend and maintain.
---
## Building Multi-Agent Systems
Production-scale AI will often use **multiple specialised agents**, each with tailored context:
Example: *Multi-agent movie ranker*
- **Chatbot Agent** — Talks to the user
- **Safety Agent** — Filters malicious input
- **Preference Agent** — Applies user-specific filters
- **Critic Agent** — Combines facts for the final ranking
Agents pass outputs into each other’s context windows — much like API calls.
---
## Key Takeaways
To engineer context effectively:
1. Treat LLMs as **analysts**, not oracles.
2. Own the **whole context window**, not just the user prompt.
3. Use **tested patterns** for reliability and reuse.
4. Treat agent-to-agent handovers as **API contracts**.
By doing so, we bring the rigor of **software engineering** to **context engineering** — enabling accurate, maintainable, and scalable AI systems.
---
**Reference:** [Original Post](https://chrisloy.dev/post/2025/08/03/context-engineering)
See all posts: [baoyu.io/translations](https://baoyu.io/translations)