LLM
Context Engineering | Chris Loy

Honghao Wang

03 Nov 2025 — 3 min read
# Context Engineering  
*Part of the “Machine Learning for Engineers” series*  

![image](https://blog.aitoearn.ai/content/images/2025/11/img_001-5.png)  

**Previous recap:** *Vector Distance Metrics*  

---

## Introduction: From Chatbot to Decision Engine  

Large Language Models (LLMs) have shifted from being **casual chatbots** to **core decision-making components** in complex systems.  
This evolution demands a new way to **communicate** with them during inference.

- **Old approach:** *Prompt engineering* — crafting precise phrasing to “beg” for the right answer.  
- **Limitations:** High trial-and-error, fragile results, no guaranteed accuracy.  
- **New approach:** *Context engineering* — dynamically, deliberately feeding the LLM **all tokens** it needs for the task.  

Today, we’ll explore **context engineering** via a simple example:  
> *What is the greatest sci‑fi movie of all time?*  

---

## The Context Window  

A [Large Language Model (LLM)](https://chrisloy.dev/post/2025/03/23/will-ai-replace-software) processes information as **tokens** (roughly words or characters).  
Its **context window** — tens of thousands of tokens at most — defines how much it can “see” at once.

### Training Phase  
- LLMs are trained by reading large sequences of tokens from vast corpora (e.g., text scraped from the internet).  

### Inference Phase  
- The model **predicts the next token** based on all tokens in its current context window.  
- The “prompt” is simply the token sequence seen so far.

Example:  
> Prompt: `"The greatest sci‑fi movie of all time is…"`  
> Prediction: `perhaps Star Wars`

![image](images/img_003.jpg)  

Initially, this “completion mode” was impressive — but hard to control for style, rules, or constraints.

---

## The Chat Format Advantage  

To make LLMs easier to guide, developers added **structured conversational formats** to training data:  

- Special tokens define roles: *user* vs. *assistant*.  
- **System messages** provide persistent role or style instructions.  

![image](images/img_004.jpg)  

Now the context window may include:  
- Chat history  
- System instructions  
- User queries  
- Additional metadata  

> Example: If instructed *“You are a film critic”*, the model’s completions might shift from *Star Wars* to *Blade Runner* — reflecting the critic’s perspective.

**Key point:** The **architecture** hasn’t changed — only the *framing* of inputs.

---

## Prompt Engineering — And Its Limits  

Prompt engineering is about finding the right input phrasing to coax the LLM into desired output.  
- Often trial-and-error  
- Relies on probability, not certainty  
- Closer to “casting spells” than true engineering

Example:
> *“You are a knowledgeable, impartial film critic who knows film award history.”*  
Hopes for better accuracy — but no guarantees.

---

## In-Context Learning  

LLMs can use **examples and data fed during inference** to guide output — this is **in-context learning**.

### What You Can Feed into Context:
- **Hardcoded examples** — curated Q&A, formatting samples  
- **Non-text data** — converted images/audio/video  
- **Tool/function definitions** — enabling execution outside the LLM  
- **Retrieved documents & summaries** — e.g., via **RAG**  
- **Conversation history/memory** — summarised long-term interactions  

Example: For the movie ranking task, include:
- Box office history  
- Top 100 lists  
- Rotten Tomatoes scores  
- Award results  

**Challenge:** Even with 100k+ tokens, space runs out quickly → risk of **hallucinations**.  
**Solution:** Curate for *relevance*, *brevity*, *recency*, and *accuracy*.

---

## Treating the LLM as an Analyst  

Language encodes **meaning**, not just facts.  
When applied correctly, LLMs can act like **analysts**:

- Supply relevant **up-to-date information**
- Define tasks **precisely**
- Document available **tools**
- Avoid relying purely on outdated memory  

Instead of crafting the “perfect prompt,” **engineer the token set** required for the task.

---

## Applying Context Engineering to a Real Task  

**Question:** *What is the average weekly box office revenue for UK cinemas?*  

**Oracle mode answer:** Old (2019) value from training data: ~£24m/week.  

**Context engineering approach:**  
Include:
- Current date (`June 2024`)
- Latest figures (e.g., [BBC article](https://www.bbc.co.uk/news/articles/cx2j1jpnglvo))
- Calculation instructions for *total ÷ 52 weeks*  

**Output:**  
> *In 2024, the average weekly UK box office revenue was £18.8m.*

---

## RAG — A Context Engineering Pattern  

**Retrieval-Augmented Generation** = fetching relevant data at inference time and inserting it into the LLM’s context.

- Conceptually simple; technically demanding to implement robustly.
- Helps avoid hallucinations by grounding the output in current data.

Example uses:  
- Search for latest movie reviews and awards  
- Inject summaries into the prompt  

---

## Context Engineering Design Patterns  

Like software engineering, context engineering benefits from reusable **patterns**:

- **RAG** — Inject topically relevant documents  
- **Tool Calling** — Integrate external computation/functions  
- **Structured Output** — Fix output format as JSON/XML  
- **Chain of Thought / ReAct** — Include visible reasoning steps  
- **Context Compression** — Shorten histories into key facts  
- **Memory** — Persist knowledge between sessions  

These patterns enable **composable designs** — easy to extend and maintain.

---

## Building Multi-Agent Systems  

Production-scale AI will often use **multiple specialised agents**, each with tailored context:

Example: *Multi-agent movie ranker*  
- **Chatbot Agent** — Talks to the user  
- **Safety Agent** — Filters malicious input  
- **Preference Agent** — Applies user-specific filters  
- **Critic Agent** — Combines facts for the final ranking  

Agents pass outputs into each other’s context windows — much like API calls.

---

## Key Takeaways  

To engineer context effectively:
1. Treat LLMs as **analysts**, not oracles.  
2. Own the **whole context window**, not just the user prompt.  
3. Use **tested patterns** for reliability and reuse.  
4. Treat agent-to-agent handovers as **API contracts**.

By doing so, we bring the rigor of **software engineering** to **context engineering** — enabling accurate, maintainable, and scalable AI systems.

---

**Reference:** [Original Post](https://chrisloy.dev/post/2025/08/03/context-engineering)  
See all posts: [baoyu.io/translations](https://baoyu.io/translations)
Context Engineering | Chris Loy

Honghao Wang

Read more

These College Students Are Helping OPPO Build AI Products

Ilya’s Shocking Testimony: Altman’s Wrongdoing, Mira’s Drama, and OpenAI’s Near-Merger with Anthropic

Reasons Against pgvector: Technical Challenges at Scale

Elimination Game’s New Innovative Gameplay Hits $1M Monthly Revenue in 70 Days