LangChain

The Road to Rebuilding LangChain Chatbots and Lessons Learned

Honghao Wang

06 Nov 2025 — 3 min read

Background

Every successful platform needs reliable support. We found our engineers were spending hours chasing answers to technical questions—creating a critical bottleneck for users.

We decided to solve this using our own stack: LangChain, LangGraph, and LangSmith. We initially built chat.langchain.com as a prototype with two primary purposes:

Product Q&A — Provide instant, authoritative answers to product questions for both users and internal staff.
Customer Prototype — Demonstrate how customers can build sophisticated, reliable agents using LangChain tools.

However, our support engineers weren’t actively using the chatbot. Solving that adoption problem taught us how to build reliable, production-grade agents our customers could adapt.

---

The Challenge

Engineers didn’t avoid Chat LangChain because it was broken—they needed deeper guidance than documentation alone.

Typical workflow when investigating issues:

Search documentation — (docs.langchain.com) to understand intent.
Check the knowledge base — (support.langchain.com) to see real-world resolutions.
Inspect code — Use Claude Code to search and verify the implementation.

> Flow: Docs → Knowledge Base → Codebase.

> Docs provide the official narrative, the KB contains real-world fixes, and code is the ground truth.

We realized the chatbot needed to embed this exact three-step troubleshooting flow.

---

Automating the Workflow

We built an internal Deep Agent with:

Documentation search agent
Knowledge base search agent
Codebase search agent

Each subagent:

Asks follow-up questions
Filters irrelevant results
Passes refined insights to a main orchestrator agent

The orchestrator synthesizes inputs into actionable, verified answers—with citations and exact code lines.

---

Realizing Public Chat LangChain Needed an Upgrade

The public chatbot still used a chunk → embedding → vector search pattern:

Required constant reindexing when docs changed
Fragmented context
Citations often incomplete

Our internal Deep Agent was higher quality, more precise and needed to be public.

---

Designing the New Agent Architecture

Category 1: Documentation & KB Questions

Tool: createAgent — minimal overhead, executes tools fast.

Why:

No planning/orchestration phase.
Answers most questions in 3–6 tool calls within seconds.

Models Offered:

Claude Haiku 4.5 — fastest, high accuracy in tool execution
GPT‑4o Mini / GPT‑4o‑nano — alternatives

Optimization:

Used LangSmith to trace conversations
Reduced unnecessary tool calls
Improved follow-up prompting

---

Category 2: Codebase Questions

Tool: Deep Agent with domain-specific subgraphs

Subagents:

Docs search
KB search
Code search

Benefits:

Prevents overload in main agent
Allows deep digging per domain

Tradeoff:

Slower (1–3 min) for complex queries, but far more thorough.

Initially rolled out to select users.

---

Why We Removed Vector Embeddings for Docs

Embedding-based RAG is great for unstructured content (PDFs), but for our structured docs it caused:

Broken context from chunking
Constant reindexing overhead
Vague citations

Solution: Direct API access to existing structure.

Full pages from Mintlify API
Title-first search in KB, then full article read
Code search via uploaded repos to LangGraph Cloud + `ripgrep`

---

Smart Prompting for Human-Like Search

Instead of similarity scores, the agent searches like a human:

Broad keyword search
Evaluate results critically
Refine search terms until the right context is found

Iterative process encourages “active research” rather than passive retrieval.

---

Tool Design Highlights

Docs Search (Mintlify)

Returns full pages with headers, subsections, examples.

@tool
def SearchDocsByLangChain(query: str, page_size: int = 5, language: Optional[str] = None) -> str:
    """Search LangChain documentation via Mintlify API"""
    params = {"query": query, "page_size": page_size}
    if language:
        params["language"] = language
    response = requests.get(MINTLIFY_API_URL, params=params)
    return _format_search_results(response.json())

---

KB Search (Pylon)

Two-step:

Scan titles
Read selected articles in full

@tool
def search_support_articles(collections: str = "all", limit: int = 50) -> str:
    """Get article titles to scan"""
    ...
@tool
def get_article_content(article_ids: List[str]) -> str:
    """Read the most relevant articles"""
    ...

---

Code Search

Three tools:

Search patterns (`ripgrep`)
List directories (file structure)
Read files (exact lines)

@tool
def search_public_code(pattern: str, path: Optional[str] = None) -> str:
    ...

---

Managing Context Overload with Subgraphs

Before: Single agent saw all raw results → too much noise.

Now: Each subagent returns only golden data to orchestrator.

Benefits:

Precision
Reduced irrelevant details
Cleaner synthesis

---

Production-Ready Infrastructure

Added middleware:

middleware = [
    guardrails_middleware,
    model_retry_middleware,
    model_fallback_middleware,
    anthropic_cache_middleware
]

Handles:

Off-topic filtering
API retries
Model fallback
Response caching

---

Delivering to Users

Thread Handling:

const userThreads = await client.threads.search({
  metadata: { user_id: userId },
  limit: THREAD_FETCH_LIMIT,
})

Streaming Responses:

const streamResponse = client.runs.stream(threadId, "docs_agent", {...})

Modes:

messages — progressive tokens
updates — live tool calls
values — final state

---

Results

Sub-15-second answers with Create Agent for docs/KB
Immediate reflection of doc updates
Precise citations with verifiable links
Engineers use Deep Agent for complex tickets → hours saved

---

Key Takeaways

Automate proven user workflows
Vector embeddings not ideal for structured content
Direct structured access yields better citations/context
Mirror human reasoning patterns in tool/agent design
Use Deep Agents + subgraphs to manage multi-domain context
Infrastructure (middleware) matters for production reliability

---

What’s Next

Public codebase search — Direct repo verification + line citations

---

Try It

Visit chat.langchain.com and test models:

Claude Haiku 4.5 (fastest)
GPT‑4o Mini
GPT‑4o‑nano

---

Join the Conversation

Forum
Twitter
Subscribe to the newsletter

---

Bottom line: Whether debugging code, answering product questions, or scaling content workflows, combining well-designed agent architectures with human-like reasoning and direct access to structure yields faster, more accurate, and verifiable responses—ready for both internal users and public audiences.