Skip to main content

Dynamic Middleware: Adapting Prompts, Tools, and Models at Runtime

Every piece of middleware Aria has used so far behaves identically no matter who's talking to her. But Beacon & Co. isn't just Julie — Aria might eventually field questions from colleagues and external clients, and those two groups shouldn't get the same treatment. A colleague asking about catalog numbers is a normal internal question. A client asking the same thing might be asking for data they shouldn't see at all.

This article covers dynamic middleware — prompts, tools, and even the underlying model itself, all adapting automatically based on who's asking and what's happening in the conversation, building on the same UserRole-style context from Article 7 and the SQL tool from Article 9.

🔴 Skill level: Advanced.

Quick Reference

When to use this: Whenever an agent's prompt, available tools, or model choice should change based on context (who's asking) or state (what's happened so far), rather than staying fixed.

Basic syntax:

from langchain.agents.middleware import wrap_model_call, dynamic_prompt, ModelRequest, ModelResponse

@dynamic_prompt
def my_prompt(request: ModelRequest) -> str:
return "..." if request.runtime.context.user_role == "internal" else "..."

@wrap_model_call
def my_middleware(request: ModelRequest, handler) -> ModelResponse:
request = request.override(tools=[...]) # or model=..., system_prompt=...
return handler(request)

Common patterns:

  • @dynamic_prompt — a simple decorator just for generating a system prompt dynamically
  • @wrap_model_call — a general-purpose decorator that can override tools, model, or prompt before the model is actually called
  • request.override(...) changes settings for this one call only — it never changes the agent's permanent configuration

Gotchas:

  • ⚠️ request.override(...) is temporary — it affects only the specific call currently being processed, not future calls or the agent definition itself.
  • ⚠️ @dynamic_prompt and @wrap_model_call look similar but aren't interchangeable: @dynamic_prompt only generates a prompt string; @wrap_model_call can change anything about the request.

See also: Runtime Context: Injecting User-Specific Data, Managing Conversation History: Summarization and Trimming

What You Need to Know First

What We'll Cover in This Article

  • A new kind of middleware that wraps the model call itself
  • How to generate a different system prompt depending on who's asking
  • How to restrict which tools are available depending on who's asking
  • How to switch the underlying model based on how long the conversation has run

What We'll Explain Along the Way

  • The difference between @dynamic_prompt and @wrap_model_call
  • What request.override(...) actually changes, and for how long

A New Kind of Middleware: Wrapping the Model Call

The @before_agent middleware from Article 12 runs once, before the agent starts processing a new message. @wrap_model_call is more specific: it wraps the moment the agent is actually about to call the model, letting you inspect or change the request right before that happens — then you explicitly call handler(request) to let it continue.

Think of it like a manager who reviews a request right before it goes out, with the authority to adjust some of the details — but only for that one specific request, not as a permanent policy change. That's exactly what request.override(...) does: it modifies settings for the call currently being processed, and nothing beyond it.

We'll build up three uses of this pattern, in order of complexity, all using the same context shape so each section extends the last rather than starting over.

Dynamic Prompts: Different Tone for Different Users

Let's start with the simplest version: @dynamic_prompt, a decorator specifically for generating a system prompt based on context.

# Purpose: Define context distinguishing internal colleagues from external clients
# Context: Extends the context pattern from Article 7 with a new field
# Input: N/A — this defines a shape
# Output: A reusable context class

from dotenv import load_dotenv
load_dotenv()

from dataclasses import dataclass

@dataclass
class UserRole:
user_role: str = "external" # "internal" or "external"
# Purpose: Generate a different system prompt depending on user_role
# Context: @dynamic_prompt only needs to return a string — the simplest
# form of dynamic middleware
# Input: The current request, including its context
# Output: A system prompt string, chosen based on who's asking

from langchain.agents.middleware import dynamic_prompt, ModelRequest

@dynamic_prompt
def aria_dynamic_prompt(request: ModelRequest) -> str:
"""Generate Aria's system prompt based on whether the user is internal or external."""
user_role = request.runtime.context.user_role

if user_role == "internal":
return (
"You are Aria, an internal assistant for Beacon & Co. staff. "
"You can be direct and technical — your audience knows the business."
)
else:
return (
"You are Aria, representing Beacon & Co. to external contacts. "
"Be warm, professional, and careful not to share internal details."
)
# Purpose: See the prompt change based on context, with no other code different
# Context: Same agent, same question, different context value
# Input: A simple greeting
# Output: A response in a noticeably different tone depending on user_role

from langchain.agents import create_agent
from langchain.messages import HumanMessage

agent = create_agent(
model="gpt-5-nano",
context_schema=UserRole,
middleware=[aria_dynamic_prompt],
)

response = agent.invoke(
{"messages": [HumanMessage(content="Hi, can you help me with something?")]},
context=UserRole(user_role="external"),
)
print(response["messages"][-1].content)

Try the same call again with UserRole(user_role="internal") and compare — same agent, same question, genuinely different tone, entirely from context.

Dynamic Tools: Restricting Access Based on Who's Asking

A different tone is one thing. Restricting capabilities is the more consequential use case — recall Article 9's SQL tool, which can query Beacon & Co.'s internal catalog. An external client almost certainly shouldn't have that same access. This needs @wrap_model_call, since it changes something beyond just the prompt:

# Purpose: Restrict which tools are available based on user_role
# Context: External users get web_search only; internal users get SQL access too
# Input: The current request, including context
# Output: A request with tools overridden based on who's asking

from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from typing import Callable

@wrap_model_call
def dynamic_tool_access(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelResponse:
"""Restrict tool access based on whether the user is internal or external."""
user_role = request.runtime.context.user_role

if user_role == "internal":
pass # internal users keep access to everything already configured
else:
request = request.override(tools=[web_search]) # external: web search only

return handler(request)

Before wiring this in — predict what happens if an external user asks a question that genuinely requires the SQL catalog data, like "How many tracks does our top artist have?" Take a moment.

...

Aria won't be able to answer accurately — sql_query simply isn't in her available tools for this request, so she'll either say she can't access that information, or answer using only web_search, which won't have Beacon & Co.'s actual internal data. That's the restriction working as intended, not a bug.

# Purpose: Confirm the restriction actually works
# Context: Same question, two different contexts
# Input: A question requiring internal catalog data
# Output: A capable answer for "internal", a limited one for "external"

agent = create_agent(
model="gpt-5-nano",
tools=[web_search, sql_query], # both tools configured by default
middleware=[dynamic_tool_access],
context_schema=UserRole,
)

question = HumanMessage(content="How many tracks does our top artist have in the catalog?")

# External: should NOT be able to access sql_query
response_external = agent.invoke({"messages": [question]}, context=UserRole(user_role="external"))
print(response_external["messages"][-1].content)

# Internal: SHOULD be able to access sql_query
response_internal = agent.invoke({"messages": [question]}, context=UserRole(user_role="internal"))
print(response_internal["messages"][-1].content)

Notice that tools=[web_search, sql_query] is passed to create_agent as usual — the middleware is what narrows that list down for specific calls, not the original configuration itself.

Dynamic Models: Escalating to a Stronger Model for Long Conversations

The last piece doesn't depend on context at all — it depends on state: specifically, how long the conversation has gotten. A short, simple exchange might be handled perfectly well by an efficient, inexpensive model. A long, complicated thread might benefit from a more capable one.

# Purpose: Switch models based on how long the conversation has run
# Context: Uses request.messages (a shortcut for request.state["messages"]),
# not context — conversation length is state, not a fixed per-user fact
# Input: The current request, including its message history
# Output: A request with the model overridden based on conversation length

from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from langchain.chat_models import init_chat_model
from typing import Callable

standard_model = init_chat_model("gpt-5-nano")
capable_model = init_chat_model("gpt-5") # a more capable model, same provider

@wrap_model_call
def escalate_for_long_conversations(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelResponse:
"""Use a more capable model once a conversation has gotten long."""
message_count = len(request.messages) # shortcut for request.state["messages"]

if message_count > 10:
request = request.override(model=capable_model)
else:
request = request.override(model=standard_model)

return handler(request)
# Purpose: Confirm the model actually changes based on conversation length
# Context: A short conversation vs. a long one, same agent
# Input: Two different message lists
# Output: response_metadata showing different model names were actually used

agent = create_agent(
model="gpt-5-nano",
middleware=[escalate_for_long_conversations],
system_prompt="You are Aria, a helpful assistant for Beacon & Co.",
)

short_response = agent.invoke(
{"messages": [HumanMessage(content="What time is it?")]}
)
print(short_response["messages"][-1].response_metadata["model_name"])

long_conversation = [HumanMessage(content=f"Question {i}") for i in range(11)]
long_response = agent.invoke({"messages": long_conversation})
print(long_response["messages"][-1].response_metadata["model_name"])

The two model_name values should differ — confirming the model itself was swapped mid-flight, based purely on how many messages were already in the conversation, with no change to how create_agent was originally configured.

💡 We used two models from the same provider here to avoid requiring a second API key. If you want to mix providers entirely (say, an OpenAI model for routine work and a different provider's model for escalation), you'd set up that provider's API key the same way you did for OpenAI in Article 1.

Common Misconceptions

❌ Misconception: request.override(...) permanently changes the agent

Reality: override() only affects the specific request currently being processed by that middleware function. The agent's original configuration — whatever was passed to create_agent — is untouched, and the next call starts fresh from that original configuration again.

Why this matters: You don't need to "reset" anything between calls — each call independently goes through the middleware and gets its own override decision, based on that call's own context and state.

❌ Misconception: @dynamic_prompt and @wrap_model_call are interchangeable

Reality: @dynamic_prompt is a narrower, simpler tool — it only generates a system prompt string. @wrap_model_call is general-purpose — it can override tools, model, prompt, or anything else about the request.

Why this matters: If you only need to change the prompt, @dynamic_prompt is simpler and clearer about intent. Reach for @wrap_model_call when you need to change something beyond just the prompt text.

Troubleshooting Common Issues

Problem: AttributeError accessing request.runtime.context

Symptoms: Middleware reading context fails with an attribute error.

Common Causes:

  1. context_schema wasn't set on create_agent, so there's no context shape to read (most common)
  2. No context=... was passed to invoke(), and the dataclass has no usable default

Solution: Confirm context_schema=UserRole (or your equivalent) is set on create_agent, and that a context=... value is passed to every invoke() call that needs it.

Problem: The model never seems to switch in the dynamic models example

Symptoms: response_metadata["model_name"] is the same regardless of conversation length.

Common Causes:

  1. The message count threshold (> 10) wasn't actually crossed in your test
  2. The middleware wasn't included in the middleware=[...] list passed to create_agent

Solution: Double-check your test conversation actually exceeds the threshold, and confirm the middleware function is present in middleware=[...].

Check Your Understanding

Quick Quiz

  1. What's the key difference between @dynamic_prompt and @wrap_model_call?

    Show Answer

    @dynamic_prompt only generates a system prompt string. @wrap_model_call is general-purpose and can override anything about the request — tools, model, prompt, or more — before the model is actually called.

  2. Does request.override(tools=[web_search]) change what tools are available in future, separate conversations?

    Show Answer

    No — the override only applies to the specific call currently being processed. The agent's original tool configuration is untouched, and future calls go through the same middleware logic fresh, based on their own context.

  3. Why does the dynamic-model example use request.messages (state) instead of request.runtime.context?

    Show Answer

    Because conversation length is information that accumulates during the conversation — that's state, not a fixed per-user fact set by your code in advance. Context wouldn't naturally capture something that changes as messages are exchanged.

Hands-On Exercise

Challenge: Combine all three patterns into one agent: dynamic prompt based on user_role, dynamic tools based on user_role, and dynamic model escalation based on conversation length — all as separate middleware functions in the same middleware=[...] list.

Show Solution
agent = create_agent(
model="gpt-5-nano",
tools=[web_search, sql_query],
context_schema=UserRole,
middleware=[
aria_dynamic_prompt,
dynamic_tool_access,
escalate_for_long_conversations,
],
)

Explanation: Middleware functions in the list are independent — each handles its own concern (prompt, tools, model) without needing to know about the others, which is exactly why combining them is just a matter of listing all three.

Summary: Key Takeaways

  • @wrap_model_call wraps the moment right before a model is actually called, letting middleware inspect or change the request first
  • @dynamic_prompt is a simpler, narrower decorator just for generating prompts dynamically
  • request.override(...) changes settings for one call only — never the agent's permanent configuration
  • Dynamic prompts and dynamic tools typically read from context (fixed, set by your code); dynamic models in this example read from state (accumulates during the conversation)
  • Aria can now adapt her tone, her available tools, and even her underlying model — all automatically, based on who's asking and what's happened so far

Version Information

Tested with:

  • Python: >=3.10, <4.0
  • langchain: >=1.1.3 (latest stable as of writing: 1.3.4) — wrap_model_call, dynamic_prompt, and init_chat_model are all part of core langchain

Known issues:

  • ⚠️ If mixing model providers (not just model names within one provider), you'll need that provider's separate API key, set up the same way as Article 1's OpenAI key.

What's Next?

You now understand how to make every part of an agent's behavior — prompt, tools, and even model — adapt dynamically, based on context and state.

The natural next step is Capstone: A Production-Ready Authenticated Email Assistant — the final article in this series, where everything from prompting to dynamic middleware to human-in-the-loop comes together into one complete, realistic version of Aria.

References