1. Introduction
RAAF Core is the foundation of the Ruby AI Agents Factory. It provides the essential classes and patterns for building AI agents. Understanding these core concepts is crucial for building effective AI systems.
1.1. The Agent-Based Architecture Philosophy
Most AI frameworks treat agents as black boxes—you provide a prompt and receive a response with little control over the process. RAAF takes a different approach by decomposing AI systems into discrete, composable components that can be understood, tested, and modified independently.
This architectural approach mirrors proven patterns from software engineering: just as web applications are built with HTTP, routing, and MVC patterns, AI systems benefit from structured patterns around agents, tools, and memory management. These patterns aren't just implementation details—they're the architectural foundations that determine system maintainability and effectiveness.
1.2. Core Architectural Concepts
Separation of Concerns: Each component handles a specific aspect of AI system operation. Agents define behavior, runners manage execution, tools provide capabilities, and memory handles context. This separation enables independent development, testing, and modification of each component.
Stateless Design: Agents are stateless templates that define behavior without maintaining conversation state. This design enables concurrent execution, simplified testing, and better scalability.
Composability: Components can be combined and recombined to create different system behaviors. The same agent can be used with different tools, different memory configurations, and different providers.
Explicit State Management: Rather than hiding state in agent internals, RAAF makes conversation state explicit and manageable through the runner component.
1.3. Core Design Principles
Orthogonality: Each component has a single, well-defined responsibility. The memory system operates independently of the tool system, providers can be swapped without affecting agents, and tool implementations can change without agent modifications. This separation of concerns enables rapid iteration and effective debugging.
Reversibility: Design decisions should be easily changeable. Provider selection, tool implementations, and memory strategies can be modified without system-wide changes. This flexibility provides insurance against changing requirements and evolving AI capabilities.
Incremental Development: Start with simple, working systems and add complexity gradually. A basic agent with one tool provides immediate value and serves as a foundation for more sophisticated capabilities. This approach reduces risk and accelerates development.
Explicit Error Handling: Problems are surfaced early and clearly through structured error responses. Tools return standardized error information, agents validate inputs, and memory systems handle context overflow gracefully. This explicit approach prevents mysterious failures and enables effective debugging.
Predictable Behavior: System behavior should be deterministic and testable. Agents behave consistently across different contexts, tools produce reliable outputs, and memory management follows predictable patterns.
1.4. Core Components and Their Interactions
Agent: The behavioral template that defines how an AI assistant should operate. Agents specify instructions, available tools, and interaction patterns, but maintain no conversation state. This stateless design enables reuse across multiple concurrent conversations.
Runner: The execution engine that manages conversation state and coordinates between agents and AI providers. Runners handle context management, tool execution, and response generation while maintaining isolation between different conversations.
Tools: The interface layer that enables agents to interact with external systems. Tools provide standardized ways for agents to access databases, APIs, file systems, and other resources. They abstract implementation details while providing reliable, testable interfaces.
Memory: The context management system that handles conversation history, token limits, and state persistence. Memory systems can employ different strategies for context retention, from simple sliding windows to sophisticated semantic retrieval.
Providers: The abstraction layer over AI model APIs. Providers normalize different AI service interfaces (OpenAI, Claude, etc.) into a consistent interface, enabling provider independence and flexible model routing.
1.5. Component Interaction Model
These components interact through well-defined interfaces that maintain separation of concerns. The runner coordinates between components without creating tight coupling, enabling independent modification and testing. This architecture provides practical benefits: you can modify memory strategies without affecting tool implementations, swap providers without changing agent definitions, and add new tools without touching core execution logic.
2. Agent Architecture
2.1. Conceptual Foundation
An agent in RAAF represents a behavioral template for AI assistants—a specification of how an AI should behave, what tools it can use, and how it should interact with users. The fundamental insight is that agents are stateless templates, not stateful conversational entities.
2.2. The Template vs. Instance Model
Traditional AI systems often conflate the definition of behavior (what the AI should do) with the execution of that behavior (tracking a specific conversation). RAAF separates these concerns:
- Agent: Defines consistent behavior patterns, available tools, and interaction rules (stateless by design)
- Runner Instance: Manages specific conversation state and coordinates template execution
This separation enables one agent to power thousands of concurrent conversations while maintaining complete isolation between them.
2.3. Benefits of Stateless Design
Concurrent Execution: Multiple conversations can use the same agent simultaneously without interference or resource conflicts.
Predictable Testing: Each test execution starts with a clean behavioral template, eliminating test failures caused by residual state from previous executions.
Simplified Debugging: Problems are reproducible because conversation state is explicitly managed by the runner component rather than hidden within agent internals.
Flexible Composition: Agents can be combined with different tools, memory configurations, and providers without concern for internal state conflicts.
Horizontal Scaling: Agents can be distributed across multiple servers since they contain no conversation-specific state.
2.4. Architectural Implications
This stateless design has profound implications for system architecture. It enables microservice deployment patterns, simplifies load balancing, and eliminates the complex state synchronization typically required in distributed AI systems. The agent becomes a pure functional specification that can be cached, replicated, and executed anywhere without coordination overhead.
2.5. Understanding Statelessness
The stateless nature of agents means they contain no conversation-specific information. They define capabilities and behavior patterns but maintain no memory of previous interactions. This is analogous to a class definition in object-oriented programming—the class defines behavior, but instances maintain state.
What Agents Contain: - Behavioral instructions and personality - Available tool definitions and configurations - Response formatting and validation rules - Error handling and escalation patterns
What Agents Don't Contain: - Conversation history or context - User-specific information or preferences - Session state or temporary data - Execution metrics or performance data
2.6. Practical Implications
This architectural choice enables practical benefits at scale. A single agent can serve thousands of concurrent users without resource conflicts or cross-contamination. Customer service departments can use one agent definition across all representatives, ensuring consistent behavior while maintaining conversation isolation.
The stateless design also simplifies deployment and maintenance. Agents can be updated and deployed independently of running conversations, enabling continuous improvement without service interruption.
2.7. Agent Creation
Creating an agent involves defining three fundamental components: identity, behavior, and capabilities.
agent = RAAF::Agent.new(
name: "DataAnalyst",
instructions: "You are a data analyst...",
model: "gpt-4o"
)
2.8. Essential Agent Components
Identity (Name): Provides a unique identifier for the agent, crucial for multi-agent scenarios, tracing, and debugging. The name serves as a reference point for handoffs, monitoring, and system coordination.
Behavior (Instructions): Defines the agent's personality, expertise domain, and interaction patterns. Instructions shape how the agent interprets queries, formats responses, and handles various scenarios. Well-crafted instructions create consistent, predictable behavior across different contexts.
Capabilities (Model): Specifies the underlying AI model and its capabilities. Different models have different strengths, costs, and performance characteristics. The model choice affects response quality, speed, and operational costs.
2.9. Template Instantiation Philosophy
The agent serves as a behavioral specification that can be instantiated multiple times through different runners. Each instantiation maintains the same behavioral patterns while operating on different conversation contexts. This design enables:
- Consistent Behavior: All instances follow the same behavioral template
- Independent Execution: Each instance operates without affecting others
- Resource Efficiency: Templates can be shared across multiple conversations
- Simplified Management: Behavioral updates affect all instances uniformly
2.10. Design Considerations
Agents should be designed with reusability in mind. Instructions should be general enough to handle various scenarios within the domain while specific enough to provide consistent behavior. The agent becomes a behavioral contract that defines how the AI assistant will operate across different contexts.
Designing for Reusability: A well-designed agent can serve multiple use cases without modification. Consider a customer service agent - rather than creating separate agents for order inquiries, shipping questions, and return requests, design one agent with comprehensive instructions that covers all customer service scenarios. This approach reduces maintenance overhead and ensures consistent customer experience across different interaction types.
The key to reusable agent design lies in abstracting the core purpose while allowing flexibility in execution. Instructions should define the agent's expertise domain and behavioral guidelines without hardcoding specific responses or workflows. Tools provide the specific capabilities, context variables supply the runtime data, and the model selection determines the sophistication level - but the agent itself remains a stable, reusable component.
This reusability extends across different deployment contexts. The same agent definition can power a web chat interface, a mobile app, an API endpoint, or even voice interactions. By keeping agents stateless and context-agnostic, you create building blocks that compose into larger systems without modification.
2.11. Core Agent Properties
2.11.1. Name: System Identity
The agent name serves multiple architectural purposes beyond simple identification:
Multi-Agent Coordination: In systems with multiple agents, names enable handoffs, delegation, and coordination. Agents can reference each other by name for workflow management.
Tracing and Monitoring: Names provide context for logging, debugging, and performance monitoring. System administrators can track agent-specific metrics and behavior patterns.
Error Handling: Names help identify the source of issues in complex systems with multiple agents operating simultaneously.
name: "CustomerService" # Used for handoffs and tracing
instructions: "...",
model: "gpt-4o"
)
The name parameter provides a unique identifier for the agent that enables routing, monitoring, and debugging. The name becomes part of the agent's identity in multi-agent systems, and it's more than just a label—it's an architectural decision that affects how your system operates.
Names enable handoffs between agents. When your customer service agent realizes it needs legal expertise, it can transfer the conversation to a "LegalExpert" agent by name. This isn't just labeling—it's the foundation of agent coordination. Names also appear in your monitoring dashboards, error logs, and tracing systems. "Agent1" tells you nothing when debugging a production issue; "DatabaseQueryOptimizer" tells you exactly what went wrong and where to look. Good names reduce mean time to resolution.
Names matter because they enable handoffs. When your customer service agent realizes it needs legal expertise, it can transfer the conversation to a "LegalExpert" agent by name. Names also appear in your monitoring and debugging output, so choose them carefully. "Agent1" tells you nothing; "DatabaseQueryOptimizer" tells you exactly what went wrong when it fails.
A good naming convention might be: {Domain}{Role}
like "FinanceAnalyst", "ContentModerator", or "DataValidator". Be specific enough that your future self will understand the agent's purpose immediately.
2.11.2. Instructions
Instructions define the agent's behavior and personality. This is where you encode your domain expertise into the AI:
2.11.2.1. Effective Instruction Writing
Instructions are the primary way to encode domain knowledge into AI agents. Vague instructions lead to inconsistent behavior and poor user experiences.
The difference between effective and ineffective instructions is specificity and context:
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
ruby: /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-wxnxgq.rb:452: syntax errors found (SyntaxError) 450 | agent = RAAF::Agent.new( 451 | name: "CustomerServicePro", > 452 | instructions: <<~INSTRUCTIONS | ^~~~~~~~~~~~ unterminated heredoc; can't find string "INSTRUCTIONS" anywhere before EOF 453 | You are a senior customer service representative for TechCorp. 454 | ~~~~~~~ 490 | puts e.backtrace.first(3).join("\n") if e.backtrace 491 | exit 1 > 492 | end | ^ unexpected end-of-input, assuming it is closing the parent top level context | ^ unexpected end-of-input; expected a `)` to close the arguments > 493 | | ^ expected an `end` to close the `begin` statement
# Before: Vague and ineffective
agent = RAAF::Agent.new(
instructions: "Be a helpful customer service agent"
)
# After: Specific and effective
agent = RAAF::Agent.new(
name: "CustomerServicePro",
instructions: <<~INSTRUCTIONS
You are a senior customer service representative for TechCorp.
Your personality:
- Professional but warm (like a helpful colleague, not a robot)
- Patient with frustrated customers
- Solution-oriented rather than problem-focused
Your approach:
1. Acknowledge the customer's concern first
2. Ask clarifying questions if needed
3. Provide specific solutions, not generic advice
4. Always offer a next step
Example good response:
"I understand how frustrating that must be. Let me help you resolve this.
Can you tell me which version of the software you're using?"
Example bad response:
"Have you tried turning it off and on again?"
Never:
- Blame the customer
- Say "that's not possible" (say "let me find a solution" instead)
- End without offering next steps
INSTRUCTIONS,
model: "gpt-4o"
)
Specific instructions with examples and constraints produce more consistent and effective agent behavior.
2.11.2.2. What Makes Instructions Effective?
Role Definition: Don't just say what they are, say what level they are. "You are a customer service agent" vs. "You are a senior customer service representative with 10 years of experience"
Behavioral Examples: Show, don't just tell. Include examples of good and bad responses. AI models learn patterns from examples better than from abstract rules.
Specific Constraints: Instead of "be professional," say "use formal language but include personal touches like 'I understand' and 'Let me help'"
Domain Context: Include relevant background. If it's a technical product, mention that. If customers are typically frustrated, acknowledge that.
Test your instructions by asking the agent to explain its understanding of its role. If the explanation doesn't match your intent, refine the instructions.
2.11.3. Model
The AI model to use. RAAF supports many providers, and this choice has real-world implications for your application:
# OpenAI models
agent = RAAF::Agent.new(model: "gpt-4o") # Premium: high quality, higher cost
agent = RAAF::Agent.new(model: "gpt-4o-mini") # Balanced: good quality, lower cost
# Anthropic models
agent = RAAF::Agent.new(model: "claude-3-5-sonnet-20241022") # Excellent reasoning
# Groq models
agent = RAAF::Agent.new(model: "mixtral-8x7b-32768") # Fast inference, good for simple tasks
# LiteLLM universal access
agent = RAAF::Agent.new(model: "bedrock/claude-3") # Access to AWS Bedrock models
The model choice isn't just about capabilities—it's about trade-offs. GPT-4o gives you the best reasoning but costs more per token. GPT-4o-mini is fast and cheap but might struggle with complex logic. LiteLLM provides access to diverse providers and models through a universal interface.
Here's a practical approach: start with a premium model like GPT-4o during development. Once you understand your requirements, you can optimize by using cheaper models for simple tasks and reserving premium models for complex reasoning. RAAF's provider abstraction makes this kind of optimization trivial to implement.
2.11.4. Additional Configuration
agent = RAAF::Agent.new(
name: "Assistant",
instructions: "You are helpful",
model: "gpt-4o",
temperature: 0.7, # Control randomness
max_tokens: 1000, # Limit response length
top_p: 0.9 # Nucleus sampling
)
These parameters control fine-grained aspects of response generation that affect creativity, length, and consistency. Temperature controls the randomness of responses, where 0.0 produces deterministic output and 1.0 creates very creative responses. For customer service, you might want 0.3 for consistent, factual responses. For creative writing, 0.8 might be better. Max_tokens prevents runaway responses that could exhaust your token budget.
Start with defaults like temperature: 0.7 and max_tokens: 1000, then adjust based on your use case. Use lower temperature for factual tasks, higher for creative work. Set max_tokens based on your typical response length needs plus a buffer for complex queries.
3. The Runner Execution Model
The Runner is responsible for executing agents and managing conversation state. It coordinates between agents, tools, memory, and monitoring systems. But think of it as more than just a coordinator—it's the operating system for your AI interactions.
3.1. Why the Runner Pattern Is Necessary
Direct API calls to AI models create several problems in production systems:
Memory management: Each request starts fresh, losing conversation context and requiring users to repeat information.
Tool execution: Without coordination, tools can call each other recursively, creating infinite loops and unexpected costs.
Error handling: Failed requests have no recovery mechanism, leading to poor user experience.
State management: Concurrent requests can interfere with each other when sharing context improperly.
Monitoring: No visibility into what's happening during AI interactions makes debugging difficult.
The Runner pattern addresses these issues by providing a stateful execution environment that manages the complexities of AI interactions.
3.2. What the Runner Actually Does
The runner is your AI system's operating system. Here's what it manages:
- Conversation State: Maintains context across multiple turns
- Tool Orchestration: Ensures tools are called safely and efficiently
- Memory Management: Handles context windows and conversation history
- Error Recovery: Retries failed requests, handles provider outages
- Monitoring Integration: Tracks every interaction for debugging
- Multi-Agent Coordination: Manages handoffs between specialists
Think of it this way: The agent is the "what" (what should the AI do?), the runner is the "how" (how do we make it happen reliably?).
3.3. Basic Usage
agent = RAAF::Agent.new(
name: "Assistant",
instructions: "You are helpful",
model: "gpt-4o"
)
runner = RAAF::Runner.new(agent: agent)
result = runner.run("Hello!")
puts result.messages.last[:content]
This code creates a complete AI conversation system with just four lines of code. The agent defines the behavior and capabilities, the runner manages the execution and state, and the result provides structured access to everything that happened during the conversation.
The runner serves as the stateful coordinator that bridges your stateless agent with the realities of AI provider APIs. It handles message formatting, conversation context, error recovery, and result processing—all the messy details that would otherwise clutter your application code. The runner encapsulates the complexity of AI interactions while exposing a simple interface. You call run()
with a message and get a structured result, while behind the scenes the runner handles provider-specific formatting, context management, and error recovery.
This simple example hides a lot of complexity. Behind the scenes, the runner is:
- Formatting the message for the specific AI provider
- Managing the conversation context (system message, user message, etc.)
- Handling the API call with proper error handling and retries
- Processing the response and updating the conversation state
- Returning a structured result that your application can work with
The beauty of this abstraction is that all this complexity is hidden. You just call run()
and get a result. But the abstraction is also reversible—you can dig deeper when you need to.
3.4. Runner Configuration
runner = RAAF::Runner.new(
agent: agent,
max_turns: 10, # Limit conversation turns
stream: false, # Enable/disable streaming
parallel: true # Enable parallel tool execution
)
These configurations control runtime behavior that affects performance, user experience, and resource usage. Max_turns prevents infinite loops in multi-turn conversations. If an agent gets stuck in a reasoning loop or keeps asking clarifying questions, max_turns provides a safety net. It's better to gracefully terminate than to exhaust your token budget.
When streaming is enabled, it sends partial responses as they're generated, improving perceived performance. Users see progress immediately rather than waiting for complete responses. This is crucial for long-form content or complex reasoning tasks. Parallel execution enables multiple tool calls to run simultaneously, reducing total response time. If an agent needs to check both weather and tRAAFic, parallel execution runs both calls concurrently rather than sequentially.
3.5. Result Object
The runner returns a RAAF::Result
object containing:
result = runner.run("Hello!")
# Access messages
result.messages.each do |message|
puts "#{message[:role]}: #{message[:content]}"
end
# Check completion status
puts "Success: #{result.success?}"
puts "Agent: #{result.agent.name}"
# Usage statistics
puts "Tokens used: #{result.usage}"
The result object provides structured access to conversation outcomes, including messages, status, and usage metrics. It's your window into what happened during the AI interaction.
The messages array contains the complete conversation history, including system messages, user input, and AI responses. This is crucial for debugging, auditing, and understanding conversation flow. Token usage directly correlates to cost. By monitoring usage patterns, you can optimize prompts, identify expensive operations, and implement cost controls. This data is essential for production deployments where costs matter.
The success flag indicates whether the conversation completed normally or encountered errors. Failed conversations might indicate model issues, rate limiting, or malformed requests that need handling.
3.6. Multiple Turns
For multi-turn conversations, the runner maintains context:
runner = RAAF::Runner.new(agent: agent)
# Turn 1
result1 = runner.run("My name is Alice")
puts result1.messages.last[:content]
# Turn 2 - agent remembers Alice's name
result2 = runner.run("What's my name?")
puts result2.messages.last[:content]
This demonstrates persistent conversation memory across multiple interactions. The runner maintains context between turns, enabling natural conversation flow. Without memory, every interaction would be independent, making the AI feel robotic and frustrating. Context persistence enables the AI to reference previous information, build on earlier responses, and maintain conversational coherence.
Each call to run()
adds to the conversation history stored in the runner. The runner automatically includes this history in subsequent API calls, so the AI model has access to the complete conversation context. Long conversations consume more tokens and eventually hit context limits. RAAF provides memory management strategies to handle this gracefully, but understanding the trade-offs is important for production applications.
4. Tool System
Tools enable agents to interact with external systems, databases, APIs, and perform computations. But here's the key insight: tools are how you bridge the gap between AI reasoning and real-world action.
4.1. Why Tools Are Essential for AI Agents
Without tools, AI agents can only provide information and advice. They cannot take actions, access real-time data, or integrate with existing systems.
Before tools: Agents can discuss solutions but cannot implement them. Users must take actions manually based on the agent's recommendations.
With tools: Agents can look up information, perform calculations, update databases, send notifications, and interact with external APIs.
This capability gap is the difference between a chatbot and a digital assistant. Tools enable agents to complete tasks rather than just discussing them.
4.2. What Makes a Good Tool?
Focused Purpose: Each tool does one thing well
- Bad:
manage_customer()
that does everything - Good:
lookup_order()
,cancel_order()
,process_refund()
Safe by Design: Tools validate inputs and limit actions
- Never expose raw database access
- Always validate permissions
- Return structured errors, not stack traces
Self-Documenting: The AI needs to understand when and how to use it
- Clear parameter names
- Descriptive return values
- Helpful error messages
4.3. How Tools Become Your Safety Net
Tools aren't just about functionality—they're about control. Instead of giving an AI model unlimited access to your systems, you create specific, controlled interfaces. It's like the difference between giving someone your credit card versus giving them a gift card to a specific store.
This is defensive programming at its finest. Each tool is a controlled entry point that:
- Validates inputs before taking action
- Enforces business rules and permissions
- Provides audit trails of what was done
- Limits the blast radius of potential mistakes
4.4. Creating Tools
4.4.1. Method-Based Tools
The simplest way to create tools is from Ruby methods. This approach leverages Ruby's strengths—you're just writing normal methods that happen to be callable by AI:
def get_weather(location)
# Simulate weather API call
"Weather in #{location}: 72°F, sunny"
end
agent = RAAF::Agent.new(
name: "WeatherBot",
instructions: "Help users get weather information",
model: "gpt-4o"
)
agent.add_tool(method(:get_weather))
This code demonstrates the most straightforward way to give an AI agent the ability to retrieve weather information. The get_weather
method is just a regular Ruby method that takes a location parameter and returns a weather description. The magic happens when you call agent.add_tool(method(:get_weather))
– this converts your Ruby method into something the AI can understand and call.
Behind the scenes, RAAF examines your method signature and automatically generates a JSON schema that describes the tool to the AI model. When the AI decides to use this tool, RAAF handles the parameter serialization, method invocation, and result formatting. From your perspective, you're just writing normal Ruby code, but the AI sees it as a structured tool with clear inputs and outputs.
The beauty of this approach is testability. Since your tool is just a Ruby method, you can test it independently of the AI system. Want to verify that your weather lookup works? Just call get_weather("New York")
in your test suite. No mocking, no complex setup—just regular Ruby methods.
This simplicity is deceptive. Behind the scenes, RAAF is introspecting your method signature, generating JSON schemas for the AI model, and handling the serialization of parameters and results. You write Ruby code; RAAF handles the AI integration.
The method-based approach also means you can test your tools independently of the AI system. Want to verify that your weather lookup works? Just call get_weather("New York")
in your test suite. No mocking, no complex setup—just regular Ruby methods.
4.4.2. Keyword Arguments
Tools with complex parameters use keyword arguments:
def send_email(to:, subject:, body:, priority: 'normal')
# Send email logic
{
status: 'sent',
message_id: 'msg_123',
timestamp: Time.now
}
end
agent.add_tool(method(:send_email))
# Agent can now call:
# send_email(to: "[email protected]", subject: "Hello", body: "Hi there!")
This example shows how to create tools with more complex parameter requirements. The send_email
method uses keyword arguments to clearly define what information is required and what has sensible defaults. The AI model will understand that to
, subject
, and body
are required parameters, while priority
is optional and defaults to 'normal' if not specified.
The method returns a structured hash with status information, which the AI can use to understand whether the operation succeeded and potentially communicate results back to the user. This structured approach to tool responses is crucial for building reliable AI workflows – the AI needs to know not just that something happened, but whether it succeeded and what the results were.
4.4.3. Class-Based Tools
For more complex tools, use classes:
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-h1d0p5.rb:460:in '<main>'
class DatabaseTool
def initialize(connection)
@db = connection
end
def call(query:, limit: 100)
results = @db.execute(query).limit(limit)
{
rows: results.to_a,
count: results.count,
execution_time: @db.last_execution_time
}
end
end
db_tool = DatabaseTool.new(database_connection)
agent.add_tool(db_tool.method(:call))
When your tools need to maintain state or require complex initialization, class-based tools provide a clean solution. This database tool example shows how to encapsulate database connection management within the tool class itself. The constructor takes a database connection, and the call
method provides a safe interface for executing queries with built-in result limiting.
The structured return value includes not just the query results, but also metadata like row count and execution time. This additional information helps the AI understand the scope and performance characteristics of the query, which can be useful for generating appropriate responses to users.
4.4.4. Lambda Tools
For inline tools:
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-lwfnv1.rb:448:in '<main>'
get_time = ->(timezone: 'UTC') do
Time.now.in_time_zone(timezone).strftime("%I:%M %p %Z")
end
agent.add_tool(get_time)
For simple tools that don't warrant a full method definition, lambda expressions provide a concise alternative. This time tool uses a lambda to create a quick timezone-aware time formatter. The lambda accepts a timezone parameter with a sensible default, and returns a formatted time string.
Lambda tools are particularly useful for simple transformations or calculations that you want to expose to the AI without the overhead of defining a full method. They're perfect for utility functions that you might otherwise inline in your code.
4.5. Tool Parameters and Documentation
RAAF automatically extracts parameter information from method signatures:
def book_flight(
origin:, # Required parameter
destination:, # Required parameter
departure_date:, # Required parameter
passengers: 1, # Optional with default
class_type: 'economy' # Optional with default
)
# Implementation
end
This method signature demonstrates how RAAF automatically understands parameter requirements from your Ruby code. Parameters without default values become required in the AI's understanding of the tool, while parameters with defaults become optional. The AI model will know that it must provide origin, destination, and departure_date, but can optionally specify passengers and class_type.
This automatic parameter extraction eliminates the need for separate API documentation – your method signature becomes the tool's specification. The AI understands not just what parameters are available, but which ones are required and what the default values are for optional parameters. end
agent.add_tool(method(:book_flight)) ```
For better documentation, use comments:
# Books a flight for the specified passengers
# @param origin [String] Departure airport code (e.g., 'LAX')
# @param destination [String] Arrival airport code (e.g., 'JFK')
# @param departure_date [String] Departure date in YYYY-MM-DD format
# @param passengers [Integer] Number of passengers (default: 1)
# @param class_type [String] Seat class: 'economy', 'business', or 'first'
# @return [Hash] Flight booking confirmation details
def book_flight(origin:, destination:, departure_date:, passengers: 1, class_type: 'economy')
# Implementation
end
4.6. Tool Error Handling
Tools should handle errors gracefully:
def get_stock_price(symbol:)
begin
api = StockAPI.new
price_data = api.get_current_price(symbol.upcase)
{
symbol: symbol.upcase,
price: price_data[:current_price],
currency: price_data[:currency],
last_updated: price_data[:timestamp]
}
rescue StockAPI::InvalidSymbolError
{ error: "Invalid stock symbol: #{symbol}" }
rescue StockAPI::APIError => e
{ error: "Unable to fetch stock data: #{e.message}" }
rescue => e
{ error: "Unexpected error: #{e.message}" }
end
end
agent.add_tool(method(:get_stock_price))
4.7. Async Tools
For long-running operations:
def analyze_large_dataset(dataset_url:)
# Start background job
job_id = DataAnalysisJob.perform_async(dataset_url)
{
job_id: job_id,
status: 'started',
estimated_completion: 5.minutes.from_now,
status_url: "/api/jobs/#{job_id}/status"
}
end
def check_job_status(job_id:)
job = DataAnalysisJob.find(job_id)
{
job_id: job_id,
status: job.status,
progress: job.progress_percentage,
result: job.completed? ? job.result : nil,
error: job.failed? ? job.error_message : nil
}
end
agent.add_tool(method(:analyze_large_dataset))
agent.add_tool(method(:check_job_status))
5. Memory Management
5.1. Why Memory Management Is Critical
Memory management becomes essential when conversations exceed model context windows. Without proper memory handling, AI agents lose access to conversation history, leading to:
Context loss: The agent forgets previous interactions and asks users to repeat information.
Inconsistent responses: Without full context, the agent may contradict earlier statements.
Poor user experience: Users expect agents to remember what they've discussed throughout the conversation.
Degraded performance: The agent cannot build on previous context to provide relevant responses.
Context window limits are hard constraints that require proactive management to maintain conversation quality.
5.2. What Is Context Window Management?
Think of the context window like a whiteboard during a meeting:
- Fixed Size: You can only fit so much on the board
- Everything Counts: Every word takes up space
- Old Stuff Must Go: To add new info, you need to erase something
But unlike a whiteboard, you can't just erase random parts. You need intelligent strategies:
- Sliding Window: Keep the most recent N messages
- Importance-Based: Keep critical information, drop small talk
- Summarization: Compress old conversations into summaries
- Semantic Selection: Use embeddings to keep relevant context
5.3. How RAAF Handles Memory
RAAF provides sophisticated memory management to handle conversation context and long-running interactions. But memory management in AI systems is fundamentally different from traditional applications.
In a typical web application, you might cache database queries or memoize expensive computations. In AI systems, you're managing context—the conversation history that determines what the AI knows and remembers about the current interaction.
Context is a finite resource. Every AI model has a maximum context window—the amount of conversation history it can process at once. As conversations get longer, you hit this limit. The naive approach is to just truncate old messages, but that loses valuable context. The smart approach is to manage memory strategically.
Think of it like managing RAM in a computer: you keep frequently accessed data in fast storage, archive less important data, and have strategies for what to do when you're running out of space.
5.4. The Context Window Problem
Every AI model has a maximum context window. GPT-4o can handle about 128K tokens, which sounds like a lot until you realize that a typical conversation with a few tool calls can easily consume 10K tokens. A day-long customer service conversation might need 100K tokens or more.
When you hit the context limit, you have several options:
- Truncate ruthlessly: Drop old messages (loses context)
- Summarize intelligently: Compress old messages into summaries (preserves key information)
- Select strategically: Keep only the most relevant messages (maintains focus)
RAAF supports all these strategies, and you can combine them based on your needs.
5.5. Memory Stores: Where Your AI's Memories Live
Here's a question that cost us $12,000 to answer: "Where should we store conversation history?"
We started simple—just kept everything in memory. Then our server restarted and wiped out 10,000 active customer conversations. The support tickets were... memorable.
5.5.1. The Three Memory Store Personalities
1. In-Memory Store: The Speed Demon
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
<internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': cannot load such file -- raaf (LoadError) from <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require' from /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-yledm2.rb:444:in '<main>'
require 'raaf'
# Fast but forgetful
memory_manager = RAAF::Memory::MemoryManager.new(
store: RAAF::Memory::InMemoryStore.new(
capacity: 10_000, # Max conversations
ttl: 3600 # Expire after 1 hour
),
max_tokens: 4000
)
runner = RAAF::Runner.new(
agent: agent,
memory_manager: memory_manager
)
Real-world use cases:
- Demo environments (who cares if it forgets?)
- Stateless chat widgets (each page refresh = new conversation)
- High-frequency trading bots (speed > persistence)
- Development and testing (restart often)
The brutal truth: In-memory is seductive because it's so fast. But the first time you lose a critical conversation, you'll wish you'd chosen differently.
2. File Store: The Reliable Workhorse
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NoMethodError: undefined method 'join' for an instance of String /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-xmyeo0.rb:475:in '<main>'
# The improved version we use in production
class ProductionFileStore < RAAF::Memory::FileStore
def initialize(config)
super(
directory: config[:base_dir],
# Organize by date for easier management
path_pattern: '%{year}/%{month}/%{day}/%{session_id}.json',
# Compress old conversations
compression: :gzip,
# Encrypt sensitive data
encryption: {
algorithm: 'AES-256-GCM',
key: ENV['CONVERSATION_ENCRYPTION_KEY']
}
)
end
def store_message(session_id, message)
# Add metadata for debugging
message[:stored_at] = Time.now
message[:server_version] = RAAF::VERSION
super(session_id, message)
# Async backup to S3
BackupWorker.perform_async(session_id) if production?
end
end
memory_manager = RAAF::Memory::MemoryManager.new(
store: ProductionFileStore.new(
base_dir: Rails.root.join('storage/conversations'),
retention_days: 90 # GDPR compliance
),
max_tokens: 4000
)
Why file storage isn't as simple as it seems:
- File locking: Two processes writing = corrupted data
- Directory limits: Some filesystems choke at 10k files per directory
- Search performance: Finding a conversation from last month? Good luck
- Backup complexity: Copying millions of small files is painful
3. Vector Store: The Smart Archive
# This is where things get interesting
class IntelligentVectorStore < RAAF::Memory::VectorStore
def initialize
super(
# Use OpenAI for embeddings
embedding_model: 'text-embedding-3-small',
# PostgreSQL with pgvector for production
database: {
adapter: 'postgresql',
extension: 'vector',
pool: 25
},
# Automatic clustering for similar conversations
clustering: {
enabled: true,
algorithm: 'hnsw',
min_cluster_size: 5
}
)
end
def find_similar_conversations(query, limit: 5)
embedding = generate_embedding(query)
# Find semantically similar past conversations
results = database.execute(<<-SQL)
SELECT session_id, message,
1 - (embedding <=> '#{embedding}') as similarity
FROM conversation_messages
WHERE embedding <=> '#{embedding}' < 0.3
ORDER BY embedding <=> '#{embedding}'
LIMIT #{limit}
SQL
# Use similar conversations as context
results.map do |row|
{
session_id: row['session_id'],
message: row['message'],
similarity: row['similarity'],
learned_context: extract_key_facts(row['message'])
}
end
end
end
# Now your AI can say: "I see you had a similar issue last month..."
memory_manager = RAAF::Memory::MemoryManager.new(
store: IntelligentVectorStore.new,
max_tokens: 8000,
# Hybrid approach: recent + relevant
retrieval_strategy: :hybrid,
recent_messages: 10,
relevant_messages: 5
)
Vector store superpowers:
- Find all conversations about "refunds" even if they never used that word
- Identify patterns across thousands of support tickets
- Build knowledge from past conversations
- "This customer asked about X before, here's what worked"
The hidden costs:
- Embedding generation: ~$0.0001 per message (adds up fast)
- Database requirements: PostgreSQL with pgvector or similar
- Query complexity: Vector math isn't intuitive
- Index maintenance: Rebuilding can take hours
5.5.2. Choosing Your Memory Store: A Decision Framework
# Here's how we help clients choose
def recommend_memory_store(requirements)
case
when requirements[:conversations_per_day] < 100 &&
requirements[:persistence_needed] == false
:in_memory
when requirements[:compliance_requirements].any? ||
requirements[:audit_trail_needed]
:file_store # Easier to audit and backup
when requirements[:conversations_per_day] > 10_000 ||
requirements[:semantic_search_needed]
:vector_store
else
:file_store # Safe default
end
end
5.6. Memory Strategies: The Art of Forgetting Intelligently
Context window management requires strategic decisions about what information to preserve and what to discard.
In long conversations, important details can be lost when context windows overflow. This can lead to inconsistent or inappropriate responses based on incomplete information.
Effective memory management requires strategic selection of what information to retain and what to discard.
5.6.1. Memory Management Strategies
1. Sliding Window: The Goldfish Memory
# The naive approach we all start with
memory_manager = RAAF::Memory::MemoryManager.new(
store: store,
max_tokens: 4000,
pruning_strategy: :sliding_window,
# Keep at least this many recent messages
min_messages_to_keep: 4,
# Always preserve system message
preserve_messages: [:system, :first_user]
)
# What actually happens in memory:
# [System] You are a helpful assistant...
# [User] Help me plan a wedding (removed)
# [AI] I'd be happy to help! (removed)
# [User] Budget is $50k (removed)
# [AI] Great! Let's start... (removed)
# [User] Actually make it $30k <- Kept
# [AI] No problem, adjusting... <- Kept
# [User] What about flowers? <- Kept
# [AI] For a $30k budget... <- Kept
When it works brilliantly:
- Customer service ("Reset my password")
- Simple Q&A ("What's your return policy?")
- Transactional interactions
- Demo environments
When it fails spectacularly:
- Complex troubleshooting (forgets the original problem)
- Multi-step processes (loses critical early decisions)
- Relationship building (forgets personal details)
The $50,000 lesson: A sales bot using sliding window forgot the customer mentioned their $50k budget early in the conversation. It kept pushing enterprise plans. Customer got frustrated and left.
2. Summarization: The Meeting Minutes
# The sophisticated approach that actually works
class ProductionSummarizer
def initialize
@memory_manager = RAAF::Memory::MemoryManager.new(
store: store,
max_tokens: 4000,
pruning_strategy: :summarization,
# Summarize when we hit 80% capacity
summarization_threshold: 0.8,
# Use fast, cheap model for summaries
summary_model: 'gpt-4o-mini',
# But use smart prompts
summary_prompt: <<~PROMPT
Summarize this conversation, preserving:
1. Key decisions made
2. Important numbers (budgets, dates, quantities)
3. User preferences and constraints
4. Current task status
5. Unresolved questions
Format: Brief bullet points
Max length: 500 tokens
PROMPT
)
end
def smart_summarize(messages)
# Group messages into logical chunks
chunks = group_by_topic(messages)
# Summarize each chunk separately
summaries = chunks.map do |chunk|
{
topic: chunk[:topic],
summary: summarize_chunk(chunk[:messages]),
key_facts: extract_facts(chunk[:messages])
}
end
# Combine into structured summary
{
conversation_summary: combine_summaries(summaries),
key_facts: summaries.flat_map { |s| s[:key_facts] },
open_topics: extract_open_questions(messages)
}
end
end
Real success story: A legal assistant bot helping with contract review. 8-hour conversation, 200+ pages discussed. The summarization strategy preserved:
- Every clause the user flagged as concerning
- All agreed changes
- Reasoning for each decision
- Outstanding questions for legal counsel
The partner was amazed the bot remembered a subtle point from hour 2 that became critical in hour 7.
The gotchas nobody mentions:
- Summaries can hallucinate ("User agreed to X" when they didn't)
- Critical details get lost ("around $10k" becomes "$10k" exactly)
- Emotional context disappears (frustrated user seems neutral)
- Costs add up (summarizing every N messages)
3. Semantic Similarity: The Mind Reader
# The approach that feels like magic when it works
class SemanticMemoryManager
def initialize
@memory = RAAF::Memory::MemoryManager.new(
store: RAAF::Memory::VectorStore.new,
max_tokens: 4000,
pruning_strategy: :semantic_similarity
)
# Track topic evolution
@topic_embeddings = []
end
def select_relevant_context(current_query)
# Get embedding for current query
query_embedding = embed(current_query)
# Find messages similar to current topic
relevant_messages = @memory.search(
embedding: query_embedding,
threshold: 0.7,
limit: 20
)
# But also keep temporal context
recent_messages = @memory.recent(5)
# And critical markers
important_messages = @memory.tagged(:important)
# Intelligently merge
merge_contexts(
relevant_messages,
recent_messages,
important_messages
)
end
def tag_important_messages(message)
# Auto-tag messages with key information
tags = []
tags << :budget if message =~ /\$\d+|budget|cost/i
tags << :deadline if message =~ /by|before|deadline|due/i
tags << :decision if message =~ /decided|agreed|confirmed/i
tags << :problem if message =~ /issue|problem|broken|error/i
@memory.tag_message(message, tags) if tags.any?
end
end
Where semantic similarity shines:
- Technical support ("Remember when we discussed the API error?")
- Research assistance (pulls relevant info from earlier)
- Creative projects (maintains thematic consistency)
- Medical consultations (surfaces related symptoms)
The unexpected failure mode: A customer support bot kept pulling irrelevant messages because "refund" and "fund" were semantically similar. Customer asked about mutual funds, bot kept offering refunds.
4. Hybrid Strategy: The Production Reality
# What we actually use in production
class HybridMemoryStrategy
def initialize
@strategies = {
recent: SlidingWindow.new(last_n: 10),
important: ImportanceScorer.new(threshold: 0.8),
semantic: SemanticMatcher.new(top_k: 5),
summary: ChunkSummarizer.new(chunk_size: 50)
}
end
def build_context(current_input, conversation_history)
context_budget = 4000 # tokens
selected_messages = []
# Layer 1: Always include system prompt (200 tokens)
selected_messages << conversation_history[:system]
context_budget -= 200
# Layer 2: Recent messages (up to 1000 tokens)
recent = @strategies[:recent].select(conversation_history)
recent_tokens = count_tokens(recent)
if recent_tokens <= 1000
selected_messages.concat(recent)
context_budget -= recent_tokens
else
# Summarize older recent messages
summary = @strategies[:summary].summarize(recent[0...-5])
selected_messages << summary
selected_messages.concat(recent[-5..-1])
context_budget -= count_tokens(summary) + count_tokens(recent[-5..-1])
end
# Layer 3: Semantically relevant (up to 1500 tokens)
relevant = @strategies[:semantic].find_relevant(
current_input,
conversation_history
)
relevant_tokens = count_tokens(relevant)
if relevant_tokens <= context_budget - 500 # Leave room for response
selected_messages.concat(relevant)
else
# Take most relevant that fit
relevant.each do |msg|
msg_tokens = count_tokens(msg)
if context_budget - msg_tokens >= 500
selected_messages << msg
context_budget -= msg_tokens
end
end
end
# Layer 4: Fill remaining space with important messages
important = @strategies[:important].rank(conversation_history)
important.each do |msg|
msg_tokens = count_tokens(msg)
if context_budget - msg_tokens >= 500
selected_messages << msg
context_budget -= msg_tokens
end
end
# Return organized context
organize_chronologically(selected_messages)
end
end
Why hybrid wins: No single strategy handles all cases. Our production system:
- Uses sliding window for immediate context
- Applies semantic search for relevant history
- Summarizes verbose sections
- Preserves critical decisions
- Adapts based on conversation type
5.7. Memory Operations
# Access conversation history
memory = runner.memory_manager.get_memory(session_id: 'user_123')
puts "Messages: #{memory.messages.count}"
puts "Total tokens: #{memory.token_count}"
# Clear memory
runner.memory_manager.clear_memory(session_id: 'user_123')
# Add custom memory
runner.memory_manager.add_memory(
session_id: 'user_123',
role: 'system',
content: 'User prefers technical explanations'
)
6. Structured Outputs
RAAF provides universal structured output support that works across all AI providers.
6.1. Why Structured Outputs Changed Our Error Rate from 15% to 0.1%
Here's a conversation from our pre-structured era:
User: "What's the weather?"
AI: "It's currently 72 degrees Fahrenheit in San Francisco with partly
cloudy skies and a gentle breeze from the west at about 10 mph."
Our parsing code: *crashes trying to extract temperature*
We were using regex to parse natural language. It worked... sometimes. Other times the AI would say "seventy-two degrees" or "72°F" or "quite warm, around 72" and our system would fail.
Structured outputs guarantee that the AI returns data in exactly the format you need. No parsing. No regex. No guessing.
6.2. What Are Structured Outputs?
Instead of hoping the AI formats its response correctly, you define a schema and the AI must follow it:
Without Structure: "The temperature is 72°F"
With Structure: {"temperature": 72, "unit": "fahrenheit"}
It's like the difference between asking someone to "write down the time" (could be "3:30", "half past three", "15:30") versus filling out a form with specific fields for hours and minutes.
6.3. When Structured Outputs Save the Day
- Data Integration: When AI output feeds into other systems
- Multi-Agent Workflows: When agents need to pass structured data
- UI Rendering: When you're displaying AI data in specific formats
- Validation: When you need guaranteed data types and formats
6.4. Basic Structured Output
6.4.1. The $50,000 Data Format Disaster
A fintech company was building an AI system to process loan applications. Their AI returned loan decisions as natural language: "We approve this loan for $50,000 at 4.2% interest."
Their backend system couldn't parse it. Every response was different:
- "Approved for $50K at 4.2%"
- "Loan amount: fifty thousand dollars, rate: 4.2%"
- "Yes, approve. Amount: $50,000. Interest: 4.2%"
The integration team spent 3 months building a parser. It failed 30% of the time. They lost deals because approvals took days instead of seconds.
The fix? Structured outputs.
6.4.2. Your First Structured Output
require 'raaf'
# Define response schema
WeatherSchema = {
type: 'object',
properties: {
location: { type: 'string' },
temperature: { type: 'number' },
conditions: { type: 'string' },
humidity: { type: 'number' },
wind_speed: { type: 'number' }
},
required: ['location', 'temperature', 'conditions']
}
agent = RAAF::Agent.new(
name: "WeatherBot",
instructions: "Provide weather information in the specified format",
model: "gpt-4o",
response_format: WeatherSchema
)
runner = RAAF::Runner.new(agent: agent)
result = runner.run("What's the weather in San Francisco?")
# Result will be structured JSON matching the schema
weather_data = JSON.parse(result.messages.last[:content])
puts "Temperature: #{weather_data['temperature']}°F"
6.4.3. Common Data Types and Patterns
String Types with Validation
PersonSchema = {
type: 'object',
properties: {
name: {
type: 'string',
minLength: 1,
maxLength: 100
},
email: {
type: 'string',
format: 'email'
},
phone: {
type: 'string',
pattern: '^\\+?[1-9]\\d{1,14}$' # International phone format
},
role: {
type: 'string',
enum: ['admin', 'user', 'guest']
}
},
required: ['name', 'email']
}
Numeric Types with Constraints
PricingSchema = {
type: 'object',
properties: {
base_price: {
type: 'number',
minimum: 0,
maximum: 1000000
},
discount_percent: {
type: 'number',
minimum: 0,
maximum: 100
},
quantity: {
type: 'integer',
minimum: 1,
maximum: 10000
},
is_taxable: {
type: 'boolean'
}
},
required: ['base_price', 'quantity']
}
Array Handling
ShoppingCartSchema = {
type: 'object',
properties: {
items: {
type: 'array',
minItems: 1,
maxItems: 50,
items: {
type: 'object',
properties: {
product_id: { type: 'string' },
quantity: { type: 'integer', minimum: 1 }
},
required: ['product_id', 'quantity']
}
},
total_items: {
type: 'integer',
minimum: 1
}
},
required: ['items', 'total_items']
}
6.4.4. Real-World Schema Examples
Financial Transaction Schema
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: undefined local variable or method 'runner' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-anite9.rb:490:in '<main>'
TransactionSchema = {
type: 'object',
properties: {
transaction_id: { type: 'string' },
amount: {
type: 'number',
minimum: 0.01,
maximum: 1000000
},
currency: {
type: 'string',
enum: ['USD', 'EUR', 'GBP', 'JPY']
},
transaction_type: {
type: 'string',
enum: ['credit', 'debit', 'transfer']
},
merchant: {
type: 'object',
properties: {
name: { type: 'string' },
category: { type: 'string' },
location: { type: 'string' }
},
required: ['name', 'category']
},
timestamp: {
type: 'string',
format: 'date-time'
},
status: {
type: 'string',
enum: ['pending', 'completed', 'failed', 'cancelled']
}
},
required: ['transaction_id', 'amount', 'currency', 'transaction_type', 'timestamp', 'status']
}
# Usage
agent = RAAF::Agent.new(
name: "TransactionProcessor",
instructions: "Process financial transactions and extract structured data",
model: "gpt-4o",
response_format: TransactionSchema
)
result = runner.run("Process this transaction: $250.00 payment to Amazon on 2024-01-15")
transaction = JSON.parse(result.messages.last[:content])
Customer Support Ticket Schema
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: undefined local variable or method 'runner' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-u3i0p5.rb:500:in '<main>'
SupportTicketSchema = {
type: 'object',
properties: {
ticket_id: { type: 'string' },
customer_info: {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string', format: 'email' },
tier: { type: 'string', enum: ['free', 'premium', 'enterprise'] }
},
required: ['name', 'email', 'tier']
},
issue: {
type: 'object',
properties: {
category: {
type: 'string',
enum: ['billing', 'technical', 'account', 'feature_request']
},
priority: {
type: 'string',
enum: ['low', 'medium', 'high', 'critical']
},
description: { type: 'string' },
affected_features: {
type: 'array',
items: { type: 'string' }
}
},
required: ['category', 'priority', 'description']
},
resolution: {
type: 'object',
properties: {
status: {
type: 'string',
enum: ['open', 'in_progress', 'resolved', 'closed']
},
estimated_resolution_time: { type: 'string' },
assigned_agent: { type: 'string' }
},
required: ['status']
}
},
required: ['ticket_id', 'customer_info', 'issue', 'resolution']
}
# Usage for ticket classification
agent = RAAF::Agent.new(
name: "TicketClassifier",
instructions: "Classify and structure customer support tickets",
model: "gpt-4o",
response_format: SupportTicketSchema
)
result = runner.run("Customer [email protected] (premium) says: 'My dashboard won't load and I can't see my analytics. This is urgent for tomorrow's board meeting.'")
ticket = JSON.parse(result.messages.last[:content])
6.4.5. Error Handling for Structured Outputs
class StructuredOutputHandler
def self.safe_parse(agent_response, schema)
begin
# Parse the JSON response
parsed_data = JSON.parse(agent_response)
# Validate against schema
validation_errors = validate_schema(parsed_data, schema)
if validation_errors.any?
{
success: false,
errors: validation_errors,
raw_response: agent_response
}
else
{
success: true,
data: parsed_data
}
end
rescue JSON::ParserError => e
{
success: false,
errors: ["Invalid JSON format: #{e.message}"],
raw_response: agent_response
}
end
end
private
def self.validate_schema(data, schema)
errors = []
# Check required fields
schema[:required]&.each do |field|
unless data.key?(field.to_s)
errors << "Missing required field: #{field}"
end
end
# Check data types
schema[:properties]&.each do |field, constraints|
next unless data.key?(field.to_s)
value = data[field.to_s]
expected_type = constraints[:type]
unless validate_type(value, expected_type)
errors << "Field '#{field}' should be #{expected_type}, got #{value.class}"
end
end
errors
end
def self.validate_type(value, expected_type)
case expected_type
when 'string'
value.is_a?(String)
when 'number'
value.is_a?(Numeric)
when 'integer'
value.is_a?(Integer)
when 'boolean'
value.is_a?(TrueClass) || value.is_a?(FalseClass)
when 'array'
value.is_a?(Array)
when 'object'
value.is_a?(Hash)
else
false
end
end
end
# Usage
result = runner.run("Extract customer data from this text...")
response = result.messages.last[:content]
parsed_result = StructuredOutputHandler.safe_parse(response, CustomerSchema)
if parsed_result[:success]
customer_data = parsed_result[:data]
puts "Customer: #{customer_data['name']}"
else
puts "Parsing failed: #{parsed_result[:errors].join(', ')}"
# Handle fallback or retry logic
end
6.4.6. Performance Tips for Structured Outputs
# Tip 1: Use concise schemas for faster generation
CompactSchema = {
type: 'object',
properties: {
status: { type: 'string' },
result: { type: 'string' }
},
required: ['status']
}
# Tip 2: Cache schema validation
class SchemaCache
@cache = {}
def self.validate(data, schema)
schema_key = schema.hash
validator = @cache[schema_key] ||= build_validator(schema)
validator.validate(data)
end
end
# Tip 3: Use progressive schema complexity
BasicContactSchema = {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string' }
},
required: ['name', 'email']
}
# Extend only when needed
DetailedContactSchema = BasicContactSchema.merge({
properties: BasicContactSchema[:properties].merge({
phone: { type: 'string' },
address: { type: 'object' },
preferences: { type: 'array' }
})
})
The key to successful structured outputs is starting simple and adding complexity only when your use case demands it. Every field in your schema should solve a real problem, not anticipate theoretical needs.
6.5. Complex Schemas
6.5.1. Complex Schema Patterns
An e-commerce platform needed to integrate with 12 different systems: payment processors, inventory management, shipping carriers, tax calculators, and fraud detection. Each system expected different data formats.
The Problem: Their AI agent was generating orders in natural language, then they had 12 different parsers trying to extract the right data for each system. Maintenance was complex and error-prone.
The Solution: One complex schema that captured all the needed data, with clear relationships and validation rules.
6.5.2. Building Complex Schemas: The Layer-by-Layer Approach
Layer 1: Basic Object Structure
# Start with the core entities
BasicOrderSchema = {
type: 'object',
properties: {
order_id: { type: 'string' },
customer: { type: 'object' },
items: { type: 'array' },
total: { type: 'number' }
},
required: ['order_id', 'customer', 'items', 'total']
}
Layer 2: Add Nested Object Details
# Define customer object structure
CustomerObject = {
type: 'object',
properties: {
customer_id: { type: 'string' },
name: { type: 'string' },
email: { type: 'string', format: 'email' },
phone: { type: 'string' },
address: {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' },
state: { type: 'string' },
postal_code: { type: 'string' },
country: { type: 'string', enum: ['US', 'CA', 'GB', 'AU'] }
},
required: ['street', 'city', 'state', 'postal_code', 'country']
},
billing_address: {
type: 'object',
properties: {
same_as_shipping: { type: 'boolean' },
street: { type: 'string' },
city: { type: 'string' },
state: { type: 'string' },
postal_code: { type: 'string' },
country: { type: 'string', enum: ['US', 'CA', 'GB', 'AU'] }
},
required: ['same_as_shipping']
}
},
required: ['customer_id', 'name', 'email', 'address']
}
Layer 3: Complex Array Items
# Define item structure with variants and metadata
ItemObject = {
type: 'object',
properties: {
product_id: { type: 'string' },
name: { type: 'string' },
category: { type: 'string' },
quantity: { type: 'integer', minimum: 1, maximum: 100 },
unit_price: { type: 'number', minimum: 0 },
discount: {
type: 'object',
properties: {
type: { type: 'string', enum: ['percentage', 'fixed'] },
value: { type: 'number', minimum: 0 },
reason: { type: 'string' }
}
},
variants: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
value: { type: 'string' }
},
required: ['name', 'value']
}
},
metadata: {
type: 'object',
properties: {
weight: { type: 'number' },
dimensions: {
type: 'object',
properties: {
length: { type: 'number' },
width: { type: 'number' },
height: { type: 'number' }
}
},
sku: { type: 'string' },
barcode: { type: 'string' }
}
}
},
required: ['product_id', 'name', 'quantity', 'unit_price']
}
Layer 4: Complete Complex Schema
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: uninitialized constant ItemObject /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-x69il2.rb:455:in '<main>'
# The full enterprise-grade order schema
CompleteOrderSchema = {
type: 'object',
properties: {
order_id: { type: 'string' },
order_number: { type: 'string' },
customer: CustomerObject,
items: {
type: 'array',
minItems: 1,
maxItems: 50,
items: ItemObject
},
pricing: {
type: 'object',
properties: {
subtotal: { type: 'number', minimum: 0 },
tax: { type: 'number', minimum: 0 },
shipping: { type: 'number', minimum: 0 },
discount: { type: 'number', minimum: 0 },
total: { type: 'number', minimum: 0 }
},
required: ['subtotal', 'tax', 'shipping', 'total']
},
payment: {
type: 'object',
properties: {
method: { type: 'string', enum: ['credit_card', 'debit_card', 'paypal', 'bank_transfer'] },
status: { type: 'string', enum: ['pending', 'authorized', 'captured', 'failed'] },
transaction_id: { type: 'string' },
amount: { type: 'number', minimum: 0 }
},
required: ['method', 'status', 'amount']
},
fulfillment: {
type: 'object',
properties: {
method: { type: 'string', enum: ['standard', 'express', 'overnight', 'pickup'] },
carrier: { type: 'string' },
tracking_number: { type: 'string' },
estimated_delivery: { type: 'string', format: 'date' },
status: { type: 'string', enum: ['pending', 'processing', 'shipped', 'delivered', 'cancelled'] }
},
required: ['method', 'status']
},
timestamps: {
type: 'object',
properties: {
created_at: { type: 'string', format: 'date-time' },
updated_at: { type: 'string', format: 'date-time' },
confirmed_at: { type: 'string', format: 'date-time' }
},
required: ['created_at', 'updated_at']
},
metadata: {
type: 'object',
properties: {
source: { type: 'string', enum: ['web', 'mobile', 'api', 'phone'] },
referrer: { type: 'string' },
session_id: { type: 'string' },
user_agent: { type: 'string' },
ip_address: { type: 'string' },
notes: { type: 'string' }
}
}
},
required: ['order_id', 'customer', 'items', 'pricing', 'payment', 'fulfillment', 'timestamps']
}
# Usage
agent = RAAF::Agent.new(
name: "OrderProcessor",
instructions: "Process complex orders with full details for enterprise integration",
model: "gpt-4o",
response_format: CompleteOrderSchema
)
result = runner.run("Process this order: Customer John Smith wants 2 iPhone 15 Pro cases and 1 screen protector, shipping to 123 Main St, Boston, MA 02101")
order = JSON.parse(result.messages.last[:content])
6.5.3. Advanced Schema Patterns
Pattern 1: Conditional Properties
# Schema that changes based on other properties
ConditionalSchema = {
type: 'object',
properties: {
notification_type: { type: 'string', enum: ['email', 'sms', 'push'] },
recipient: { type: 'string' },
content: { type: 'string' }
},
required: ['notification_type', 'recipient', 'content'],
# Conditional validation
if: { properties: { notification_type: { const: 'email' } } },
then: {
properties: {
recipient: { type: 'string', format: 'email' },
subject: { type: 'string' },
html_content: { type: 'string' }
},
required: ['subject']
},
else: {
if: { properties: { notification_type: { const: 'sms' } } },
then: {
properties: {
recipient: { type: 'string', pattern: '^\\+?[1-9]\\d{1,14}$' },
message: { type: 'string', maxLength: 160 }
},
required: ['message']
}
}
}
Pattern 2: Polymorphic Objects
# Schema that handles different object types
PolymorphicSchema = {
type: 'object',
properties: {
events: {
type: 'array',
items: {
type: 'object',
properties: {
event_type: { type: 'string', enum: ['user_action', 'system_event', 'error'] },
timestamp: { type: 'string', format: 'date-time' },
data: { type: 'object' }
},
required: ['event_type', 'timestamp', 'data'],
# Different data structure based on event_type
if: { properties: { event_type: { const: 'user_action' } } },
then: {
properties: {
data: {
type: 'object',
properties: {
user_id: { type: 'string' },
action: { type: 'string' },
page: { type: 'string' },
session_id: { type: 'string' }
},
required: ['user_id', 'action']
}
}
},
else: {
if: { properties: { event_type: { const: 'system_event' } } },
then: {
properties: {
data: {
type: 'object',
properties: {
service: { type: 'string' },
operation: { type: 'string' },
status: { type: 'string' },
duration_ms: { type: 'number' }
},
required: ['service', 'operation', 'status']
}
}
}
}
}
}
},
required: ['events']
}
Pattern 3: Recursive Schemas
# Schema that references itself (like file system trees)
RecursiveSchema = {
type: 'object',
properties: {
name: { type: 'string' },
type: { type: 'string', enum: ['file', 'directory'] },
size: { type: 'number' },
children: {
type: 'array',
items: { '$ref': '#' } # References the root schema
}
},
required: ['name', 'type'],
# Conditional requirements
if: { properties: { type: { const: 'file' } } },
then: { required: ['size'] },
else: { required: ['children'] }
}
6.5.4. Schema Optimization Strategies
Strategy 1: Progressive Schema Building
class SchemaBuilder
def self.build_progressive(base_schema, extensions = {})
result = base_schema.deep_dup
extensions.each do |key, extension|
case key
when :add_properties
result[:properties].merge!(extension)
when :add_required
result[:required] ||= []
result[:required].concat(extension)
when :add_constraints
extension.each do |prop, constraints|
result[:properties][prop]&.merge!(constraints)
end
end
end
result
end
end
# Usage
base_schema = {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string' }
},
required: ['name']
}
# Add complexity progressively
enhanced_schema = SchemaBuilder.build_progressive(base_schema, {
add_properties: {
phone: { type: 'string' },
address: { type: 'object' }
},
add_required: ['email'],
add_constraints: {
email: { format: 'email' },
name: { minLength: 2, maxLength: 50 }
}
})
Strategy 2: Schema Composition
# Reusable schema components
AddressSchema = {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' },
state: { type: 'string' },
postal_code: { type: 'string' },
country: { type: 'string' }
},
required: ['street', 'city', 'state', 'postal_code', 'country']
}
PersonSchema = {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string', format: 'email' },
address: AddressSchema
},
required: ['name', 'email', 'address']
}
# Compose into larger schemas
CompanySchema = {
type: 'object',
properties: {
name: { type: 'string' },
employees: {
type: 'array',
items: PersonSchema
},
headquarters: AddressSchema
},
required: ['name', 'employees', 'headquarters']
}
6.5.5. Managing Schema Complexity
Rule 1: Keep Related Data Together
# Bad: Scattered related fields
BadSchema = {
properties: {
first_name: { type: 'string' },
last_name: { type: 'string' },
email: { type: 'string' },
phone: { type: 'string' },
street: { type: 'string' },
city: { type: 'string' }
}
}
# Good: Grouped related fields
GoodSchema = {
properties: {
personal_info: {
type: 'object',
properties: {
first_name: { type: 'string' },
last_name: { type: 'string' }
}
},
contact_info: {
type: 'object',
properties: {
email: { type: 'string' },
phone: { type: 'string' }
}
},
address: {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' }
}
}
}
}
Rule 2: Use Clear Naming Conventions
# Clear, consistent naming
NamingSchema = {
properties: {
user_id: { type: 'string' }, # Not: id, userId, user_identifier
created_at: { type: 'string' }, # Not: created, createdAt, creation_time
is_active: { type: 'boolean' }, # Not: active, status, enabled
email_address: { type: 'string' }, # Not: email, mail, e_mail
phone_number: { type: 'string' }, # Not: phone, tel, telephone
order_items: { type: 'array' }, # Not: items, products, line_items
total_amount: { type: 'number' }, # Not: total, amount, price
payment_method: { type: 'string' }, # Not: payment, method, pay_method
shipping_address: { type: 'object' }, # Not: address, ship_to, destination
billing_address: { type: 'object' } # Not: bill_to, payment_address
}
}
The key to complex schemas is building them incrementally, testing at each stage, and focusing on real business requirements rather than theoretical completeness.
6.6. Schema Validation
6.6.1. Schema Validation Importance
It was 2 AM on Black Friday. Our AI agent was processing thousands of orders per minute. Everything was working perfectly—until it wasn't.
An order came in with a negative quantity: { "quantity": -5 }
. Our system processed it, charged the customer, and sent a refund request to the payment processor. The payment processor saw the negative amount and issued a credit instead of a refund.
Result? We accidentally gave a customer $500 instead of processing their $100 order.
Schema validation is essential for production AI systems to ensure data integrity and prevent errors.
6.6.2. Multi-Layer Validation Strategy
Layer 1: Provider-Level Validation
# Most providers now support native structured outputs
agent = RAAF::Agent.new(
name: "DataExtractor",
instructions: "Extract structured data",
model: "gpt-4o",
response_format: schema,
strict: true # Enable strict schema validation
)
runner = RAAF::Runner.new(agent: agent)
begin
result = runner.run("Extract customer data from this text...")
data = JSON.parse(result.messages.last[:content])
# Data is guaranteed to match schema
process_customer_data(data)
rescue RAAF::StructuredOutput::ValidationError => e
puts "Schema validation failed: #{e.message}"
puts "Validation errors: #{e.errors}"
end
Layer 2: Application-Level Validation
class SchemaValidator
def self.validate(data, schema)
errors = []
# Check required fields
missing_fields = check_required_fields(data, schema)
errors.concat(missing_fields) if missing_fields.any?
# Check data types
type_errors = check_data_types(data, schema)
errors.concat(type_errors) if type_errors.any?
# Check constraints
constraint_errors = check_constraints(data, schema)
errors.concat(constraint_errors) if constraint_errors.any?
# Check business rules
business_errors = check_business_rules(data, schema)
errors.concat(business_errors) if business_errors.any?
{
valid: errors.empty?,
errors: errors,
data: data
}
end
private
def self.check_required_fields(data, schema)
errors = []
required_fields = schema.dig(:required) || []
required_fields.each do |field|
unless data.key?(field.to_s) || data.key?(field.to_sym)
errors << "Missing required field: #{field}"
end
end
errors
end
def self.check_data_types(data, schema)
errors = []
properties = schema.dig(:properties) || {}
properties.each do |field, constraints|
next unless data.key?(field.to_s) || data.key?(field.to_sym)
value = data[field.to_s] || data[field.to_sym]
expected_type = constraints[:type]
unless valid_type?(value, expected_type)
errors << "Field '#{field}' should be #{expected_type}, got #{value.class}"
end
end
errors
end
def self.check_constraints(data, schema)
errors = []
properties = schema.dig(:properties) || {}
properties.each do |field, constraints|
next unless data.key?(field.to_s) || data.key?(field.to_sym)
value = data[field.to_s] || data[field.to_sym]
# Check string constraints
if constraints[:type] == 'string'
errors.concat(validate_string_constraints(field, value, constraints))
end
# Check numeric constraints
if constraints[:type] == 'number' || constraints[:type] == 'integer'
errors.concat(validate_numeric_constraints(field, value, constraints))
end
# Check array constraints
if constraints[:type] == 'array'
errors.concat(validate_array_constraints(field, value, constraints))
end
# Check enum constraints
if constraints[:enum]
unless constraints[:enum].include?(value)
errors << "Field '#{field}' must be one of #{constraints[:enum].join(', ')}, got '#{value}'"
end
end
end
errors
end
def self.validate_string_constraints(field, value, constraints)
errors = []
if constraints[:minLength] && value.length < constraints[:minLength]
errors << "Field '#{field}' must be at least #{constraints[:minLength]} characters"
end
if constraints[:maxLength] && value.length > constraints[:maxLength]
errors << "Field '#{field}' must be at most #{constraints[:maxLength]} characters"
end
if constraints[:pattern] && !value.match(Regexp.new(constraints[:pattern]))
errors << "Field '#{field}' must match pattern #{constraints[:pattern]}"
end
if constraints[:format] == 'email' && !value.match(/\A[^@]+@[^@]+\.[^@]+\z/)
errors << "Field '#{field}' must be a valid email address"
end
errors
end
def self.validate_numeric_constraints(field, value, constraints)
errors = []
if constraints[:minimum] && value < constraints[:minimum]
errors << "Field '#{field}' must be at least #{constraints[:minimum]}"
end
if constraints[:maximum] && value > constraints[:maximum]
errors << "Field '#{field}' must be at most #{constraints[:maximum]}"
end
if constraints[:type] == 'integer' && !value.is_a?(Integer)
errors << "Field '#{field}' must be an integer"
end
errors
end
def self.validate_array_constraints(field, value, constraints)
errors = []
if constraints[:minItems] && value.length < constraints[:minItems]
errors << "Field '#{field}' must have at least #{constraints[:minItems]} items"
end
if constraints[:maxItems] && value.length > constraints[:maxItems]
errors << "Field '#{field}' must have at most #{constraints[:maxItems]} items"
end
# Validate array items
if constraints[:items]
value.each_with_index do |item, index|
item_errors = validate(item, constraints[:items])
if item_errors[:errors].any?
errors << "Item #{index} in field '#{field}': #{item_errors[:errors].join(', ')}"
end
end
end
errors
end
def self.check_business_rules(data, schema)
errors = []
# Custom business rule validation
if schema[:business_rules]
schema[:business_rules].each do |rule|
unless rule[:validator].call(data)
errors << rule[:message]
end
end
end
errors
end
def self.valid_type?(value, expected_type)
case expected_type
when 'string'
value.is_a?(String)
when 'number'
value.is_a?(Numeric)
when 'integer'
value.is_a?(Integer)
when 'boolean'
value.is_a?(TrueClass) || value.is_a?(FalseClass)
when 'array'
value.is_a?(Array)
when 'object'
value.is_a?(Hash)
when 'null'
value.nil?
else
false
end
end
end
Layer 3: Business Rule Validation
# Schema with business rules
OrderSchema = {
type: 'object',
properties: {
customer_id: { type: 'string' },
items: {
type: 'array',
items: {
type: 'object',
properties: {
product_id: { type: 'string' },
quantity: { type: 'integer', minimum: 1 },
price: { type: 'number', minimum: 0 }
}
}
},
total: { type: 'number', minimum: 0 }
},
required: ['customer_id', 'items', 'total'],
# Custom business rules
business_rules: [
{
validator: ->(data) {
# Total must equal sum of item prices
calculated_total = data['items'].sum { |item| item['price'] * item['quantity'] }
(data['total'] - calculated_total).abs < 0.01
},
message: "Total must equal sum of item prices"
},
{
validator: ->(data) {
# Maximum 20 items per order
data['items'].length <= 20
},
message: "Orders cannot exceed 20 items"
},
{
validator: ->(data) {
# No duplicate products
product_ids = data['items'].map { |item| item['product_id'] }
product_ids.uniq.length == product_ids.length
},
message: "Order cannot contain duplicate products"
}
]
}
# Usage
result = runner.run("Process this order...")
response = result.messages.last[:content]
data = JSON.parse(response)
validation_result = SchemaValidator.validate(data, OrderSchema)
if validation_result[:valid]
process_order(data)
else
puts "Validation failed:"
validation_result[:errors].each { |error| puts " - #{error}" }
end
6.6.3. Real-Time Validation Patterns
Pattern 1: Streaming Validation
class StreamingValidator
def initialize(schema)
@schema = schema
@partial_data = {}
@errors = []
end
def validate_chunk(chunk)
@partial_data.merge!(chunk)
# Validate what we have so far
current_errors = validate_partial(@partial_data, @schema)
{
valid_so_far: current_errors.empty?,
errors: current_errors,
can_continue: can_continue_parsing?(current_errors)
}
end
def finalize
# Final validation of complete data
SchemaValidator.validate(@partial_data, @schema)
end
private
def validate_partial(data, schema)
# Only validate fields that are present
# Don't fail on missing required fields yet
errors = []
schema[:properties]&.each do |field, constraints|
next unless data.key?(field.to_s)
value = data[field.to_s]
# Validate type
unless SchemaValidator.valid_type?(value, constraints[:type])
errors << "Field '#{field}' has invalid type"
end
# Validate constraints
if constraints[:type] == 'string'
errors.concat(SchemaValidator.validate_string_constraints(field, value, constraints))
end
end
errors
end
def can_continue_parsing?(errors)
# Don't continue if we have fundamental type errors
!errors.any? { |error| error.include?('invalid type') }
end
end
Pattern 2: Conditional Validation
class ConditionalValidator
def self.validate(data, schema)
errors = []
# Base validation
base_errors = SchemaValidator.validate(data, schema)
errors.concat(base_errors[:errors])
# Conditional validation
if schema[:conditional_rules]
schema[:conditional_rules].each do |rule|
if rule[:condition].call(data)
conditional_errors = validate_conditional_schema(data, rule[:then_schema])
errors.concat(conditional_errors)
elsif rule[:else_schema]
conditional_errors = validate_conditional_schema(data, rule[:else_schema])
errors.concat(conditional_errors)
end
end
end
{
valid: errors.empty?,
errors: errors,
data: data
}
end
private
def self.validate_conditional_schema(data, schema)
result = SchemaValidator.validate(data, schema)
result[:errors]
end
end
# Usage
PaymentSchema = {
type: 'object',
properties: {
payment_method: { type: 'string', enum: ['credit_card', 'paypal', 'bank_transfer'] },
amount: { type: 'number', minimum: 0 }
},
required: ['payment_method', 'amount'],
conditional_rules: [
{
condition: ->(data) { data['payment_method'] == 'credit_card' },
then_schema: {
type: 'object',
properties: {
card_number: { type: 'string', pattern: '^[0-9]{16}$' },
expiry: { type: 'string', pattern: '^[0-9]{2}/[0-9]{2}$' },
cvv: { type: 'string', pattern: '^[0-9]{3,4}$' }
},
required: ['card_number', 'expiry', 'cvv']
}
},
{
condition: ->(data) { data['payment_method'] == 'paypal' },
then_schema: {
type: 'object',
properties: {
paypal_email: { type: 'string', format: 'email' }
},
required: ['paypal_email']
}
}
]
}
6.6.4. Performance-Optimized Validation
class CachedValidator
def initialize
@validation_cache = {}
@schema_cache = {}
end
def validate(data, schema)
# Cache compiled schemas
schema_key = schema.hash
compiled_schema = @schema_cache[schema_key] ||= compile_schema(schema)
# Cache validation results for identical data
data_key = data.hash
cache_key = "#{schema_key}_#{data_key}"
@validation_cache[cache_key] ||= perform_validation(data, compiled_schema)
end
private
def compile_schema(schema)
# Pre-compile regex patterns, enum sets, etc.
compiled = schema.deep_dup
compiled[:properties]&.each do |field, constraints|
if constraints[:pattern]
constraints[:compiled_pattern] = Regexp.new(constraints[:pattern])
end
if constraints[:enum]
constraints[:enum_set] = Set.new(constraints[:enum])
end
end
compiled
end
def perform_validation(data, compiled_schema)
# Use compiled patterns for faster validation
errors = []
compiled_schema[:properties]&.each do |field, constraints|
next unless data.key?(field.to_s)
value = data[field.to_s]
# Use compiled regex
if constraints[:compiled_pattern] && !value.match(constraints[:compiled_pattern])
errors << "Field '#{field}' must match pattern"
end
# Use compiled enum set
if constraints[:enum_set] && !constraints[:enum_set].include?(value)
errors << "Field '#{field}' must be one of allowed values"
end
end
{
valid: errors.empty?,
errors: errors,
data: data
}
end
end
6.6.5. Validation Error Recovery
class ValidationErrorRecovery
def self.attempt_recovery(data, schema, errors)
recovery_attempts = []
errors.each do |error|
case error
when /Missing required field: (.+)/
field = $1
recovery_attempts << attempt_infer_missing_field(data, field, schema)
when /Field '(.+)' should be (.+), got (.+)/
field, expected_type, actual_type = $1, $2, $3
recovery_attempts << attempt_type_conversion(data, field, expected_type)
when /Field '(.+)' must be at least (\d+) characters/
field, min_length = $1, $2.to_i
recovery_attempts << attempt_pad_string(data, field, min_length)
when /Field '(.+)' must be one of (.+), got '(.+)'/
field, allowed_values, actual_value = $1, $2, $3
recovery_attempts << attempt_enum_correction(data, field, allowed_values, actual_value)
end
end
# Apply successful recovery attempts
recovered_data = data.dup
recovery_attempts.compact.each do |recovery|
recovered_data[recovery[:field]] = recovery[:value]
end
{
recovered_data: recovered_data,
recovery_applied: recovery_attempts.any?(&:present?)
}
end
private
def self.attempt_infer_missing_field(data, field, schema)
# Try to infer missing field from context
case field
when 'created_at'
{ field: field, value: Time.now.iso8601 }
when 'status'
{ field: field, value: 'pending' }
when 'id'
{ field: field, value: SecureRandom.uuid }
else
nil
end
end
def self.attempt_type_conversion(data, field, expected_type)
current_value = data[field]
case expected_type
when 'string'
{ field: field, value: current_value.to_s }
when 'number'
{ field: field, value: current_value.to_f }
when 'integer'
{ field: field, value: current_value.to_i }
when 'boolean'
{ field: field, value: !!current_value }
else
nil
end
end
def self.attempt_pad_string(data, field, min_length)
current_value = data[field]
if current_value.length < min_length
{ field: field, value: current_value.ljust(min_length, '0') }
else
nil
end
end
def self.attempt_enum_correction(data, field, allowed_values, actual_value)
allowed_array = allowed_values.split(', ')
# Try fuzzy matching
best_match = allowed_array.min_by do |allowed|
levenshtein_distance(actual_value.downcase, allowed.downcase)
end
if levenshtein_distance(actual_value.downcase, best_match.downcase) <= 2
{ field: field, value: best_match }
else
nil
end
end
def self.levenshtein_distance(s1, s2)
# Simple Levenshtein distance implementation
return s2.length if s1.empty?
return s1.length if s2.empty?
costs = Array.new(s2.length + 1) { |i| i }
(1..s1.length).each do |i|
costs[0] = i
nw = i - 1
(1..s2.length).each do |j|
costs[j], nw = [
costs[j] + 1,
costs[j - 1] + 1,
nw + (s1[i - 1] == s2[j - 1] ? 0 : 1)
].min, costs[j]
end
end
costs[s2.length]
end
end
The key to effective schema validation is layering multiple validation strategies: provider-level for structure, application-level for business rules, and recovery mechanisms for graceful error handling.
6.7. Provider-Specific Adaptations
RAAF automatically adapts schemas for different providers:
# Same schema works with different providers
schema = { type: 'object', properties: { name: { type: 'string' } } }
# OpenAI - uses native structured outputs
openai_agent = RAAF::Agent.new(
model: "gpt-4o",
response_format: schema
)
# Anthropic - uses guided generation
claude_agent = RAAF::Agent.new(
model: "claude-3-5-sonnet-20241022",
response_format: schema
)
# Groq - uses constrained generation
groq_agent = RAAF::Agent.new(
model: "mixtral-8x7b-32768",
response_format: schema
)
7. Context Management
Context is the connective tissue of AI conversations. Without proper context management, your AI agent suffers from digital amnesia—forgetting everything between interactions, losing track of user preferences, and restarting complex workflows from scratch.
Think of context as the agent's working memory. Just as a skilled consultant remembers your company's specific requirements, industry context, and past discussions, an AI agent needs context to provide personalized, coherent interactions. The difference between a forgettable chatbot and a truly useful AI assistant is how well it manages and leverages context.
Context management in RAAF addresses three critical challenges:
- State Persistence: Maintaining information across conversation turns
- Scope Control: Determining what information should be shared between agents
- Performance Impact: Balancing context richness with token efficiency
7.1. Why Context Management Matters
Consider a customer service scenario without context management:
# Without context - Every interaction starts fresh
user: "I need help with my order"
agent: "Sure! Can you provide your order number?"
user: "It's ORD-12345"
agent: "Thank you. What's your issue?"
user: "I need to change the shipping address"
agent: "What's your new address?"
user: "123 Main St, New York"
agent: "Done! Is there anything else?"
user: "Actually, can you check if there are any other orders?"
agent: "I'd be happy to help! Can you provide your order number?"
The agent forgot everything about the customer, even though they just discussed order ORD-12345. This creates a frustrating experience.
With proper context management:
# With context - Information persists and accumulates
user: "I need help with my order"
agent: "I'd be happy to help! Can you provide your order number?"
user: "It's ORD-12345"
agent: "Thank you. I can see this is for John Smith, shipping to 456 Oak Ave. What's your issue?"
user: "I need to change the shipping address"
agent: "What's your new address?"
user: "123 Main St, New York"
agent: "I've updated order ORD-12345 to ship to 123 Main St, New York. Is there anything else?"
user: "Actually, can you check if there are any other orders?"
agent: "Looking at your account, I see you have two other orders: ORD-12340 (delivered) and ORD-12342 (processing). Would you like details on either of these?"
The agent remembers the customer's identity, current order, and proactively provides relevant information. This is the power of context management.
7.2. Context Variables: The Foundation of Stateful AI
Context variables are key-value pairs that persist throughout a conversation and can be accessed by all tools and agents in the workflow. They bridge the gap between stateless AI models and stateful application requirements.
# Comprehensive context setup
runner = RAAF::Runner.new(
agent: agent,
context_variables: {
# User identity and preferences
user_id: 'user_123',
user_name: 'John Smith',
user_role: 'premium_customer',
language: 'en',
timezone: 'America/New_York',
# Session information
session_type: 'support',
session_id: 'sess_abc123',
channel: 'web_chat',
# Business context
company_id: 'comp_456',
account_tier: 'enterprise',
region: 'north_america',
# Workflow state
current_step: 'initial_contact',
escalation_level: 0,
previous_interactions: 3
}
)
# Context variables are automatically injected into tools
def get_user_preferences(user_id: nil)
# user_id is automatically provided from context
preferences = UserPreferences.find_by(user_id: user_id)
{
theme: preferences.theme,
notification_preferences: preferences.notifications,
communication_style: preferences.style,
preferred_channel: preferences.channel
}
end
def personalize_response(content:, user_id: nil, language: nil, timezone: nil)
# Multiple context variables injected automatically
localized_content = I18n.with_locale(language) do
content.gsub('{timezone}', timezone)
end
{
content: localized_content,
personalization_applied: true,
user_id: user_id
}
end
agent.add_tool(method(:get_user_preferences))
agent.add_tool(method(:personalize_response))
This comprehensive context setup enables sophisticated personalization and workflow management. The agent can access user preferences, maintain session state, and make decisions based on business context—all without requiring explicit parameters in each tool call.
7.3. Dynamic Context Updates: Context That Evolves
Context isn't static—it evolves as conversations progress. Dynamic context updates allow agents to learn and adapt based on new information discovered during the interaction.
def update_user_language(language:)
# Update context during conversation
runner.update_context(language: language)
# Update user preferences persistently
UserPreferences.find_by(user_id: context[:user_id]).update(language: language)
{
status: 'updated',
new_language: language,
message: "I've switched to #{language}. All future interactions will use this language."
}
end
def escalate_support_level(reason:)
current_level = runner.context[:escalation_level] || 0
new_level = current_level + 1
# Update context to reflect escalation
runner.update_context(
escalation_level: new_level,
current_step: 'escalated_support',
escalation_reason: reason,
escalation_timestamp: Time.current
)
# Notify human agents if escalation reaches threshold
if new_level >= 2
HumanEscalationService.notify(
user_id: runner.context[:user_id],
session_id: runner.context[:session_id],
reason: reason,
context: runner.context
)
end
{
status: 'escalated',
escalation_level: new_level,
next_steps: new_level >= 2 ? 'human_agent_notified' : 'continued_ai_support'
}
end
agent.add_tool(method(:update_user_language))
agent.add_tool(method(:escalate_support_level))
Dynamic context updates enable several powerful patterns:
- Learning and Adaptation: The agent learns user preferences and adjusts behavior
- Workflow Progression: Context tracks where the user is in a multi-step process
- Escalation Management: Context maintains escalation state and triggers appropriate actions
- Session State: Context preserves important session information across interactions
7.4. Context Inheritance: Sharing Knowledge Between Agents
In multi-agent workflows, context inheritance ensures that knowledge and state transfer seamlessly between specialized agents. This prevents information loss during agent handoffs and maintains conversation continuity.
# Define specialized agents with context awareness
research_agent = RAAF::Agent.new(
name: "Researcher",
instructions: "Research topics thoroughly using available sources. Access user preferences and previous research from context.",
model: "gpt-4o"
)
writer_agent = RAAF::Agent.new(
name: "Writer",
instructions: "Write engaging content based on research. Use user preferences and context to personalize the writing style.",
model: "gpt-4o"
)
editor_agent = RAAF::Agent.new(
name: "Editor",
instructions: "Edit and polish content. Ensure consistency with user preferences and brand guidelines from context.",
model: "gpt-4o"
)
# Rich context that spans all agents
runner = RAAF::Runner.new(
agent: research_agent,
agents: [research_agent, writer_agent, editor_agent],
context_variables: {
# Content requirements
topic: 'Ruby programming best practices',
audience: 'intermediate developers',
content_type: 'technical_blog_post',
target_length: 2000,
# User preferences
writing_style: 'conversational',
code_style: 'ruby_standard',
examples_preferred: true,
# Brand guidelines
brand_voice: 'authoritative_but_approachable',
company_standards: 'enterprise_focused',
# Workflow state
deadline: '2024-01-20',
priority: 'high',
stakeholder_approval_required: true,
# Research constraints
source_types: ['official_docs', 'community_posts', 'code_examples'],
fact_checking_required: true,
citation_style: 'informal'
}
)
# Context-aware tools that leverage inherited information
def gather_research_sources(topic: nil, audience: nil, source_types: nil)
# All parameters automatically injected from context
sources = SourceFinder.find(
topic: topic,
audience_level: audience,
types: source_types
)
{
sources_found: sources.length,
sources: sources,
research_quality: assess_source_quality(sources),
context_used: { topic: topic, audience: audience, types: source_types }
}
end
def draft_content(topic: nil, audience: nil, writing_style: nil, examples_preferred: nil)
# Context provides all necessary information
content = ContentGenerator.generate(
topic: topic,
audience: audience,
style: writing_style,
include_examples: examples_preferred
)
{
content: content,
word_count: content.split.length,
style_applied: writing_style,
examples_included: examples_preferred
}
end
research_agent.add_tool(method(:gather_research_sources))
writer_agent.add_tool(method(:draft_content))
Context inheritance enables sophisticated multi-agent workflows where each agent builds on the work of previous agents while maintaining access to all relevant information. The research agent discovers sources, the writer creates content, and the editor polishes—all with consistent access to user preferences and workflow state.
7.5. Context Scoping: Controlling Information Flow
Not all context should be shared everywhere. Context scoping allows you to control what information is available to which agents and tools, enabling both security and performance optimizations.
# Context with different scopes
runner = RAAF::Runner.new(
agent: customer_service_agent,
context_variables: {
# Global context - available to all agents
user_id: 'user_123',
session_id: 'sess_abc',
language: 'en',
# Agent-specific context
agent_contexts: {
'CustomerService' => {
escalation_level: 0,
case_number: 'CS-2024-001',
customer_tier: 'premium'
},
'TechnicalSupport' => {
issue_category: 'api_integration',
severity: 'high',
sla_deadline: '2024-01-15T10:00:00Z'
},
'BillingSupport' => {
account_balance: 450.00,
payment_method: 'credit_card',
billing_cycle: 'monthly'
}
},
# Sensitive context - restricted access
sensitive_context: {
credit_card_last_four: '1234',
account_pin: '7890',
security_questions: ['mother_maiden_name', 'first_pet']
}
}
)
# Context-scoped tool that only accesses appropriate information
def check_billing_status(user_id: nil, account_balance: nil, payment_method: nil)
# Only receives billing-related context
{
user_id: user_id,
current_balance: account_balance,
payment_method: payment_method,
next_billing_date: calculate_next_billing(user_id),
payment_status: check_payment_status(user_id)
}
end
# Security-conscious tool with explicit sensitive data access
def verify_identity(user_id: nil, sensitive_context: nil)
# Explicitly requires sensitive context
security_check = IdentityVerifier.verify(
user_id: user_id,
provided_pin: sensitive_context[:account_pin],
security_answers: sensitive_context[:security_questions]
)
{
identity_verified: security_check.verified?,
verification_method: security_check.method,
risk_score: security_check.risk_score,
# Never return sensitive data in tool response
sensitive_data_accessed: true
}
end
billing_agent.add_tool(method(:check_billing_status))
security_agent.add_tool(method(:verify_identity))
Context scoping provides several benefits:
- Security: Sensitive information is only available to authorized agents
- Performance: Agents don't receive unnecessary context that would consume tokens
- Clarity: Each agent only sees context relevant to its function
- Compliance: Audit trails can track which agents accessed sensitive information
7.6. Context Lifecycle Management
Context has a lifecycle that must be managed to prevent memory leaks, maintain performance, and ensure data freshness.
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NoMethodError: undefined method 'hour' for an instance of Integer /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-ifeltj.rb:459:in 'ContextManager#create_context' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-ifeltj.rb:565:in '<main>'
class ContextManager
def initialize
@context_store = {}
@context_ttl = {}
@context_access_log = {}
end
def create_context(session_id, initial_context = {})
@context_store[session_id] = {
created_at: Time.current,
last_accessed: Time.current,
data: initial_context,
version: 1
}
@context_ttl[session_id] = Time.current + 1.hour
@context_access_log[session_id] = []
log_context_event(session_id, 'created', initial_context.keys)
end
def update_context(session_id, updates)
return unless @context_store.key?(session_id)
context = @context_store[session_id]
old_version = context[:version]
context[:data].merge!(updates)
context[:last_accessed] = Time.current
context[:version] += 1
# Extend TTL on updates
@context_ttl[session_id] = Time.current + 1.hour
log_context_event(session_id, 'updated', updates.keys, old_version, context[:version])
end
def get_context(session_id, agent_name = nil)
return {} unless @context_store.key?(session_id)
# Check if context has expired
if Time.current > @context_ttl[session_id]
expire_context(session_id)
return {}
end
context = @context_store[session_id]
context[:last_accessed] = Time.current
# Apply agent-specific scoping
scoped_context = apply_context_scoping(context[:data], agent_name)
log_context_access(session_id, agent_name, scoped_context.keys)
scoped_context
end
def expire_context(session_id)
return unless @context_store.key?(session_id)
context = @context_store[session_id]
# Archive context for analytics
ContextArchive.create(
session_id: session_id,
created_at: context[:created_at],
last_accessed: context[:last_accessed],
version: context[:version],
access_count: @context_access_log[session_id].length,
data: context[:data]
)
@context_store.delete(session_id)
@context_ttl.delete(session_id)
@context_access_log.delete(session_id)
log_context_event(session_id, 'expired')
end
def cleanup_expired_contexts
expired_sessions = @context_ttl.select { |_, ttl| Time.current > ttl }.keys
expired_sessions.each { |session_id| expire_context(session_id) }
end
private
def apply_context_scoping(context, agent_name)
# Apply agent-specific context scoping
scoped_context = context.except(:sensitive_context)
if agent_name && context[:agent_contexts]&.[](agent_name)
scoped_context.merge!(context[:agent_contexts][agent_name])
end
scoped_context
end
def log_context_event(session_id, event_type, keys = [], old_version = nil, new_version = nil)
ContextAuditLog.create(
session_id: session_id,
event_type: event_type,
keys_affected: keys,
old_version: old_version,
new_version: new_version,
timestamp: Time.current
)
end
def log_context_access(session_id, agent_name, keys_accessed)
@context_access_log[session_id] << {
agent: agent_name,
keys: keys_accessed,
timestamp: Time.current
}
end
end
# Usage in production
context_manager = ContextManager.new
# Create context for new session
context_manager.create_context('sess_123', {
user_id: 'user_456',
language: 'en',
session_type: 'support'
})
# Update context during conversation
context_manager.update_context('sess_123', {
escalation_level: 1,
issue_category: 'billing'
})
# Get context for specific agent
billing_context = context_manager.get_context('sess_123', 'BillingSupport')
Context lifecycle management ensures:
- Memory Efficiency: Expired contexts are automatically cleaned up
- Data Freshness: Context has configurable time-to-live
- Audit Compliance: All context access is logged for security audits
- Performance: Context scoping reduces token usage per agent
7.7. Context Persistence Strategies
Different applications require different context persistence strategies. RAAF supports multiple approaches to match your scalability and reliability requirements.
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: uninitialized constant Redis /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-97hls8.rb:449:in '<main>'
# In-memory context (development/testing)
memory_context = RAAF::Context::InMemoryStore.new
# Redis-based context (production)
redis_context = RAAF::Context::RedisStore.new(
redis_client: Redis.new(url: ENV['REDIS_URL']),
key_prefix: 'raaf:context:',
ttl: 3600 # 1 hour
)
# Database-backed context (enterprise)
db_context = RAAF::Context::DatabaseStore.new(
table_name: 'agent_contexts',
connection: ActiveRecord::Base.connection
)
# Hybrid context (high-performance production)
hybrid_context = RAAF::Context::HybridStore.new(
hot_store: redis_context, # Frequent access
cold_store: db_context, # Long-term storage
migration_threshold: 5.minutes # Move to cold storage after 5 minutes
)
# Configure runner with appropriate context store
runner = RAAF::Runner.new(
agent: agent,
context_store: hybrid_context,
context_variables: initial_context
)
Each persistence strategy has different trade-offs:
- In-Memory: Fastest, no persistence, suitable for development
- Redis: Fast, clusterable, good for production
- Database: Durable, queryable, good for compliance
- Hybrid: Best of both worlds, optimal for high-scale production
7.8. Context and Token Optimization
Context directly impacts token usage and costs. Smart context management can reduce token consumption by 30-50% while maintaining conversation quality.
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NoMethodError: undefined method 'map' for an instance of String /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-rb6gk0.rb:461:in 'TokenOptimizedContext#optimize_context' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-rb6gk0.rb:508:in '<main>'
# Token-optimized context management
class TokenOptimizedContext
def initialize(max_tokens: 4000)
@max_tokens = max_tokens
@context_priorities = {
user_id: 100, # Always keep
session_id: 100, # Always keep
current_step: 90, # High priority
escalation_level: 90, # High priority
user_preferences: 80, # Medium-high priority
previous_actions: 70, # Medium priority
debug_info: 10 # Low priority
}
end
def optimize_context(context)
# Calculate token usage for each context item
context_with_tokens = context.map do |key, value|
token_count = estimate_tokens(value)
priority = @context_priorities[key] || 50
{
key: key,
value: value,
tokens: token_count,
priority: priority,
score: priority / token_count # Priority per token
}
end
# Sort by score and select items within token budget
selected_items = []
total_tokens = 0
context_with_tokens.sort_by { |item| -item[:score] }.each do |item|
if total_tokens + item[:tokens] <= @max_tokens
selected_items << item
total_tokens += item[:tokens]
end
end
# Convert back to context hash
optimized_context = selected_items.each_with_object({}) do |item, context|
context[item[:key]] = item[:value]
end
{
context: optimized_context,
tokens_used: total_tokens,
tokens_saved: estimate_tokens(context) - total_tokens,
items_dropped: context.length - selected_items.length
}
end
private
def estimate_tokens(value)
# Rough estimation: 1 token per 4 characters
value.to_s.length / 4
end
end
# Use token-optimized context
context_optimizer = TokenOptimizedContext.new(max_tokens: 3000)
result = context_optimizer.optimize_context(large_context)
runner = RAAF::Runner.new(
agent: agent,
context_variables: result[:context]
)
This optimization approach:
- Prioritizes Essential Information: User ID and session data are always preserved
- Maximizes Value: Selects context items with the highest priority-to-token ratio
- Stays Within Budget: Ensures context doesn't exceed token limits
- Provides Metrics: Shows how much was saved and what was dropped
Context management is the foundation of intelligent, stateful AI systems. By properly managing context variables, inheritance, scoping, and lifecycle, you create AI agents that feel less like chatbots and more like knowledgeable assistants who remember, learn, and adapt to user needs.
8. Error Handling and Recovery
RAAF provides comprehensive error handling and recovery mechanisms.
8.1. Provider Failures
runner = RAAF::Runner.new(
agent: agent,
retry_config: {
max_retries: 3,
backoff_strategy: :exponential,
retry_on: [
RAAF::Errors::ProviderTimeoutError,
RAAF::Errors::RateLimitError
]
}
)
8.2. Tool Failures
def unreliable_tool(data:)
begin
# Potentially failing operation
external_api.process(data)
rescue ExternalAPI::Error => e
# Return error info instead of raising
{
error: true,
error_type: 'api_failure',
message: e.message,
retry_after: e.retry_after
}
end
end
agent.add_tool(method(:unreliable_tool))
8.3. Circuit Breaker Pattern
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-nclqkk.rb:482:in '<main>'
class CircuitBreakerTool
def initialize
@failure_count = 0
@last_failure_time = nil
@circuit_open = false
end
def call(data:)
return { error: 'Circuit breaker open' } if circuit_open?
begin
result = external_service.call(data)
reset_circuit
result
rescue => e
record_failure
{ error: e.message }
end
end
private
def circuit_open?
@circuit_open && (Time.now - @last_failure_time) < 60
end
def record_failure
@failure_count += 1
@last_failure_time = Time.now
@circuit_open = true if @failure_count >= 5
end
def reset_circuit
@failure_count = 0
@circuit_open = false
end
end
agent.add_tool(CircuitBreakerTool.new.method(:call))
9. Performance Optimization: Making AI Fast, Cheap, and Reliable
Let me share a horror story. We had a client whose AI bill hit $47,000 in one day. Not because of a bug—their system was working perfectly. It was just working very, very inefficiently.
Here's how we cut their costs by 93% while making their system 5x faster.
9.1. The Hidden Performance Killers in AI Systems
Before we dive into solutions, understand what's actually slow:
- API Latency: 200-800ms per call (can't fix this)
- Token Processing: 50-100 tokens/second (model dependent)
- Tool Execution: 10ms-30s (your code)
- Memory Operations: 1-1000ms (your architecture)
- Network Overhead: 10-100ms (geography matters)
The trick is optimizing what you can control.
9.2. Connection Pooling: Stop Creating, Start Reusing
The $15,000 Mistake
# Bad: Creates new connection for every request
class NaiveAgent
def process_request(prompt)
# This constructor makes 3 HTTP calls!
provider = RAAF::Models::ResponsesProvider.new(
api_key: ENV['OPENAI_API_KEY']
)
runner = RAAF::Runner.new(agent: self, provider: provider)
runner.run(prompt)
end
end
# At 1000 requests/hour, that's 3000 unnecessary API calls
The Production Solution
# Good: Reuse connections like a pro
class ProductionAgent
def initialize
# Connection pool configuration based on real metrics
@provider = RAAF::Models::ResponsesProvider.new(
api_key: ENV['OPENAI_API_KEY'],
# Pool size = expected concurrent requests + buffer
pool_size: calculate_pool_size,
# Timeout based on p99 response time + margin
timeout: 30,
# Keep connections warm
keepalive: true,
keepalive_timeout: 60,
# Retry configuration
retry_config: {
max_attempts: 3,
base_delay: 0.5,
max_delay: 2,
multiplier: 2
}
)
# Pre-warm the connection pool
warm_connection_pool
end
private
def calculate_pool_size
# Based on your actual traffic patterns
base_concurrent_requests = 10
spike_buffer = 5
failover_capacity = 5
base_concurrent_requests + spike_buffer + failover_capacity
end
def warm_connection_pool
# Make dummy requests to establish connections
pool_size.times do
Thread.new { @provider.test_connection rescue nil }
end
end
end
Real Impact:
- Connection time: 200ms → 0ms (reused)
- Error rate: 0.1% → 0.01% (retry logic)
- Cost: No change, but 20% faster responses
9.3. Parallel Tool Execution: Work Smarter, Not Harder
The Serial Slowdown
# Bad: Tools run one at a time (10 seconds total)
def analyze_customer(customer_id)
profile = fetch_profile(customer_id) # 2 seconds
orders = fetch_orders(customer_id) # 3 seconds
support = fetch_tickets(customer_id) # 2 seconds
social = fetch_social(customer_id) # 3 seconds
{ profile: profile, orders: orders, support: support, social: social }
end
The Parallel Paradise
# Good: Tools run simultaneously (3 seconds total)
class ParallelCustomerAnalyzer
def initialize
@runner = RAAF::Runner.new(
agent: agent,
# Enable parallel execution
parallel_tools: true,
# Limit based on provider rate limits
max_parallel_tools: determine_safe_parallel_limit,
# Timeout for parallel operations
parallel_timeout: 10,
# What to do when one fails
partial_failure_strategy: :continue_with_results
)
end
def analyze_customer_parallel(customer_id)
# RAAF automatically parallelizes these tool calls
@runner.run(<<~PROMPT)
Analyze customer #{customer_id}:
1. Fetch their profile
2. Get order history
3. Review support tickets
4. Check social engagement
Return all data in structured format.
PROMPT
end
private
def determine_safe_parallel_limit
# Based on rate limits and system resources
provider_rate_limit = 50 # requests per second
safety_factor = 0.5 # Use 50% of limit
avg_tools_per_request = 4
(provider_rate_limit * safety_factor / avg_tools_per_request).floor
end
end
Common Gotchas:
- Some tools have dependencies (fetch auth token → use token)
- Database connection pools need sizing for parallel queries
- Error handling gets complex (one fails, others succeed?)
- Memory usage can spike with parallel operations
9.4. Response Caching: The Same Question Twice
The Embarrassing Discovery
# Our logs showed:
# 10:00 AM - User: "What's your refund policy?"
# 10:01 AM - Same user: "What's your refund policy?"
# 10:02 AM - Same user: "Tell me about refunds"
# Each request = $0.02. Thousands of users = $$$
The Smart Cache
class IntelligentCacheManager
def initialize
# Multi-layer cache strategy
@memory_cache = RAAF::Cache::MemoryStore.new(
size_limit: 1000, # Keep hot data in RAM
ttl: 300 # 5 minutes for immediate repeats
)
@file_cache = RAAF::Cache::FileStore.new(
directory: './cache',
ttl: 3600, # 1 hour for common questions
max_size: '1GB',
compression: :gzip # Save disk space
)
@semantic_cache = RAAF::Cache::SemanticStore.new(
embedding_model: 'text-embedding-3-small',
similarity_threshold: 0.95, # Nearly identical questions
ttl: 86400 # 24 hours for FAQ-style content
)
end
def get_or_generate(prompt, context = {})
# Level 1: Exact match in memory
cache_key = generate_cache_key(prompt, context)
if result = @memory_cache.get(cache_key)
log_cache_hit(:memory, prompt)
return result
end
# Level 2: Exact match on disk
if result = @file_cache.get(cache_key)
@memory_cache.set(cache_key, result) # Promote to memory
log_cache_hit(:file, prompt)
return result
end
# Level 3: Semantic similarity match
if result = @semantic_cache.find_similar(prompt)
log_cache_hit(:semantic, prompt)
return adapt_response(result, prompt) # Slight personalization
end
# Level 4: Generate new response
result = generate_response(prompt, context)
# Cache at all levels
cache_response(cache_key, prompt, result)
result
end
private
def generate_cache_key(prompt, context)
# Include context that affects response
relevant_context = context.slice(:user_tier, :language, :region)
Digest::SHA256.hexdigest("#{prompt}:#{relevant_context.sort.to_s}")
end
def should_cache?(prompt, response)
# Don't cache personal data
return false if prompt =~ /my account|my order|my data/i
# Don't cache time-sensitive info
return false if response =~ /today|current|now|at this moment/i
# Don't cache errors
return false if response[:error]
true
end
end
Cache Hit Rates from Production:
- FAQ-style questions: 87% cache hit rate
- Technical documentation: 72% cache hit rate
- Personal queries: 3% cache hit rate (as expected)
- Overall cost reduction: 42%
9.5. Token Optimization: Every Token Counts
The Token Waste Audit
# We analyzed 10,000 requests and found:
# - 35% of tokens were system prompts (repeated every call)
# - 20% were verbose tool descriptions
# - 15% were formatting instructions
# - Only 30% were actual user content!
The Optimization Playbook
class TokenOptimizedAgent
def initialize
@router = RAAF::Agent.new(
name: "Router",
# Minimal instructions for routing
instructions: "Route to: support, sales, or tech",
model: "gpt-4o-mini", # $0.0001 per 1K tokens
max_tokens: 10 # Just need one word
)
@specialists = {
support: build_specialist("support", "gpt-4o-mini"),
sales: build_specialist("sales", "gpt-4o"),
tech: build_specialist("tech", "gpt-4o")
}
end
def process_optimized(user_input)
# Step 1: Compress user input
compressed = compress_input(user_input)
# Step 2: Quick routing (10 tokens max)
route = @router.run(compressed)
# Step 3: Specialist handles with full context
specialist = @specialists[route.to_sym]
specialist.run(user_input) # Original input for accuracy
end
private
def compress_input(input)
# Remove redundancy, keep essence
Compressor.new.compress(input, preserve: [:intent, :key_facts])
end
def build_specialist(type, model)
RAAF::Agent.new(
name: type.capitalize,
instructions: load_compressed_instructions(type),
model: model,
# Token-saving configurations
response_format: { type: 'json' }, # No markdown overhead
temperature: 0.3, # More predictable = shorter
# Smart truncation
max_completion_tokens: calculate_optimal_max(type),
# Reuse context across turns
context_window_management: :aggressive
)
end
end
Advanced Token Strategies:
# 1. Dynamic instruction loading
def load_instructions_for_context(user_type, query_type)
base_instructions = "Be helpful and accurate."
# Only add what's needed
case query_type
when :refund
base_instructions += " Refund policy: 30 days, original payment method."
when :technical
base_instructions += load_technical_context
end
base_instructions
end
# 2. Response compression
def compress_response(response)
# Remove fluff, keep facts
response
.gsub(/(?:I understand|I can help|I'd be happy to)\s*/i, '')
.gsub(/\s+/, ' ')
.strip
end
# 3. Batching for efficiency
def process_batch(queries)
# Combine multiple queries into one request
combined = queries.map.with_index { |q, i| "#{i+1}. #{q}" }.join("\n")
response = agent.run(<<~PROMPT)
Answer each numbered question concisely:
#{combined}
PROMPT
# Split responses
parse_batched_response(response)
end
Real-World Impact:
- Average tokens per request: 1,847 → 743
- Cost per request: $0.037 → $0.015
- Response time: 3.2s → 1.8s
- Monthly savings: $18,000
9.6. The Performance Monitoring Dashboard
class PerformanceMonitor
def track_request(request_id, &block)
metrics = {
start_time: Time.now,
tokens_sent: 0,
tokens_received: 0,
cache_hits: [],
tool_timings: {},
errors: []
}
result = yield(metrics)
metrics[:end_time] = Time.now
metrics[:total_duration] = metrics[:end_time] - metrics[:start_time]
metrics[:cost] = calculate_cost(metrics)
# Alert on anomalies
check_performance_thresholds(metrics)
# Store for analysis
store_metrics(request_id, metrics)
result
end
private
def check_performance_thresholds(metrics)
if metrics[:total_duration] > 10
alert("Slow request: #{metrics[:total_duration]}s")
end
if metrics[:cost] > 0.50
alert("Expensive request: $#{metrics[:cost]}")
end
if metrics[:tokens_sent] > 3000
alert("Token limit warning: #{metrics[:tokens_sent]} tokens")
end
end
end
10. Best Practices
10.1. Hard-Won Lessons from Production AI Systems
After building dozens of AI agent systems, these are the patterns that separate the successful from the struggling:
10.2. Agent Design: The Art of Specialization
10.2.1. Why "Do Everything" Agents Always Fail
We once built a "SuperAgent" that could handle customer service, technical support, sales, and account management. It knew everything. It could do anything. It was terrible at everything.
The problem? Conflicting instructions created confusion:
- Customer service training said "always be empathetic"
- Sales training said "always be closing"
- Technical support training said "always be precise"
The result was an agent that tried to close sales while customers were reporting bugs.
10.2.2. The Single Responsibility Principle
Bad:
agent = RAAF::Agent.new(
name: "DoEverything",
instructions: "Handle customer service, sales, support, and billing"
)
Good:
# Each agent has ONE clear job
customer_service = RAAF::Agent.new(
name: "CustomerService",
instructions: "Help customers with order issues and account questions"
)
sales_agent = RAAF::Agent.new(
name: "Sales",
instructions: "Identify opportunities and guide purchase decisions"
)
tech_support = RAAF::Agent.new(
name: "TechSupport",
instructions: "Diagnose technical issues and provide solutions"
)
10.2.3. Clear Instructions: The Difference Between Success and Chaos
Vague Instructions Lead to Unpredictable Behavior:
# This agent will disappoint you
agent = RAAF::Agent.new(
instructions: "Be helpful and professional"
)
Specific Instructions Create Consistent Results:
agent = RAAF::Agent.new(
name: "RefundProcessor",
instructions: <<~INSTRUCTIONS
You process refund requests following these rules:
ALWAYS:
- Verify order exists and is within 30-day window
- Check if item was delivered
- Calculate refund including original shipping
- Provide clear timeline (3-5 business days)
NEVER:
- Process refunds over $500 without manager approval
- Refund digital products after download
- Make exceptions to the 30-day policy
For edge cases, escalate to human review.
INSTRUCTIONS
)
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
ruby: /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-wtjxd0.rb:447: syntax errors found (SyntaxError) 445 | customer_service_agent = RAAF::Agent.new( 446 | name: "CustomerService", > 447 | instructions: <<~INSTRUCTIONS | ^~~~~~~~~~~~ unterminated heredoc; can't find string "INSTRUCTIONS" anywhere before EOF 448 | You help customers with: 449 | ~~~~~~~ 470 | puts e.backtrace.first(3).join("\n") if e.backtrace 471 | exit 1 > 472 | end | ^ unexpected end-of-input, assuming it is closing the parent top level context | ^ unexpected end-of-input; expected a `)` to close the arguments > 473 | | ^ expected an `end` to close the `begin` statement
# Good: Focused agent with clear purpose
customer_service_agent = RAAF::Agent.new(
name: "CustomerService",
instructions: <<~INSTRUCTIONS
You help customers with:
- Order status inquiries
- Account issues
- Product questions
Always be polite and helpful. Escalate complex issues to human agents.
INSTRUCTIONS,
model: "gpt-4o"
)
# Add focused tools
customer_service_agent.add_tool(method(:lookup_order))
customer_service_agent.add_tool(method(:check_account_status))
customer_service_agent.add_tool(method(:escalate_to_human))
10.3. Tool Design Patterns
10.3.1. Tool Safety Considerations
Tools with broad capabilities can cause unintended consequences. For example, a generic "update_database" tool might perform operations beyond the intended scope when given ambiguous instructions.
Tool design requires balancing functionality with safety, clarity, and predictability.
10.3.2. Tool Design Principles
1. One Tool, One Purpose
# Bad: The kitchen sink approach
def manage_customer(action:, **params)
case action
when 'create' then create_customer(params)
when 'update' then update_customer(params)
when 'delete' then delete_customer(params)
when 'merge' then merge_customers(params)
# 20 more actions...
end
end
# Good: Specific, predictable tools
def create_customer(name:, email:, company: nil)
# Single, clear purpose
end
def update_customer_email(customer_id:, new_email:)
# Specific update, not generic "update anything"
end
2. Defensive Programming is Not Optional
class SafeDatabaseTool
MAX_RECORDS_PER_OPERATION = 100
def update_customer_status(customer_id:, new_status:)
# Layer 1: Input validation
validate_customer_id!(customer_id)
validate_status!(new_status)
# Layer 2: Permission check
unless can_modify_customer?(customer_id)
return { error: "Permission denied for customer #{customer_id}" }
end
# Layer 3: Business rule validation
current_status = get_customer_status(customer_id)
unless valid_status_transition?(current_status, new_status)
return {
error: "Invalid status transition",
current: current_status,
requested: new_status,
allowed: allowed_transitions_from(current_status)
}
end
# Layer 4: Audit trail
log_change_request(
user: current_user,
action: 'status_update',
customer_id: customer_id,
from: current_status,
to: new_status
)
# Layer 5: Actual update with rollback capability
transaction do
previous_state = capture_state(customer_id)
begin
result = Customer.find(customer_id).update!(status: new_status)
# Layer 6: Verification
unless verify_update(customer_id, new_status)
raise "Update verification failed"
end
{
success: true,
customer_id: customer_id,
previous_status: current_status,
new_status: new_status,
updated_at: Time.now,
rollback_token: generate_rollback_token(previous_state)
}
rescue => e
rollback_to(previous_state)
{ error: "Update failed: #{e.message}" }
end
end
end
end
3. Clear Naming: Your Tool's First Documentation
# Bad: Ambiguous names
def process_data(data) # Process how?
def handle_request(request) # Handle what?
def fix_issue(issue_id) # Fix how?
# Good: Self-documenting names
def validate_email_format(email:)
def calculate_shipping_cost(weight:, destination:)
def escalate_ticket_to_supervisor(ticket_id:, reason:)
4. Structured Responses: Consistency is King
module ToolResponseFormatter
def success_response(data = {})
{
success: true,
timestamp: Time.now.iso8601,
data: data,
warnings: collect_warnings
}
end
def error_response(error, recoverable: true)
{
success: false,
timestamp: Time.now.iso8601,
error: {
message: sanitize_error_message(error),
type: error.class.name,
recoverable: recoverable,
suggestions: suggest_fixes(error)
}
}
end
def partial_success_response(succeeded, failed)
{
success: false,
partial: true,
timestamp: Time.now.iso8601,
succeeded: succeeded,
failed: failed,
summary: {
total: succeeded.count + failed.count,
success_count: succeeded.count,
failure_count: failed.count
}
}
end
end
5. Documentation That Actually Helps
# This is what your AI reads to understand your tool
class WellDocumentedTool
# Books a meeting room for the specified time slot
#
# @param room_id [String] The room identifier (e.g., 'conf-3rd-floor-a')
# @param start_time [String] ISO 8601 formatted start time
# @param duration_minutes [Integer] Meeting duration (15-480 minutes)
# @param attendee_count [Integer] Number of attendees (1-50)
# @param title [String] Meeting title (required for calendar)
#
# @return [Hash] Booking confirmation with:
# - booking_id: Unique identifier for the booking
# - calendar_link: URL to add to calendar
# - room_details: Capacity, equipment, location
# - conflicts: Any conflicting bookings that were checked
#
# @example
# book_meeting_room(
# room_id: 'conf-3rd-floor-a',
# start_time: '2024-03-15T14:00:00Z',
# duration_minutes: 60,
# attendee_count: 8,
# title: 'Q1 Planning Review'
# )
def book_meeting_room(room_id:, start_time:, duration_minutes:,
attendee_count:, title:)
# Implementation
end
end
10.4. Memory Management: The Art of Strategic Forgetting
10.4.1. Why Perfect Memory is Actually Terrible
A financial advisor bot remembered everything. Every. Single. Detail. After six months, it was tracking:
- Every stock price mentioned
- Every "what if" scenario discussed
- Every typo and correction
- Every "let me think about it"
Result? The bot spent more time processing irrelevant history than giving advice. Response time went from 2 seconds to 45 seconds. Customers left.
10.4.2. The Memory Management Playbook
1. Choose Your Memory Store Like Your Life Depends On It
class MemoryStoreSelector
def self.recommend(requirements)
case
# Demos and prototypes
when requirements[:persistence] == false &&
requirements[:scale] < 100
configure_in_memory_store
# Most production applications
when requirements[:compliance] ||
requirements[:audit_trail]
configure_file_store_with_encryption
# High-scale with intelligence
when requirements[:scale] > 10_000 ||
requirements[:semantic_search]
configure_vector_store_with_clustering
# When in doubt
else
configure_hybrid_store
end
end
private
def self.configure_hybrid_store
RAAF::Memory::HybridStore.new(
# Recent messages in memory
hot_storage: RAAF::Memory::InMemoryStore.new(
capacity: 1000,
ttl: 300 # 5 minutes
),
# Everything else on disk
cold_storage: RAAF::Memory::FileStore.new(
directory: 'data/conversations',
compression: :zstd,
encryption: true
),
# Semantic index for search
index: RAAF::Memory::VectorStore.new(
embedding_model: 'text-embedding-3-small'
)
)
end
end
2. Implement Progressive Summarization
class ProgressiveSummarizer
SUMMARY_TRIGGERS = {
token_threshold: 0.7, # Start when 70% full
message_count: 50, # Or after 50 messages
time_elapsed: 3600, # Or after 1 hour
topic_changes: 3 # Or after 3 major topic shifts
}
def should_summarize?(conversation)
SUMMARY_TRIGGERS.any? do |trigger, threshold|
case trigger
when :token_threshold
conversation.token_usage > (conversation.max_tokens * threshold)
when :message_count
conversation.messages.count > threshold
when :time_elapsed
Time.now - conversation.started_at > threshold
when :topic_changes
detect_topic_changes(conversation) > threshold
end
end
end
def create_summary(messages)
# Group by topic and importance
grouped = group_messages_intelligently(messages)
# Summarize each group appropriately
summaries = grouped.map do |group|
case group[:type]
when :decision
# Preserve exact decision details
preserve_decisions(group[:messages])
when :exploration
# Compress exploratory discussion
summarize_exploration(group[:messages])
when :small_talk
# Minimal summary for chitchat
minimal_summary(group[:messages])
end
end
# Combine with metadata
{
summary: summaries.join("\n\n"),
key_facts: extract_key_facts(messages),
open_questions: find_unresolved_questions(messages),
important_entities: extract_entities(messages),
summary_date: Time.now,
original_message_count: messages.count,
compression_ratio: calculate_compression(messages, summaries)
}
end
end
3. Smart Context Selection
class ContextOptimizer
def build_optimal_context(current_query, history, budget = 4000)
context = []
remaining_tokens = budget
# Priority 1: System message (non-negotiable)
system_msg = history[:system_message]
context << system_msg
remaining_tokens -= count_tokens(system_msg)
# Priority 2: Current query context
query_context = find_related_context(current_query, history)
query_context.each do |msg|
if remaining_tokens > count_tokens(msg) + 500 # Reserve for response
context << msg
remaining_tokens -= count_tokens(msg)
end
end
# Priority 3: Recent messages
history[:messages].last(10).each do |msg|
if remaining_tokens > count_tokens(msg) + 500
context << msg unless context.include?(msg)
remaining_tokens -= count_tokens(msg)
end
end
# Priority 4: Important markers
find_important_messages(history).each do |msg|
if remaining_tokens > count_tokens(msg) + 500
context << msg unless context.include?(msg)
remaining_tokens -= count_tokens(msg)
end
end
# Return chronologically ordered
order_chronologically(context)
end
end
10.5. Testing Strategies
10.5.1. The Testing Pyramid for AI Systems
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: uninitialized constant Tools /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-svflnq.rb:445:in '<main>'
# Level 1: Tool Unit Tests (Fast, Deterministic)
RSpec.describe Tools::OrderLookup do
describe '#find_order' do
it 'returns order details for valid ID' do
result = subject.find_order(order_id: '12345')
expect(result).to match({
success: true,
data: hash_including(
order_id: '12345',
status: String,
total: Numeric
)
})
end
it 'handles non-existent orders gracefully' do
result = subject.find_order(order_id: 'INVALID')
expect(result).to match({
success: false,
error: hash_including(
message: /not found/i,
recoverable: true
)
})
end
it 'validates order ID format' do
expect {
subject.find_order(order_id: 'ABC-123')
}.to raise_error(ArgumentError, /Invalid order ID format/)
end
end
end
# Level 2: Agent Behavior Tests (Mocked AI)
RSpec.describe Agents::CustomerService do
let(:agent) { described_class.new }
let(:mock_ai) { instance_double(RAAF::Models::Provider) }
before do
allow(agent).to receive(:provider).and_return(mock_ai)
end
it 'looks up order when asked about status' do
allow(mock_ai).to receive(:complete).and_return(
double(content: 'lookup_order(order_id: "12345")')
)
result = agent.process("What's the status of order 12345?")
expect(result).to include('order has been shipped')
end
it 'escalates when unable to help' do
allow(mock_ai).to receive(:complete).and_return(
double(content: 'escalate_to_human(reason: "refund_over_limit")')
)
result = agent.process("I need a $5000 refund")
expect(result).to include('connecting you with a specialist')
end
end
# Level 3: Integration Tests (Real AI, Controlled Environment)
RSpec.describe 'Customer Service Flow', :integration do
let(:runner) { RAAF::Runner.new(agent: CustomerServiceAgent.new) }
it 'handles complete refund flow' do
VCR.use_cassette('customer_service/refund_flow') do
# Initial inquiry
result1 = runner.run("I want to return order 12345")
expect(result1.messages.last[:content]).to include('return policy')
# Confirm intent
result2 = runner.run("Yes, I want a refund")
expect(result2.messages.last[:content]).to include('processing')
# Verify completion
expect(result2.tool_calls).to include(
hash_including(name: 'process_refund')
)
end
end
end
# Level 4: Contract Tests (Provider Compatibility)
RSpec.describe 'Provider Compatibility', :contract do
providers = [
RAAF::Models::OpenAI.new,
RAAF::Models::Anthropic.new,
RAAF::Models::Groq.new
]
providers.each do |provider|
context "with #{provider.class.name}" do
it 'handles tool calls correctly' do
agent = RAAF::Agent.new(
instructions: 'You help with math',
provider: provider
)
agent.add_tool(method(:calculate))
runner = RAAF::Runner.new(agent: agent)
result = runner.run("What's 2+2?")
expect(result.success?).to be true
expect(result.messages.last[:content]).to include('4')
end
end
end
end
10.5.2. Testing Anti-Patterns to Avoid
EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance.
Error: NameError: uninitialized constant OpenAI /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:446:in 'block in <main>' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:334:in 'Object#it' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:445:in '<main>'
# DON'T: Test implementation details
it 'calls OpenAI with correct parameters' do
expect(OpenAI::Client).to receive(:new).with(api_key: 'key')
# This breaks when you switch providers
end
# DO: Test behavior
it 'responds to customer inquiries' do
result = runner.run("What's your return policy?")
expect(result.messages.last[:content]).to match(/30 days/)
end
# DON'T: Test exact AI responses
it 'returns exact message' do
expect(response).to eq("Hello! I'd be happy to help you today.")
# AI responses vary
end
# DO: Test response characteristics
it 'greets customer politely' do
expect(response).to match(/hello|hi|greetings/i)
expect(response).to match(/help|assist|support/i)
end
10.6. Production Monitoring: Because You Can't Fix What You Can't See
class ProductionMonitor
CRITICAL_METRICS = {
response_time: { threshold: 5.0, unit: :seconds },
token_usage: { threshold: 4000, unit: :tokens },
cost_per_request: { threshold: 0.50, unit: :dollars },
error_rate: { threshold: 0.02, unit: :percentage },
tool_success_rate: { threshold: 0.95, unit: :percentage }
}
def monitor_request(request_id)
start_metrics = capture_start_state
begin
result = yield
end_metrics = capture_end_state
metrics = calculate_metrics(start_metrics, end_metrics)
# Check thresholds
check_critical_thresholds(metrics)
# Store for analysis
store_metrics(request_id, metrics)
# Real-time dashboards
publish_to_dashboard(metrics)
result
rescue => e
record_error(request_id, e)
alert_on_critical_error(e)
raise
end
end
def daily_report
{
summary: {
total_requests: count_requests,
success_rate: calculate_success_rate,
average_response_time: average(:response_time),
total_cost: sum(:cost),
token_usage: percentile(:token_usage, 95)
},
trends: {
cost_trend: calculate_trend(:cost, days: 7),
performance_trend: calculate_trend(:response_time, days: 7),
error_trend: calculate_trend(:errors, days: 7)
},
anomalies: detect_anomalies,
recommendations: generate_recommendations
}
end
end
11. Troubleshooting
11.1. Common Issues
Tool not being called
- Check tool name and parameters
- Verify agent instructions mention the tool
- Ensure parameter types match
Memory growing too large
- Reduce max_tokens setting
- Enable pruning strategy
- Use compression
Slow responses
- Enable parallel tool execution
- Use faster models for simple tasks
- Implement caching
Provider errors
- Check API keys and quotas
- Implement retry logic
- Use circuit breaker pattern
11.2. Debugging
# Enable debug logging
ENV['RAAF_LOG_LEVEL'] = 'debug'
ENV['RAAF_DEBUG_CATEGORIES'] = 'api,tools,memory'
# Add debug tracer
tracer = RAAF::Tracing::SpanTracer.new
tracer.add_processor(RAAF::Tracing::ConsoleProcessor.new)
runner = RAAF::Runner.new(
agent: agent,
tracer: tracer
)
12. Next Steps
Now that you understand the core concepts:
- Tool System Guide - Deep dive into building tools
- Memory Guide - Advanced memory management
- Multi-Agent Guide - Build agent workflows
- Provider Guide - Use different AI providers
- DSL Guide - Declarative agent configuration