More at guides.raaf-ai.dev:

RAAF Core Guide

This guide covers the core concepts and architecture of Ruby AI Agents Factory (RAAF). It explains agents, runners, tools, and the fundamental patterns for building AI systems.

After reading this guide, you will know:

  • How agents work and their key capabilities
  • The agent-runner execution model
  • How to create and use tools effectively
  • Memory management and context handling
  • Structured outputs and type safety
  • Best practices for agent design

1. Introduction

RAAF Core is the foundation of the Ruby AI Agents Factory. It provides the essential classes and patterns for building AI agents. Understanding these core concepts is crucial for building effective AI systems.

1.1. The Agent-Based Architecture Philosophy

Most AI frameworks treat agents as black boxes—you provide a prompt and receive a response with little control over the process. RAAF takes a different approach by decomposing AI systems into discrete, composable components that can be understood, tested, and modified independently.

This architectural approach mirrors proven patterns from software engineering: just as web applications are built with HTTP, routing, and MVC patterns, AI systems benefit from structured patterns around agents, tools, and memory management. These patterns aren't just implementation details—they're the architectural foundations that determine system maintainability and effectiveness.

1.2. Core Architectural Concepts

Separation of Concerns: Each component handles a specific aspect of AI system operation. Agents define behavior, runners manage execution, tools provide capabilities, and memory handles context. This separation enables independent development, testing, and modification of each component.

Stateless Design: Agents are stateless templates that define behavior without maintaining conversation state. This design enables concurrent execution, simplified testing, and better scalability.

Composability: Components can be combined and recombined to create different system behaviors. The same agent can be used with different tools, different memory configurations, and different providers.

Explicit State Management: Rather than hiding state in agent internals, RAAF makes conversation state explicit and manageable through the runner component.

1.3. Core Design Principles

Orthogonality: Each component has a single, well-defined responsibility. The memory system operates independently of the tool system, providers can be swapped without affecting agents, and tool implementations can change without agent modifications. This separation of concerns enables rapid iteration and effective debugging.

Reversibility: Design decisions should be easily changeable. Provider selection, tool implementations, and memory strategies can be modified without system-wide changes. This flexibility provides insurance against changing requirements and evolving AI capabilities.

Incremental Development: Start with simple, working systems and add complexity gradually. A basic agent with one tool provides immediate value and serves as a foundation for more sophisticated capabilities. This approach reduces risk and accelerates development.

Explicit Error Handling: Problems are surfaced early and clearly through structured error responses. Tools return standardized error information, agents validate inputs, and memory systems handle context overflow gracefully. This explicit approach prevents mysterious failures and enables effective debugging.

Predictable Behavior: System behavior should be deterministic and testable. Agents behave consistently across different contexts, tools produce reliable outputs, and memory management follows predictable patterns.

1.4. Core Components and Their Interactions

Agent: The behavioral template that defines how an AI assistant should operate. Agents specify instructions, available tools, and interaction patterns, but maintain no conversation state. This stateless design enables reuse across multiple concurrent conversations.

Runner: The execution engine that manages conversation state and coordinates between agents and AI providers. Runners handle context management, tool execution, and response generation while maintaining isolation between different conversations.

Tools: The interface layer that enables agents to interact with external systems. Tools provide standardized ways for agents to access databases, APIs, file systems, and other resources. They abstract implementation details while providing reliable, testable interfaces.

Memory: The context management system that handles conversation history, token limits, and state persistence. Memory systems can employ different strategies for context retention, from simple sliding windows to sophisticated semantic retrieval.

Providers: The abstraction layer over AI model APIs. Providers normalize different AI service interfaces (OpenAI, Claude, etc.) into a consistent interface, enabling provider independence and flexible model routing.

1.5. Component Interaction Model

These components interact through well-defined interfaces that maintain separation of concerns. The runner coordinates between components without creating tight coupling, enabling independent modification and testing. This architecture provides practical benefits: you can modify memory strategies without affecting tool implementations, swap providers without changing agent definitions, and add new tools without touching core execution logic.

2. Agent Architecture

2.1. Conceptual Foundation

An agent in RAAF represents a behavioral template for AI assistants—a specification of how an AI should behave, what tools it can use, and how it should interact with users. The fundamental insight is that agents are stateless templates, not stateful conversational entities.

2.2. The Template vs. Instance Model

Traditional AI systems often conflate the definition of behavior (what the AI should do) with the execution of that behavior (tracking a specific conversation). RAAF separates these concerns:

  • Agent: Defines consistent behavior patterns, available tools, and interaction rules (stateless by design)
  • Runner Instance: Manages specific conversation state and coordinates template execution

This separation enables one agent to power thousands of concurrent conversations while maintaining complete isolation between them.

2.3. Benefits of Stateless Design

Concurrent Execution: Multiple conversations can use the same agent simultaneously without interference or resource conflicts.

Predictable Testing: Each test execution starts with a clean behavioral template, eliminating test failures caused by residual state from previous executions.

Simplified Debugging: Problems are reproducible because conversation state is explicitly managed by the runner component rather than hidden within agent internals.

Flexible Composition: Agents can be combined with different tools, memory configurations, and providers without concern for internal state conflicts.

Horizontal Scaling: Agents can be distributed across multiple servers since they contain no conversation-specific state.

2.4. Architectural Implications

This stateless design has profound implications for system architecture. It enables microservice deployment patterns, simplifies load balancing, and eliminates the complex state synchronization typically required in distributed AI systems. The agent becomes a pure functional specification that can be cached, replicated, and executed anywhere without coordination overhead.

2.5. Understanding Statelessness

The stateless nature of agents means they contain no conversation-specific information. They define capabilities and behavior patterns but maintain no memory of previous interactions. This is analogous to a class definition in object-oriented programming—the class defines behavior, but instances maintain state.

What Agents Contain: - Behavioral instructions and personality - Available tool definitions and configurations - Response formatting and validation rules - Error handling and escalation patterns

What Agents Don't Contain: - Conversation history or context - User-specific information or preferences - Session state or temporary data - Execution metrics or performance data

2.6. Practical Implications

This architectural choice enables practical benefits at scale. A single agent can serve thousands of concurrent users without resource conflicts or cross-contamination. Customer service departments can use one agent definition across all representatives, ensuring consistent behavior while maintaining conversation isolation.

The stateless design also simplifies deployment and maintenance. Agents can be updated and deployed independently of running conversations, enabling continuous improvement without service interruption.

2.7. Agent Creation

Creating an agent involves defining three fundamental components: identity, behavior, and capabilities.

agent = RAAF::Agent.new(
  name: "DataAnalyst",
  instructions: "You are a data analyst...",
  model: "gpt-4o"
)

2.8. Essential Agent Components

Identity (Name): Provides a unique identifier for the agent, crucial for multi-agent scenarios, tracing, and debugging. The name serves as a reference point for handoffs, monitoring, and system coordination.

Behavior (Instructions): Defines the agent's personality, expertise domain, and interaction patterns. Instructions shape how the agent interprets queries, formats responses, and handles various scenarios. Well-crafted instructions create consistent, predictable behavior across different contexts.

Capabilities (Model): Specifies the underlying AI model and its capabilities. Different models have different strengths, costs, and performance characteristics. The model choice affects response quality, speed, and operational costs.

2.9. Template Instantiation Philosophy

The agent serves as a behavioral specification that can be instantiated multiple times through different runners. Each instantiation maintains the same behavioral patterns while operating on different conversation contexts. This design enables:

  • Consistent Behavior: All instances follow the same behavioral template
  • Independent Execution: Each instance operates without affecting others
  • Resource Efficiency: Templates can be shared across multiple conversations
  • Simplified Management: Behavioral updates affect all instances uniformly

2.10. Design Considerations

Agents should be designed with reusability in mind. Instructions should be general enough to handle various scenarios within the domain while specific enough to provide consistent behavior. The agent becomes a behavioral contract that defines how the AI assistant will operate across different contexts.

Designing for Reusability: A well-designed agent can serve multiple use cases without modification. Consider a customer service agent - rather than creating separate agents for order inquiries, shipping questions, and return requests, design one agent with comprehensive instructions that covers all customer service scenarios. This approach reduces maintenance overhead and ensures consistent customer experience across different interaction types.

The key to reusable agent design lies in abstracting the core purpose while allowing flexibility in execution. Instructions should define the agent's expertise domain and behavioral guidelines without hardcoding specific responses or workflows. Tools provide the specific capabilities, context variables supply the runtime data, and the model selection determines the sophistication level - but the agent itself remains a stable, reusable component.

This reusability extends across different deployment contexts. The same agent definition can power a web chat interface, a mobile app, an API endpoint, or even voice interactions. By keeping agents stateless and context-agnostic, you create building blocks that compose into larger systems without modification.

2.11. Core Agent Properties

2.11.1. Name: System Identity

The agent name serves multiple architectural purposes beyond simple identification:

Multi-Agent Coordination: In systems with multiple agents, names enable handoffs, delegation, and coordination. Agents can reference each other by name for workflow management.

Tracing and Monitoring: Names provide context for logging, debugging, and performance monitoring. System administrators can track agent-specific metrics and behavior patterns.

Error Handling: Names help identify the source of issues in complex systems with multiple agents operating simultaneously.

name: "CustomerService"  # Used for handoffs and tracing
  instructions: "...",
  model: "gpt-4o"
)

The name parameter provides a unique identifier for the agent that enables routing, monitoring, and debugging. The name becomes part of the agent's identity in multi-agent systems, and it's more than just a label—it's an architectural decision that affects how your system operates.

Names enable handoffs between agents. When your customer service agent realizes it needs legal expertise, it can transfer the conversation to a "LegalExpert" agent by name. This isn't just labeling—it's the foundation of agent coordination. Names also appear in your monitoring dashboards, error logs, and tracing systems. "Agent1" tells you nothing when debugging a production issue; "DatabaseQueryOptimizer" tells you exactly what went wrong and where to look. Good names reduce mean time to resolution.

Names matter because they enable handoffs. When your customer service agent realizes it needs legal expertise, it can transfer the conversation to a "LegalExpert" agent by name. Names also appear in your monitoring and debugging output, so choose them carefully. "Agent1" tells you nothing; "DatabaseQueryOptimizer" tells you exactly what went wrong when it fails.

A good naming convention might be: {Domain}{Role} like "FinanceAnalyst", "ContentModerator", or "DataValidator". Be specific enough that your future self will understand the agent's purpose immediately.

2.11.2. Instructions

Instructions define the agent's behavior and personality. This is where you encode your domain expertise into the AI:

2.11.2.1. Effective Instruction Writing

Instructions are the primary way to encode domain knowledge into AI agents. Vague instructions lead to inconsistent behavior and poor user experiences.

The difference between effective and ineffective instructions is specificity and context:

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. ruby: /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-wxnxgq.rb:452: syntax errors found (SyntaxError) 450 | agent = RAAF::Agent.new( 451 | name: "CustomerServicePro", > 452 | instructions: <<~INSTRUCTIONS | ^~~~~~~~~~~~ unterminated heredoc; can't find string "INSTRUCTIONS" anywhere before EOF 453 | You are a senior customer service representative for TechCorp. 454 | ~~~~~~~ 490 | puts e.backtrace.first(3).join("\n") if e.backtrace 491 | exit 1 > 492 | end | ^ unexpected end-of-input, assuming it is closing the parent top level context | ^ unexpected end-of-input; expected a `)` to close the arguments > 493 | | ^ expected an `end` to close the `begin` statement

# Before: Vague and ineffective
agent = RAAF::Agent.new(
  instructions: "Be a helpful customer service agent"
)

# After: Specific and effective
agent = RAAF::Agent.new(
  name: "CustomerServicePro",
  instructions: <<~INSTRUCTIONS
    You are a senior customer service representative for TechCorp.

    Your personality:

    - Professional but warm (like a helpful colleague, not a robot)
    - Patient with frustrated customers
    - Solution-oriented rather than problem-focused

    Your approach:

    1. Acknowledge the customer's concern first
    2. Ask clarifying questions if needed
    3. Provide specific solutions, not generic advice
    4. Always offer a next step

    Example good response:
    "I understand how frustrating that must be. Let me help you resolve this. 
    Can you tell me which version of the software you're using?"

    Example bad response:
    "Have you tried turning it off and on again?"

    Never:

    - Blame the customer
    - Say "that's not possible" (say "let me find a solution" instead)
    - End without offering next steps
  INSTRUCTIONS,
  model: "gpt-4o"
)

Specific instructions with examples and constraints produce more consistent and effective agent behavior.

2.11.2.2. What Makes Instructions Effective?

Role Definition: Don't just say what they are, say what level they are. "You are a customer service agent" vs. "You are a senior customer service representative with 10 years of experience"

Behavioral Examples: Show, don't just tell. Include examples of good and bad responses. AI models learn patterns from examples better than from abstract rules.

Specific Constraints: Instead of "be professional," say "use formal language but include personal touches like 'I understand' and 'Let me help'"

Domain Context: Include relevant background. If it's a technical product, mention that. If customers are typically frustrated, acknowledge that.

Test your instructions by asking the agent to explain its understanding of its role. If the explanation doesn't match your intent, refine the instructions.

2.11.3. Model

The AI model to use. RAAF supports many providers, and this choice has real-world implications for your application:

# OpenAI models
agent = RAAF::Agent.new(model: "gpt-4o")        # Premium: high quality, higher cost
agent = RAAF::Agent.new(model: "gpt-4o-mini")   # Balanced: good quality, lower cost

# Anthropic models
agent = RAAF::Agent.new(model: "claude-3-5-sonnet-20241022")  # Excellent reasoning

# Groq models
agent = RAAF::Agent.new(model: "mixtral-8x7b-32768")  # Fast inference, good for simple tasks

# LiteLLM universal access
agent = RAAF::Agent.new(model: "bedrock/claude-3")  # Access to AWS Bedrock models

The model choice isn't just about capabilities—it's about trade-offs. GPT-4o gives you the best reasoning but costs more per token. GPT-4o-mini is fast and cheap but might struggle with complex logic. LiteLLM provides access to diverse providers and models through a universal interface.

Here's a practical approach: start with a premium model like GPT-4o during development. Once you understand your requirements, you can optimize by using cheaper models for simple tasks and reserving premium models for complex reasoning. RAAF's provider abstraction makes this kind of optimization trivial to implement.

2.11.4. Additional Configuration

agent = RAAF::Agent.new(
  name: "Assistant",
  instructions: "You are helpful",
  model: "gpt-4o",
  temperature: 0.7,        # Control randomness
  max_tokens: 1000,        # Limit response length
  top_p: 0.9              # Nucleus sampling
)

These parameters control fine-grained aspects of response generation that affect creativity, length, and consistency. Temperature controls the randomness of responses, where 0.0 produces deterministic output and 1.0 creates very creative responses. For customer service, you might want 0.3 for consistent, factual responses. For creative writing, 0.8 might be better. Max_tokens prevents runaway responses that could exhaust your token budget.

Start with defaults like temperature: 0.7 and max_tokens: 1000, then adjust based on your use case. Use lower temperature for factual tasks, higher for creative work. Set max_tokens based on your typical response length needs plus a buffer for complex queries.

3. The Runner Execution Model

The Runner is responsible for executing agents and managing conversation state. It coordinates between agents, tools, memory, and monitoring systems. But think of it as more than just a coordinator—it's the operating system for your AI interactions.

3.1. Why the Runner Pattern Is Necessary

Direct API calls to AI models create several problems in production systems:

Memory management: Each request starts fresh, losing conversation context and requiring users to repeat information.

Tool execution: Without coordination, tools can call each other recursively, creating infinite loops and unexpected costs.

Error handling: Failed requests have no recovery mechanism, leading to poor user experience.

State management: Concurrent requests can interfere with each other when sharing context improperly.

Monitoring: No visibility into what's happening during AI interactions makes debugging difficult.

The Runner pattern addresses these issues by providing a stateful execution environment that manages the complexities of AI interactions.

3.2. What the Runner Actually Does

The runner is your AI system's operating system. Here's what it manages:

  1. Conversation State: Maintains context across multiple turns
  2. Tool Orchestration: Ensures tools are called safely and efficiently
  3. Memory Management: Handles context windows and conversation history
  4. Error Recovery: Retries failed requests, handles provider outages
  5. Monitoring Integration: Tracks every interaction for debugging
  6. Multi-Agent Coordination: Manages handoffs between specialists

Think of it this way: The agent is the "what" (what should the AI do?), the runner is the "how" (how do we make it happen reliably?).

3.3. Basic Usage

agent = RAAF::Agent.new(
  name: "Assistant",
  instructions: "You are helpful",
  model: "gpt-4o"
)

runner = RAAF::Runner.new(agent: agent)
result = runner.run("Hello!")

puts result.messages.last[:content]

This code creates a complete AI conversation system with just four lines of code. The agent defines the behavior and capabilities, the runner manages the execution and state, and the result provides structured access to everything that happened during the conversation.

The runner serves as the stateful coordinator that bridges your stateless agent with the realities of AI provider APIs. It handles message formatting, conversation context, error recovery, and result processing—all the messy details that would otherwise clutter your application code. The runner encapsulates the complexity of AI interactions while exposing a simple interface. You call run() with a message and get a structured result, while behind the scenes the runner handles provider-specific formatting, context management, and error recovery.

This simple example hides a lot of complexity. Behind the scenes, the runner is:

  1. Formatting the message for the specific AI provider
  2. Managing the conversation context (system message, user message, etc.)
  3. Handling the API call with proper error handling and retries
  4. Processing the response and updating the conversation state
  5. Returning a structured result that your application can work with

The beauty of this abstraction is that all this complexity is hidden. You just call run() and get a result. But the abstraction is also reversible—you can dig deeper when you need to.

3.4. Runner Configuration

runner = RAAF::Runner.new(
  agent: agent,
  max_turns: 10,           # Limit conversation turns
  stream: false,           # Enable/disable streaming
  parallel: true           # Enable parallel tool execution
)

These configurations control runtime behavior that affects performance, user experience, and resource usage. Max_turns prevents infinite loops in multi-turn conversations. If an agent gets stuck in a reasoning loop or keeps asking clarifying questions, max_turns provides a safety net. It's better to gracefully terminate than to exhaust your token budget.

When streaming is enabled, it sends partial responses as they're generated, improving perceived performance. Users see progress immediately rather than waiting for complete responses. This is crucial for long-form content or complex reasoning tasks. Parallel execution enables multiple tool calls to run simultaneously, reducing total response time. If an agent needs to check both weather and tRAAFic, parallel execution runs both calls concurrently rather than sequentially.

3.5. Result Object

The runner returns a RAAF::Result object containing:

result = runner.run("Hello!")

# Access messages
result.messages.each do |message|
  puts "#{message[:role]}: #{message[:content]}"
end

# Check completion status
puts "Success: #{result.success?}"
puts "Agent: #{result.agent.name}"

# Usage statistics
puts "Tokens used: #{result.usage}"

The result object provides structured access to conversation outcomes, including messages, status, and usage metrics. It's your window into what happened during the AI interaction.

The messages array contains the complete conversation history, including system messages, user input, and AI responses. This is crucial for debugging, auditing, and understanding conversation flow. Token usage directly correlates to cost. By monitoring usage patterns, you can optimize prompts, identify expensive operations, and implement cost controls. This data is essential for production deployments where costs matter.

The success flag indicates whether the conversation completed normally or encountered errors. Failed conversations might indicate model issues, rate limiting, or malformed requests that need handling.

3.6. Multiple Turns

For multi-turn conversations, the runner maintains context:

runner = RAAF::Runner.new(agent: agent)

# Turn 1
result1 = runner.run("My name is Alice")
puts result1.messages.last[:content]

# Turn 2 - agent remembers Alice's name
result2 = runner.run("What's my name?")
puts result2.messages.last[:content]

This demonstrates persistent conversation memory across multiple interactions. The runner maintains context between turns, enabling natural conversation flow. Without memory, every interaction would be independent, making the AI feel robotic and frustrating. Context persistence enables the AI to reference previous information, build on earlier responses, and maintain conversational coherence.

Each call to run() adds to the conversation history stored in the runner. The runner automatically includes this history in subsequent API calls, so the AI model has access to the complete conversation context. Long conversations consume more tokens and eventually hit context limits. RAAF provides memory management strategies to handle this gracefully, but understanding the trade-offs is important for production applications.

4. Tool System

Tools enable agents to interact with external systems, databases, APIs, and perform computations. But here's the key insight: tools are how you bridge the gap between AI reasoning and real-world action.

4.1. Why Tools Are Essential for AI Agents

Without tools, AI agents can only provide information and advice. They cannot take actions, access real-time data, or integrate with existing systems.

Before tools: Agents can discuss solutions but cannot implement them. Users must take actions manually based on the agent's recommendations.

With tools: Agents can look up information, perform calculations, update databases, send notifications, and interact with external APIs.

This capability gap is the difference between a chatbot and a digital assistant. Tools enable agents to complete tasks rather than just discussing them.

4.2. What Makes a Good Tool?

Focused Purpose: Each tool does one thing well

  • Bad: manage_customer() that does everything
  • Good: lookup_order(), cancel_order(), process_refund()

Safe by Design: Tools validate inputs and limit actions

  • Never expose raw database access
  • Always validate permissions
  • Return structured errors, not stack traces

Self-Documenting: The AI needs to understand when and how to use it

  • Clear parameter names
  • Descriptive return values
  • Helpful error messages

4.3. How Tools Become Your Safety Net

Tools aren't just about functionality—they're about control. Instead of giving an AI model unlimited access to your systems, you create specific, controlled interfaces. It's like the difference between giving someone your credit card versus giving them a gift card to a specific store.

This is defensive programming at its finest. Each tool is a controlled entry point that:

  • Validates inputs before taking action
  • Enforces business rules and permissions
  • Provides audit trails of what was done
  • Limits the blast radius of potential mistakes

4.4. Creating Tools

4.4.1. Method-Based Tools

The simplest way to create tools is from Ruby methods. This approach leverages Ruby's strengths—you're just writing normal methods that happen to be callable by AI:

def get_weather(location)
  # Simulate weather API call
  "Weather in #{location}: 72°F, sunny"
end

agent = RAAF::Agent.new(
  name: "WeatherBot",
  instructions: "Help users get weather information",
  model: "gpt-4o"
)

agent.add_tool(method(:get_weather))

This code demonstrates the most straightforward way to give an AI agent the ability to retrieve weather information. The get_weather method is just a regular Ruby method that takes a location parameter and returns a weather description. The magic happens when you call agent.add_tool(method(:get_weather)) – this converts your Ruby method into something the AI can understand and call.

Behind the scenes, RAAF examines your method signature and automatically generates a JSON schema that describes the tool to the AI model. When the AI decides to use this tool, RAAF handles the parameter serialization, method invocation, and result formatting. From your perspective, you're just writing normal Ruby code, but the AI sees it as a structured tool with clear inputs and outputs.

The beauty of this approach is testability. Since your tool is just a Ruby method, you can test it independently of the AI system. Want to verify that your weather lookup works? Just call get_weather("New York") in your test suite. No mocking, no complex setup—just regular Ruby methods.

This simplicity is deceptive. Behind the scenes, RAAF is introspecting your method signature, generating JSON schemas for the AI model, and handling the serialization of parameters and results. You write Ruby code; RAAF handles the AI integration.

The method-based approach also means you can test your tools independently of the AI system. Want to verify that your weather lookup works? Just call get_weather("New York") in your test suite. No mocking, no complex setup—just regular Ruby methods.

4.4.2. Keyword Arguments

Tools with complex parameters use keyword arguments:

def send_email(to:, subject:, body:, priority: 'normal')
  # Send email logic
  {
    status: 'sent',
    message_id: 'msg_123',
    timestamp: Time.now
  }
end

agent.add_tool(method(:send_email))

# Agent can now call:
# send_email(to: "[email protected]", subject: "Hello", body: "Hi there!")

This example shows how to create tools with more complex parameter requirements. The send_email method uses keyword arguments to clearly define what information is required and what has sensible defaults. The AI model will understand that to, subject, and body are required parameters, while priority is optional and defaults to 'normal' if not specified.

The method returns a structured hash with status information, which the AI can use to understand whether the operation succeeded and potentially communicate results back to the user. This structured approach to tool responses is crucial for building reliable AI workflows – the AI needs to know not just that something happened, but whether it succeeded and what the results were.

4.4.3. Class-Based Tools

For more complex tools, use classes:

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-h1d0p5.rb:460:in '<main>'

class DatabaseTool
  def initialize(connection)
    @db = connection
  end

  def call(query:, limit: 100)
    results = @db.execute(query).limit(limit)
    {
      rows: results.to_a,
      count: results.count,
      execution_time: @db.last_execution_time
    }
  end
end

db_tool = DatabaseTool.new(database_connection)
agent.add_tool(db_tool.method(:call))

When your tools need to maintain state or require complex initialization, class-based tools provide a clean solution. This database tool example shows how to encapsulate database connection management within the tool class itself. The constructor takes a database connection, and the call method provides a safe interface for executing queries with built-in result limiting.

The structured return value includes not just the query results, but also metadata like row count and execution time. This additional information helps the AI understand the scope and performance characteristics of the query, which can be useful for generating appropriate responses to users.

4.4.4. Lambda Tools

For inline tools:

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-lwfnv1.rb:448:in '<main>'

get_time = ->(timezone: 'UTC') do
  Time.now.in_time_zone(timezone).strftime("%I:%M %p %Z")
end

agent.add_tool(get_time)

For simple tools that don't warrant a full method definition, lambda expressions provide a concise alternative. This time tool uses a lambda to create a quick timezone-aware time formatter. The lambda accepts a timezone parameter with a sensible default, and returns a formatted time string.

Lambda tools are particularly useful for simple transformations or calculations that you want to expose to the AI without the overhead of defining a full method. They're perfect for utility functions that you might otherwise inline in your code.

4.5. Tool Parameters and Documentation

RAAF automatically extracts parameter information from method signatures:

def book_flight(
  origin:,           # Required parameter
  destination:,      # Required parameter  
  departure_date:,   # Required parameter
  passengers: 1,     # Optional with default
  class_type: 'economy'  # Optional with default
)
  # Implementation
end

This method signature demonstrates how RAAF automatically understands parameter requirements from your Ruby code. Parameters without default values become required in the AI's understanding of the tool, while parameters with defaults become optional. The AI model will know that it must provide origin, destination, and departure_date, but can optionally specify passengers and class_type.

This automatic parameter extraction eliminates the need for separate API documentation – your method signature becomes the tool's specification. The AI understands not just what parameters are available, but which ones are required and what the default values are for optional parameters. end

agent.add_tool(method(:book_flight)) ```

For better documentation, use comments:

# Books a flight for the specified passengers
# @param origin [String] Departure airport code (e.g., 'LAX')
# @param destination [String] Arrival airport code (e.g., 'JFK')
# @param departure_date [String] Departure date in YYYY-MM-DD format
# @param passengers [Integer] Number of passengers (default: 1)
# @param class_type [String] Seat class: 'economy', 'business', or 'first'
# @return [Hash] Flight booking confirmation details
def book_flight(origin:, destination:, departure_date:, passengers: 1, class_type: 'economy')
  # Implementation
end

4.6. Tool Error Handling

Tools should handle errors gracefully:

def get_stock_price(symbol:)
  begin
    api = StockAPI.new
    price_data = api.get_current_price(symbol.upcase)

    {
      symbol: symbol.upcase,
      price: price_data[:current_price],
      currency: price_data[:currency],
      last_updated: price_data[:timestamp]
    }
  rescue StockAPI::InvalidSymbolError
    { error: "Invalid stock symbol: #{symbol}" }
  rescue StockAPI::APIError => e
    { error: "Unable to fetch stock data: #{e.message}" }
  rescue => e
    { error: "Unexpected error: #{e.message}" }
  end
end

agent.add_tool(method(:get_stock_price))

4.7. Async Tools

For long-running operations:

def analyze_large_dataset(dataset_url:)
  # Start background job
  job_id = DataAnalysisJob.perform_async(dataset_url)

  {
    job_id: job_id,
    status: 'started',
    estimated_completion: 5.minutes.from_now,
    status_url: "/api/jobs/#{job_id}/status"
  }
end

def check_job_status(job_id:)
  job = DataAnalysisJob.find(job_id)

  {
    job_id: job_id,
    status: job.status,
    progress: job.progress_percentage,
    result: job.completed? ? job.result : nil,
    error: job.failed? ? job.error_message : nil
  }
end

agent.add_tool(method(:analyze_large_dataset))
agent.add_tool(method(:check_job_status))

5. Memory Management

5.1. Why Memory Management Is Critical

Memory management becomes essential when conversations exceed model context windows. Without proper memory handling, AI agents lose access to conversation history, leading to:

Context loss: The agent forgets previous interactions and asks users to repeat information.

Inconsistent responses: Without full context, the agent may contradict earlier statements.

Poor user experience: Users expect agents to remember what they've discussed throughout the conversation.

Degraded performance: The agent cannot build on previous context to provide relevant responses.

Context window limits are hard constraints that require proactive management to maintain conversation quality.

5.2. What Is Context Window Management?

Think of the context window like a whiteboard during a meeting:

  • Fixed Size: You can only fit so much on the board
  • Everything Counts: Every word takes up space
  • Old Stuff Must Go: To add new info, you need to erase something

But unlike a whiteboard, you can't just erase random parts. You need intelligent strategies:

  1. Sliding Window: Keep the most recent N messages
  2. Importance-Based: Keep critical information, drop small talk
  3. Summarization: Compress old conversations into summaries
  4. Semantic Selection: Use embeddings to keep relevant context

5.3. How RAAF Handles Memory

RAAF provides sophisticated memory management to handle conversation context and long-running interactions. But memory management in AI systems is fundamentally different from traditional applications.

In a typical web application, you might cache database queries or memoize expensive computations. In AI systems, you're managing context—the conversation history that determines what the AI knows and remembers about the current interaction.

Context is a finite resource. Every AI model has a maximum context window—the amount of conversation history it can process at once. As conversations get longer, you hit this limit. The naive approach is to just truncate old messages, but that loses valuable context. The smart approach is to manage memory strategically.

Think of it like managing RAM in a computer: you keep frequently accessed data in fast storage, archive less important data, and have strategies for what to do when you're running out of space.

5.4. The Context Window Problem

Every AI model has a maximum context window. GPT-4o can handle about 128K tokens, which sounds like a lot until you realize that a typical conversation with a few tool calls can easily consume 10K tokens. A day-long customer service conversation might need 100K tokens or more.

When you hit the context limit, you have several options:

  1. Truncate ruthlessly: Drop old messages (loses context)
  2. Summarize intelligently: Compress old messages into summaries (preserves key information)
  3. Select strategically: Keep only the most relevant messages (maintains focus)

RAAF supports all these strategies, and you can combine them based on your needs.

5.5. Memory Stores: Where Your AI's Memories Live

Here's a question that cost us $12,000 to answer: "Where should we store conversation history?"

We started simple—just kept everything in memory. Then our server restarted and wiped out 10,000 active customer conversations. The support tickets were... memorable.

5.5.1. The Three Memory Store Personalities

1. In-Memory Store: The Speed Demon

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': cannot load such file -- raaf (LoadError) from <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require' from /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-yledm2.rb:444:in '<main>'

require 'raaf'

# Fast but forgetful
memory_manager = RAAF::Memory::MemoryManager.new(
  store: RAAF::Memory::InMemoryStore.new(
    capacity: 10_000,  # Max conversations
    ttl: 3600          # Expire after 1 hour
  ),
  max_tokens: 4000
)

runner = RAAF::Runner.new(
  agent: agent,
  memory_manager: memory_manager
)

Real-world use cases:

  • Demo environments (who cares if it forgets?)
  • Stateless chat widgets (each page refresh = new conversation)
  • High-frequency trading bots (speed > persistence)
  • Development and testing (restart often)

The brutal truth: In-memory is seductive because it's so fast. But the first time you lose a critical conversation, you'll wish you'd chosen differently.

2. File Store: The Reliable Workhorse

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NoMethodError: undefined method 'join' for an instance of String /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-xmyeo0.rb:475:in '<main>'

# The improved version we use in production
class ProductionFileStore < RAAF::Memory::FileStore
  def initialize(config)
    super(
      directory: config[:base_dir],
      # Organize by date for easier management
      path_pattern: '%{year}/%{month}/%{day}/%{session_id}.json',
      # Compress old conversations
      compression: :gzip,
      # Encrypt sensitive data
      encryption: {
        algorithm: 'AES-256-GCM',
        key: ENV['CONVERSATION_ENCRYPTION_KEY']
      }
    )
  end

  def store_message(session_id, message)
    # Add metadata for debugging
    message[:stored_at] = Time.now
    message[:server_version] = RAAF::VERSION

    super(session_id, message)

    # Async backup to S3
    BackupWorker.perform_async(session_id) if production?
  end
end

memory_manager = RAAF::Memory::MemoryManager.new(
  store: ProductionFileStore.new(
    base_dir: Rails.root.join('storage/conversations'),
    retention_days: 90  # GDPR compliance
  ),
  max_tokens: 4000
)

Why file storage isn't as simple as it seems:

  • File locking: Two processes writing = corrupted data
  • Directory limits: Some filesystems choke at 10k files per directory
  • Search performance: Finding a conversation from last month? Good luck
  • Backup complexity: Copying millions of small files is painful

3. Vector Store: The Smart Archive

# This is where things get interesting
class IntelligentVectorStore < RAAF::Memory::VectorStore
  def initialize
    super(
      # Use OpenAI for embeddings
      embedding_model: 'text-embedding-3-small',

      # PostgreSQL with pgvector for production
      database: {
        adapter: 'postgresql',
        extension: 'vector',
        pool: 25
      },

      # Automatic clustering for similar conversations
      clustering: {
        enabled: true,
        algorithm: 'hnsw',
        min_cluster_size: 5
      }
    )
  end

  def find_similar_conversations(query, limit: 5)
    embedding = generate_embedding(query)

    # Find semantically similar past conversations
    results = database.execute(<<-SQL)
      SELECT session_id, message, 
             1 - (embedding <=> '#{embedding}') as similarity
      FROM conversation_messages
      WHERE embedding <=> '#{embedding}' < 0.3
      ORDER BY embedding <=> '#{embedding}'
      LIMIT #{limit}
    SQL

    # Use similar conversations as context
    results.map do |row|
      {
        session_id: row['session_id'],
        message: row['message'],
        similarity: row['similarity'],
        learned_context: extract_key_facts(row['message'])
      }
    end
  end
end

# Now your AI can say: "I see you had a similar issue last month..."
memory_manager = RAAF::Memory::MemoryManager.new(
  store: IntelligentVectorStore.new,
  max_tokens: 8000,

  # Hybrid approach: recent + relevant
  retrieval_strategy: :hybrid,
  recent_messages: 10,
  relevant_messages: 5
)

Vector store superpowers:

  • Find all conversations about "refunds" even if they never used that word
  • Identify patterns across thousands of support tickets
  • Build knowledge from past conversations
  • "This customer asked about X before, here's what worked"

The hidden costs:

  • Embedding generation: ~$0.0001 per message (adds up fast)
  • Database requirements: PostgreSQL with pgvector or similar
  • Query complexity: Vector math isn't intuitive
  • Index maintenance: Rebuilding can take hours

5.5.2. Choosing Your Memory Store: A Decision Framework

# Here's how we help clients choose
def recommend_memory_store(requirements)
  case
  when requirements[:conversations_per_day] < 100 && 
       requirements[:persistence_needed] == false
    :in_memory

  when requirements[:compliance_requirements].any? ||
       requirements[:audit_trail_needed]
    :file_store  # Easier to audit and backup

  when requirements[:conversations_per_day] > 10_000 ||
       requirements[:semantic_search_needed]
    :vector_store

  else
    :file_store  # Safe default
  end
end

5.6. Memory Strategies: The Art of Forgetting Intelligently

Context window management requires strategic decisions about what information to preserve and what to discard.

In long conversations, important details can be lost when context windows overflow. This can lead to inconsistent or inappropriate responses based on incomplete information.

Effective memory management requires strategic selection of what information to retain and what to discard.

5.6.1. Memory Management Strategies

1. Sliding Window: The Goldfish Memory

# The naive approach we all start with
memory_manager = RAAF::Memory::MemoryManager.new(
  store: store,
  max_tokens: 4000,
  pruning_strategy: :sliding_window,

  # Keep at least this many recent messages
  min_messages_to_keep: 4,

  # Always preserve system message
  preserve_messages: [:system, :first_user]
)

# What actually happens in memory:
# [System] You are a helpful assistant...
# [User] Help me plan a wedding (removed)
# [AI] I'd be happy to help! (removed)
# [User] Budget is $50k (removed)
# [AI] Great! Let's start... (removed)
# [User] Actually make it $30k  <- Kept
# [AI] No problem, adjusting... <- Kept
# [User] What about flowers?     <- Kept
# [AI] For a $30k budget...      <- Kept

When it works brilliantly:

  • Customer service ("Reset my password")
  • Simple Q&A ("What's your return policy?")
  • Transactional interactions
  • Demo environments

When it fails spectacularly:

  • Complex troubleshooting (forgets the original problem)
  • Multi-step processes (loses critical early decisions)
  • Relationship building (forgets personal details)

The $50,000 lesson: A sales bot using sliding window forgot the customer mentioned their $50k budget early in the conversation. It kept pushing enterprise plans. Customer got frustrated and left.

2. Summarization: The Meeting Minutes

# The sophisticated approach that actually works
class ProductionSummarizer
  def initialize
    @memory_manager = RAAF::Memory::MemoryManager.new(
      store: store,
      max_tokens: 4000,
      pruning_strategy: :summarization,

      # Summarize when we hit 80% capacity
      summarization_threshold: 0.8,

      # Use fast, cheap model for summaries
      summary_model: 'gpt-4o-mini',

      # But use smart prompts
      summary_prompt: <<~PROMPT
        Summarize this conversation, preserving:

        1. Key decisions made
        2. Important numbers (budgets, dates, quantities)
        3. User preferences and constraints
        4. Current task status
        5. Unresolved questions

        Format: Brief bullet points
        Max length: 500 tokens
      PROMPT
    )
  end

  def smart_summarize(messages)
    # Group messages into logical chunks
    chunks = group_by_topic(messages)

    # Summarize each chunk separately
    summaries = chunks.map do |chunk|
      {
        topic: chunk[:topic],
        summary: summarize_chunk(chunk[:messages]),
        key_facts: extract_facts(chunk[:messages])
      }
    end

    # Combine into structured summary
    {
      conversation_summary: combine_summaries(summaries),
      key_facts: summaries.flat_map { |s| s[:key_facts] },
      open_topics: extract_open_questions(messages)
    }
  end
end

Real success story: A legal assistant bot helping with contract review. 8-hour conversation, 200+ pages discussed. The summarization strategy preserved:

  • Every clause the user flagged as concerning
  • All agreed changes
  • Reasoning for each decision
  • Outstanding questions for legal counsel

The partner was amazed the bot remembered a subtle point from hour 2 that became critical in hour 7.

The gotchas nobody mentions:

  • Summaries can hallucinate ("User agreed to X" when they didn't)
  • Critical details get lost ("around $10k" becomes "$10k" exactly)
  • Emotional context disappears (frustrated user seems neutral)
  • Costs add up (summarizing every N messages)

3. Semantic Similarity: The Mind Reader

# The approach that feels like magic when it works
class SemanticMemoryManager
  def initialize
    @memory = RAAF::Memory::MemoryManager.new(
      store: RAAF::Memory::VectorStore.new,
      max_tokens: 4000,
      pruning_strategy: :semantic_similarity
    )

    # Track topic evolution
    @topic_embeddings = []
  end

  def select_relevant_context(current_query)
    # Get embedding for current query
    query_embedding = embed(current_query)

    # Find messages similar to current topic
    relevant_messages = @memory.search(
      embedding: query_embedding,
      threshold: 0.7,
      limit: 20
    )

    # But also keep temporal context
    recent_messages = @memory.recent(5)

    # And critical markers
    important_messages = @memory.tagged(:important)

    # Intelligently merge
    merge_contexts(
      relevant_messages,
      recent_messages,
      important_messages
    )
  end

  def tag_important_messages(message)
    # Auto-tag messages with key information
    tags = []
    tags << :budget if message =~ /\$\d+|budget|cost/i
    tags << :deadline if message =~ /by|before|deadline|due/i
    tags << :decision if message =~ /decided|agreed|confirmed/i
    tags << :problem if message =~ /issue|problem|broken|error/i

    @memory.tag_message(message, tags) if tags.any?
  end
end

Where semantic similarity shines:

  • Technical support ("Remember when we discussed the API error?")
  • Research assistance (pulls relevant info from earlier)
  • Creative projects (maintains thematic consistency)
  • Medical consultations (surfaces related symptoms)

The unexpected failure mode: A customer support bot kept pulling irrelevant messages because "refund" and "fund" were semantically similar. Customer asked about mutual funds, bot kept offering refunds.

4. Hybrid Strategy: The Production Reality

# What we actually use in production
class HybridMemoryStrategy
  def initialize
    @strategies = {
      recent: SlidingWindow.new(last_n: 10),
      important: ImportanceScorer.new(threshold: 0.8),
      semantic: SemanticMatcher.new(top_k: 5),
      summary: ChunkSummarizer.new(chunk_size: 50)
    }
  end

  def build_context(current_input, conversation_history)
    context_budget = 4000  # tokens
    selected_messages = []

    # Layer 1: Always include system prompt (200 tokens)
    selected_messages << conversation_history[:system]
    context_budget -= 200

    # Layer 2: Recent messages (up to 1000 tokens)
    recent = @strategies[:recent].select(conversation_history)
    recent_tokens = count_tokens(recent)
    if recent_tokens <= 1000
      selected_messages.concat(recent)
      context_budget -= recent_tokens
    else
      # Summarize older recent messages
      summary = @strategies[:summary].summarize(recent[0...-5])
      selected_messages << summary
      selected_messages.concat(recent[-5..-1])
      context_budget -= count_tokens(summary) + count_tokens(recent[-5..-1])
    end

    # Layer 3: Semantically relevant (up to 1500 tokens)
    relevant = @strategies[:semantic].find_relevant(
      current_input, 
      conversation_history
    )
    relevant_tokens = count_tokens(relevant)
    if relevant_tokens <= context_budget - 500  # Leave room for response
      selected_messages.concat(relevant)
    else
      # Take most relevant that fit
      relevant.each do |msg|
        msg_tokens = count_tokens(msg)
        if context_budget - msg_tokens >= 500
          selected_messages << msg
          context_budget -= msg_tokens
        end
      end
    end

    # Layer 4: Fill remaining space with important messages
    important = @strategies[:important].rank(conversation_history)
    important.each do |msg|
      msg_tokens = count_tokens(msg)
      if context_budget - msg_tokens >= 500
        selected_messages << msg
        context_budget -= msg_tokens
      end
    end

    # Return organized context
    organize_chronologically(selected_messages)
  end
end

Why hybrid wins: No single strategy handles all cases. Our production system:

  • Uses sliding window for immediate context
  • Applies semantic search for relevant history
  • Summarizes verbose sections
  • Preserves critical decisions
  • Adapts based on conversation type

5.7. Memory Operations

# Access conversation history
memory = runner.memory_manager.get_memory(session_id: 'user_123')
puts "Messages: #{memory.messages.count}"
puts "Total tokens: #{memory.token_count}"

# Clear memory
runner.memory_manager.clear_memory(session_id: 'user_123')

# Add custom memory
runner.memory_manager.add_memory(
  session_id: 'user_123',
  role: 'system',
  content: 'User prefers technical explanations'
)

6. Structured Outputs

RAAF provides universal structured output support that works across all AI providers.

6.1. Why Structured Outputs Changed Our Error Rate from 15% to 0.1%

Here's a conversation from our pre-structured era:

User: "What's the weather?"
AI: "It's currently 72 degrees Fahrenheit in San Francisco with partly 
cloudy skies and a gentle breeze from the west at about 10 mph."
Our parsing code: *crashes trying to extract temperature*

We were using regex to parse natural language. It worked... sometimes. Other times the AI would say "seventy-two degrees" or "72°F" or "quite warm, around 72" and our system would fail.

Structured outputs guarantee that the AI returns data in exactly the format you need. No parsing. No regex. No guessing.

6.2. What Are Structured Outputs?

Instead of hoping the AI formats its response correctly, you define a schema and the AI must follow it:

Without Structure: "The temperature is 72°F" With Structure: {"temperature": 72, "unit": "fahrenheit"}

It's like the difference between asking someone to "write down the time" (could be "3:30", "half past three", "15:30") versus filling out a form with specific fields for hours and minutes.

6.3. When Structured Outputs Save the Day

  1. Data Integration: When AI output feeds into other systems
  2. Multi-Agent Workflows: When agents need to pass structured data
  3. UI Rendering: When you're displaying AI data in specific formats
  4. Validation: When you need guaranteed data types and formats

6.4. Basic Structured Output

6.4.1. The $50,000 Data Format Disaster

A fintech company was building an AI system to process loan applications. Their AI returned loan decisions as natural language: "We approve this loan for $50,000 at 4.2% interest."

Their backend system couldn't parse it. Every response was different:

  • "Approved for $50K at 4.2%"
  • "Loan amount: fifty thousand dollars, rate: 4.2%"
  • "Yes, approve. Amount: $50,000. Interest: 4.2%"

The integration team spent 3 months building a parser. It failed 30% of the time. They lost deals because approvals took days instead of seconds.

The fix? Structured outputs.

6.4.2. Your First Structured Output

require 'raaf'

# Define response schema
WeatherSchema = {
  type: 'object',
  properties: {
    location: { type: 'string' },
    temperature: { type: 'number' },
    conditions: { type: 'string' },
    humidity: { type: 'number' },
    wind_speed: { type: 'number' }
  },
  required: ['location', 'temperature', 'conditions']
}

agent = RAAF::Agent.new(
  name: "WeatherBot",
  instructions: "Provide weather information in the specified format",
  model: "gpt-4o",
  response_format: WeatherSchema
)

runner = RAAF::Runner.new(agent: agent)
result = runner.run("What's the weather in San Francisco?")

# Result will be structured JSON matching the schema
weather_data = JSON.parse(result.messages.last[:content])
puts "Temperature: #{weather_data['temperature']}°F"

6.4.3. Common Data Types and Patterns

String Types with Validation

PersonSchema = {
  type: 'object',
  properties: {
    name: { 
      type: 'string',
      minLength: 1,
      maxLength: 100
    },
    email: { 
      type: 'string',
      format: 'email'
    },
    phone: {
      type: 'string',
      pattern: '^\\+?[1-9]\\d{1,14}$'  # International phone format
    },
    role: {
      type: 'string',
      enum: ['admin', 'user', 'guest']
    }
  },
  required: ['name', 'email']
}

Numeric Types with Constraints

PricingSchema = {
  type: 'object',
  properties: {
    base_price: {
      type: 'number',
      minimum: 0,
      maximum: 1000000
    },
    discount_percent: {
      type: 'number',
      minimum: 0,
      maximum: 100
    },
    quantity: {
      type: 'integer',
      minimum: 1,
      maximum: 10000
    },
    is_taxable: {
      type: 'boolean'
    }
  },
  required: ['base_price', 'quantity']
}

Array Handling

ShoppingCartSchema = {
  type: 'object',
  properties: {
    items: {
      type: 'array',
      minItems: 1,
      maxItems: 50,
      items: {
        type: 'object',
        properties: {
          product_id: { type: 'string' },
          quantity: { type: 'integer', minimum: 1 }
        },
        required: ['product_id', 'quantity']
      }
    },
    total_items: {
      type: 'integer',
      minimum: 1
    }
  },
  required: ['items', 'total_items']
}

6.4.4. Real-World Schema Examples

Financial Transaction Schema

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'runner' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-anite9.rb:490:in '<main>'

TransactionSchema = {
  type: 'object',
  properties: {
    transaction_id: { type: 'string' },
    amount: {
      type: 'number',
      minimum: 0.01,
      maximum: 1000000
    },
    currency: {
      type: 'string',
      enum: ['USD', 'EUR', 'GBP', 'JPY']
    },
    transaction_type: {
      type: 'string',
      enum: ['credit', 'debit', 'transfer']
    },
    merchant: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        category: { type: 'string' },
        location: { type: 'string' }
      },
      required: ['name', 'category']
    },
    timestamp: {
      type: 'string',
      format: 'date-time'
    },
    status: {
      type: 'string',
      enum: ['pending', 'completed', 'failed', 'cancelled']
    }
  },
  required: ['transaction_id', 'amount', 'currency', 'transaction_type', 'timestamp', 'status']
}

# Usage
agent = RAAF::Agent.new(
  name: "TransactionProcessor",
  instructions: "Process financial transactions and extract structured data",
  model: "gpt-4o",
  response_format: TransactionSchema
)

result = runner.run("Process this transaction: $250.00 payment to Amazon on 2024-01-15")
transaction = JSON.parse(result.messages.last[:content])

Customer Support Ticket Schema

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'runner' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-u3i0p5.rb:500:in '<main>'

SupportTicketSchema = {
  type: 'object',
  properties: {
    ticket_id: { type: 'string' },
    customer_info: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        email: { type: 'string', format: 'email' },
        tier: { type: 'string', enum: ['free', 'premium', 'enterprise'] }
      },
      required: ['name', 'email', 'tier']
    },
    issue: {
      type: 'object',
      properties: {
        category: {
          type: 'string',
          enum: ['billing', 'technical', 'account', 'feature_request']
        },
        priority: {
          type: 'string',
          enum: ['low', 'medium', 'high', 'critical']
        },
        description: { type: 'string' },
        affected_features: {
          type: 'array',
          items: { type: 'string' }
        }
      },
      required: ['category', 'priority', 'description']
    },
    resolution: {
      type: 'object',
      properties: {
        status: {
          type: 'string',
          enum: ['open', 'in_progress', 'resolved', 'closed']
        },
        estimated_resolution_time: { type: 'string' },
        assigned_agent: { type: 'string' }
      },
      required: ['status']
    }
  },
  required: ['ticket_id', 'customer_info', 'issue', 'resolution']
}

# Usage for ticket classification
agent = RAAF::Agent.new(
  name: "TicketClassifier",
  instructions: "Classify and structure customer support tickets",
  model: "gpt-4o",
  response_format: SupportTicketSchema
)

result = runner.run("Customer [email protected] (premium) says: 'My dashboard won't load and I can't see my analytics. This is urgent for tomorrow's board meeting.'")
ticket = JSON.parse(result.messages.last[:content])

6.4.5. Error Handling for Structured Outputs

class StructuredOutputHandler
  def self.safe_parse(agent_response, schema)
    begin
      # Parse the JSON response
      parsed_data = JSON.parse(agent_response)

      # Validate against schema
      validation_errors = validate_schema(parsed_data, schema)

      if validation_errors.any?
        {
          success: false,
          errors: validation_errors,
          raw_response: agent_response
        }
      else
        {
          success: true,
          data: parsed_data
        }
      end

    rescue JSON::ParserError => e
      {
        success: false,
        errors: ["Invalid JSON format: #{e.message}"],
        raw_response: agent_response
      }
    end
  end

  private

  def self.validate_schema(data, schema)
    errors = []

    # Check required fields
    schema[:required]&.each do |field|
      unless data.key?(field.to_s)
        errors << "Missing required field: #{field}"
      end
    end

    # Check data types
    schema[:properties]&.each do |field, constraints|
      next unless data.key?(field.to_s)

      value = data[field.to_s]
      expected_type = constraints[:type]

      unless validate_type(value, expected_type)
        errors << "Field '#{field}' should be #{expected_type}, got #{value.class}"
      end
    end

    errors
  end

  def self.validate_type(value, expected_type)
    case expected_type
    when 'string'
      value.is_a?(String)
    when 'number'
      value.is_a?(Numeric)
    when 'integer'
      value.is_a?(Integer)
    when 'boolean'
      value.is_a?(TrueClass) || value.is_a?(FalseClass)
    when 'array'
      value.is_a?(Array)
    when 'object'
      value.is_a?(Hash)
    else
      false
    end
  end
end

# Usage
result = runner.run("Extract customer data from this text...")
response = result.messages.last[:content]

parsed_result = StructuredOutputHandler.safe_parse(response, CustomerSchema)

if parsed_result[:success]
  customer_data = parsed_result[:data]
  puts "Customer: #{customer_data['name']}"
else
  puts "Parsing failed: #{parsed_result[:errors].join(', ')}"
  # Handle fallback or retry logic
end

6.4.6. Performance Tips for Structured Outputs

# Tip 1: Use concise schemas for faster generation
CompactSchema = {
  type: 'object',
  properties: {
    status: { type: 'string' },
    result: { type: 'string' }
  },
  required: ['status']
}

# Tip 2: Cache schema validation
class SchemaCache
  @cache = {}

  def self.validate(data, schema)
    schema_key = schema.hash
    validator = @cache[schema_key] ||= build_validator(schema)
    validator.validate(data)
  end
end

# Tip 3: Use progressive schema complexity
BasicContactSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    email: { type: 'string' }
  },
  required: ['name', 'email']
}

# Extend only when needed
DetailedContactSchema = BasicContactSchema.merge({
  properties: BasicContactSchema[:properties].merge({
    phone: { type: 'string' },
    address: { type: 'object' },
    preferences: { type: 'array' }
  })
})

The key to successful structured outputs is starting simple and adding complexity only when your use case demands it. Every field in your schema should solve a real problem, not anticipate theoretical needs.

6.5. Complex Schemas

6.5.1. Complex Schema Patterns

An e-commerce platform needed to integrate with 12 different systems: payment processors, inventory management, shipping carriers, tax calculators, and fraud detection. Each system expected different data formats.

The Problem: Their AI agent was generating orders in natural language, then they had 12 different parsers trying to extract the right data for each system. Maintenance was complex and error-prone.

The Solution: One complex schema that captured all the needed data, with clear relationships and validation rules.

6.5.2. Building Complex Schemas: The Layer-by-Layer Approach

Layer 1: Basic Object Structure

# Start with the core entities
BasicOrderSchema = {
  type: 'object',
  properties: {
    order_id: { type: 'string' },
    customer: { type: 'object' },
    items: { type: 'array' },
    total: { type: 'number' }
  },
  required: ['order_id', 'customer', 'items', 'total']
}

Layer 2: Add Nested Object Details

# Define customer object structure
CustomerObject = {
  type: 'object',
  properties: {
    customer_id: { type: 'string' },
    name: { type: 'string' },
    email: { type: 'string', format: 'email' },
    phone: { type: 'string' },
    address: {
      type: 'object',
      properties: {
        street: { type: 'string' },
        city: { type: 'string' },
        state: { type: 'string' },
        postal_code: { type: 'string' },
        country: { type: 'string', enum: ['US', 'CA', 'GB', 'AU'] }
      },
      required: ['street', 'city', 'state', 'postal_code', 'country']
    },
    billing_address: {
      type: 'object',
      properties: {
        same_as_shipping: { type: 'boolean' },
        street: { type: 'string' },
        city: { type: 'string' },
        state: { type: 'string' },
        postal_code: { type: 'string' },
        country: { type: 'string', enum: ['US', 'CA', 'GB', 'AU'] }
      },
      required: ['same_as_shipping']
    }
  },
  required: ['customer_id', 'name', 'email', 'address']
}

Layer 3: Complex Array Items

# Define item structure with variants and metadata
ItemObject = {
  type: 'object',
  properties: {
    product_id: { type: 'string' },
    name: { type: 'string' },
    category: { type: 'string' },
    quantity: { type: 'integer', minimum: 1, maximum: 100 },
    unit_price: { type: 'number', minimum: 0 },
    discount: {
      type: 'object',
      properties: {
        type: { type: 'string', enum: ['percentage', 'fixed'] },
        value: { type: 'number', minimum: 0 },
        reason: { type: 'string' }
      }
    },
    variants: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          value: { type: 'string' }
        },
        required: ['name', 'value']
      }
    },
    metadata: {
      type: 'object',
      properties: {
        weight: { type: 'number' },
        dimensions: {
          type: 'object',
          properties: {
            length: { type: 'number' },
            width: { type: 'number' },
            height: { type: 'number' }
          }
        },
        sku: { type: 'string' },
        barcode: { type: 'string' }
      }
    }
  },
  required: ['product_id', 'name', 'quantity', 'unit_price']
}

Layer 4: Complete Complex Schema

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: uninitialized constant ItemObject /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-x69il2.rb:455:in '<main>'

# The full enterprise-grade order schema
CompleteOrderSchema = {
  type: 'object',
  properties: {
    order_id: { type: 'string' },
    order_number: { type: 'string' },
    customer: CustomerObject,
    items: {
      type: 'array',
      minItems: 1,
      maxItems: 50,
      items: ItemObject
    },
    pricing: {
      type: 'object',
      properties: {
        subtotal: { type: 'number', minimum: 0 },
        tax: { type: 'number', minimum: 0 },
        shipping: { type: 'number', minimum: 0 },
        discount: { type: 'number', minimum: 0 },
        total: { type: 'number', minimum: 0 }
      },
      required: ['subtotal', 'tax', 'shipping', 'total']
    },
    payment: {
      type: 'object',
      properties: {
        method: { type: 'string', enum: ['credit_card', 'debit_card', 'paypal', 'bank_transfer'] },
        status: { type: 'string', enum: ['pending', 'authorized', 'captured', 'failed'] },
        transaction_id: { type: 'string' },
        amount: { type: 'number', minimum: 0 }
      },
      required: ['method', 'status', 'amount']
    },
    fulfillment: {
      type: 'object',
      properties: {
        method: { type: 'string', enum: ['standard', 'express', 'overnight', 'pickup'] },
        carrier: { type: 'string' },
        tracking_number: { type: 'string' },
        estimated_delivery: { type: 'string', format: 'date' },
        status: { type: 'string', enum: ['pending', 'processing', 'shipped', 'delivered', 'cancelled'] }
      },
      required: ['method', 'status']
    },
    timestamps: {
      type: 'object',
      properties: {
        created_at: { type: 'string', format: 'date-time' },
        updated_at: { type: 'string', format: 'date-time' },
        confirmed_at: { type: 'string', format: 'date-time' }
      },
      required: ['created_at', 'updated_at']
    },
    metadata: {
      type: 'object',
      properties: {
        source: { type: 'string', enum: ['web', 'mobile', 'api', 'phone'] },
        referrer: { type: 'string' },
        session_id: { type: 'string' },
        user_agent: { type: 'string' },
        ip_address: { type: 'string' },
        notes: { type: 'string' }
      }
    }
  },
  required: ['order_id', 'customer', 'items', 'pricing', 'payment', 'fulfillment', 'timestamps']
}

# Usage
agent = RAAF::Agent.new(
  name: "OrderProcessor",
  instructions: "Process complex orders with full details for enterprise integration",
  model: "gpt-4o",
  response_format: CompleteOrderSchema
)

result = runner.run("Process this order: Customer John Smith wants 2 iPhone 15 Pro cases and 1 screen protector, shipping to 123 Main St, Boston, MA 02101")
order = JSON.parse(result.messages.last[:content])

6.5.3. Advanced Schema Patterns

Pattern 1: Conditional Properties

# Schema that changes based on other properties
ConditionalSchema = {
  type: 'object',
  properties: {
    notification_type: { type: 'string', enum: ['email', 'sms', 'push'] },
    recipient: { type: 'string' },
    content: { type: 'string' }
  },
  required: ['notification_type', 'recipient', 'content'],

  # Conditional validation
  if: { properties: { notification_type: { const: 'email' } } },
  then: {
    properties: {
      recipient: { type: 'string', format: 'email' },
      subject: { type: 'string' },
      html_content: { type: 'string' }
    },
    required: ['subject']
  },

  else: {
    if: { properties: { notification_type: { const: 'sms' } } },
    then: {
      properties: {
        recipient: { type: 'string', pattern: '^\\+?[1-9]\\d{1,14}$' },
        message: { type: 'string', maxLength: 160 }
      },
      required: ['message']
    }
  }
}

Pattern 2: Polymorphic Objects

# Schema that handles different object types
PolymorphicSchema = {
  type: 'object',
  properties: {
    events: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          event_type: { type: 'string', enum: ['user_action', 'system_event', 'error'] },
          timestamp: { type: 'string', format: 'date-time' },
          data: { type: 'object' }
        },
        required: ['event_type', 'timestamp', 'data'],

        # Different data structure based on event_type
        if: { properties: { event_type: { const: 'user_action' } } },
        then: {
          properties: {
            data: {
              type: 'object',
              properties: {
                user_id: { type: 'string' },
                action: { type: 'string' },
                page: { type: 'string' },
                session_id: { type: 'string' }
              },
              required: ['user_id', 'action']
            }
          }
        },

        else: {
          if: { properties: { event_type: { const: 'system_event' } } },
          then: {
            properties: {
              data: {
                type: 'object',
                properties: {
                  service: { type: 'string' },
                  operation: { type: 'string' },
                  status: { type: 'string' },
                  duration_ms: { type: 'number' }
                },
                required: ['service', 'operation', 'status']
              }
            }
          }
        }
      }
    }
  },
  required: ['events']
}

Pattern 3: Recursive Schemas

# Schema that references itself (like file system trees)
RecursiveSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    type: { type: 'string', enum: ['file', 'directory'] },
    size: { type: 'number' },
    children: {
      type: 'array',
      items: { '$ref': '#' }  # References the root schema
    }
  },
  required: ['name', 'type'],

  # Conditional requirements
  if: { properties: { type: { const: 'file' } } },
  then: { required: ['size'] },
  else: { required: ['children'] }
}

6.5.4. Schema Optimization Strategies

Strategy 1: Progressive Schema Building

class SchemaBuilder
  def self.build_progressive(base_schema, extensions = {})
    result = base_schema.deep_dup

    extensions.each do |key, extension|
      case key
      when :add_properties
        result[:properties].merge!(extension)
      when :add_required
        result[:required] ||= []
        result[:required].concat(extension)
      when :add_constraints
        extension.each do |prop, constraints|
          result[:properties][prop]&.merge!(constraints)
        end
      end
    end

    result
  end
end

# Usage
base_schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    email: { type: 'string' }
  },
  required: ['name']
}

# Add complexity progressively
enhanced_schema = SchemaBuilder.build_progressive(base_schema, {
  add_properties: {
    phone: { type: 'string' },
    address: { type: 'object' }
  },
  add_required: ['email'],
  add_constraints: {
    email: { format: 'email' },
    name: { minLength: 2, maxLength: 50 }
  }
})

Strategy 2: Schema Composition

# Reusable schema components
AddressSchema = {
  type: 'object',
  properties: {
    street: { type: 'string' },
    city: { type: 'string' },
    state: { type: 'string' },
    postal_code: { type: 'string' },
    country: { type: 'string' }
  },
  required: ['street', 'city', 'state', 'postal_code', 'country']
}

PersonSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    email: { type: 'string', format: 'email' },
    address: AddressSchema
  },
  required: ['name', 'email', 'address']
}

# Compose into larger schemas
CompanySchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    employees: {
      type: 'array',
      items: PersonSchema
    },
    headquarters: AddressSchema
  },
  required: ['name', 'employees', 'headquarters']
}

6.5.5. Managing Schema Complexity

Rule 1: Keep Related Data Together

# Bad: Scattered related fields
BadSchema = {
  properties: {
    first_name: { type: 'string' },
    last_name: { type: 'string' },
    email: { type: 'string' },
    phone: { type: 'string' },
    street: { type: 'string' },
    city: { type: 'string' }
  }
}

# Good: Grouped related fields
GoodSchema = {
  properties: {
    personal_info: {
      type: 'object',
      properties: {
        first_name: { type: 'string' },
        last_name: { type: 'string' }
      }
    },
    contact_info: {
      type: 'object',
      properties: {
        email: { type: 'string' },
        phone: { type: 'string' }
      }
    },
    address: {
      type: 'object',
      properties: {
        street: { type: 'string' },
        city: { type: 'string' }
      }
    }
  }
}

Rule 2: Use Clear Naming Conventions

# Clear, consistent naming
NamingSchema = {
  properties: {
    user_id: { type: 'string' },          # Not: id, userId, user_identifier
    created_at: { type: 'string' },       # Not: created, createdAt, creation_time
    is_active: { type: 'boolean' },       # Not: active, status, enabled
    email_address: { type: 'string' },    # Not: email, mail, e_mail
    phone_number: { type: 'string' },     # Not: phone, tel, telephone
    order_items: { type: 'array' },       # Not: items, products, line_items
    total_amount: { type: 'number' },     # Not: total, amount, price
    payment_method: { type: 'string' },   # Not: payment, method, pay_method
    shipping_address: { type: 'object' }, # Not: address, ship_to, destination
    billing_address: { type: 'object' }   # Not: bill_to, payment_address
  }
}

The key to complex schemas is building them incrementally, testing at each stage, and focusing on real business requirements rather than theoretical completeness.

6.6. Schema Validation

6.6.1. Schema Validation Importance

It was 2 AM on Black Friday. Our AI agent was processing thousands of orders per minute. Everything was working perfectly—until it wasn't.

An order came in with a negative quantity: { "quantity": -5 }. Our system processed it, charged the customer, and sent a refund request to the payment processor. The payment processor saw the negative amount and issued a credit instead of a refund.

Result? We accidentally gave a customer $500 instead of processing their $100 order.

Schema validation is essential for production AI systems to ensure data integrity and prevent errors.

6.6.2. Multi-Layer Validation Strategy

Layer 1: Provider-Level Validation

# Most providers now support native structured outputs
agent = RAAF::Agent.new(
  name: "DataExtractor",
  instructions: "Extract structured data",
  model: "gpt-4o",
  response_format: schema,
  strict: true  # Enable strict schema validation
)

runner = RAAF::Runner.new(agent: agent)

begin
  result = runner.run("Extract customer data from this text...")
  data = JSON.parse(result.messages.last[:content])

  # Data is guaranteed to match schema
  process_customer_data(data)

rescue RAAF::StructuredOutput::ValidationError => e
  puts "Schema validation failed: #{e.message}"
  puts "Validation errors: #{e.errors}"
end

Layer 2: Application-Level Validation

class SchemaValidator
  def self.validate(data, schema)
    errors = []

    # Check required fields
    missing_fields = check_required_fields(data, schema)
    errors.concat(missing_fields) if missing_fields.any?

    # Check data types
    type_errors = check_data_types(data, schema)
    errors.concat(type_errors) if type_errors.any?

    # Check constraints
    constraint_errors = check_constraints(data, schema)
    errors.concat(constraint_errors) if constraint_errors.any?

    # Check business rules
    business_errors = check_business_rules(data, schema)
    errors.concat(business_errors) if business_errors.any?

    {
      valid: errors.empty?,
      errors: errors,
      data: data
    }
  end

  private

  def self.check_required_fields(data, schema)
    errors = []
    required_fields = schema.dig(:required) || []

    required_fields.each do |field|
      unless data.key?(field.to_s) || data.key?(field.to_sym)
        errors << "Missing required field: #{field}"
      end
    end

    errors
  end

  def self.check_data_types(data, schema)
    errors = []
    properties = schema.dig(:properties) || {}

    properties.each do |field, constraints|
      next unless data.key?(field.to_s) || data.key?(field.to_sym)

      value = data[field.to_s] || data[field.to_sym]
      expected_type = constraints[:type]

      unless valid_type?(value, expected_type)
        errors << "Field '#{field}' should be #{expected_type}, got #{value.class}"
      end
    end

    errors
  end

  def self.check_constraints(data, schema)
    errors = []
    properties = schema.dig(:properties) || {}

    properties.each do |field, constraints|
      next unless data.key?(field.to_s) || data.key?(field.to_sym)

      value = data[field.to_s] || data[field.to_sym]

      # Check string constraints
      if constraints[:type] == 'string'
        errors.concat(validate_string_constraints(field, value, constraints))
      end

      # Check numeric constraints
      if constraints[:type] == 'number' || constraints[:type] == 'integer'
        errors.concat(validate_numeric_constraints(field, value, constraints))
      end

      # Check array constraints
      if constraints[:type] == 'array'
        errors.concat(validate_array_constraints(field, value, constraints))
      end

      # Check enum constraints
      if constraints[:enum]
        unless constraints[:enum].include?(value)
          errors << "Field '#{field}' must be one of #{constraints[:enum].join(', ')}, got '#{value}'"
        end
      end
    end

    errors
  end

  def self.validate_string_constraints(field, value, constraints)
    errors = []

    if constraints[:minLength] && value.length < constraints[:minLength]
      errors << "Field '#{field}' must be at least #{constraints[:minLength]} characters"
    end

    if constraints[:maxLength] && value.length > constraints[:maxLength]
      errors << "Field '#{field}' must be at most #{constraints[:maxLength]} characters"
    end

    if constraints[:pattern] && !value.match(Regexp.new(constraints[:pattern]))
      errors << "Field '#{field}' must match pattern #{constraints[:pattern]}"
    end

    if constraints[:format] == 'email' && !value.match(/\A[^@]+@[^@]+\.[^@]+\z/)
      errors << "Field '#{field}' must be a valid email address"
    end

    errors
  end

  def self.validate_numeric_constraints(field, value, constraints)
    errors = []

    if constraints[:minimum] && value < constraints[:minimum]
      errors << "Field '#{field}' must be at least #{constraints[:minimum]}"
    end

    if constraints[:maximum] && value > constraints[:maximum]
      errors << "Field '#{field}' must be at most #{constraints[:maximum]}"
    end

    if constraints[:type] == 'integer' && !value.is_a?(Integer)
      errors << "Field '#{field}' must be an integer"
    end

    errors
  end

  def self.validate_array_constraints(field, value, constraints)
    errors = []

    if constraints[:minItems] && value.length < constraints[:minItems]
      errors << "Field '#{field}' must have at least #{constraints[:minItems]} items"
    end

    if constraints[:maxItems] && value.length > constraints[:maxItems]
      errors << "Field '#{field}' must have at most #{constraints[:maxItems]} items"
    end

    # Validate array items
    if constraints[:items]
      value.each_with_index do |item, index|
        item_errors = validate(item, constraints[:items])
        if item_errors[:errors].any?
          errors << "Item #{index} in field '#{field}': #{item_errors[:errors].join(', ')}"
        end
      end
    end

    errors
  end

  def self.check_business_rules(data, schema)
    errors = []

    # Custom business rule validation
    if schema[:business_rules]
      schema[:business_rules].each do |rule|
        unless rule[:validator].call(data)
          errors << rule[:message]
        end
      end
    end

    errors
  end

  def self.valid_type?(value, expected_type)
    case expected_type
    when 'string'
      value.is_a?(String)
    when 'number'
      value.is_a?(Numeric)
    when 'integer'
      value.is_a?(Integer)
    when 'boolean'
      value.is_a?(TrueClass) || value.is_a?(FalseClass)
    when 'array'
      value.is_a?(Array)
    when 'object'
      value.is_a?(Hash)
    when 'null'
      value.nil?
    else
      false
    end
  end
end

Layer 3: Business Rule Validation

# Schema with business rules
OrderSchema = {
  type: 'object',
  properties: {
    customer_id: { type: 'string' },
    items: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          product_id: { type: 'string' },
          quantity: { type: 'integer', minimum: 1 },
          price: { type: 'number', minimum: 0 }
        }
      }
    },
    total: { type: 'number', minimum: 0 }
  },
  required: ['customer_id', 'items', 'total'],

  # Custom business rules
  business_rules: [
    {
      validator: ->(data) {
        # Total must equal sum of item prices
        calculated_total = data['items'].sum { |item| item['price'] * item['quantity'] }
        (data['total'] - calculated_total).abs < 0.01
      },
      message: "Total must equal sum of item prices"
    },
    {
      validator: ->(data) {
        # Maximum 20 items per order
        data['items'].length <= 20
      },
      message: "Orders cannot exceed 20 items"
    },
    {
      validator: ->(data) {
        # No duplicate products
        product_ids = data['items'].map { |item| item['product_id'] }
        product_ids.uniq.length == product_ids.length
      },
      message: "Order cannot contain duplicate products"
    }
  ]
}

# Usage
result = runner.run("Process this order...")
response = result.messages.last[:content]
data = JSON.parse(response)

validation_result = SchemaValidator.validate(data, OrderSchema)

if validation_result[:valid]
  process_order(data)
else
  puts "Validation failed:"
  validation_result[:errors].each { |error| puts "  - #{error}" }
end

6.6.3. Real-Time Validation Patterns

Pattern 1: Streaming Validation

class StreamingValidator
  def initialize(schema)
    @schema = schema
    @partial_data = {}
    @errors = []
  end

  def validate_chunk(chunk)
    @partial_data.merge!(chunk)

    # Validate what we have so far
    current_errors = validate_partial(@partial_data, @schema)

    {
      valid_so_far: current_errors.empty?,
      errors: current_errors,
      can_continue: can_continue_parsing?(current_errors)
    }
  end

  def finalize
    # Final validation of complete data
    SchemaValidator.validate(@partial_data, @schema)
  end

  private

  def validate_partial(data, schema)
    # Only validate fields that are present
    # Don't fail on missing required fields yet
    errors = []

    schema[:properties]&.each do |field, constraints|
      next unless data.key?(field.to_s)

      value = data[field.to_s]

      # Validate type
      unless SchemaValidator.valid_type?(value, constraints[:type])
        errors << "Field '#{field}' has invalid type"
      end

      # Validate constraints
      if constraints[:type] == 'string'
        errors.concat(SchemaValidator.validate_string_constraints(field, value, constraints))
      end
    end

    errors
  end

  def can_continue_parsing?(errors)
    # Don't continue if we have fundamental type errors
    !errors.any? { |error| error.include?('invalid type') }
  end
end

Pattern 2: Conditional Validation

class ConditionalValidator
  def self.validate(data, schema)
    errors = []

    # Base validation
    base_errors = SchemaValidator.validate(data, schema)
    errors.concat(base_errors[:errors])

    # Conditional validation
    if schema[:conditional_rules]
      schema[:conditional_rules].each do |rule|
        if rule[:condition].call(data)
          conditional_errors = validate_conditional_schema(data, rule[:then_schema])
          errors.concat(conditional_errors)
        elsif rule[:else_schema]
          conditional_errors = validate_conditional_schema(data, rule[:else_schema])
          errors.concat(conditional_errors)
        end
      end
    end

    {
      valid: errors.empty?,
      errors: errors,
      data: data
    }
  end

  private

  def self.validate_conditional_schema(data, schema)
    result = SchemaValidator.validate(data, schema)
    result[:errors]
  end
end

# Usage
PaymentSchema = {
  type: 'object',
  properties: {
    payment_method: { type: 'string', enum: ['credit_card', 'paypal', 'bank_transfer'] },
    amount: { type: 'number', minimum: 0 }
  },
  required: ['payment_method', 'amount'],

  conditional_rules: [
    {
      condition: ->(data) { data['payment_method'] == 'credit_card' },
      then_schema: {
        type: 'object',
        properties: {
          card_number: { type: 'string', pattern: '^[0-9]{16}$' },
          expiry: { type: 'string', pattern: '^[0-9]{2}/[0-9]{2}$' },
          cvv: { type: 'string', pattern: '^[0-9]{3,4}$' }
        },
        required: ['card_number', 'expiry', 'cvv']
      }
    },
    {
      condition: ->(data) { data['payment_method'] == 'paypal' },
      then_schema: {
        type: 'object',
        properties: {
          paypal_email: { type: 'string', format: 'email' }
        },
        required: ['paypal_email']
      }
    }
  ]
}

6.6.4. Performance-Optimized Validation

class CachedValidator
  def initialize
    @validation_cache = {}
    @schema_cache = {}
  end

  def validate(data, schema)
    # Cache compiled schemas
    schema_key = schema.hash
    compiled_schema = @schema_cache[schema_key] ||= compile_schema(schema)

    # Cache validation results for identical data
    data_key = data.hash
    cache_key = "#{schema_key}_#{data_key}"

    @validation_cache[cache_key] ||= perform_validation(data, compiled_schema)
  end

  private

  def compile_schema(schema)
    # Pre-compile regex patterns, enum sets, etc.
    compiled = schema.deep_dup

    compiled[:properties]&.each do |field, constraints|
      if constraints[:pattern]
        constraints[:compiled_pattern] = Regexp.new(constraints[:pattern])
      end

      if constraints[:enum]
        constraints[:enum_set] = Set.new(constraints[:enum])
      end
    end

    compiled
  end

  def perform_validation(data, compiled_schema)
    # Use compiled patterns for faster validation
    errors = []

    compiled_schema[:properties]&.each do |field, constraints|
      next unless data.key?(field.to_s)

      value = data[field.to_s]

      # Use compiled regex
      if constraints[:compiled_pattern] && !value.match(constraints[:compiled_pattern])
        errors << "Field '#{field}' must match pattern"
      end

      # Use compiled enum set
      if constraints[:enum_set] && !constraints[:enum_set].include?(value)
        errors << "Field '#{field}' must be one of allowed values"
      end
    end

    {
      valid: errors.empty?,
      errors: errors,
      data: data
    }
  end
end

6.6.5. Validation Error Recovery

class ValidationErrorRecovery
  def self.attempt_recovery(data, schema, errors)
    recovery_attempts = []

    errors.each do |error|
      case error
      when /Missing required field: (.+)/
        field = $1
        recovery_attempts << attempt_infer_missing_field(data, field, schema)

      when /Field '(.+)' should be (.+), got (.+)/
        field, expected_type, actual_type = $1, $2, $3
        recovery_attempts << attempt_type_conversion(data, field, expected_type)

      when /Field '(.+)' must be at least (\d+) characters/
        field, min_length = $1, $2.to_i
        recovery_attempts << attempt_pad_string(data, field, min_length)

      when /Field '(.+)' must be one of (.+), got '(.+)'/
        field, allowed_values, actual_value = $1, $2, $3
        recovery_attempts << attempt_enum_correction(data, field, allowed_values, actual_value)
      end
    end

    # Apply successful recovery attempts
    recovered_data = data.dup
    recovery_attempts.compact.each do |recovery|
      recovered_data[recovery[:field]] = recovery[:value]
    end

    {
      recovered_data: recovered_data,
      recovery_applied: recovery_attempts.any?(&:present?)
    }
  end

  private

  def self.attempt_infer_missing_field(data, field, schema)
    # Try to infer missing field from context
    case field
    when 'created_at'
      { field: field, value: Time.now.iso8601 }
    when 'status'
      { field: field, value: 'pending' }
    when 'id'
      { field: field, value: SecureRandom.uuid }
    else
      nil
    end
  end

  def self.attempt_type_conversion(data, field, expected_type)
    current_value = data[field]

    case expected_type
    when 'string'
      { field: field, value: current_value.to_s }
    when 'number'
      { field: field, value: current_value.to_f }
    when 'integer'
      { field: field, value: current_value.to_i }
    when 'boolean'
      { field: field, value: !!current_value }
    else
      nil
    end
  end

  def self.attempt_pad_string(data, field, min_length)
    current_value = data[field]
    if current_value.length < min_length
      { field: field, value: current_value.ljust(min_length, '0') }
    else
      nil
    end
  end

  def self.attempt_enum_correction(data, field, allowed_values, actual_value)
    allowed_array = allowed_values.split(', ')

    # Try fuzzy matching
    best_match = allowed_array.min_by do |allowed|
      levenshtein_distance(actual_value.downcase, allowed.downcase)
    end

    if levenshtein_distance(actual_value.downcase, best_match.downcase) <= 2
      { field: field, value: best_match }
    else
      nil
    end
  end

  def self.levenshtein_distance(s1, s2)
    # Simple Levenshtein distance implementation
    return s2.length if s1.empty?
    return s1.length if s2.empty?

    costs = Array.new(s2.length + 1) { |i| i }

    (1..s1.length).each do |i|
      costs[0] = i
      nw = i - 1

      (1..s2.length).each do |j|
        costs[j], nw = [
          costs[j] + 1,
          costs[j - 1] + 1,
          nw + (s1[i - 1] == s2[j - 1] ? 0 : 1)
        ].min, costs[j]
      end
    end

    costs[s2.length]
  end
end

The key to effective schema validation is layering multiple validation strategies: provider-level for structure, application-level for business rules, and recovery mechanisms for graceful error handling.

6.7. Provider-Specific Adaptations

RAAF automatically adapts schemas for different providers:

# Same schema works with different providers
schema = { type: 'object', properties: { name: { type: 'string' } } }

# OpenAI - uses native structured outputs
openai_agent = RAAF::Agent.new(
  model: "gpt-4o",
  response_format: schema
)

# Anthropic - uses guided generation
claude_agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022", 
  response_format: schema
)

# Groq - uses constrained generation
groq_agent = RAAF::Agent.new(
  model: "mixtral-8x7b-32768",
  response_format: schema
)

7. Context Management

Context is the connective tissue of AI conversations. Without proper context management, your AI agent suffers from digital amnesia—forgetting everything between interactions, losing track of user preferences, and restarting complex workflows from scratch.

Think of context as the agent's working memory. Just as a skilled consultant remembers your company's specific requirements, industry context, and past discussions, an AI agent needs context to provide personalized, coherent interactions. The difference between a forgettable chatbot and a truly useful AI assistant is how well it manages and leverages context.

Context management in RAAF addresses three critical challenges:

  1. State Persistence: Maintaining information across conversation turns
  2. Scope Control: Determining what information should be shared between agents
  3. Performance Impact: Balancing context richness with token efficiency

7.1. Why Context Management Matters

Consider a customer service scenario without context management:

# Without context - Every interaction starts fresh
user: "I need help with my order"
agent: "Sure! Can you provide your order number?"
user: "It's ORD-12345"
agent: "Thank you. What's your issue?"
user: "I need to change the shipping address"
agent: "What's your new address?"
user: "123 Main St, New York"
agent: "Done! Is there anything else?"
user: "Actually, can you check if there are any other orders?"
agent: "I'd be happy to help! Can you provide your order number?"

The agent forgot everything about the customer, even though they just discussed order ORD-12345. This creates a frustrating experience.

With proper context management:

# With context - Information persists and accumulates
user: "I need help with my order"
agent: "I'd be happy to help! Can you provide your order number?"
user: "It's ORD-12345"
agent: "Thank you. I can see this is for John Smith, shipping to 456 Oak Ave. What's your issue?"
user: "I need to change the shipping address"
agent: "What's your new address?"
user: "123 Main St, New York"
agent: "I've updated order ORD-12345 to ship to 123 Main St, New York. Is there anything else?"
user: "Actually, can you check if there are any other orders?"
agent: "Looking at your account, I see you have two other orders: ORD-12340 (delivered) and ORD-12342 (processing). Would you like details on either of these?"

The agent remembers the customer's identity, current order, and proactively provides relevant information. This is the power of context management.

7.2. Context Variables: The Foundation of Stateful AI

Context variables are key-value pairs that persist throughout a conversation and can be accessed by all tools and agents in the workflow. They bridge the gap between stateless AI models and stateful application requirements.

# Comprehensive context setup
runner = RAAF::Runner.new(
  agent: agent,
  context_variables: {
    # User identity and preferences
    user_id: 'user_123',
    user_name: 'John Smith',
    user_role: 'premium_customer',
    language: 'en',
    timezone: 'America/New_York',

    # Session information
    session_type: 'support',
    session_id: 'sess_abc123',
    channel: 'web_chat',

    # Business context
    company_id: 'comp_456',
    account_tier: 'enterprise',
    region: 'north_america',

    # Workflow state
    current_step: 'initial_contact',
    escalation_level: 0,
    previous_interactions: 3
  }
)

# Context variables are automatically injected into tools
def get_user_preferences(user_id: nil)
  # user_id is automatically provided from context
  preferences = UserPreferences.find_by(user_id: user_id)

  {
    theme: preferences.theme,
    notification_preferences: preferences.notifications,
    communication_style: preferences.style,
    preferred_channel: preferences.channel
  }
end

def personalize_response(content:, user_id: nil, language: nil, timezone: nil)
  # Multiple context variables injected automatically
  localized_content = I18n.with_locale(language) do
    content.gsub('{timezone}', timezone)
  end

  {
    content: localized_content,
    personalization_applied: true,
    user_id: user_id
  }
end

agent.add_tool(method(:get_user_preferences))
agent.add_tool(method(:personalize_response))

This comprehensive context setup enables sophisticated personalization and workflow management. The agent can access user preferences, maintain session state, and make decisions based on business context—all without requiring explicit parameters in each tool call.

7.3. Dynamic Context Updates: Context That Evolves

Context isn't static—it evolves as conversations progress. Dynamic context updates allow agents to learn and adapt based on new information discovered during the interaction.

def update_user_language(language:)
  # Update context during conversation
  runner.update_context(language: language)

  # Update user preferences persistently
  UserPreferences.find_by(user_id: context[:user_id]).update(language: language)

  {
    status: 'updated',
    new_language: language,
    message: "I've switched to #{language}. All future interactions will use this language."
  }
end

def escalate_support_level(reason:)
  current_level = runner.context[:escalation_level] || 0
  new_level = current_level + 1

  # Update context to reflect escalation
  runner.update_context(
    escalation_level: new_level,
    current_step: 'escalated_support',
    escalation_reason: reason,
    escalation_timestamp: Time.current
  )

  # Notify human agents if escalation reaches threshold
  if new_level >= 2
    HumanEscalationService.notify(
      user_id: runner.context[:user_id],
      session_id: runner.context[:session_id],
      reason: reason,
      context: runner.context
    )
  end

  {
    status: 'escalated',
    escalation_level: new_level,
    next_steps: new_level >= 2 ? 'human_agent_notified' : 'continued_ai_support'
  }
end

agent.add_tool(method(:update_user_language))
agent.add_tool(method(:escalate_support_level))

Dynamic context updates enable several powerful patterns:

  1. Learning and Adaptation: The agent learns user preferences and adjusts behavior
  2. Workflow Progression: Context tracks where the user is in a multi-step process
  3. Escalation Management: Context maintains escalation state and triggers appropriate actions
  4. Session State: Context preserves important session information across interactions

7.4. Context Inheritance: Sharing Knowledge Between Agents

In multi-agent workflows, context inheritance ensures that knowledge and state transfer seamlessly between specialized agents. This prevents information loss during agent handoffs and maintains conversation continuity.

# Define specialized agents with context awareness
research_agent = RAAF::Agent.new(
  name: "Researcher",
  instructions: "Research topics thoroughly using available sources. Access user preferences and previous research from context.",
  model: "gpt-4o"
)

writer_agent = RAAF::Agent.new(
  name: "Writer",
  instructions: "Write engaging content based on research. Use user preferences and context to personalize the writing style.",
  model: "gpt-4o"
)

editor_agent = RAAF::Agent.new(
  name: "Editor",
  instructions: "Edit and polish content. Ensure consistency with user preferences and brand guidelines from context.",
  model: "gpt-4o"
)

# Rich context that spans all agents
runner = RAAF::Runner.new(
  agent: research_agent,
  agents: [research_agent, writer_agent, editor_agent],
  context_variables: {
    # Content requirements
    topic: 'Ruby programming best practices',
    audience: 'intermediate developers',
    content_type: 'technical_blog_post',
    target_length: 2000,

    # User preferences
    writing_style: 'conversational',
    code_style: 'ruby_standard',
    examples_preferred: true,

    # Brand guidelines
    brand_voice: 'authoritative_but_approachable',
    company_standards: 'enterprise_focused',

    # Workflow state
    deadline: '2024-01-20',
    priority: 'high',
    stakeholder_approval_required: true,

    # Research constraints
    source_types: ['official_docs', 'community_posts', 'code_examples'],
    fact_checking_required: true,
    citation_style: 'informal'
  }
)

# Context-aware tools that leverage inherited information
def gather_research_sources(topic: nil, audience: nil, source_types: nil)
  # All parameters automatically injected from context
  sources = SourceFinder.find(
    topic: topic,
    audience_level: audience,
    types: source_types
  )

  {
    sources_found: sources.length,
    sources: sources,
    research_quality: assess_source_quality(sources),
    context_used: { topic: topic, audience: audience, types: source_types }
  }
end

def draft_content(topic: nil, audience: nil, writing_style: nil, examples_preferred: nil)
  # Context provides all necessary information
  content = ContentGenerator.generate(
    topic: topic,
    audience: audience,
    style: writing_style,
    include_examples: examples_preferred
  )

  {
    content: content,
    word_count: content.split.length,
    style_applied: writing_style,
    examples_included: examples_preferred
  }
end

research_agent.add_tool(method(:gather_research_sources))
writer_agent.add_tool(method(:draft_content))

Context inheritance enables sophisticated multi-agent workflows where each agent builds on the work of previous agents while maintaining access to all relevant information. The research agent discovers sources, the writer creates content, and the editor polishes—all with consistent access to user preferences and workflow state.

7.5. Context Scoping: Controlling Information Flow

Not all context should be shared everywhere. Context scoping allows you to control what information is available to which agents and tools, enabling both security and performance optimizations.

# Context with different scopes
runner = RAAF::Runner.new(
  agent: customer_service_agent,
  context_variables: {
    # Global context - available to all agents
    user_id: 'user_123',
    session_id: 'sess_abc',
    language: 'en',

    # Agent-specific context
    agent_contexts: {
      'CustomerService' => {
        escalation_level: 0,
        case_number: 'CS-2024-001',
        customer_tier: 'premium'
      },
      'TechnicalSupport' => {
        issue_category: 'api_integration',
        severity: 'high',
        sla_deadline: '2024-01-15T10:00:00Z'
      },
      'BillingSupport' => {
        account_balance: 450.00,
        payment_method: 'credit_card',
        billing_cycle: 'monthly'
      }
    },

    # Sensitive context - restricted access
    sensitive_context: {
      credit_card_last_four: '1234',
      account_pin: '7890',
      security_questions: ['mother_maiden_name', 'first_pet']
    }
  }
)

# Context-scoped tool that only accesses appropriate information
def check_billing_status(user_id: nil, account_balance: nil, payment_method: nil)
  # Only receives billing-related context
  {
    user_id: user_id,
    current_balance: account_balance,
    payment_method: payment_method,
    next_billing_date: calculate_next_billing(user_id),
    payment_status: check_payment_status(user_id)
  }
end

# Security-conscious tool with explicit sensitive data access
def verify_identity(user_id: nil, sensitive_context: nil)
  # Explicitly requires sensitive context
  security_check = IdentityVerifier.verify(
    user_id: user_id,
    provided_pin: sensitive_context[:account_pin],
    security_answers: sensitive_context[:security_questions]
  )

  {
    identity_verified: security_check.verified?,
    verification_method: security_check.method,
    risk_score: security_check.risk_score,
    # Never return sensitive data in tool response
    sensitive_data_accessed: true
  }
end

billing_agent.add_tool(method(:check_billing_status))
security_agent.add_tool(method(:verify_identity))

Context scoping provides several benefits:

  1. Security: Sensitive information is only available to authorized agents
  2. Performance: Agents don't receive unnecessary context that would consume tokens
  3. Clarity: Each agent only sees context relevant to its function
  4. Compliance: Audit trails can track which agents accessed sensitive information

7.6. Context Lifecycle Management

Context has a lifecycle that must be managed to prevent memory leaks, maintain performance, and ensure data freshness.

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NoMethodError: undefined method 'hour' for an instance of Integer /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-ifeltj.rb:459:in 'ContextManager#create_context' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-ifeltj.rb:565:in '<main>'

class ContextManager
  def initialize
    @context_store = {}
    @context_ttl = {}
    @context_access_log = {}
  end

  def create_context(session_id, initial_context = {})
    @context_store[session_id] = {
      created_at: Time.current,
      last_accessed: Time.current,
      data: initial_context,
      version: 1
    }

    @context_ttl[session_id] = Time.current + 1.hour
    @context_access_log[session_id] = []

    log_context_event(session_id, 'created', initial_context.keys)
  end

  def update_context(session_id, updates)
    return unless @context_store.key?(session_id)

    context = @context_store[session_id]
    old_version = context[:version]

    context[:data].merge!(updates)
    context[:last_accessed] = Time.current
    context[:version] += 1

    # Extend TTL on updates
    @context_ttl[session_id] = Time.current + 1.hour

    log_context_event(session_id, 'updated', updates.keys, old_version, context[:version])
  end

  def get_context(session_id, agent_name = nil)
    return {} unless @context_store.key?(session_id)

    # Check if context has expired
    if Time.current > @context_ttl[session_id]
      expire_context(session_id)
      return {}
    end

    context = @context_store[session_id]
    context[:last_accessed] = Time.current

    # Apply agent-specific scoping
    scoped_context = apply_context_scoping(context[:data], agent_name)

    log_context_access(session_id, agent_name, scoped_context.keys)

    scoped_context
  end

  def expire_context(session_id)
    return unless @context_store.key?(session_id)

    context = @context_store[session_id]

    # Archive context for analytics
    ContextArchive.create(
      session_id: session_id,
      created_at: context[:created_at],
      last_accessed: context[:last_accessed],
      version: context[:version],
      access_count: @context_access_log[session_id].length,
      data: context[:data]
    )

    @context_store.delete(session_id)
    @context_ttl.delete(session_id)
    @context_access_log.delete(session_id)

    log_context_event(session_id, 'expired')
  end

  def cleanup_expired_contexts
    expired_sessions = @context_ttl.select { |_, ttl| Time.current > ttl }.keys
    expired_sessions.each { |session_id| expire_context(session_id) }
  end

  private

  def apply_context_scoping(context, agent_name)
    # Apply agent-specific context scoping
    scoped_context = context.except(:sensitive_context)

    if agent_name && context[:agent_contexts]&.[](agent_name)
      scoped_context.merge!(context[:agent_contexts][agent_name])
    end

    scoped_context
  end

  def log_context_event(session_id, event_type, keys = [], old_version = nil, new_version = nil)
    ContextAuditLog.create(
      session_id: session_id,
      event_type: event_type,
      keys_affected: keys,
      old_version: old_version,
      new_version: new_version,
      timestamp: Time.current
    )
  end

  def log_context_access(session_id, agent_name, keys_accessed)
    @context_access_log[session_id] << {
      agent: agent_name,
      keys: keys_accessed,
      timestamp: Time.current
    }
  end
end

# Usage in production
context_manager = ContextManager.new

# Create context for new session
context_manager.create_context('sess_123', {
  user_id: 'user_456',
  language: 'en',
  session_type: 'support'
})

# Update context during conversation
context_manager.update_context('sess_123', {
  escalation_level: 1,
  issue_category: 'billing'
})

# Get context for specific agent
billing_context = context_manager.get_context('sess_123', 'BillingSupport')

Context lifecycle management ensures:

  1. Memory Efficiency: Expired contexts are automatically cleaned up
  2. Data Freshness: Context has configurable time-to-live
  3. Audit Compliance: All context access is logged for security audits
  4. Performance: Context scoping reduces token usage per agent

7.7. Context Persistence Strategies

Different applications require different context persistence strategies. RAAF supports multiple approaches to match your scalability and reliability requirements.

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: uninitialized constant Redis /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-97hls8.rb:449:in '<main>'

# In-memory context (development/testing)
memory_context = RAAF::Context::InMemoryStore.new

# Redis-based context (production)
redis_context = RAAF::Context::RedisStore.new(
  redis_client: Redis.new(url: ENV['REDIS_URL']),
  key_prefix: 'raaf:context:',
  ttl: 3600 # 1 hour
)

# Database-backed context (enterprise)
db_context = RAAF::Context::DatabaseStore.new(
  table_name: 'agent_contexts',
  connection: ActiveRecord::Base.connection
)

# Hybrid context (high-performance production)
hybrid_context = RAAF::Context::HybridStore.new(
  hot_store: redis_context,      # Frequent access
  cold_store: db_context,        # Long-term storage
  migration_threshold: 5.minutes  # Move to cold storage after 5 minutes
)

# Configure runner with appropriate context store
runner = RAAF::Runner.new(
  agent: agent,
  context_store: hybrid_context,
  context_variables: initial_context
)

Each persistence strategy has different trade-offs:

  • In-Memory: Fastest, no persistence, suitable for development
  • Redis: Fast, clusterable, good for production
  • Database: Durable, queryable, good for compliance
  • Hybrid: Best of both worlds, optimal for high-scale production

7.8. Context and Token Optimization

Context directly impacts token usage and costs. Smart context management can reduce token consumption by 30-50% while maintaining conversation quality.

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NoMethodError: undefined method 'map' for an instance of String /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-rb6gk0.rb:461:in 'TokenOptimizedContext#optimize_context' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-rb6gk0.rb:508:in '<main>'

# Token-optimized context management
class TokenOptimizedContext
  def initialize(max_tokens: 4000)
    @max_tokens = max_tokens
    @context_priorities = {
      user_id: 100,           # Always keep
      session_id: 100,        # Always keep
      current_step: 90,       # High priority
      escalation_level: 90,   # High priority
      user_preferences: 80,   # Medium-high priority
      previous_actions: 70,   # Medium priority
      debug_info: 10          # Low priority
    }
  end

  def optimize_context(context)
    # Calculate token usage for each context item
    context_with_tokens = context.map do |key, value|
      token_count = estimate_tokens(value)
      priority = @context_priorities[key] || 50

      {
        key: key,
        value: value,
        tokens: token_count,
        priority: priority,
        score: priority / token_count  # Priority per token
      }
    end

    # Sort by score and select items within token budget
    selected_items = []
    total_tokens = 0

    context_with_tokens.sort_by { |item| -item[:score] }.each do |item|
      if total_tokens + item[:tokens] <= @max_tokens
        selected_items << item
        total_tokens += item[:tokens]
      end
    end

    # Convert back to context hash
    optimized_context = selected_items.each_with_object({}) do |item, context|
      context[item[:key]] = item[:value]
    end

    {
      context: optimized_context,
      tokens_used: total_tokens,
      tokens_saved: estimate_tokens(context) - total_tokens,
      items_dropped: context.length - selected_items.length
    }
  end

  private

  def estimate_tokens(value)
    # Rough estimation: 1 token per 4 characters
    value.to_s.length / 4
  end
end

# Use token-optimized context
context_optimizer = TokenOptimizedContext.new(max_tokens: 3000)
result = context_optimizer.optimize_context(large_context)

runner = RAAF::Runner.new(
  agent: agent,
  context_variables: result[:context]
)

This optimization approach:

  1. Prioritizes Essential Information: User ID and session data are always preserved
  2. Maximizes Value: Selects context items with the highest priority-to-token ratio
  3. Stays Within Budget: Ensures context doesn't exceed token limits
  4. Provides Metrics: Shows how much was saved and what was dropped

Context management is the foundation of intelligent, stateful AI systems. By properly managing context variables, inheritance, scoping, and lifecycle, you create AI agents that feel less like chatbots and more like knowledgeable assistants who remember, learn, and adapt to user needs.

8. Error Handling and Recovery

RAAF provides comprehensive error handling and recovery mechanisms.

8.1. Provider Failures

runner = RAAF::Runner.new(
  agent: agent,
  retry_config: {
    max_retries: 3,
    backoff_strategy: :exponential,
    retry_on: [
      RAAF::Errors::ProviderTimeoutError,
      RAAF::Errors::RateLimitError
    ]
  }
)

8.2. Tool Failures

def unreliable_tool(data:)
  begin
    # Potentially failing operation
    external_api.process(data)
  rescue ExternalAPI::Error => e
    # Return error info instead of raising
    {
      error: true,
      error_type: 'api_failure',
      message: e.message,
      retry_after: e.retry_after
    }
  end
end

agent.add_tool(method(:unreliable_tool))

8.3. Circuit Breaker Pattern

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'agent' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-nclqkk.rb:482:in '<main>'

class CircuitBreakerTool
  def initialize
    @failure_count = 0
    @last_failure_time = nil
    @circuit_open = false
  end

  def call(data:)
    return { error: 'Circuit breaker open' } if circuit_open?

    begin
      result = external_service.call(data)
      reset_circuit
      result
    rescue => e
      record_failure
      { error: e.message }
    end
  end

  private

  def circuit_open?
    @circuit_open && (Time.now - @last_failure_time) < 60
  end

  def record_failure
    @failure_count += 1
    @last_failure_time = Time.now
    @circuit_open = true if @failure_count >= 5
  end

  def reset_circuit
    @failure_count = 0
    @circuit_open = false
  end
end

agent.add_tool(CircuitBreakerTool.new.method(:call))

9. Performance Optimization: Making AI Fast, Cheap, and Reliable

Let me share a horror story. We had a client whose AI bill hit $47,000 in one day. Not because of a bug—their system was working perfectly. It was just working very, very inefficiently.

Here's how we cut their costs by 93% while making their system 5x faster.

9.1. The Hidden Performance Killers in AI Systems

Before we dive into solutions, understand what's actually slow:

  1. API Latency: 200-800ms per call (can't fix this)
  2. Token Processing: 50-100 tokens/second (model dependent)
  3. Tool Execution: 10ms-30s (your code)
  4. Memory Operations: 1-1000ms (your architecture)
  5. Network Overhead: 10-100ms (geography matters)

The trick is optimizing what you can control.

9.2. Connection Pooling: Stop Creating, Start Reusing

The $15,000 Mistake

# Bad: Creates new connection for every request
class NaiveAgent
  def process_request(prompt)
    # This constructor makes 3 HTTP calls!
    provider = RAAF::Models::ResponsesProvider.new(
      api_key: ENV['OPENAI_API_KEY']
    )
    runner = RAAF::Runner.new(agent: self, provider: provider)
    runner.run(prompt)
  end
end

# At 1000 requests/hour, that's 3000 unnecessary API calls

The Production Solution

# Good: Reuse connections like a pro
class ProductionAgent
  def initialize
    # Connection pool configuration based on real metrics
    @provider = RAAF::Models::ResponsesProvider.new(
      api_key: ENV['OPENAI_API_KEY'],

      # Pool size = expected concurrent requests + buffer
      pool_size: calculate_pool_size,

      # Timeout based on p99 response time + margin
      timeout: 30,

      # Keep connections warm
      keepalive: true,
      keepalive_timeout: 60,

      # Retry configuration
      retry_config: {
        max_attempts: 3,
        base_delay: 0.5,
        max_delay: 2,
        multiplier: 2
      }
    )

    # Pre-warm the connection pool
    warm_connection_pool
  end

  private

  def calculate_pool_size
    # Based on your actual traffic patterns
    base_concurrent_requests = 10
    spike_buffer = 5
    failover_capacity = 5

    base_concurrent_requests + spike_buffer + failover_capacity
  end

  def warm_connection_pool
    # Make dummy requests to establish connections
    pool_size.times do
      Thread.new { @provider.test_connection rescue nil }
    end
  end
end

Real Impact:

  • Connection time: 200ms → 0ms (reused)
  • Error rate: 0.1% → 0.01% (retry logic)
  • Cost: No change, but 20% faster responses

9.3. Parallel Tool Execution: Work Smarter, Not Harder

The Serial Slowdown

# Bad: Tools run one at a time (10 seconds total)
def analyze_customer(customer_id)
  profile = fetch_profile(customer_id)      # 2 seconds
  orders = fetch_orders(customer_id)        # 3 seconds
  support = fetch_tickets(customer_id)      # 2 seconds
  social = fetch_social(customer_id)        # 3 seconds

  { profile: profile, orders: orders, support: support, social: social }
end

The Parallel Paradise

# Good: Tools run simultaneously (3 seconds total)
class ParallelCustomerAnalyzer
  def initialize
    @runner = RAAF::Runner.new(
      agent: agent,

      # Enable parallel execution
      parallel_tools: true,

      # Limit based on provider rate limits
      max_parallel_tools: determine_safe_parallel_limit,

      # Timeout for parallel operations
      parallel_timeout: 10,

      # What to do when one fails
      partial_failure_strategy: :continue_with_results
    )
  end

  def analyze_customer_parallel(customer_id)
    # RAAF automatically parallelizes these tool calls
    @runner.run(<<~PROMPT)
      Analyze customer #{customer_id}:

      1. Fetch their profile
      2. Get order history
      3. Review support tickets  
      4. Check social engagement

      Return all data in structured format.
    PROMPT
  end

  private

  def determine_safe_parallel_limit
    # Based on rate limits and system resources
    provider_rate_limit = 50  # requests per second
    safety_factor = 0.5       # Use 50% of limit
    avg_tools_per_request = 4

    (provider_rate_limit * safety_factor / avg_tools_per_request).floor
  end
end

Common Gotchas:

  • Some tools have dependencies (fetch auth token → use token)
  • Database connection pools need sizing for parallel queries
  • Error handling gets complex (one fails, others succeed?)
  • Memory usage can spike with parallel operations

9.4. Response Caching: The Same Question Twice

The Embarrassing Discovery

# Our logs showed:
# 10:00 AM - User: "What's your refund policy?"
# 10:01 AM - Same user: "What's your refund policy?"
# 10:02 AM - Same user: "Tell me about refunds"
# Each request = $0.02. Thousands of users = $$$

The Smart Cache

class IntelligentCacheManager
  def initialize
    # Multi-layer cache strategy
    @memory_cache = RAAF::Cache::MemoryStore.new(
      size_limit: 1000,  # Keep hot data in RAM
      ttl: 300           # 5 minutes for immediate repeats
    )

    @file_cache = RAAF::Cache::FileStore.new(
      directory: './cache',
      ttl: 3600,              # 1 hour for common questions
      max_size: '1GB',
      compression: :gzip      # Save disk space
    )

    @semantic_cache = RAAF::Cache::SemanticStore.new(
      embedding_model: 'text-embedding-3-small',
      similarity_threshold: 0.95,  # Nearly identical questions
      ttl: 86400                   # 24 hours for FAQ-style content
    )
  end

  def get_or_generate(prompt, context = {})
    # Level 1: Exact match in memory
    cache_key = generate_cache_key(prompt, context)
    if result = @memory_cache.get(cache_key)
      log_cache_hit(:memory, prompt)
      return result
    end

    # Level 2: Exact match on disk
    if result = @file_cache.get(cache_key)
      @memory_cache.set(cache_key, result)  # Promote to memory
      log_cache_hit(:file, prompt)
      return result
    end

    # Level 3: Semantic similarity match
    if result = @semantic_cache.find_similar(prompt)
      log_cache_hit(:semantic, prompt)
      return adapt_response(result, prompt)  # Slight personalization
    end

    # Level 4: Generate new response
    result = generate_response(prompt, context)

    # Cache at all levels
    cache_response(cache_key, prompt, result)

    result
  end

  private

  def generate_cache_key(prompt, context)
    # Include context that affects response
    relevant_context = context.slice(:user_tier, :language, :region)
    Digest::SHA256.hexdigest("#{prompt}:#{relevant_context.sort.to_s}")
  end

  def should_cache?(prompt, response)
    # Don't cache personal data
    return false if prompt =~ /my account|my order|my data/i

    # Don't cache time-sensitive info
    return false if response =~ /today|current|now|at this moment/i

    # Don't cache errors
    return false if response[:error]

    true
  end
end

Cache Hit Rates from Production:

  • FAQ-style questions: 87% cache hit rate
  • Technical documentation: 72% cache hit rate
  • Personal queries: 3% cache hit rate (as expected)
  • Overall cost reduction: 42%

9.5. Token Optimization: Every Token Counts

The Token Waste Audit

# We analyzed 10,000 requests and found:
# - 35% of tokens were system prompts (repeated every call)
# - 20% were verbose tool descriptions
# - 15% were formatting instructions
# - Only 30% were actual user content!

The Optimization Playbook

class TokenOptimizedAgent
  def initialize
    @router = RAAF::Agent.new(
      name: "Router",
      # Minimal instructions for routing
      instructions: "Route to: support, sales, or tech",
      model: "gpt-4o-mini",  # $0.0001 per 1K tokens
      max_tokens: 10         # Just need one word
    )

    @specialists = {
      support: build_specialist("support", "gpt-4o-mini"),
      sales: build_specialist("sales", "gpt-4o"),
      tech: build_specialist("tech", "gpt-4o")
    }
  end

  def process_optimized(user_input)
    # Step 1: Compress user input
    compressed = compress_input(user_input)

    # Step 2: Quick routing (10 tokens max)
    route = @router.run(compressed)

    # Step 3: Specialist handles with full context
    specialist = @specialists[route.to_sym]
    specialist.run(user_input)  # Original input for accuracy
  end

  private

  def compress_input(input)
    # Remove redundancy, keep essence
    Compressor.new.compress(input, preserve: [:intent, :key_facts])
  end

  def build_specialist(type, model)
    RAAF::Agent.new(
      name: type.capitalize,
      instructions: load_compressed_instructions(type),
      model: model,

      # Token-saving configurations
      response_format: { type: 'json' },  # No markdown overhead
      temperature: 0.3,                   # More predictable = shorter

      # Smart truncation
      max_completion_tokens: calculate_optimal_max(type),

      # Reuse context across turns
      context_window_management: :aggressive
    )
  end
end

Advanced Token Strategies:

# 1. Dynamic instruction loading
def load_instructions_for_context(user_type, query_type)
  base_instructions = "Be helpful and accurate."

  # Only add what's needed
  case query_type
  when :refund
    base_instructions += " Refund policy: 30 days, original payment method."
  when :technical
    base_instructions += load_technical_context
  end

  base_instructions
end

# 2. Response compression
def compress_response(response)
  # Remove fluff, keep facts
  response
    .gsub(/(?:I understand|I can help|I'd be happy to)\s*/i, '')
    .gsub(/\s+/, ' ')
    .strip
end

# 3. Batching for efficiency
def process_batch(queries)
  # Combine multiple queries into one request
  combined = queries.map.with_index { |q, i| "#{i+1}. #{q}" }.join("\n")

  response = agent.run(<<~PROMPT)
    Answer each numbered question concisely:
    #{combined}
  PROMPT

  # Split responses
  parse_batched_response(response)
end

Real-World Impact:

  • Average tokens per request: 1,847 → 743
  • Cost per request: $0.037 → $0.015
  • Response time: 3.2s → 1.8s
  • Monthly savings: $18,000

9.6. The Performance Monitoring Dashboard

class PerformanceMonitor
  def track_request(request_id, &block)
    metrics = {
      start_time: Time.now,
      tokens_sent: 0,
      tokens_received: 0,
      cache_hits: [],
      tool_timings: {},
      errors: []
    }

    result = yield(metrics)

    metrics[:end_time] = Time.now
    metrics[:total_duration] = metrics[:end_time] - metrics[:start_time]
    metrics[:cost] = calculate_cost(metrics)

    # Alert on anomalies
    check_performance_thresholds(metrics)

    # Store for analysis
    store_metrics(request_id, metrics)

    result
  end

  private

  def check_performance_thresholds(metrics)
    if metrics[:total_duration] > 10
      alert("Slow request: #{metrics[:total_duration]}s")
    end

    if metrics[:cost] > 0.50
      alert("Expensive request: $#{metrics[:cost]}")
    end

    if metrics[:tokens_sent] > 3000
      alert("Token limit warning: #{metrics[:tokens_sent]} tokens")
    end
  end
end

10. Best Practices

10.1. Hard-Won Lessons from Production AI Systems

After building dozens of AI agent systems, these are the patterns that separate the successful from the struggling:

10.2. Agent Design: The Art of Specialization

10.2.1. Why "Do Everything" Agents Always Fail

We once built a "SuperAgent" that could handle customer service, technical support, sales, and account management. It knew everything. It could do anything. It was terrible at everything.

The problem? Conflicting instructions created confusion:

  • Customer service training said "always be empathetic"
  • Sales training said "always be closing"
  • Technical support training said "always be precise"

The result was an agent that tried to close sales while customers were reporting bugs.

10.2.2. The Single Responsibility Principle

Bad:

agent = RAAF::Agent.new(
  name: "DoEverything",
  instructions: "Handle customer service, sales, support, and billing"
)

Good:

# Each agent has ONE clear job
customer_service = RAAF::Agent.new(
  name: "CustomerService",
  instructions: "Help customers with order issues and account questions"
)

sales_agent = RAAF::Agent.new(
  name: "Sales",
  instructions: "Identify opportunities and guide purchase decisions"
)

tech_support = RAAF::Agent.new(
  name: "TechSupport",
  instructions: "Diagnose technical issues and provide solutions"
)

10.2.3. Clear Instructions: The Difference Between Success and Chaos

Vague Instructions Lead to Unpredictable Behavior:

# This agent will disappoint you
agent = RAAF::Agent.new(
  instructions: "Be helpful and professional"
)

Specific Instructions Create Consistent Results:

agent = RAAF::Agent.new(
  name: "RefundProcessor",
  instructions: <<~INSTRUCTIONS
    You process refund requests following these rules:

    ALWAYS:

    - Verify order exists and is within 30-day window
    - Check if item was delivered
    - Calculate refund including original shipping
    - Provide clear timeline (3-5 business days)

    NEVER:

    - Process refunds over $500 without manager approval
    - Refund digital products after download
    - Make exceptions to the 30-day policy

    For edge cases, escalate to human review.
  INSTRUCTIONS
)

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. ruby: /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-wtjxd0.rb:447: syntax errors found (SyntaxError) 445 | customer_service_agent = RAAF::Agent.new( 446 | name: "CustomerService", > 447 | instructions: <<~INSTRUCTIONS | ^~~~~~~~~~~~ unterminated heredoc; can't find string "INSTRUCTIONS" anywhere before EOF 448 | You help customers with: 449 | ~~~~~~~ 470 | puts e.backtrace.first(3).join("\n") if e.backtrace 471 | exit 1 > 472 | end | ^ unexpected end-of-input, assuming it is closing the parent top level context | ^ unexpected end-of-input; expected a `)` to close the arguments > 473 | | ^ expected an `end` to close the `begin` statement

# Good: Focused agent with clear purpose
customer_service_agent = RAAF::Agent.new(
  name: "CustomerService",
  instructions: <<~INSTRUCTIONS
    You help customers with:

    - Order status inquiries
    - Account issues
    - Product questions

    Always be polite and helpful. Escalate complex issues to human agents.
  INSTRUCTIONS,
  model: "gpt-4o"
)

# Add focused tools
customer_service_agent.add_tool(method(:lookup_order))
customer_service_agent.add_tool(method(:check_account_status))
customer_service_agent.add_tool(method(:escalate_to_human))

10.3. Tool Design Patterns

10.3.1. Tool Safety Considerations

Tools with broad capabilities can cause unintended consequences. For example, a generic "update_database" tool might perform operations beyond the intended scope when given ambiguous instructions.

Tool design requires balancing functionality with safety, clarity, and predictability.

10.3.2. Tool Design Principles

1. One Tool, One Purpose

# Bad: The kitchen sink approach
def manage_customer(action:, **params)
  case action
  when 'create' then create_customer(params)
  when 'update' then update_customer(params)
  when 'delete' then delete_customer(params)
  when 'merge' then merge_customers(params)
  # 20 more actions...
  end
end

# Good: Specific, predictable tools
def create_customer(name:, email:, company: nil)
  # Single, clear purpose
end

def update_customer_email(customer_id:, new_email:)
  # Specific update, not generic "update anything"
end

2. Defensive Programming is Not Optional

class SafeDatabaseTool
  MAX_RECORDS_PER_OPERATION = 100

  def update_customer_status(customer_id:, new_status:)
    # Layer 1: Input validation
    validate_customer_id!(customer_id)
    validate_status!(new_status)

    # Layer 2: Permission check
    unless can_modify_customer?(customer_id)
      return { error: "Permission denied for customer #{customer_id}" }
    end

    # Layer 3: Business rule validation
    current_status = get_customer_status(customer_id)
    unless valid_status_transition?(current_status, new_status)
      return { 
        error: "Invalid status transition",
        current: current_status,
        requested: new_status,
        allowed: allowed_transitions_from(current_status)
      }
    end

    # Layer 4: Audit trail
    log_change_request(
      user: current_user,
      action: 'status_update',
      customer_id: customer_id,
      from: current_status,
      to: new_status
    )

    # Layer 5: Actual update with rollback capability
    transaction do
      previous_state = capture_state(customer_id)

      begin
        result = Customer.find(customer_id).update!(status: new_status)

        # Layer 6: Verification
        unless verify_update(customer_id, new_status)
          raise "Update verification failed"
        end

        {
          success: true,
          customer_id: customer_id,
          previous_status: current_status,
          new_status: new_status,
          updated_at: Time.now,
          rollback_token: generate_rollback_token(previous_state)
        }

      rescue => e
        rollback_to(previous_state)
        { error: "Update failed: #{e.message}" }
      end
    end
  end
end

3. Clear Naming: Your Tool's First Documentation

# Bad: Ambiguous names
def process_data(data) # Process how?
def handle_request(request) # Handle what?
def fix_issue(issue_id) # Fix how?

# Good: Self-documenting names
def validate_email_format(email:)
def calculate_shipping_cost(weight:, destination:)
def escalate_ticket_to_supervisor(ticket_id:, reason:)

4. Structured Responses: Consistency is King

module ToolResponseFormatter
  def success_response(data = {})
    {
      success: true,
      timestamp: Time.now.iso8601,
      data: data,
      warnings: collect_warnings
    }
  end

  def error_response(error, recoverable: true)
    {
      success: false,
      timestamp: Time.now.iso8601,
      error: {
        message: sanitize_error_message(error),
        type: error.class.name,
        recoverable: recoverable,
        suggestions: suggest_fixes(error)
      }
    }
  end

  def partial_success_response(succeeded, failed)
    {
      success: false,
      partial: true,
      timestamp: Time.now.iso8601,
      succeeded: succeeded,
      failed: failed,
      summary: {
        total: succeeded.count + failed.count,
        success_count: succeeded.count,
        failure_count: failed.count
      }
    }
  end
end

5. Documentation That Actually Helps

# This is what your AI reads to understand your tool
class WellDocumentedTool
  # Books a meeting room for the specified time slot
  #
  # @param room_id [String] The room identifier (e.g., 'conf-3rd-floor-a')
  # @param start_time [String] ISO 8601 formatted start time
  # @param duration_minutes [Integer] Meeting duration (15-480 minutes)
  # @param attendee_count [Integer] Number of attendees (1-50)
  # @param title [String] Meeting title (required for calendar)
  # 
  # @return [Hash] Booking confirmation with:
  #   - booking_id: Unique identifier for the booking
  #   - calendar_link: URL to add to calendar
  #   - room_details: Capacity, equipment, location
  #   - conflicts: Any conflicting bookings that were checked
  #
  # @example
  #   book_meeting_room(
  #     room_id: 'conf-3rd-floor-a',
  #     start_time: '2024-03-15T14:00:00Z',
  #     duration_minutes: 60,
  #     attendee_count: 8,
  #     title: 'Q1 Planning Review'
  #   )
  def book_meeting_room(room_id:, start_time:, duration_minutes:, 
                       attendee_count:, title:)
    # Implementation
  end
end

10.4. Memory Management: The Art of Strategic Forgetting

10.4.1. Why Perfect Memory is Actually Terrible

A financial advisor bot remembered everything. Every. Single. Detail. After six months, it was tracking:

  • Every stock price mentioned
  • Every "what if" scenario discussed
  • Every typo and correction
  • Every "let me think about it"

Result? The bot spent more time processing irrelevant history than giving advice. Response time went from 2 seconds to 45 seconds. Customers left.

10.4.2. The Memory Management Playbook

1. Choose Your Memory Store Like Your Life Depends On It

class MemoryStoreSelector
  def self.recommend(requirements)
    case
    # Demos and prototypes
    when requirements[:persistence] == false &&
         requirements[:scale] < 100
      configure_in_memory_store

    # Most production applications  
    when requirements[:compliance] || 
         requirements[:audit_trail]
      configure_file_store_with_encryption

    # High-scale with intelligence
    when requirements[:scale] > 10_000 ||
         requirements[:semantic_search]
      configure_vector_store_with_clustering

    # When in doubt
    else
      configure_hybrid_store
    end
  end

  private

  def self.configure_hybrid_store
    RAAF::Memory::HybridStore.new(
      # Recent messages in memory
      hot_storage: RAAF::Memory::InMemoryStore.new(
        capacity: 1000,
        ttl: 300  # 5 minutes
      ),

      # Everything else on disk
      cold_storage: RAAF::Memory::FileStore.new(
        directory: 'data/conversations',
        compression: :zstd,
        encryption: true
      ),

      # Semantic index for search
      index: RAAF::Memory::VectorStore.new(
        embedding_model: 'text-embedding-3-small'
      )
    )
  end
end

2. Implement Progressive Summarization

class ProgressiveSummarizer
  SUMMARY_TRIGGERS = {
    token_threshold: 0.7,      # Start when 70% full
    message_count: 50,         # Or after 50 messages
    time_elapsed: 3600,        # Or after 1 hour
    topic_changes: 3           # Or after 3 major topic shifts
  }

  def should_summarize?(conversation)
    SUMMARY_TRIGGERS.any? do |trigger, threshold|
      case trigger
      when :token_threshold
        conversation.token_usage > (conversation.max_tokens * threshold)
      when :message_count
        conversation.messages.count > threshold
      when :time_elapsed
        Time.now - conversation.started_at > threshold
      when :topic_changes
        detect_topic_changes(conversation) > threshold
      end
    end
  end

  def create_summary(messages)
    # Group by topic and importance
    grouped = group_messages_intelligently(messages)

    # Summarize each group appropriately
    summaries = grouped.map do |group|
      case group[:type]
      when :decision
        # Preserve exact decision details
        preserve_decisions(group[:messages])
      when :exploration
        # Compress exploratory discussion
        summarize_exploration(group[:messages])
      when :small_talk
        # Minimal summary for chitchat
        minimal_summary(group[:messages])
      end
    end

    # Combine with metadata
    {
      summary: summaries.join("\n\n"),
      key_facts: extract_key_facts(messages),
      open_questions: find_unresolved_questions(messages),
      important_entities: extract_entities(messages),
      summary_date: Time.now,
      original_message_count: messages.count,
      compression_ratio: calculate_compression(messages, summaries)
    }
  end
end

3. Smart Context Selection

class ContextOptimizer
  def build_optimal_context(current_query, history, budget = 4000)
    context = []
    remaining_tokens = budget

    # Priority 1: System message (non-negotiable)
    system_msg = history[:system_message]
    context << system_msg
    remaining_tokens -= count_tokens(system_msg)

    # Priority 2: Current query context
    query_context = find_related_context(current_query, history)
    query_context.each do |msg|
      if remaining_tokens > count_tokens(msg) + 500  # Reserve for response
        context << msg
        remaining_tokens -= count_tokens(msg)
      end
    end

    # Priority 3: Recent messages
    history[:messages].last(10).each do |msg|
      if remaining_tokens > count_tokens(msg) + 500
        context << msg unless context.include?(msg)
        remaining_tokens -= count_tokens(msg)
      end
    end

    # Priority 4: Important markers
    find_important_messages(history).each do |msg|
      if remaining_tokens > count_tokens(msg) + 500
        context << msg unless context.include?(msg)
        remaining_tokens -= count_tokens(msg)
      end
    end

    # Return chronologically ordered
    order_chronologically(context)
  end
end

10.5. Testing Strategies

10.5.1. The Testing Pyramid for AI Systems

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: uninitialized constant Tools /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-svflnq.rb:445:in '<main>'

# Level 1: Tool Unit Tests (Fast, Deterministic)
RSpec.describe Tools::OrderLookup do
  describe '#find_order' do
    it 'returns order details for valid ID' do
      result = subject.find_order(order_id: '12345')

      expect(result).to match({
        success: true,
        data: hash_including(
          order_id: '12345',
          status: String,
          total: Numeric
        )
      })
    end

    it 'handles non-existent orders gracefully' do
      result = subject.find_order(order_id: 'INVALID')

      expect(result).to match({
        success: false,
        error: hash_including(
          message: /not found/i,
          recoverable: true
        )
      })
    end

    it 'validates order ID format' do
      expect {
        subject.find_order(order_id: 'ABC-123')
      }.to raise_error(ArgumentError, /Invalid order ID format/)
    end
  end
end

# Level 2: Agent Behavior Tests (Mocked AI)
RSpec.describe Agents::CustomerService do
  let(:agent) { described_class.new }
  let(:mock_ai) { instance_double(RAAF::Models::Provider) }

  before do
    allow(agent).to receive(:provider).and_return(mock_ai)
  end

  it 'looks up order when asked about status' do
    allow(mock_ai).to receive(:complete).and_return(
      double(content: 'lookup_order(order_id: "12345")')
    )

    result = agent.process("What's the status of order 12345?")

    expect(result).to include('order has been shipped')
  end

  it 'escalates when unable to help' do
    allow(mock_ai).to receive(:complete).and_return(
      double(content: 'escalate_to_human(reason: "refund_over_limit")')
    )

    result = agent.process("I need a $5000 refund")

    expect(result).to include('connecting you with a specialist')
  end
end

# Level 3: Integration Tests (Real AI, Controlled Environment)
RSpec.describe 'Customer Service Flow', :integration do
  let(:runner) { RAAF::Runner.new(agent: CustomerServiceAgent.new) }

  it 'handles complete refund flow' do
    VCR.use_cassette('customer_service/refund_flow') do
      # Initial inquiry
      result1 = runner.run("I want to return order 12345")
      expect(result1.messages.last[:content]).to include('return policy')

      # Confirm intent
      result2 = runner.run("Yes, I want a refund")
      expect(result2.messages.last[:content]).to include('processing')

      # Verify completion
      expect(result2.tool_calls).to include(
        hash_including(name: 'process_refund')
      )
    end
  end
end

# Level 4: Contract Tests (Provider Compatibility)
RSpec.describe 'Provider Compatibility', :contract do
  providers = [
    RAAF::Models::OpenAI.new,
    RAAF::Models::Anthropic.new,
    RAAF::Models::Groq.new
  ]

  providers.each do |provider|
    context "with #{provider.class.name}" do
      it 'handles tool calls correctly' do
        agent = RAAF::Agent.new(
          instructions: 'You help with math',
          provider: provider
        )
        agent.add_tool(method(:calculate))

        runner = RAAF::Runner.new(agent: agent)
        result = runner.run("What's 2+2?")

        expect(result.success?).to be true
        expect(result.messages.last[:content]).to include('4')
      end
    end
  end
end

10.5.2. Testing Anti-Patterns to Avoid

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: uninitialized constant OpenAI /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:446:in 'block in <main>' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:334:in 'Object#it' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-f6ye7r.rb:445:in '<main>'

# DON'T: Test implementation details
it 'calls OpenAI with correct parameters' do
  expect(OpenAI::Client).to receive(:new).with(api_key: 'key')
  # This breaks when you switch providers
end

# DO: Test behavior
it 'responds to customer inquiries' do
  result = runner.run("What's your return policy?")
  expect(result.messages.last[:content]).to match(/30 days/)
end

# DON'T: Test exact AI responses
it 'returns exact message' do
  expect(response).to eq("Hello! I'd be happy to help you today.")
  # AI responses vary
end

# DO: Test response characteristics
it 'greets customer politely' do
  expect(response).to match(/hello|hi|greetings/i)
  expect(response).to match(/help|assist|support/i)
end

10.6. Production Monitoring: Because You Can't Fix What You Can't See

class ProductionMonitor
  CRITICAL_METRICS = {
    response_time: { threshold: 5.0, unit: :seconds },
    token_usage: { threshold: 4000, unit: :tokens },
    cost_per_request: { threshold: 0.50, unit: :dollars },
    error_rate: { threshold: 0.02, unit: :percentage },
    tool_success_rate: { threshold: 0.95, unit: :percentage }
  }

  def monitor_request(request_id)
    start_metrics = capture_start_state

    begin
      result = yield

      end_metrics = capture_end_state
      metrics = calculate_metrics(start_metrics, end_metrics)

      # Check thresholds
      check_critical_thresholds(metrics)

      # Store for analysis
      store_metrics(request_id, metrics)

      # Real-time dashboards
      publish_to_dashboard(metrics)

      result

    rescue => e
      record_error(request_id, e)
      alert_on_critical_error(e)
      raise
    end
  end

  def daily_report
    {
      summary: {
        total_requests: count_requests,
        success_rate: calculate_success_rate,
        average_response_time: average(:response_time),
        total_cost: sum(:cost),
        token_usage: percentile(:token_usage, 95)
      },

      trends: {
        cost_trend: calculate_trend(:cost, days: 7),
        performance_trend: calculate_trend(:response_time, days: 7),
        error_trend: calculate_trend(:errors, days: 7)
      },

      anomalies: detect_anomalies,

      recommendations: generate_recommendations
    }
  end
end

11. Troubleshooting

11.1. Common Issues

Tool not being called

  • Check tool name and parameters
  • Verify agent instructions mention the tool
  • Ensure parameter types match

Memory growing too large

  • Reduce max_tokens setting
  • Enable pruning strategy
  • Use compression

Slow responses

  • Enable parallel tool execution
  • Use faster models for simple tasks
  • Implement caching

Provider errors

  • Check API keys and quotas
  • Implement retry logic
  • Use circuit breaker pattern

11.2. Debugging

# Enable debug logging
ENV['RAAF_LOG_LEVEL'] = 'debug'
ENV['RAAF_DEBUG_CATEGORIES'] = 'api,tools,memory'

# Add debug tracer
tracer = RAAF::Tracing::SpanTracer.new
tracer.add_processor(RAAF::Tracing::ConsoleProcessor.new)

runner = RAAF::Runner.new(
  agent: agent,
  tracer: tracer
)

12. Next Steps

Now that you understand the core concepts:



Back to top