More at guides.raaf-ai.dev:

RAAF Providers Guide

This guide covers the multi-provider architecture of Ruby AI Agents Factory (RAAF). RAAF supports 100+ AI providers, allowing you to use the best model for each task while avoiding vendor lock-in.

After reading this guide, you will know:

  • How to configure and use different AI providers
  • Provider-specific features and capabilities
  • Cost optimization strategies across providers
  • Provider failover and load balancing
  • Performance tuning for different providers

1. Introduction

1.1. The Provider Abstraction Philosophy

RAAF's provider-agnostic architecture represents a fundamental shift from vendor-specific AI integration to a unified interface that treats AI providers as interchangeable resources. This abstraction layer enables strategic flexibility while maintaining technical simplicity.

1.2. Core Architectural Benefits

Provider Interoperability: Different AI providers can be used within the same application without code changes. This enables best-of-breed selection where each provider's strengths are leveraged for appropriate tasks.

Migration Flexibility: Switching providers requires only configuration changes, not code rewrites. This protects applications from provider-specific technical debt and enables rapid adaptation to market changes.

Cost Optimization: Tasks can be routed to the most cost-effective provider for each specific use case. Simple queries can use cheaper models while complex analysis uses more capable (and expensive) models.

Risk Mitigation: Provider lock-in risks are eliminated through abstraction. Applications maintain the ability to switch providers in response to pricing changes, availability issues, or capability improvements.

Innovation Access: New providers and models can be integrated rapidly without affecting existing application code. This enables quick evaluation and adoption of emerging AI capabilities.

1.3. Abstraction Layer Design

The provider abstraction layer normalizes different AI APIs into a consistent interface while preserving provider-specific capabilities. This design maintains the flexibility to leverage unique provider features while ensuring portability across different AI services.

Critical Provider Portability Warning: While RAAF provides technical portability across providers, operational portability requires careful consideration. Different language models exhibit vastly different behaviors, reasoning patterns, and response styles even with identical prompts and configurations. What works perfectly with GPT-4 might fail completely with Claude or produce unexpected results with open-source models. The abstraction layer handles API differences, but it cannot normalize model behavior differences. ALWAYS thoroughly test your complete agent setup after changing providers, including edge cases, error handling, and multi-turn conversations. Consider maintaining provider-specific test suites and be prepared to adjust prompts, temperature settings, and tool configurations for each provider.

1.4. Strategic Risks of Provider Lock-In

The AI provider ecosystem evolves rapidly, with frequent changes in model availability, pricing structures, and competitive landscape. Applications tightly coupled to specific providers face several strategic risks:

Technical Debt Accumulation: Code optimized for provider-specific APIs, response formats, and behavioral quirks becomes increasingly difficult to migrate. This technical debt compounds over time, making provider switching more expensive and complex.

Cost Vulnerability: Price increases become unavoidable without alternative providers configured and ready for deployment. Organizations lose negotiating power and cost optimization opportunities.

Innovation Lag: New providers with superior capabilities or pricing require significant development effort to integrate, creating delays in accessing competitive advantages.

Reliability Exposure: Provider outages, rate limiting, or service degradation can't be mitigated without backup options. Single points of failure increase system risk.

Migration Complexity: Switching providers requires comprehensive rewrites of AI integration code, extensive testing, and coordinated deployment across multiple systems.

1.5. Business Continuity Implications

Provider lock-in creates business continuity risks that extend beyond technical concerns. Organizations become dependent on external providers for core functionality without fallback options. This dependency can impact customer service, operational efficiency, and competitive positioning when provider issues occur.

1.6. Provider Agnosticism: Strategic Flexibility

Provider agnosticism doesn't require using every available provider—it's about maintaining strategic options and the capability to respond rapidly to changing conditions.

Service Continuity: When providers experience outages or service degradation, applications can automatically failover to alternative providers without manual intervention or code changes.

Cost Responsiveness: Price increases can be addressed immediately by routing traffic to more cost-effective providers, maintaining operational efficiency while preserving budget targets.

Innovation Adoption: New models and providers can be evaluated and integrated rapidly, enabling organizations to access competitive advantages without extensive development cycles.

Negotiation Leverage: Multiple provider options create negotiating leverage for better pricing, terms, and service levels. Providers understand they are not the only option.

Quality Optimization: Different providers excel at different tasks. Provider agnosticism enables routing specific workloads to the most suitable models for optimal results.

1.7. Implementation Philosophy

Effective provider agnosticism requires architectural design that abstracts provider differences while preserving unique capabilities. This approach maintains technical flexibility without sacrificing the specific advantages that different providers offer.

1.8. How RAAF Makes Provider Switching Trivial

RAAF abstracts away provider differences while preserving their unique capabilities. You write your agent once, and RAAF handles:

1.9. Supported Providers

  • OpenAI - GPT-4o, GPT-4o-mini, GPT-3.5 Turbo
  • Anthropic - Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
  • Cohere - Command models
  • Groq - High-speed inference for Llama, Mixtral
  • Together AI - Open source models
  • LiteLLM - Universal gateway to 100+ providers

All supported providers must have tool/function calling capabilities. Providers that don't support tool calling have been removed to ensure consistent handoff behavior across all deployments.

1.10. Universal Interface

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'model_name' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-2ym037.rb:448:in '<main>'

# Same code works with any provider
agent = RAAF::Agent.new(
  name: "Assistant",
  instructions: "You are helpful",
  model: model_name  # Provider auto-detected from model name
)

# OpenAI
agent = RAAF::Agent.new(model: "gpt-4o")

# Anthropic  
agent = RAAF::Agent.new(model: "claude-3-5-sonnet-20241022")

# Groq
agent = RAAF::Agent.new(model: "mixtral-8x7b-32768")

# LiteLLM with universal access
agent = RAAF::Agent.new(model: "bedrock/claude-3")

1.11. Automatic Provider Detection

The model selection examples demonstrate RAAF's provider-agnostic approach. Model names serve as the primary mechanism for provider detection:

OpenAI Models: Names like "gpt-4o" automatically route to OpenAI services Anthropic Models: Names like "claude-3-5-sonnet-20241022" route to Anthropic services Groq Models: Names like "mixtral-8x7b-32768" route to Groq services LiteLLM Models: Prefixed names like "bedrock/claude-3" route through LiteLLM

This automatic detection eliminates explicit provider configuration in most cases, reducing code complexity and improving maintainability.

1.12. Configuration-Based Provider Management

Provider selection and configuration are handled through environment variables and configuration files, not code changes. This enables deployment-time provider decisions without application modifications.

Environment-Based Configuration: Provider credentials, endpoints, and settings are configured through environment variables, enabling different providers in different deployment environments.

Runtime Provider Selection: Provider routing decisions can be made at runtime based on workload characteristics, cost considerations, or availability requirements.

Deployment Flexibility: The same application code can use different providers in development, staging, and production environments without modification.

The universal interface means you can switch between providers by simply changing the model name. This approach protects your application from vendor lock-in while allowing you to take advantage of the best model for each specific task. The framework handles all the provider-specific API differences behind the scenes.

2. OpenAI Provider

2.1. Why OpenAI Remains the Default Choice (For Now)

Despite the provider diversity story above, let's be honest: OpenAI is still where most people start. Why?

  1. Documentation: Their docs actually make sense
  2. Reliability: 99.9% uptime beats everyone else
  3. Tool Support: Function calling just works
  4. Ecosystem: Every tutorial uses OpenAI

But here's the catch—OpenAI knows this. They've raised prices twice in the last year. Without RAAF's abstraction, you're at their mercy.

2.2. Configuration

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': cannot load such file -- raaf (LoadError) from <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require' from /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-dwy88l.rb:444:in '<main>'

require 'raaf'

# Environment variable (recommended)
ENV['OPENAI_API_KEY'] = 'your-openai-key'

# Or explicit configuration
provider = RAAF::Models::ResponsesProvider.new(
  api_key: 'your-openai-key',
  base_url: 'https://api.openai.com/v1',  # Custom endpoint
  timeout: 30,
  max_retries: 3
)

agent = RAAF::Agent.new(
  name: "Assistant",
  instructions: "You are helpful",
  model: "gpt-4o",
  provider: provider
)

The first approach using environment variables is recommended for production systems because it keeps sensitive credentials out of your code. The RAAF framework automatically looks for the OPENAI_API_KEY environment variable and uses it when creating OpenAI connections.

The explicit configuration provides more control over connection parameters. The base_url option allows you to route requests through custom endpoints, which is useful for enterprise setups with API gateways or when using OpenAI-compatible services. The timeout setting determines how long to wait for API responses, while max_retries controls automatic retry behavior for failed requests. These parameters help tune the provider for your specific reliability and performance requirements.

2.3. Available Models

# GPT-4o models (newest, multimodal)
agent = RAAF::Agent.new(model: "gpt-4o")           # Best overall
agent = RAAF::Agent.new(model: "gpt-4o-mini")      # Fast and cost-effective

# GPT-4 models (legacy)
agent = RAAF::Agent.new(model: "gpt-4")            # High quality
agent = RAAF::Agent.new(model: "gpt-4-turbo")      # Faster GPT-4

# GPT-3.5 models (cost-effective)
agent = RAAF::Agent.new(model: "gpt-3.5-turbo")    # Balanced

The model selection represents a trade-off between capability, speed, and cost. GPT-4o provides the best reasoning and multimodal capabilities but comes with higher token costs. GPT-4o-mini offers a sweet spot for many applications—it's significantly faster and cheaper than GPT-4o while maintaining good quality for most tasks.

The legacy GPT-4 models are still available for applications that specifically need their characteristics, but GPT-4o generally provides better performance. GPT-3.5-turbo remains the most cost-effective option for simple conversational tasks where advanced reasoning isn't required. The choice depends on your specific performance requirements and budget constraints.

2.4. OpenAI-Specific Features

# Structured outputs (native support)
schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    age: { type: 'integer' }
  }
}

agent = RAAF::Agent.new(
  model: "gpt-4o",
  response_format: schema,
  strict: true  # Enable strict schema validation
)

# Function calling with parallel execution
agent = RAAF::Agent.new(
  model: "gpt-4o",
  parallel_tool_calls: true,
  tool_choice: "auto"  # "none", "auto", or specific tool
)

# Vision capabilities
agent = RAAF::Agent.new(
  model: "gpt-4o",
  multimodal: true
)

# Add image tool
agent.add_tool(lambda do |image_url:|
  # Process image
  { description: "Image analysis result" }
end)

The structured outputs feature ensures that OpenAI responses conform to your specified JSON schema. The strict: true option enables rigorous validation, guaranteeing that the response matches your schema exactly. This is particularly valuable for applications that need to parse AI responses programmatically, eliminating the need for error-prone response parsing and validation logic.

Parallel tool calls allow the model to execute multiple tools simultaneously when appropriate, significantly improving performance for complex tasks. The tool_choice parameter gives you control over when tools are used: "none" disables tools completely, "auto" lets the model decide, and you can specify a particular tool name to force its use.

The vision capabilities unlock multimodal AI by enabling the agent to process images alongside text. When you add image processing tools, the agent can analyze visual content and incorporate those insights into its responses, enabling applications like document analysis, visual quality assurance, or image-based customer support.

2.5. Cost Optimization

# Route by complexity
def choose_openai_model(complexity_score)
  case complexity_score
  when 0..3
    "gpt-4o-mini"      # $0.15/$0.60 per 1M tokens
  when 4..7  
    "gpt-4o"           # $5/$15 per 1M tokens
  else
    "gpt-4"            # $30/$60 per 1M tokens
  end
end

agent = RAAF::Agent.new(
  model: choose_openai_model(task_complexity),
  instructions: "You are helpful"
)

This dynamic model selection approach demonstrates intelligent cost optimization. The function evaluates task complexity and routes to the most cost-effective model that can handle the requirements. Simple tasks like basic Q&A or formatting use GPT-4o-mini, which costs significantly less while providing good quality.

More complex tasks requiring advanced reasoning or creative capabilities route to GPT-4o, which provides better performance at moderate cost. Only the most demanding tasks that require the highest reasoning capabilities use GPT-4. This strategy can reduce your AI costs by 70-90% compared to using the most expensive model for all tasks, while maintaining quality where it matters most.

3. Anthropic Provider

3.1. Why Claude Is OpenAI's Biggest Threat

Remember that story about being locked into OpenAI? Here's the plot twist: when we finally broke free, we discovered Claude was better for our use case all along.

Claude's advantages became clear:

  • Thoughtful responses: Less likely to hallucinate
  • Better at following complex instructions: Understands nuance
  • Massive context window: 200K tokens vs OpenAI's 128K
  • More humble: Says "I don't know" instead of making things up

But Claude has quirks:

  • Slower: Quality over speed
  • More expensive per token: But often needs fewer tokens
  • Different prompt style: What works for GPT might not work for Claude

This is exactly why provider abstraction matters. You can leverage Claude's strengths without rewriting your application.

3.2. Configuration

# Environment variable
ENV['ANTHROPIC_API_KEY'] = 'your-anthropic-key'

# Explicit configuration
provider = RAAF::Models::AnthropicProvider.new(
  api_key: 'your-anthropic-key',
  base_url: 'https://api.anthropic.com',
  timeout: 60,  # Anthropic may be slower
  max_retries: 3
)

agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  provider: provider
)

The Anthropic provider configuration follows the same pattern as OpenAI but with provider-specific optimizations. The longer timeout reflects Anthropic's focus on thoughtful, well-reasoned responses rather than speed. Claude models often take more time to process complex requests because they're designed to think through problems more thoroughly.

The explicit configuration allows you to customize the base URL for enterprise setups or regional deployments. The retry mechanism is particularly important with Anthropic because their API can occasionally have longer processing times for complex requests, and retries help ensure reliability in production environments.

3.3. Available Models

# Claude 3.5 (newest generation)
agent = RAAF::Agent.new(model: "claude-3-5-sonnet-20241022")  # Best overall
agent = RAAF::Agent.new(model: "claude-3-5-haiku-20241022")   # Fast and affordable

# Claude 3 (previous generation)
agent = RAAF::Agent.new(model: "claude-3-opus-20240229")      # Most capable
agent = RAAF::Agent.new(model: "claude-3-sonnet-20240229")    # Balanced
agent = RAAF::Agent.new(model: "claude-3-haiku-20240307")     # Fast and cheap

3.4. Anthropic-Specific Features

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: undefined local variable or method 'agent_tools' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-sw2iau.rb:454:in '<main>'

# Large context windows
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 8192,  # Up to 200K context window
  temperature: 0.7
)

# Tool use (function calling)
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  tools: agent_tools,
  tool_choice: { type: "auto" }
)

# System prompts (Anthropic excels at following system instructions)
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  instructions: <<~HEREDOC
    You are Claude, an AI assistant created by Anthropic.
    You are helpful, harmless, and honest.
    Think step by step and be very careful about accuracy.
  HEREDOC
)

3.5. Vision Capabilities

# Claude 3 models support vision
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  multimodal: true
)

agent.add_tool(lambda do |image_data:, question:|
  # Claude can analyze images
  response = provider.complete(
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: question },
          { type: 'image', source: { type: 'base64', data: image_data } }
        ]
      }
    ]
  )

  response['content'][0]['text']
end)

4. Groq Provider

4.1. Groq for High-Speed Inference

Groq specializes in high-speed inference for specific models. Performance comparison for simple queries:

  • OpenAI GPT-4o-mini: 2-3 seconds
  • Groq Mixtral: 200-300 milliseconds

This performance difference makes Groq suitable for applications requiring low latency responses.

However, there are trade-offs to consider:

  • Limited model selection: You can't run GPT or Claude on Groq
  • Occasional capacity issues: Fast when available, but sometimes not available
  • Different prompt optimization: What works on GPT might need tweaking

Groq is perfect for:

  • High-volume, simple queries
  • Real-time interactions
  • Cost-sensitive applications (fast = cheaper)

Groq struggles with:

  • Complex reasoning tasks
  • Long context windows
  • Bleeding-edge model features

4.2. Configuration

# Environment variable
ENV['GROQ_API_KEY'] = 'your-groq-key'

# High-speed inference configuration
provider = RAAF::Models::GroqProvider.new(
  api_key: 'your-groq-key',
  timeout: 10,  # Groq is very fast
  max_retries: 2
)

agent = RAAF::Agent.new(
  model: "mixtral-8x7b-32768",
  provider: provider
)

The Groq provider configuration is optimized for speed. Groq's specialized hardware enables extremely fast inference, often delivering responses in under a second. The shorter timeout reflects this speed advantage—if a Groq request takes longer than 10 seconds, something is likely wrong.

The reduced retry count acknowledges that Groq requests either succeed quickly or fail fast. This configuration is ideal for applications that need real-time responses, such as interactive chat interfaces, live customer support, or any scenario where user-perceived latency is critical. Groq's speed advantage makes it particularly suitable for high-throughput applications where you need to process many requests quickly.

4.3. Available Models

# Mixtral models (excellent performance)
agent = RAAF::Agent.new(model: "mixtral-8x7b-32768")      # Balanced, 32K context
agent = RAAF::Agent.new(model: "mixtral-8x22b-32768")     # Larger, more capable

# Llama models
agent = RAAF::Agent.new(model: "llama3-70b-8192")         # Very capable
agent = RAAF::Agent.new(model: "llama3-8b-8192")          # Fast and efficient

# Gemma models  
agent = RAAF::Agent.new(model: "gemma-7b-it")             # Google's model

4.4. Speed Optimization

# Groq excels at high-throughput, low-latency inference
class HighSpeedAgent
  def initialize
    @groq_provider = RAAF::Models::GroqProvider.new(
      api_key: ENV['GROQ_API_KEY'],
      timeout: 5,  # Very short timeout for speed
      connection_pool_size: 20  # High concurrency
    )
  end

  def create_fast_agent(task_type)
    model = case task_type
    when :simple_qa
      "llama3-8b-8192"      # Fastest for simple tasks
    when :complex_reasoning
      "mixtral-8x7b-32768"  # Better reasoning
    when :coding
      "mixtral-8x22b-32768" # Best for code
    end

    RAAF::Agent.new(
      model: model,
      provider: @groq_provider,
      temperature: 0.1,  # Lower for consistency
      max_tokens: 1000   # Shorter for speed
    )
  end
end

This high-speed agent class demonstrates how to optimize for Groq's strengths. The connection pool size of 20 enables high concurrency, allowing your application to handle many simultaneous requests efficiently. The very short timeout forces quick failures rather than hanging requests, which is crucial for maintaining responsiveness in high-throughput scenarios.

The task-specific model selection balances speed and capability. Simple Q&A tasks use the fastest model (llama3-8b), while more complex reasoning tasks step up to mixtral-8x7b. The temperature setting of 0.1 provides more consistent outputs, which is often preferred for production applications where reliability matters more than creativity. The reduced max_tokens limit ensures faster responses by preventing overly long outputs that would slow down the interaction.

5. Together AI Provider

5.1. Configuration

# Together.ai is supported through LiteLLM
agent = RAAF::Agent.new(
  model: "together_ai/meta-llama/Llama-2-70b-chat-hf"
)

# Or configure directly
ENV['TOGETHER_API_KEY'] = 'your-together-key'
agent = RAAF::Agent.new(
  model: "meta-llama/Llama-2-70b-chat-hf",
  provider_config: {
    api_base: 'https://api.together.xyz/v1'
  }
)

5.2. Available Models

# Meta Llama models
agent = RAAF::Agent.new(model: "meta-llama/Llama-2-70b-chat-hf")
agent = RAAF::Agent.new(model: "meta-llama/Llama-2-13b-chat-hf")

# Mistral models
agent = RAAF::Agent.new(model: "mistralai/Mixtral-8x7B-Instruct-v0.1")
agent = RAAF::Agent.new(model: "mistralai/Mistral-7B-Instruct-v0.1")

# Code-specific models
agent = RAAF::Agent.new(model: "codellama/CodeLlama-34b-Instruct-hf")
agent = RAAF::Agent.new(model: "WizardLM/WizardCoder-Python-34B-V1.0")

6. Cohere Provider

6.1. Why Cohere Excels at Enterprise Text Processing

Cohere specializes in enterprise-grade language processing with strong performance in:

  • Classification tasks: Superior text categorization and sentiment analysis
  • Retrieval augmented generation: Excellent at working with document search
  • Multilingual support: Strong performance across many languages
  • Enterprise features: Built-in safety and bias mitigation

Perfect for:

  • Content moderation and classification
  • Document analysis and summarization
  • Customer support automation
  • Multilingual applications

6.2. Configuration

# Environment variable
ENV['COHERE_API_KEY'] = 'your-cohere-key'

# Explicit configuration
# Cohere is supported through standard configuration
ENV['COHERE_API_KEY'] = 'your-cohere-key'

agent = RAAF::Agent.new(
  model: "command-r-plus"
)

6.3. Available Models

# Command models (latest generation)
agent = RAAF::Agent.new(model: "command-r-plus")    # Most capable
agent = RAAF::Agent.new(model: "command-r")         # Balanced performance
agent = RAAF::Agent.new(model: "command-light")     # Fast and efficient

# Specialized models
agent = RAAF::Agent.new(model: "command-nightly")   # Latest experimental features

6.4. Cohere-Specific Features

# Text classification
agent = RAAF::Agent.new(
  model: "command-r",
  task_type: "classify",
  categories: ["positive", "negative", "neutral"]
)

# Retrieval augmented generation
agent = RAAF::Agent.new(
  model: "command-r-plus",
  retrieval_enabled: true,
  max_tokens: 4096
)

7. LiteLLM Universal Provider

7.1. Configuration

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require': cannot load such file -- raaf (LoadError) from <internal:/Users/hajee/.rvm/rubies/ruby-3.4.5/lib/ruby/3.4.0/rubygems/core_ext/kernel_require.rb>:136:in 'Kernel#require' from /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-h7xe24.rb:445:in '<main>'

# LiteLLM provides access to 100+ providers
require 'raaf'

provider = RAAF::Models::LiteLLMProvider.new(
  # Automatically routes based on model name
)

# Use any model from any provider
agent = RAAF::Agent.new(
  model: "azure/gpt-4o",      # Azure OpenAI
  provider: provider
)

agent = RAAF::Agent.new(
  model: "bedrock/claude-3",  # AWS Bedrock
  provider: provider
)

agent = RAAF::Agent.new(
  model: "vertex_ai/gemini-pro", # Google Vertex AI
  provider: provider
)

7.2. Provider Routing

class UniversalAI
  def initialize
    @litellm = RAAF::Models::LiteLLMProvider.new
  end

  def create_agent_for_provider(provider_name, task_type)
    model = case [provider_name, task_type]
    when ['azure', 'reasoning']
      "azure/gpt-4o"
    when ['aws', 'conversation']
      "bedrock/claude-3-sonnet"
    when ['google', 'analysis']
      "vertex_ai/gemini-pro"
    when ['huggingface', 'coding']
      "huggingface/codellama/CodeLlama-34b-Instruct-hf"
    else
      "gpt-4o-mini"  # Default fallback
    end

    RAAF::Agent.new(
      model: model,
      provider: @litellm,
      instructions: "You are helpful"
    )
  end
end

8. Multi-Provider Strategies

8.1. The Art of Provider Orchestration: Real Money, Real Results

Last quarter, we reduced our AI costs by 73% without sacrificing quality. Here's how:

The Realization: Not all AI tasks need GPT-4. It's like using a Ferrari for grocery runs.

Our analysis revealed:

  • 60% of requests were simple data lookups (perfect for GPT-4o-mini)
  • 25% needed reasoning but not creativity (Claude 3 Haiku)
  • 10% were complex analytical tasks (GPT-4o or Claude Sonnet)
  • 5% were truly challenging problems (Claude Opus)

This analysis informed our routing strategy:

8.2. Cost Optimization

class CostOptimizedRouting
  def initialize
    @providers = {
      openai: RAAF::Models::ResponsesProvider.new,
      anthropic: RAAF::Models::AnthropicProvider.new,
      groq: RAAF::Models::GroqProvider.new,
      litellm: RAAF::Models::LiteLLMProvider.new
    }

    # Cost per 1M tokens (input/output)
    @costs = {
      'gpt-4o-mini' => [0.15, 0.60],
      'claude-3-haiku' => [0.25, 1.25],
      'mixtral-8x7b-32768' => [0.27, 0.27],  # Groq
      'command-light' => [0.15, 0.15]  # Cohere
    }
  end

  def route_by_cost(task_complexity, budget_cents_per_1k_tokens)
    suitable_models = @costs.select do |model, (input_cost, output_cost)|
      avg_cost = (input_cost + output_cost) / 2 / 1000  # Per 1K tokens
      avg_cost <= budget_cents_per_1k_tokens
    end

    # Choose best model within budget
    if task_complexity > 8 && suitable_models.include?('gpt-4o')
      ['gpt-4o', @providers[:openai]]
    elsif task_complexity > 5 && suitable_models.include?('claude-3-sonnet')
      ['claude-3-sonnet', @providers[:anthropic]]
    elsif suitable_models.include?('mixtral-8x7b-32768')
      ['mixtral-8x7b-32768', @providers[:groq]]
    else
      ['command-light', @providers[:litellm]]  # Cost-effective fallback
    end
  end
end

This cost-optimized routing system demonstrates intelligent budget management across multiple providers. The system tracks cost per million tokens for both input and output, enabling precise budget calculations. The routing logic balances task complexity with cost constraints, automatically selecting the most capable model that fits within your budget.

The fallback cascade ensures that you always have a viable option. If high-complexity tasks exceed your budget for premium models, the system steps down to more affordable alternatives. The LiteLLM option provides access to cost-effective providers and models through a universal interface, ensuring your application can continue functioning even when primary providers are unavailable or over budget. This approach can reduce costs by 60-80% while maintaining appropriate quality levels for each task type.

8.3. Load Balancing

class LoadBalancedProvider
  def initialize
    @providers = [
      RAAF::Models::ResponsesProvider.new,
      RAAF::Models::AnthropicProvider.new,
      RAAF::Models::GroqProvider.new
    ]
    @current_index = 0
    @mutex = Mutex.new
  end

  def create_agent(model_type)
    provider = next_provider
    model = select_model_for_provider(provider, model_type)

    RAAF::Agent.new(
      model: model,
      provider: provider,
      instructions: "You are helpful"
    )
  end

  private

  def next_provider
    @mutex.synchronize do
      provider = @providers[@current_index]
      @current_index = (@current_index + 1) % @providers.length
      provider
    end
  end

  def select_model_for_provider(provider, model_type)
    case [provider.class.name, model_type]
    when ['ResponsesProvider', :fast]
      'gpt-4o-mini'
    when ['ResponsesProvider', :quality]
      'gpt-4o'
    when ['AnthropicProvider', :fast]
      'claude-3-haiku'
    when ['AnthropicProvider', :quality]
      'claude-3-5-sonnet'
    when ['GroqProvider', :fast]
      'llama3-8b-8192'
    when ['GroqProvider', :quality]
      'mixtral-8x7b-32768'
    end
  end
end

8.4. Failover and Redundancy

class ResilientProvider
  def initialize
    @primary_provider = RAAF::Models::ResponsesProvider.new
    @secondary_provider = RAAF::Models::AnthropicProvider.new
    @fallback_provider = RAAF::Models::LiteLLMProvider.new

    @circuit_breakers = {}
  end

  def create_resilient_agent(model_preference)
    providers_to_try = [
      [@primary_provider, select_primary_model(model_preference)],
      [@secondary_provider, select_secondary_model(model_preference)],
      [@fallback_provider, select_fallback_model(model_preference)]
    ]

    providers_to_try.each do |provider, model|
      next if circuit_breaker_open?(provider)

      begin
        agent = RAAF::Agent.new(
          model: model,
          provider: provider,
          instructions: "You are helpful"
        )

        # Test the agent
        test_result = test_agent(agent)
        if test_result[:success]
          reset_circuit_breaker(provider)
          return agent
        end

      rescue => e
        record_failure(provider, e)
        open_circuit_breaker(provider) if should_open_circuit?(provider)
        next
      end
    end

    raise "All providers failed"
  end

  private

  def circuit_breaker_open?(provider)
    breaker = @circuit_breakers[provider.class.name]
    breaker && breaker[:open] && (Time.now - breaker[:opened_at]) < 300  # 5 minutes
  end

  def test_agent(agent)
    runner = RAAF::Runner.new(agent: agent)
    result = runner.run("Say 'OK'")
    { success: result.success? }
  rescue
    { success: false }
  end
end

9. Performance Optimization

9.1. Connection Pooling

class PooledProviderManager
  def initialize
    @openai_pool = ConnectionPool.new(size: 10, timeout: 5) do
      RAAF::Models::ResponsesProvider.new(
        api_key: ENV['OPENAI_API_KEY'],
        timeout: 30
      )
    end

    @anthropic_pool = ConnectionPool.new(size: 5, timeout: 5) do
      RAAF::Models::AnthropicProvider.new(
        api_key: ENV['ANTHROPIC_API_KEY'],
        timeout: 60
      )
    end
  end

  def with_openai_provider(&block)
    @openai_pool.with(&block)
  end

  def with_anthropic_provider(&block)
    @anthropic_pool.with(&block)
  end

  def create_pooled_agent(provider_type, model)
    case provider_type
    when :openai
      with_openai_provider do |provider|
        RAAF::Agent.new(model: model, provider: provider)
      end
    when :anthropic
      with_anthropic_provider do |provider|
        RAAF::Agent.new(model: model, provider: provider)
      end
    end
  end
end

9.2. Caching

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NameError: uninitialized constant CachedProviderResponses::ActiveSupport /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-vzlaux.rb:447:in 'CachedProviderResponses#initialize' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-vzlaux.rb:479:in 'Class#new' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-vzlaux.rb:479:in '<main>'

class CachedProviderResponses
  def initialize(provider)
    @provider = provider
    @cache = ActiveSupport::Cache::MemoryStore.new(size: 100.megabytes)
  end

  def complete(messages, options = {})
    # Create cache key from messages and options
    cache_key = generate_cache_key(messages, options)

    # Try cache first
    cached_response = @cache.read(cache_key)
    return cached_response if cached_response

    # Call actual provider
    response = @provider.complete(messages, options)

    # Cache successful responses
    if response[:success]
      @cache.write(cache_key, response, expires_in: 1.hour)
    end

    response
  end

  private

  def generate_cache_key(messages, options)
    content = messages.map { |m| m[:content] }.join('|')
    options_str = options.to_json
    Digest::MD5.hexdigest("#{content}|#{options_str}")
  end
end

# Usage
cached_openai = CachedProviderResponses.new(
  RAAF::Models::ResponsesProvider.new
)

agent = RAAF::Agent.new(
  model: "gpt-4o",
  provider: cached_openai
)

10. Provider-Specific Best Practices

10.1. Hard-Won Lessons from Production

After running millions of requests across providers, these patterns consistently deliver results:

10.1.1. The Golden Rule: Test Everything

Same prompt, different providers, wildly different results:

  • GPT-4: "I'll analyze this step by step..."
  • Claude: "I need more context about..."
  • Llama: "ANALYSIS: STEP 1:..."

Your prompts need provider-specific tuning. What works for one rarely works perfectly for another.

10.2. OpenAI Optimization

# Use structured outputs for consistent data
agent = RAAF::Agent.new(
  model: "gpt-4o",
  response_format: your_schema,
  strict: true  # Enables strict mode
)

# Batch requests for efficiency
class OpenAIBatcher
  def initialize
    @provider = RAAF::Models::ResponsesProvider.new
    @batch_requests = []
  end

  def add_request(messages, options = {})
    @batch_requests << { messages: messages, options: options }
  end

  def execute_batch
    # OpenAI batch API for cost savings
    @provider.batch_complete(@batch_requests)
  end
end

10.3. Anthropic Optimization

# Use system prompts effectively
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  instructions: <<~HEREDOC
    You are an expert assistant.

    Guidelines:

    - Think step by step
    - Be precise and accurate
    - Ask clarifying questions when needed
  HEREDOC
)

# Leverage large context windows
agent = RAAF::Agent.new(
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 8192,  # Can handle up to 200K input tokens
  instructions: "Analyze this large document..."
)

10.4. Groq Optimization

# Optimize for speed
agent = RAAF::Agent.new(
  model: "mixtral-8x7b-32768",
  temperature: 0.1,  # Lower for consistency
  max_tokens: 1000,  # Shorter responses for speed
  top_p: 0.9
)

# Use for high-throughput scenarios
class HighThroughputGroq
  def initialize
    @provider = RAAF::Models::GroqProvider.new(
      timeout: 5,  # Short timeout
      max_retries: 1  # Fail fast
    )
  end

  def process_batch(tasks)
    futures = tasks.map do |task|
      Concurrent::Future.execute do
        agent = RAAF::Agent.new(
          model: "llama3-8b-8192",
          provider: @provider
        )

        runner = RAAF::Runner.new(agent: agent)
        runner.run(task)
      end
    end

    futures.map(&:value)
  end
end

11. Testing Multi-Provider Systems

11.1. Provider Testing

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NoMethodError: undefined method 'select_test_model' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-o65fto.rb:456:in 'block (3 levels) in <main>' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-o65fto.rb:454:in 'Hash#each' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-o65fto.rb:454:in 'block (2 levels) in <main>'

RSpec.describe 'Multi-Provider Agent System' do
  let(:providers) do
    {
      openai: RAAF::Models::ResponsesProvider.new,
      anthropic: RAAF::Models::AnthropicProvider.new,
      groq: RAAF::Models::GroqProvider.new
    }
  end

  it 'works with all configured providers' do
    providers.each do |name, provider|
      agent = RAAF::Agent.new(
        model: select_test_model(name),
        provider: provider,
        instructions: "Say 'Hello from #{name}'"
      )

      runner = RAAF::Runner.new(agent: agent)
      result = runner.run("Please respond")

      expect(result.success?).to be true
      expect(result.messages.last[:content]).to include(name.to_s)
    end
  end

  def select_test_model(provider_name)
    case provider_name
    when :openai
      'gpt-4o-mini'
    when :anthropic
      'claude-3-haiku'
    when :groq
      'llama3-8b-8192'
    end
  end
end

11.2. Mock Providers for Testing

EXAMPLE VALIDATION FAILED - This example needs work and contributions are welcome! Please see Contributing to RAAF for guidance. Error: NoMethodError: undefined method 'eq' for main /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-qwhu9n.rb:495:in 'block (2 levels) in <main>' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-qwhu9n.rb:334:in 'Object#it' /var/folders/r5/1t1h14ts04v5plm6tg1237pr0000gn/T/code_block20250725-12953-qwhu9n.rb:491:in 'block in <main>'

class MockProvider
  def initialize(responses = {})
    @responses = responses
    @call_count = 0
  end

  def complete(messages, options = {})
    @call_count += 1

    # Return predefined response or default
    response_content = @responses[@call_count] || "Mock response"

    {
      success: true,
      choices: [
        {
          message: {
            role: 'assistant',
            content: response_content
          }
        }
      ],
      usage: {
        prompt_tokens: 10,
        completion_tokens: 5,
        total_tokens: 15
      }
    }
  end
end

# In tests
RSpec.describe 'Agent Behavior' do
  let(:mock_provider) do
    MockProvider.new(
      1 => "First response",
      2 => "Second response"
    )
  end

  let(:agent) do
    RAAF::Agent.new(
      model: "mock-model",
      provider: mock_provider
    )
  end

  it 'uses mock responses' do
    runner = RAAF::Runner.new(agent: agent)

    result1 = runner.run("First message")
    expect(result1.messages.last[:content]).to eq("First response")

    result2 = runner.run("Second message")
    expect(result2.messages.last[:content]).to eq("Second response")
  end
end

12. Next Steps

Now that you understand RAAF providers:



Back to top