jkbennitt
Clean hf-space branch and prepare for HuggingFace Spaces deployment
fb867c3

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

LM Studio Multi-Server Configuration

This directory contains configuration files for running multiple LM Studio servers simultaneously with the Felix Framework.

Configuration Files

multi_model_config.jsonRECOMMENDED

Multi-model configuration for single LM Studio server:

  • Research Model: qwen/qwen3-4b-2507 - Fast exploration
  • Analysis Model: qwen/qwen3-4b-thinking-2507 - Reasoning focused
  • Synthesis Model: google/gemma-3-12b - High-quality output
  • All models use same server: http://127.0.0.1:1234/v1

server_config.json

Multi-server configuration supporting up to 4 different LM Studio servers:

  • Creative Server (port 1234): Fast model for research agents
  • Analytical Server (port 1235): Balanced model for analysis/critic agents
  • Synthesis Server (port 1236): High-quality model for synthesis agents
  • Fallback Server (port 1237): Fast backup model (disabled by default)

single_server_config.json

Single-server configuration for comparison testing.

Setting Up Multiple LM Studio Servers

  1. Start LM Studio instances on different ports:

    # Terminal 1: Creative server
    lms server start --port 1234 --model mistral-7b-instruct
    
    # Terminal 2: Analytical server  
    lms server start --port 1235 --model llama-3.1-8b-instruct
    
    # Terminal 3: Synthesis server
    lms server start --port 1236 --model mixtral-8x7b-instruct
    
  2. Or use LM Studio GUI:

    • Launch multiple LM Studio instances
    • Set different ports in Settings → Developer → Port
    • Load different models in each instance
    • Start servers

Agent Type Mappings

  • Research agents → Creative server (Mistral) for broad exploration
  • Analysis agents → Analytical server (Llama) for focused analysis
  • Synthesis agents → Synthesis server (Mixtral) for high-quality output
  • Critic agents → Analytical server (Llama) for reasoning/validation

Usage

With Multi-Model Config (Recommended):

python examples/blog_writer.py "AI safety" --server-config config/multi_model_config.json --debug

Test Multi-Model Setup:

python examples/test_multi_model.py

With Multi-Server Config:

python examples/blog_writer.py "AI safety" --server-config config/server_config.json --debug

With Single-Server Config:

python examples/blog_writer.py "AI safety" --server-config config/single_server_config.json --debug

Performance Comparison:

python examples/test_multi_server_performance.py

Configuration Options

Server Settings:

  • name: Unique server identifier
  • url: LM Studio server URL
  • model: Model name to use
  • timeout: Request timeout in seconds
  • max_concurrent: Maximum concurrent requests
  • weight: Load balancing weight
  • enabled: Whether server is active

Load Balancing Strategies:

  • agent_type_mapping: Use agent type mappings (recommended)
  • round_robin: Rotate between servers
  • least_busy: Use server with lowest load
  • fastest_response: Use server with best response time

Health Monitoring

The system automatically:

  • Checks server health every 30 seconds
  • Fails over to available servers
  • Monitors response times and load
  • Displays server status in debug mode

Performance Benefits

Multi-server setup provides:

  • True parallelism: Agents process simultaneously
  • Model specialization: Each agent type uses optimal model
  • Load distribution: Spread across multiple GPUs/servers
  • Fault tolerance: Continue if one server fails
  • 3-4x performance: With proper server setup