Spaces:

jkbennitt
/

felix-framework

Paused

App Files Files Community

felix-framework / scripts /config /README.md

jkbennitt

Clean hf-space branch and prepare for HuggingFace Spaces deployment

fb867c3 5 months ago

preview code

raw

history blame contribute delete

3.64 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

LM Studio Multi-Server Configuration

This directory contains configuration files for running multiple LM Studio servers simultaneously with the Felix Framework.

Configuration Files

`multi_model_config.json` ⭐ RECOMMENDED

Multi-model configuration for single LM Studio server:

Research Model: qwen/qwen3-4b-2507 - Fast exploration
Analysis Model: qwen/qwen3-4b-thinking-2507 - Reasoning focused
Synthesis Model: google/gemma-3-12b - High-quality output
All models use same server: http://127.0.0.1:1234/v1

`server_config.json`

Multi-server configuration supporting up to 4 different LM Studio servers:

Creative Server (port 1234): Fast model for research agents
Analytical Server (port 1235): Balanced model for analysis/critic agents
Synthesis Server (port 1236): High-quality model for synthesis agents
Fallback Server (port 1237): Fast backup model (disabled by default)

`single_server_config.json`

Single-server configuration for comparison testing.

Setting Up Multiple LM Studio Servers

Start LM Studio instances on different ports:

# Terminal 1: Creative server
lms server start --port 1234 --model mistral-7b-instruct

# Terminal 2: Analytical server  
lms server start --port 1235 --model llama-3.1-8b-instruct

# Terminal 3: Synthesis server
lms server start --port 1236 --model mixtral-8x7b-instruct

Or use LM Studio GUI:
- Launch multiple LM Studio instances
- Set different ports in Settings → Developer → Port
- Load different models in each instance
- Start servers

Agent Type Mappings

Research agents → Creative server (Mistral) for broad exploration
Analysis agents → Analytical server (Llama) for focused analysis
Synthesis agents → Synthesis server (Mixtral) for high-quality output
Critic agents → Analytical server (Llama) for reasoning/validation

Usage

With Multi-Model Config (Recommended):

python examples/blog_writer.py "AI safety" --server-config config/multi_model_config.json --debug

Test Multi-Model Setup:

python examples/test_multi_model.py

With Multi-Server Config:

python examples/blog_writer.py "AI safety" --server-config config/server_config.json --debug

With Single-Server Config:

python examples/blog_writer.py "AI safety" --server-config config/single_server_config.json --debug

Performance Comparison:

python examples/test_multi_server_performance.py

Configuration Options

Server Settings:

name: Unique server identifier
url: LM Studio server URL
model: Model name to use
timeout: Request timeout in seconds
max_concurrent: Maximum concurrent requests
weight: Load balancing weight
enabled: Whether server is active

Load Balancing Strategies:

agent_type_mapping: Use agent type mappings (recommended)
round_robin: Rotate between servers
least_busy: Use server with lowest load
fastest_response: Use server with best response time

Health Monitoring

The system automatically:

Checks server health every 30 seconds
Fails over to available servers
Monitors response times and load
Displays server status in debug mode

Performance Benefits

Multi-server setup provides:

True parallelism: Agents process simultaneously
Model specialization: Each agent type uses optimal model
Load distribution: Spread across multiple GPUs/servers
Fault tolerance: Continue if one server fails
3-4x performance: With proper server setup