Spaces:

airsltd
/

airsmodel

Running

App Files Files Community

airsmodel / memory-bank /systemPatterns.md

tanbushi

update

f036bb3 about 1 month ago

preview code

raw

history blame contribute delete

4.97 kB

System Patterns

Architecture Overview

┌─────────────────────────────────────────┐
│              FastAPI App                │
├─────────────────────────────────────────┤
│  Routes:                                │
│  • GET / (Welcome)                      │
│  • POST /download (Model Download)      │
│  • POST /v1/chat/completions (Chat)     │
├─────────────────────────────────────────┤
│  Global State:                          │
│  • pipe (Pipeline)                      │
│  • tokenizer (Tokenizer)                │
│  • model_name (Current Model)           │
├─────────────────────────────────────────┤
│  Startup Event:                         │
│  • Load .env                            │
│  • Initialize default model             │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│           Utils Modules                 │
├─────────────────────────────────────────┤
│  utils/model.py:                        │
│  • check_model() - Verify model exists  │
│  • download_model() - Download model    │
│  • initialize_pipeline() - Setup model  │
│  • DownloadRequest - Pydantic model     │
├─────────────────────────────────────────┤
│  utils/chat_request.py:                 │
│  • ChatRequest - Request validation     │
├─────────────────────────────────────────┤
│  utils/chat_response.py:                │
│  • create_chat_response() - Generate    │
│  • convert_json_format() - Parse output │
│  • ChatResponse/ChatChoice/ChatUsage    │
└─────────────────────────────────────────┘

Data Flow Patterns

1. Application Startup

.env → load_dotenv() → os.getenv("DEFAULT_MODEL_NAME")
     ↓
initialize_pipeline(model_name)
     ↓
check_model() → verify cache exists
     ↓
AutoTokenizer + AutoModelForCausalLM
     ↓
pipeline("text-generation")
     ↓
Global: pipe, tokenizer, model_name

2. Chat Request Flow

POST /v1/chat/completions
     ↓
ChatRequest (validation)
     ↓
Check model_name match
     ↓
create_chat_response(request, pipe, tokenizer)
     ↓
pipe(messages, max_new_tokens)
     ↓
convert_json_format() → clean output
     ↓
Calculate tokens (tokenizer.encode)
     ↓
ChatResponse (Pydantic)

3. Download Flow

POST /download
     ↓
download_model(model_name)
     ↓
AutoTokenizer.from_pretrained(cache_dir)
AutoModelForCausalLM.from_pretrained(cache_dir)
     ↓
initialize_pipeline(model_name)
     ↓
Update global: pipe, tokenizer, model_name
     ↓
Return success + loaded status

Key Design Decisions

1. Global State Management

Why: FastAPI is stateless, but models are expensive to load
Solution: Global variables for pipe/tokenizer/model_name
Trade-off: Single model at a time, but efficient

2. Lazy Initialization with Fallback

Why: Model might not exist on startup
Solution: Startup event tries to load, but doesn't fail
Trade-off: Graceful degradation vs. guaranteed availability

3. Model Switching

Why: Users may want different models
Solution: Check request.model vs. current model_name
Trade-off: Re-initialization overhead vs. flexibility

4. Error Handling

Why: Model operations can fail in multiple ways
Solution: HTTPException for client errors, try/except for internal
Trade-off: Clear API vs. implementation complexity

5. Environment Configuration

Why: Different deployments need different defaults
Solution: .env file with fallback
Trade-off: External config vs. hardcoded values

Security Considerations

✅ No hardcoded credentials in code
✅ HUGGINGFACE_TOKEN from environment
✅ Input validation via Pydantic
✅ No arbitrary code execution from user input

Performance Patterns

✅ Model loaded once at startup
✅ Tokenizer reused across requests
✅ Token counting with actual tokenizer
✅ Async route handlers for concurrency