airsmodel / memory-bank /systemPatterns.md
tanbushi's picture
update
f036bb3
# System Patterns
## Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI App β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Routes: β”‚
β”‚ β€’ GET / (Welcome) β”‚
β”‚ β€’ POST /download (Model Download) β”‚
β”‚ β€’ POST /v1/chat/completions (Chat) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Global State: β”‚
β”‚ β€’ pipe (Pipeline) β”‚
β”‚ β€’ tokenizer (Tokenizer) β”‚
β”‚ β€’ model_name (Current Model) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Startup Event: β”‚
β”‚ β€’ Load .env β”‚
β”‚ β€’ Initialize default model β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Utils Modules β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ utils/model.py: β”‚
β”‚ β€’ check_model() - Verify model exists β”‚
β”‚ β€’ download_model() - Download model β”‚
β”‚ β€’ initialize_pipeline() - Setup model β”‚
β”‚ β€’ DownloadRequest - Pydantic model β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ utils/chat_request.py: β”‚
β”‚ β€’ ChatRequest - Request validation β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ utils/chat_response.py: β”‚
β”‚ β€’ create_chat_response() - Generate β”‚
β”‚ β€’ convert_json_format() - Parse output β”‚
β”‚ β€’ ChatResponse/ChatChoice/ChatUsage β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Data Flow Patterns
### 1. Application Startup
```
.env β†’ load_dotenv() β†’ os.getenv("DEFAULT_MODEL_NAME")
↓
initialize_pipeline(model_name)
↓
check_model() β†’ verify cache exists
↓
AutoTokenizer + AutoModelForCausalLM
↓
pipeline("text-generation")
↓
Global: pipe, tokenizer, model_name
```
### 2. Chat Request Flow
```
POST /v1/chat/completions
↓
ChatRequest (validation)
↓
Check model_name match
↓
create_chat_response(request, pipe, tokenizer)
↓
pipe(messages, max_new_tokens)
↓
convert_json_format() β†’ clean output
↓
Calculate tokens (tokenizer.encode)
↓
ChatResponse (Pydantic)
```
### 3. Download Flow
```
POST /download
↓
download_model(model_name)
↓
AutoTokenizer.from_pretrained(cache_dir)
AutoModelForCausalLM.from_pretrained(cache_dir)
↓
initialize_pipeline(model_name)
↓
Update global: pipe, tokenizer, model_name
↓
Return success + loaded status
```
## Key Design Decisions
### 1. Global State Management
- **Why**: FastAPI is stateless, but models are expensive to load
- **Solution**: Global variables for pipe/tokenizer/model_name
- **Trade-off**: Single model at a time, but efficient
### 2. Lazy Initialization with Fallback
- **Why**: Model might not exist on startup
- **Solution**: Startup event tries to load, but doesn't fail
- **Trade-off**: Graceful degradation vs. guaranteed availability
### 3. Model Switching
- **Why**: Users may want different models
- **Solution**: Check request.model vs. current model_name
- **Trade-off**: Re-initialization overhead vs. flexibility
### 4. Error Handling
- **Why**: Model operations can fail in multiple ways
- **Solution**: HTTPException for client errors, try/except for internal
- **Trade-off**: Clear API vs. implementation complexity
### 5. Environment Configuration
- **Why**: Different deployments need different defaults
- **Solution**: .env file with fallback
- **Trade-off**: External config vs. hardcoded values
## Security Considerations
- βœ… No hardcoded credentials in code
- βœ… HUGGINGFACE_TOKEN from environment
- βœ… Input validation via Pydantic
- βœ… No arbitrary code execution from user input
## Performance Patterns
- βœ… Model loaded once at startup
- βœ… Tokenizer reused across requests
- βœ… Token counting with actual tokenizer
- βœ… Async route handlers for concurrency