airsmodel / memory-bank /systemPatterns.md
tanbushi's picture
update
f036bb3

System Patterns

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI App                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Routes:                                β”‚
β”‚  β€’ GET / (Welcome)                      β”‚
β”‚  β€’ POST /download (Model Download)      β”‚
β”‚  β€’ POST /v1/chat/completions (Chat)     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Global State:                          β”‚
β”‚  β€’ pipe (Pipeline)                      β”‚
β”‚  β€’ tokenizer (Tokenizer)                β”‚
β”‚  β€’ model_name (Current Model)           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Startup Event:                         β”‚
β”‚  β€’ Load .env                            β”‚
β”‚  β€’ Initialize default model             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Utils Modules                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/model.py:                        β”‚
β”‚  β€’ check_model() - Verify model exists  β”‚
β”‚  β€’ download_model() - Download model    β”‚
β”‚  β€’ initialize_pipeline() - Setup model  β”‚
β”‚  β€’ DownloadRequest - Pydantic model     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/chat_request.py:                 β”‚
β”‚  β€’ ChatRequest - Request validation     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/chat_response.py:                β”‚
β”‚  β€’ create_chat_response() - Generate    β”‚
β”‚  β€’ convert_json_format() - Parse output β”‚
β”‚  β€’ ChatResponse/ChatChoice/ChatUsage    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow Patterns

1. Application Startup

.env β†’ load_dotenv() β†’ os.getenv("DEFAULT_MODEL_NAME")
     ↓
initialize_pipeline(model_name)
     ↓
check_model() β†’ verify cache exists
     ↓
AutoTokenizer + AutoModelForCausalLM
     ↓
pipeline("text-generation")
     ↓
Global: pipe, tokenizer, model_name

2. Chat Request Flow

POST /v1/chat/completions
     ↓
ChatRequest (validation)
     ↓
Check model_name match
     ↓
create_chat_response(request, pipe, tokenizer)
     ↓
pipe(messages, max_new_tokens)
     ↓
convert_json_format() β†’ clean output
     ↓
Calculate tokens (tokenizer.encode)
     ↓
ChatResponse (Pydantic)

3. Download Flow

POST /download
     ↓
download_model(model_name)
     ↓
AutoTokenizer.from_pretrained(cache_dir)
AutoModelForCausalLM.from_pretrained(cache_dir)
     ↓
initialize_pipeline(model_name)
     ↓
Update global: pipe, tokenizer, model_name
     ↓
Return success + loaded status

Key Design Decisions

1. Global State Management

  • Why: FastAPI is stateless, but models are expensive to load
  • Solution: Global variables for pipe/tokenizer/model_name
  • Trade-off: Single model at a time, but efficient

2. Lazy Initialization with Fallback

  • Why: Model might not exist on startup
  • Solution: Startup event tries to load, but doesn't fail
  • Trade-off: Graceful degradation vs. guaranteed availability

3. Model Switching

  • Why: Users may want different models
  • Solution: Check request.model vs. current model_name
  • Trade-off: Re-initialization overhead vs. flexibility

4. Error Handling

  • Why: Model operations can fail in multiple ways
  • Solution: HTTPException for client errors, try/except for internal
  • Trade-off: Clear API vs. implementation complexity

5. Environment Configuration

  • Why: Different deployments need different defaults
  • Solution: .env file with fallback
  • Trade-off: External config vs. hardcoded values

Security Considerations

  • βœ… No hardcoded credentials in code
  • βœ… HUGGINGFACE_TOKEN from environment
  • βœ… Input validation via Pydantic
  • βœ… No arbitrary code execution from user input

Performance Patterns

  • βœ… Model loaded once at startup
  • βœ… Tokenizer reused across requests
  • βœ… Token counting with actual tokenizer
  • βœ… Async route handlers for concurrency