Spaces:

airsltd
/

airsmodel

Running

App Files Files Community

airsmodel / memory-bank /systemPatterns.md

tanbushi

update

f036bb3 about 1 month ago

preview code

raw

history blame contribute delete

4.97 kB

	# System Patterns

	## Architecture Overview
	```
	┌─────────────────────────────────────────┐
	│ FastAPI App │
	├─────────────────────────────────────────┤
	│ Routes: │
	│ • GET / (Welcome) │
	│ • POST /download (Model Download) │
	│ • POST /v1/chat/completions (Chat) │
	├─────────────────────────────────────────┤
	│ Global State: │
	│ • pipe (Pipeline) │
	│ • tokenizer (Tokenizer) │
	│ • model_name (Current Model) │
	├─────────────────────────────────────────┤
	│ Startup Event: │
	│ • Load .env │
	│ • Initialize default model │
	└─────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────┐
	│ Utils Modules │
	├─────────────────────────────────────────┤
	│ utils/model.py: │
	│ • check_model() - Verify model exists │
	│ • download_model() - Download model │
	│ • initialize_pipeline() - Setup model │
	│ • DownloadRequest - Pydantic model │
	├─────────────────────────────────────────┤
	│ utils/chat_request.py: │
	│ • ChatRequest - Request validation │
	├─────────────────────────────────────────┤
	│ utils/chat_response.py: │
	│ • create_chat_response() - Generate │
	│ • convert_json_format() - Parse output │
	│ • ChatResponse/ChatChoice/ChatUsage │
	└─────────────────────────────────────────┘
	```

	## Data Flow Patterns

	### 1. Application Startup
	```
	.env → load_dotenv() → os.getenv("DEFAULT_MODEL_NAME")
	↓
	initialize_pipeline(model_name)
	↓
	check_model() → verify cache exists
	↓
	AutoTokenizer + AutoModelForCausalLM
	↓
	pipeline("text-generation")
	↓
	Global: pipe, tokenizer, model_name
	```

	### 2. Chat Request Flow
	```
	POST /v1/chat/completions
	↓
	ChatRequest (validation)
	↓
	Check model_name match
	↓
	create_chat_response(request, pipe, tokenizer)
	↓
	pipe(messages, max_new_tokens)
	↓
	convert_json_format() → clean output
	↓
	Calculate tokens (tokenizer.encode)
	↓
	ChatResponse (Pydantic)
	```

	### 3. Download Flow
	```
	POST /download
	↓
	download_model(model_name)
	↓
	AutoTokenizer.from_pretrained(cache_dir)
	AutoModelForCausalLM.from_pretrained(cache_dir)
	↓
	initialize_pipeline(model_name)
	↓
	Update global: pipe, tokenizer, model_name
	↓
	Return success + loaded status
	```

	## Key Design Decisions

	### 1. Global State Management
	- Why: FastAPI is stateless, but models are expensive to load
	- Solution: Global variables for pipe/tokenizer/model_name
	- Trade-off: Single model at a time, but efficient

	### 2. Lazy Initialization with Fallback
	- Why: Model might not exist on startup
	- Solution: Startup event tries to load, but doesn't fail
	- Trade-off: Graceful degradation vs. guaranteed availability

	### 3. Model Switching
	- Why: Users may want different models
	- Solution: Check request.model vs. current model_name
	- Trade-off: Re-initialization overhead vs. flexibility

	### 4. Error Handling
	- Why: Model operations can fail in multiple ways
	- Solution: HTTPException for client errors, try/except for internal
	- Trade-off: Clear API vs. implementation complexity

	### 5. Environment Configuration
	- Why: Different deployments need different defaults
	- Solution: .env file with fallback
	- Trade-off: External config vs. hardcoded values

	## Security Considerations
	- ✅ No hardcoded credentials in code
	- ✅ HUGGINGFACE_TOKEN from environment
	- ✅ Input validation via Pydantic
	- ✅ No arbitrary code execution from user input

	## Performance Patterns
	- ✅ Model loaded once at startup
	- ✅ Tokenizer reused across requests
	- ✅ Token counting with actual tokenizer
	- ✅ Async route handlers for concurrency