File size: 4,968 Bytes
e142333
 
f036bb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# System Patterns

## Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI App                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Routes:                                β”‚
β”‚  β€’ GET / (Welcome)                      β”‚
β”‚  β€’ POST /download (Model Download)      β”‚
β”‚  β€’ POST /v1/chat/completions (Chat)     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Global State:                          β”‚
β”‚  β€’ pipe (Pipeline)                      β”‚
β”‚  β€’ tokenizer (Tokenizer)                β”‚
β”‚  β€’ model_name (Current Model)           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Startup Event:                         β”‚
β”‚  β€’ Load .env                            β”‚
β”‚  β€’ Initialize default model             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Utils Modules                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/model.py:                        β”‚
β”‚  β€’ check_model() - Verify model exists  β”‚
β”‚  β€’ download_model() - Download model    β”‚
β”‚  β€’ initialize_pipeline() - Setup model  β”‚
β”‚  β€’ DownloadRequest - Pydantic model     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/chat_request.py:                 β”‚
β”‚  β€’ ChatRequest - Request validation     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  utils/chat_response.py:                β”‚
β”‚  β€’ create_chat_response() - Generate    β”‚
β”‚  β€’ convert_json_format() - Parse output β”‚
β”‚  β€’ ChatResponse/ChatChoice/ChatUsage    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Data Flow Patterns

### 1. Application Startup
```
.env β†’ load_dotenv() β†’ os.getenv("DEFAULT_MODEL_NAME")
     ↓
initialize_pipeline(model_name)
     ↓
check_model() β†’ verify cache exists
     ↓
AutoTokenizer + AutoModelForCausalLM
     ↓
pipeline("text-generation")
     ↓
Global: pipe, tokenizer, model_name
```

### 2. Chat Request Flow
```
POST /v1/chat/completions
     ↓
ChatRequest (validation)
     ↓
Check model_name match
     ↓
create_chat_response(request, pipe, tokenizer)
     ↓
pipe(messages, max_new_tokens)
     ↓
convert_json_format() β†’ clean output
     ↓
Calculate tokens (tokenizer.encode)
     ↓
ChatResponse (Pydantic)
```

### 3. Download Flow
```
POST /download
     ↓
download_model(model_name)
     ↓
AutoTokenizer.from_pretrained(cache_dir)
AutoModelForCausalLM.from_pretrained(cache_dir)
     ↓
initialize_pipeline(model_name)
     ↓
Update global: pipe, tokenizer, model_name
     ↓
Return success + loaded status
```

## Key Design Decisions

### 1. Global State Management
- **Why**: FastAPI is stateless, but models are expensive to load
- **Solution**: Global variables for pipe/tokenizer/model_name
- **Trade-off**: Single model at a time, but efficient

### 2. Lazy Initialization with Fallback
- **Why**: Model might not exist on startup
- **Solution**: Startup event tries to load, but doesn't fail
- **Trade-off**: Graceful degradation vs. guaranteed availability

### 3. Model Switching
- **Why**: Users may want different models
- **Solution**: Check request.model vs. current model_name
- **Trade-off**: Re-initialization overhead vs. flexibility

### 4. Error Handling
- **Why**: Model operations can fail in multiple ways
- **Solution**: HTTPException for client errors, try/except for internal
- **Trade-off**: Clear API vs. implementation complexity

### 5. Environment Configuration
- **Why**: Different deployments need different defaults
- **Solution**: .env file with fallback
- **Trade-off**: External config vs. hardcoded values

## Security Considerations
- βœ… No hardcoded credentials in code
- βœ… HUGGINGFACE_TOKEN from environment
- βœ… Input validation via Pydantic
- βœ… No arbitrary code execution from user input

## Performance Patterns
- βœ… Model loaded once at startup
- βœ… Tokenizer reused across requests
- βœ… Token counting with actual tokenizer
- βœ… Async route handlers for concurrency