File size: 7,836 Bytes
e568430
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# Token Authentication Review - Gradio & HuggingFace

## Summary

This document reviews the implementation of token authentication for Gradio Client API calls and HuggingFace API usage to ensure tokens are always passed correctly.

## βœ… Implementation Status

### 1. Gradio Client Services

#### STT Service (`src/services/stt_gradio.py`)
- βœ… **Token Support**: Service accepts `hf_token` parameter in `__init__` and methods
- βœ… **Client Initialization**: `Client` is created with `hf_token` parameter when token is available
- βœ… **Token Priority**: Method-level token > instance-level token
- βœ… **Token Updates**: Client is recreated if token changes

**Implementation Pattern:**
```python
async def _get_client(self, hf_token: str | None = None) -> Client:
    token = hf_token or self.hf_token
    if token:
        self.client = Client(self.api_url, hf_token=token)
    else:
        self.client = Client(self.api_url)
```

#### Image OCR Service (`src/services/image_ocr.py`)
- βœ… **Token Support**: Service accepts `hf_token` parameter in `__init__` and methods
- βœ… **Client Initialization**: `Client` is created with `hf_token` parameter when token is available
- βœ… **Token Priority**: Method-level token > instance-level token
- βœ… **Token Updates**: Client is recreated if token changes

**Same pattern as STT Service**

### 2. Service Layer Integration

#### Audio Service (`src/services/audio_processing.py`)
- βœ… **Token Passthrough**: `process_audio_input()` accepts `hf_token` and passes to STT service
- βœ… **Token Flow**: `audio_service.process_audio_input(audio, hf_token=token)`

#### Multimodal Service (`src/services/multimodal_processing.py`)
- βœ… **Token Passthrough**: `process_multimodal_input()` accepts `hf_token` and passes to both audio and OCR services
- βœ… **Token Flow**: `multimodal_service.process_multimodal_input(..., hf_token=token)`

### 3. Application Layer (`src/app.py`)

#### Token Extraction
- βœ… **OAuth Token**: Extracted from `gr.OAuthToken` via `oauth_token.token`
- βœ… **Fallback**: Uses `HF_TOKEN` or `HUGGINGFACE_API_KEY` from environment
- βœ… **Token Priority**: `oauth_token > HF_TOKEN > HUGGINGFACE_API_KEY`

**Implementation:**
```python
token_value: str | None = None
if oauth_token is not None:
    token_value = oauth_token.token if hasattr(oauth_token, "token") else None

# Fallback to env vars
effective_token = token_value or os.getenv("HF_TOKEN") or os.getenv("HUGGINGFACE_API_KEY")
```

#### Token Usage in Services
- βœ… **Multimodal Processing**: Token passed to `process_multimodal_input(..., hf_token=token_value)`
- βœ… **Consistent Usage**: Token is extracted once and passed through all service layers

### 4. HuggingFace API Integration

#### LLM Factory (`src/utils/llm_factory.py`)
- βœ… **Token Priority**: `oauth_token > settings.hf_token > settings.huggingface_api_key`
- βœ… **Provider Usage**: `HuggingFaceProvider(api_key=effective_hf_token)`
- βœ… **Model Usage**: `HuggingFaceModel(model_name, provider=provider)`

#### Judge Handler (`src/agent_factory/judges.py`)
- βœ… **Token Priority**: `oauth_token > settings.hf_token > settings.huggingface_api_key`
- βœ… **InferenceClient**: `InferenceClient(api_key=api_key)` when token provided
- βœ… **Fallback**: Uses `HF_TOKEN` from environment if no token provided

**Implementation:**
```python
effective_hf_token = oauth_token or settings.hf_token or settings.huggingface_api_key
hf_provider = HuggingFaceProvider(api_key=effective_hf_token)
```

### 5. MCP Tools (`src/mcp_tools.py`)

#### Image OCR Tool
- βœ… **Token Support**: `extract_text_from_image()` accepts `hf_token` parameter
- βœ… **Token Fallback**: Uses `settings.hf_token` or `settings.huggingface_api_key` if not provided
- βœ… **Service Integration**: Passes token to `ImageOCRService.extract_text()`

#### Audio Transcription Tool
- βœ… **Token Support**: `transcribe_audio_file()` accepts `hf_token` parameter
- βœ… **Token Fallback**: Uses `settings.hf_token` or `settings.huggingface_api_key` if not provided
- βœ… **Service Integration**: Passes token to `STTService.transcribe_file()`

## Token Flow Diagram

```
User Login (OAuth)
    ↓
oauth_token.token
    ↓
app.py: token_value
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Service Layer                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  MultimodalService                   β”‚
β”‚    ↓ hf_token=token_value            β”‚
β”‚  AudioService                        β”‚
β”‚    ↓ hf_token=token_value            β”‚
β”‚  STTService / ImageOCRService        β”‚
β”‚    ↓ hf_token=token_value            β”‚
β”‚  Gradio Client(hf_token=token)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Alternative: Environment Variables
    ↓
HF_TOKEN or HUGGINGFACE_API_KEY
    ↓
settings.hf_token or settings.huggingface_api_key
    ↓
Same service flow as above
```

## Verification Checklist

- [x] STT Service accepts and uses `hf_token` parameter
- [x] Image OCR Service accepts and uses `hf_token` parameter
- [x] Audio Service passes token to STT service
- [x] Multimodal Service passes token to both audio and OCR services
- [x] App.py extracts OAuth token correctly
- [x] App.py passes token to multimodal service
- [x] HuggingFace API calls use token via `HuggingFaceProvider`
- [x] HuggingFace API calls use token via `InferenceClient`
- [x] MCP tools accept and use token parameter
- [x] Token priority is consistent: OAuth > Env Vars
- [x] Fallback to environment variables when OAuth not available

## Token Parameter Naming

All services consistently use `hf_token` parameter name:
- `STTService.transcribe_audio(..., hf_token=...)`
- `STTService.transcribe_file(..., hf_token=...)`
- `ImageOCRService.extract_text(..., hf_token=...)`
- `ImageOCRService.extract_text_from_image(..., hf_token=...)`
- `AudioService.process_audio_input(..., hf_token=...)`
- `MultimodalService.process_multimodal_input(..., hf_token=...)`
- `extract_text_from_image(..., hf_token=...)` (MCP tool)
- `transcribe_audio_file(..., hf_token=...)` (MCP tool)

## Gradio Client API Usage

According to Gradio documentation, the `Client` constructor accepts:
```python
Client(space_name, hf_token=None)
```

Our implementation correctly uses:
```python
Client(self.api_url, hf_token=token)  # When token available
Client(self.api_url)  # When no token (public Space)
```

## HuggingFace API Usage

### HuggingFaceProvider
```python
HuggingFaceProvider(api_key=effective_hf_token)
```
βœ… Correctly passes token as `api_key` parameter

### InferenceClient
```python
InferenceClient(api_key=api_key)  # When token provided
InferenceClient()  # Falls back to HF_TOKEN env var
```
βœ… Correctly passes token as `api_key` parameter

## Edge Cases Handled

1. **No Token Available**: Services work without token (public Gradio Spaces)
2. **Token Changes**: Client is recreated when token changes
3. **OAuth vs Env**: OAuth token takes priority over environment variables
4. **Multiple Token Sources**: Consistent priority across all services
5. **MCP Tools**: Support both explicit token and fallback to settings

## Recommendations

βœ… **All implementations are correct and consistent**

The token authentication is properly implemented throughout:
- Gradio Client services accept and use tokens
- Service layer passes tokens through correctly
- Application layer extracts and passes OAuth tokens
- HuggingFace API calls use tokens via correct parameters
- MCP tools support token authentication
- Token priority is consistent across all layers

No changes needed - implementation follows best practices.