File size: 3,564 Bytes
9313c24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b96e1f4
83a9c3e
b96e1f4
 
83a9c3e
b96e1f4
83a9c3e
 
 
9313c24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83a9c3e
 
c89343f
 
83a9c3e
c89343f
83a9c3e
 
c89343f
83a9c3e
c89343f
83a9c3e
 
 
9313c24
 
 
 
83a9c3e
9313c24
 
 
83a9c3e
9313c24
 
 
83a9c3e
9313c24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# PULSE-7B Handler Deployment Guide

## 🚀 Deployment Rehberi

### Gereksinimler
- Python 3.8+
- CUDA 11.8+ (GPU kullanımı için)
- Minimum 16GB RAM (CPU), 8GB VRAM (GPU)

### Kurulum

1. **Bağımlılıkları yükleyin:**
```bash
pip install -r requirements.txt
```

2. **PULSE LLaVA Installation (PULSE-7B için kritik):**
```bash
# PULSE-7B için PULSE'un kendi LLaVA implementasyonu gerekli:
pip install git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA

# Bu otomatik olarak transformers==4.37.2 yükleyecektir
```

3. **Flash Attention (isteğe bağlı, performans için):**
```bash
pip install flash-attn --no-build-isolation
```

### HuggingFace Inference Deployment

#### 1. Model Repository Yapısı
```
your-model-repo/
├── handler.py
├── config.json
├── generation_config.json
├── requirements.txt
├── model.safetensors.index.json
├── tokenizer_config.json
├── special_tokens_map.json
└── tokenizer.model
```

#### 2. Endpoint Oluşturma
```bash
# HuggingFace CLI ile deploy
huggingface-cli login
huggingface-cli repo create your-pulse-endpoint --type=space
```

#### 3. Test Requests

**Image URL ile test:**
```bash
curl -X POST "YOUR_ENDPOINT_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "Analyze this ECG image",
      "image": "https://i.imgur.com/7uuejqO.jpeg"
    },
    "parameters": {
      "temperature": 0.2,
      "max_new_tokens": 512
    }
  }'
```

**Base64 ile test:**
```bash
curl -X POST "YOUR_ENDPOINT_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "What do you see in this ECG?",
      "image": "..."
    },
    "parameters": {
      "temperature": 0.2
    }
  }'
```

### Performans Optimizasyonları

#### GPU Memory Optimizasyonu
- `torch_dtype=torch.bfloat16` kullanın
- `low_cpu_mem_usage=True` ayarlayın
- `device_map="auto"` ile otomatik dağıtım

#### CPU Optimizasyonu
- `torch_dtype=torch.float32` kullanın
- Thread sayısını ayarlayın: `torch.set_num_threads(4)`

### Monitoring ve Debugging

#### Log Seviyeleri
```python
import logging
logging.basicConfig(level=logging.INFO)
```

#### Memory Usage
```python
import torch
print(f"GPU Memory: {torch.cuda.memory_allocated()/1024**3:.2f}GB")
```

### Troubleshooting

#### Common Issues:

1. **"llava_llama architecture not recognized" Error**
   ```bash
   # PULSE-7B Solution: Install PULSE's LLaVA implementation
   pip install git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA
   
   # Also install development transformers
   pip install git+https://github.com/huggingface/transformers.git
   
   # Or add both to requirements.txt:
   git+https://github.com/huggingface/transformers.git
   git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA
   ```

2. **CUDA Out of Memory**
   - Batch size'ı azaltın
   - `max_new_tokens` değerini düşürün
   - Gradient checkpointing kullanın

3. **Slow Image Processing**
   - Image timeout değerini artırın
   - Image resize threshold ayarlayın

4. **Model Loading Issues**
   - HuggingFace token'ını kontrol edin
   - Network bağlantısını doğrulayın
   - Cache dizinini temizleyin
   - Transformers sürümünü kontrol edin

### Security Best Practices

- Image URL'leri validate edin
- Base64 boyut limitlerini ayarlayın
- Rate limiting uygulayın
- Input sanitization yapın

### Monitoring Metrics

- Response time
- Memory usage
- Error rates
- Image processing success rate
- Token generation speed