lily_fast_api / docs /USER_GUIDE.md
gbrabbit's picture
Fresh start for HF Spaces deployment
526927a
# Lily LLM API μ‚¬μš©μž κ°€μ΄λ“œ
## πŸ“‹ λͺ©μ°¨
1. [μ‹œμž‘ν•˜κΈ°](#μ‹œμž‘ν•˜κΈ°)
2. [κΈ°λ³Έ κΈ°λŠ₯](#κΈ°λ³Έ-κΈ°λŠ₯)
3. [κ³ κΈ‰ κΈ°λŠ₯](#κ³ κΈ‰-κΈ°λŠ₯)
4. [문제 ν•΄κ²°](#문제-ν•΄κ²°)
5. [λͺ¨λ²” 사둀](#λͺ¨λ²”-사둀)
## πŸš€ μ‹œμž‘ν•˜κΈ°
### μ‹œμŠ€ν…œ μš”κ΅¬μ‚¬ν•­
- **μ΅œμ†Œ 사양**:
- CPU: 4μ½”μ–΄ 이상
- RAM: 8GB 이상
- μ €μž₯곡간: 20GB 이상
- GPU: 선택사항 (CUDA 지원 μ‹œ μ„±λŠ₯ ν–₯상)
- **ꢌμž₯ 사양**:
- CPU: 8μ½”μ–΄ 이상
- RAM: 16GB 이상
- μ €μž₯곡간: 50GB 이상
- GPU: NVIDIA RTX 3060 이상 (CUDA 지원)
### μ„€μΉ˜ 및 μ‹€ν–‰
#### 1. Dockerλ₯Ό μ‚¬μš©ν•œ 배포 (ꢌμž₯)
```bash
# μ €μž₯μ†Œ 클둠
git clone <repository-url>
cd lily_generate_package
# 배포 μ‹€ν–‰
chmod +x scripts/deploy.sh
./scripts/deploy.sh deploy
# μƒνƒœ 확인
./scripts/deploy.sh status
```
#### 2. 둜컬 개발 ν™˜κ²½
```bash
# κ°€μƒν™˜κ²½ 생성
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# μ˜μ‘΄μ„± μ„€μΉ˜
pip install -r requirements.txt
# NLTK 데이터 λ‹€μš΄λ‘œλ“œ
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"
# μ„œλ²„ μ‹€ν–‰
python run_server_v2.py
```
### 첫 번째 μš”μ²­
```bash
# μ„œλ²„ μƒνƒœ 확인
curl http://localhost:8001/health
# λͺ¨λΈ λͺ©λ‘ 쑰회
curl http://localhost:8001/models
# κ°„λ‹¨ν•œ ν…μŠ€νŠΈ 생성
curl -X POST http://localhost:8001/generate \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "prompt=μ•ˆλ…•ν•˜μ„Έμš”!&model_id=polyglot-ko-1.3b-chat&max_length=100"
```
## πŸ€– κΈ°λ³Έ κΈ°λŠ₯
### 1. ν…μŠ€νŠΈ 생성
#### λ‹¨μˆœ ν…μŠ€νŠΈ 생성
```python
import requests
def generate_text(prompt, model_id="polyglot-ko-1.3b-chat"):
url = "http://localhost:8001/generate"
data = {
"prompt": prompt,
"model_id": model_id,
"max_length": 200,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True
}
response = requests.post(url, data=data)
return response.json()
# μ‚¬μš© 예제
result = generate_text("인곡지λŠ₯의 λ―Έλž˜μ— λŒ€ν•΄ μ„€λͺ…ν•΄μ£Όμ„Έμš”.")
print(result["generated_text"])
```
#### νŒŒλΌλ―Έν„° μ„€λͺ…
| νŒŒλΌλ―Έν„° | μ„€λͺ… | κΈ°λ³Έκ°’ | λ²”μœ„ |
|----------|------|--------|------|
| `prompt` | μž…λ ₯ ν…μŠ€νŠΈ | ν•„μˆ˜ | - |
| `model_id` | μ‚¬μš©ν•  λͺ¨λΈ | polyglot-ko-1.3b-chat | μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λΈ λͺ©λ‘ |
| `max_length` | μ΅œλŒ€ 토큰 수 | 200 | 1-4000 |
| `temperature` | μ°½μ˜μ„± 쑰절 | 0.7 | 0.0-2.0 |
| `top_p` | λˆ„μ  ν™•λ₯  μž„κ³„κ°’ | 0.9 | 0.0-1.0 |
| `do_sample` | μƒ˜ν”Œλ§ μ‚¬μš© μ—¬λΆ€ | True | True/False |
### 2. λ©€ν‹°λͺ¨λ‹¬ 처리
#### 이미지와 ν…μŠ€νŠΈ ν•¨κ»˜ 처리
```python
def generate_multimodal(prompt, image_files, model_id="kanana-1.5-v-3b-instruct"):
url = "http://localhost:8001/generate-multimodal"
files = []
for i, image_file in enumerate(image_files):
files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
data = {
"prompt": prompt,
"model_id": model_id,
"max_length": 200,
"temperature": 0.7
}
response = requests.post(url, files=files, data=data)
return response.json()
# μ‚¬μš© 예제
result = generate_multimodal(
"이 이미지에 λŒ€ν•΄ μ„€λͺ…ν•΄μ£Όμ„Έμš”.",
["image1.jpg", "image2.jpg"]
)
print(result["generated_text"])
```
### 3. μ‚¬μš©μž 관리
#### μ‚¬μš©μž 등둝 및 둜그인
```python
def register_user(username, email, password):
url = "http://localhost:8001/auth/register"
data = {
"username": username,
"email": email,
"password": password
}
response = requests.post(url, data=data)
return response.json()
def login_user(username, password):
url = "http://localhost:8001/auth/login"
data = {
"username": username,
"password": password
}
response = requests.post(url, data=data)
return response.json()
# μ‚¬μš© 예제
# 1. μ‚¬μš©μž 등둝
register_result = register_user("testuser", "test@example.com", "password123")
access_token = register_result["access_token"]
# 2. 둜그인
login_result = login_user("testuser", "password123")
access_token = login_result["access_token"]
```
#### 인증이 ν•„μš”ν•œ μš”μ²­
```python
def authenticated_request(url, data, token):
headers = {"Authorization": f"Bearer {token}"}
response = requests.post(url, data=data, headers=headers)
return response.json()
# μ‚¬μš© 예제
result = authenticated_request(
"http://localhost:8001/generate",
{"prompt": "μ•ˆλ…•ν•˜μ„Έμš”!", "model_id": "polyglot-ko-1.3b-chat"},
access_token
)
```
## πŸ“„ κ³ κΈ‰ κΈ°λŠ₯
### 1. λ¬Έμ„œ 처리 (RAG)
#### λ¬Έμ„œ μ—…λ‘œλ“œ
```python
def upload_document(file_path, user_id, token=None):
url = "http://localhost:8001/document/upload"
with open(file_path, 'rb') as f:
files = {'file': f}
data = {'user_id': user_id}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, files=files, data=data, headers=headers)
return response.json()
# μ‚¬μš© 예제
result = upload_document("document.pdf", "user123", access_token)
document_id = result["document_id"]
```
#### RAG 쿼리
```python
def rag_query(query, user_id, token=None):
url = "http://localhost:8001/rag/generate"
data = {
"query": query,
"user_id": user_id,
"max_length": 300,
"temperature": 0.7
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, data=data, headers=headers)
return response.json()
# μ‚¬μš© 예제
result = rag_query("인곡지λŠ₯의 λ―Έλž˜μ— λŒ€ν•΄ μ•Œλ €μ£Όμ„Έμš”.", "user123", access_token)
print(result["response"])
print("좜처:", result["sources"])
```
#### ν•˜μ΄λΈŒλ¦¬λ“œ RAG (이미지 + λ¬Έμ„œ)
```python
def hybrid_rag_query(query, image_files, user_id, token=None):
url = "http://localhost:8001/rag/generate-hybrid"
files = []
for i, image_file in enumerate(image_files):
files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
data = {
"query": query,
"user_id": user_id,
"max_length": 300,
"temperature": 0.7
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, files=files, data=data, headers=headers)
return response.json()
```
### 2. μ±„νŒ… μ„Έμ…˜ 관리
#### μ„Έμ…˜ 생성 및 λ©”μ‹œμ§€ 관리
```python
def create_chat_session(user_id, session_name, token=None):
url = "http://localhost:8001/session/create"
data = {
"user_id": user_id,
"session_name": session_name
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, data=data, headers=headers)
return response.json()
def add_chat_message(session_id, user_id, content, token=None):
url = "http://localhost:8001/chat/message"
data = {
"session_id": session_id,
"user_id": user_id,
"message_type": "text",
"content": content
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, data=data, headers=headers)
return response.json()
def get_chat_history(session_id, token=None):
url = f"http://localhost:8001/chat/history/{session_id}"
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.get(url, headers=headers)
return response.json()
# μ‚¬μš© 예제
# 1. μ„Έμ…˜ 생성
session_result = create_chat_session("user123", "AI 상담", access_token)
session_id = session_result["session_id"]
# 2. λ©”μ‹œμ§€ μΆ”κ°€
add_chat_message(session_id, "user123", "μ•ˆλ…•ν•˜μ„Έμš”!", access_token)
# 3. μ±„νŒ… 기둝 쑰회
history = get_chat_history(session_id, access_token)
for message in history:
print(f"{message['timestamp']}: {message['content']}")
```
### 3. λ°±κ·ΈλΌμš΄λ“œ μž‘μ—…
#### λ¬Έμ„œ 처리 μž‘μ—…
```python
def start_document_processing(file_path, user_id, token=None):
url = "http://localhost:8001/tasks/document/process"
data = {
"file_path": file_path,
"user_id": user_id
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(url, data=data, headers=headers)
return response.json()
def check_task_status(task_id, token=None):
url = f"http://localhost:8001/tasks/{task_id}"
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.get(url, headers=headers)
return response.json()
# μ‚¬μš© 예제
# 1. μž‘μ—… μ‹œμž‘
task_result = start_document_processing("/path/to/document.pdf", "user123", access_token)
task_id = task_result["task_id"]
# 2. μž‘μ—… μƒνƒœ 확인
import time
while True:
status = check_task_status(task_id, access_token)
print(f"μƒνƒœ: {status['status']}, μ§„ν–‰λ₯ : {status.get('progress', 0)}%")
if status['status'] in ['SUCCESS', 'FAILURE']:
break
time.sleep(5)
```
### 4. λͺ¨λ‹ˆν„°λ§
#### μ„±λŠ₯ λͺ¨λ‹ˆν„°λ§
```python
def start_monitoring():
url = "http://localhost:8001/monitoring/start"
response = requests.post(url)
return response.json()
def get_monitoring_status():
url = "http://localhost:8001/monitoring/status"
response = requests.get(url)
return response.json()
def get_system_health():
url = "http://localhost:8001/monitoring/health"
response = requests.get(url)
return response.json()
# μ‚¬μš© 예제
# 1. λͺ¨λ‹ˆν„°λ§ μ‹œμž‘
start_monitoring()
# 2. μƒνƒœ 확인
status = get_monitoring_status()
print(f"CPU μ‚¬μš©λ₯ : {status['current_metrics']['cpu_percent']}%")
print(f"λ©”λͺ¨λ¦¬ μ‚¬μš©λ₯ : {status['current_metrics']['memory_percent']}%")
# 3. μ‹œμŠ€ν…œ 건강 μƒνƒœ
health = get_system_health()
print(f"μ‹œμŠ€ν…œ μƒνƒœ: {health['status']}")
for recommendation in health['recommendations']:
print(f"ꢌμž₯사항: {recommendation}")
```
## πŸ”Œ WebSocket μ‹€μ‹œκ°„ μ±„νŒ…
### WebSocket ν΄λΌμ΄μ–ΈνŠΈ
```javascript
class LilyLLMWebSocket {
constructor(userId) {
this.userId = userId;
this.ws = null;
this.messageHandlers = [];
}
connect() {
this.ws = new WebSocket(`ws://localhost:8001/ws/${this.userId}`);
this.ws.onopen = () => {
console.log('WebSocket 연결됨');
};
this.ws.onmessage = (event) => {
const data = JSON.parse(event.data);
this.handleMessage(data);
};
this.ws.onclose = () => {
console.log('WebSocket μ—°κ²° μ’…λ£Œ');
};
this.ws.onerror = (error) => {
console.error('WebSocket 였λ₯˜:', error);
};
}
sendMessage(message, sessionId) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({
type: 'chat',
message: message,
session_id: sessionId
}));
}
}
addMessageHandler(handler) {
this.messageHandlers.push(handler);
}
handleMessage(data) {
this.messageHandlers.forEach(handler => handler(data));
}
disconnect() {
if (this.ws) {
this.ws.close();
}
}
}
// μ‚¬μš© 예제
const wsClient = new LilyLLMWebSocket('user123');
wsClient.connect();
wsClient.addMessageHandler((data) => {
console.log('λ©”μ‹œμ§€ μˆ˜μ‹ :', data);
});
wsClient.sendMessage('μ•ˆλ…•ν•˜μ„Έμš”!', 'session123');
```
## 🚨 문제 ν•΄κ²°
### 일반적인 λ¬Έμ œλ“€
#### 1. μ„œλ²„ μ—°κ²° μ‹€νŒ¨
**증상**: `Connection refused` λ˜λŠ” `Failed to establish a new connection`
**ν•΄κ²° 방법**:
```bash
# μ„œλ²„ μƒνƒœ 확인
curl http://localhost:8001/health
# μ„œλ²„ μž¬μ‹œμž‘
./scripts/deploy.sh restart
# 둜그 확인
./scripts/deploy.sh logs
```
#### 2. λ©”λͺ¨λ¦¬ λΆ€μ‘±
**증상**: `Out of memory` λ˜λŠ” 응닡 속도 μ €ν•˜
**ν•΄κ²° 방법**:
```bash
# λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 확인
docker stats
# λΆˆν•„μš”ν•œ μ»¨ν…Œμ΄λ„ˆ 정리
docker system prune -f
# λ¦¬μ†ŒμŠ€ μ œν•œ μ„€μ • (docker-compose.yml)
services:
lily-llm-api:
deploy:
resources:
limits:
memory: 4G
```
#### 3. λͺ¨λΈ λ‘œλ”© μ‹€νŒ¨
**증상**: `Model not found` λ˜λŠ” λͺ¨λΈ κ΄€λ ¨ 였λ₯˜
**ν•΄κ²° 방법**:
```bash
# λͺ¨λΈ λͺ©λ‘ 확인
curl http://localhost:8001/models
# λͺ¨λΈ 파일 확인
ls -la models/
# μ„œλ²„ μž¬μ‹œμž‘
./scripts/deploy.sh restart
```
#### 4. 인증 였λ₯˜
**증상**: `401 Unauthorized` λ˜λŠ” `403 Forbidden`
**ν•΄κ²° 방법**:
```python
# 토큰 κ°±μ‹ 
def refresh_token(refresh_token):
url = "http://localhost:8001/auth/refresh"
data = {"refresh_token": refresh_token}
response = requests.post(url, data=data)
return response.json()
# μƒˆλ‘œμš΄ ν† ν°μœΌλ‘œ μš”μ²­
new_tokens = refresh_token(old_refresh_token)
access_token = new_tokens["access_token"]
```
### μ„±λŠ₯ μ΅œμ ν™”
#### 1. 배치 처리
```python
def batch_generate_texts(prompts, model_id="polyglot-ko-1.3b-chat"):
results = []
for prompt in prompts:
result = generate_text(prompt, model_id)
results.append(result)
return results
# μ‚¬μš© 예제
prompts = [
"첫 번째 μ§ˆλ¬Έμž…λ‹ˆλ‹€.",
"두 번째 μ§ˆλ¬Έμž…λ‹ˆλ‹€.",
"μ„Έ 번째 μ§ˆλ¬Έμž…λ‹ˆλ‹€."
]
results = batch_generate_texts(prompts)
```
#### 2. 캐싱 ν™œμš©
```python
import redis
import json
class CachedLilyLLMClient:
def __init__(self, base_url="http://localhost:8001"):
self.base_url = base_url
self.redis_client = redis.Redis(host='localhost', port=6379, db=0)
def generate_text_with_cache(self, prompt, model_id="polyglot-ko-1.3b-chat"):
# μΊμ‹œ ν‚€ 생성
cache_key = f"text_gen:{hash(prompt + model_id)}"
# μΊμ‹œμ—μ„œ 확인
cached_result = self.redis_client.get(cache_key)
if cached_result:
return json.loads(cached_result)
# API 호좜
result = generate_text(prompt, model_id)
# μΊμ‹œμ— μ €μž₯ (1μ‹œκ°„)
self.redis_client.setex(cache_key, 3600, json.dumps(result))
return result
```
## πŸ“š λͺ¨λ²” 사둀
### 1. μ—λŸ¬ 처리
```python
import requests
from requests.exceptions import RequestException
def safe_api_call(func, *args, **kwargs):
try:
return func(*args, **kwargs)
except RequestException as e:
print(f"λ„€νŠΈμ›Œν¬ 였λ₯˜: {e}")
return None
except Exception as e:
print(f"μ˜ˆμƒμΉ˜ λͺ»ν•œ 였λ₯˜: {e}")
return None
# μ‚¬μš© 예제
result = safe_api_call(generate_text, "μ•ˆλ…•ν•˜μ„Έμš”!")
if result:
print(result["generated_text"])
```
### 2. μž¬μ‹œλ„ 둜직
```python
import time
from functools import wraps
def retry_on_failure(max_retries=3, delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise e
print(f"μ‹œλ„ {attempt + 1} μ‹€νŒ¨, {delay}초 ν›„ μž¬μ‹œλ„...")
time.sleep(delay)
return None
return wrapper
return decorator
# μ‚¬μš© 예제
@retry_on_failure(max_retries=3, delay=2)
def robust_generate_text(prompt):
return generate_text(prompt)
```
### 3. 비동기 처리
```python
import asyncio
import aiohttp
async def async_generate_text(session, prompt, model_id="polyglot-ko-1.3b-chat"):
url = "http://localhost:8001/generate"
data = {
"prompt": prompt,
"model_id": model_id,
"max_length": 200,
"temperature": 0.7
}
async with session.post(url, data=data) as response:
return await response.json()
async def batch_generate_async(prompts):
async with aiohttp.ClientSession() as session:
tasks = [async_generate_text(session, prompt) for prompt in prompts]
results = await asyncio.gather(*tasks)
return results
# μ‚¬μš© 예제
prompts = ["질문1", "질문2", "질문3"]
results = asyncio.run(batch_generate_async(prompts))
```
### 4. λ‘œκΉ…
```python
import logging
# λ‘œκΉ… μ„€μ •
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('lily_llm_client.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
def generate_text_with_logging(prompt, model_id="polyglot-ko-1.3b-chat"):
logger.info(f"ν…μŠ€νŠΈ 생성 μ‹œμž‘: {prompt[:50]}...")
try:
result = generate_text(prompt, model_id)
logger.info(f"ν…μŠ€νŠΈ 생성 성곡: {len(result['generated_text'])} 문자")
return result
except Exception as e:
logger.error(f"ν…μŠ€νŠΈ 생성 μ‹€νŒ¨: {e}")
raise
```
## πŸ“ž 지원
### 도움말 λ¦¬μ†ŒμŠ€
- **API λ¬Έμ„œ**: `http://localhost:8001/docs`
- **ReDoc λ¬Έμ„œ**: `http://localhost:8001/redoc`
- **GitHub Issues**: ν”„λ‘œμ νŠΈ μ €μž₯μ†Œμ˜ Issues μ„Ήμ…˜
- **둜그 파일**: `./logs/` 디렉토리
### 디버깅 팁
1. **둜그 확인**: 항상 둜그λ₯Ό λ¨Όμ € ν™•μΈν•˜μ„Έμš”
2. **단계별 ν…ŒμŠ€νŠΈ**: λ³΅μž‘ν•œ μš”μ²­μ„ μž‘μ€ λ‹¨μœ„λ‘œ λ‚˜λˆ„μ–΄ ν…ŒμŠ€νŠΈν•˜μ„Έμš”
3. **λ„€νŠΈμ›Œν¬ 확인**: λ°©ν™”λ²½μ΄λ‚˜ ν”„λ‘μ‹œ 섀정을 ν™•μΈν•˜μ„Έμš”
4. **λ¦¬μ†ŒμŠ€ λͺ¨λ‹ˆν„°λ§**: CPU, λ©”λͺ¨λ¦¬, λ””μŠ€ν¬ μ‚¬μš©λŸ‰μ„ 주기적으둜 ν™•μΈν•˜μ„Έμš”
### μ„±λŠ₯ 팁
1. **μ μ ˆν•œ λͺ¨λΈ 선택**: μž‘μ—…μ— λ§žλŠ” λͺ¨λΈμ„ μ„ νƒν•˜μ„Έμš”
2. **배치 처리**: μ—¬λŸ¬ μš”μ²­μ„ ν•œ λ²ˆμ— μ²˜λ¦¬ν•˜μ„Έμš”
3. **캐싱 ν™œμš©**: λ°˜λ³΅λ˜λŠ” μš”μ²­μ€ μΊμ‹œλ₯Ό μ‚¬μš©ν•˜μ„Έμš”
4. **비동기 처리**: λŒ€λŸ‰μ˜ μš”μ²­μ€ λΉ„λ™κΈ°λ‘œ μ²˜λ¦¬ν•˜μ„Έμš”