Spaces:

gbrabbit
/

lily_fast_api

Sleeping

App Files Files Community

lily_fast_api / docs /USER_GUIDE.md

gbrabbit

Fresh start for HF Spaces deployment

526927a 5 months ago

preview code

raw

history blame contribute delete

18.1 kB

Lily LLM API 사용자 가이드

📋 목차

시작하기
기본 기능
고급 기능
문제 해결
모범 사례

🚀 시작하기

시스템 요구사항

최소 사양:
- CPU: 4코어 이상
- RAM: 8GB 이상
- 저장공간: 20GB 이상
- GPU: 선택사항 (CUDA 지원 시 성능 향상)
권장 사양:
- CPU: 8코어 이상
- RAM: 16GB 이상
- 저장공간: 50GB 이상
- GPU: NVIDIA RTX 3060 이상 (CUDA 지원)

설치 및 실행

1. Docker를 사용한 배포 (권장)

# 저장소 클론
git clone <repository-url>
cd lily_generate_package

# 배포 실행
chmod +x scripts/deploy.sh
./scripts/deploy.sh deploy

# 상태 확인
./scripts/deploy.sh status

2. 로컬 개발 환경

# 가상환경 생성
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 의존성 설치
pip install -r requirements.txt

# NLTK 데이터 다운로드
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

# 서버 실행
python run_server_v2.py

첫 번째 요청

# 서버 상태 확인
curl http://localhost:8001/health

# 모델 목록 조회
curl http://localhost:8001/models

# 간단한 텍스트 생성
curl -X POST http://localhost:8001/generate \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "prompt=안녕하세요!&model_id=polyglot-ko-1.3b-chat&max_length=100"

🤖 기본 기능

1. 텍스트 생성

단순 텍스트 생성

import requests

def generate_text(prompt, model_id="polyglot-ko-1.3b-chat"):
    url = "http://localhost:8001/generate"
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7,
        "top_p": 0.9,
        "do_sample": True
    }
    
    response = requests.post(url, data=data)
    return response.json()

# 사용 예제
result = generate_text("인공지능의 미래에 대해 설명해주세요.")
print(result["generated_text"])

파라미터 설명

파라미터	설명	기본값	범위
`prompt`	입력 텍스트	필수	-
`model_id`	사용할 모델	polyglot-ko-1.3b-chat	사용 가능한 모델 목록
`max_length`	최대 토큰 수	200	1-4000
`temperature`	창의성 조절	0.7	0.0-2.0
`top_p`	누적 확률 임계값	0.9	0.0-1.0
`do_sample`	샘플링 사용 여부	True	True/False

2. 멀티모달 처리

이미지와 텍스트 함께 처리

def generate_multimodal(prompt, image_files, model_id="kanana-1.5-v-3b-instruct"):
    url = "http://localhost:8001/generate-multimodal"
    
    files = []
    for i, image_file in enumerate(image_files):
        files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
    
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7
    }
    
    response = requests.post(url, files=files, data=data)
    return response.json()

# 사용 예제
result = generate_multimodal(
    "이 이미지에 대해 설명해주세요.",
    ["image1.jpg", "image2.jpg"]
)
print(result["generated_text"])

3. 사용자 관리

사용자 등록 및 로그인

def register_user(username, email, password):
    url = "http://localhost:8001/auth/register"
    data = {
        "username": username,
        "email": email,
        "password": password
    }
    
    response = requests.post(url, data=data)
    return response.json()

def login_user(username, password):
    url = "http://localhost:8001/auth/login"
    data = {
        "username": username,
        "password": password
    }
    
    response = requests.post(url, data=data)
    return response.json()

# 사용 예제
# 1. 사용자 등록
register_result = register_user("testuser", "test@example.com", "password123")
access_token = register_result["access_token"]

# 2. 로그인
login_result = login_user("testuser", "password123")
access_token = login_result["access_token"]

인증이 필요한 요청

def authenticated_request(url, data, token):
    headers = {"Authorization": f"Bearer {token}"}
    response = requests.post(url, data=data, headers=headers)
    return response.json()

# 사용 예제
result = authenticated_request(
    "http://localhost:8001/generate",
    {"prompt": "안녕하세요!", "model_id": "polyglot-ko-1.3b-chat"},
    access_token
)

📄 고급 기능

1. 문서 처리 (RAG)

문서 업로드

def upload_document(file_path, user_id, token=None):
    url = "http://localhost:8001/document/upload"
    
    with open(file_path, 'rb') as f:
        files = {'file': f}
        data = {'user_id': user_id}
        headers = {"Authorization": f"Bearer {token}"} if token else {}
        
        response = requests.post(url, files=files, data=data, headers=headers)
        return response.json()

# 사용 예제
result = upload_document("document.pdf", "user123", access_token)
document_id = result["document_id"]

RAG 쿼리

def rag_query(query, user_id, token=None):
    url = "http://localhost:8001/rag/generate"
    
    data = {
        "query": query,
        "user_id": user_id,
        "max_length": 300,
        "temperature": 0.7
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

# 사용 예제
result = rag_query("인공지능의 미래에 대해 알려주세요.", "user123", access_token)
print(result["response"])
print("출처:", result["sources"])

하이브리드 RAG (이미지 + 문서)

def hybrid_rag_query(query, image_files, user_id, token=None):
    url = "http://localhost:8001/rag/generate-hybrid"
    
    files = []
    for i, image_file in enumerate(image_files):
        files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
    
    data = {
        "query": query,
        "user_id": user_id,
        "max_length": 300,
        "temperature": 0.7
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, files=files, data=data, headers=headers)
    return response.json()

2. 채팅 세션 관리

세션 생성 및 메시지 관리

def create_chat_session(user_id, session_name, token=None):
    url = "http://localhost:8001/session/create"
    
    data = {
        "user_id": user_id,
        "session_name": session_name
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def add_chat_message(session_id, user_id, content, token=None):
    url = "http://localhost:8001/chat/message"
    
    data = {
        "session_id": session_id,
        "user_id": user_id,
        "message_type": "text",
        "content": content
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def get_chat_history(session_id, token=None):
    url = f"http://localhost:8001/chat/history/{session_id}"
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.get(url, headers=headers)
    return response.json()

# 사용 예제
# 1. 세션 생성
session_result = create_chat_session("user123", "AI 상담", access_token)
session_id = session_result["session_id"]

# 2. 메시지 추가
add_chat_message(session_id, "user123", "안녕하세요!", access_token)

# 3. 채팅 기록 조회
history = get_chat_history(session_id, access_token)
for message in history:
    print(f"{message['timestamp']}: {message['content']}")

3. 백그라운드 작업

문서 처리 작업

def start_document_processing(file_path, user_id, token=None):
    url = "http://localhost:8001/tasks/document/process"
    
    data = {
        "file_path": file_path,
        "user_id": user_id
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def check_task_status(task_id, token=None):
    url = f"http://localhost:8001/tasks/{task_id}"
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.get(url, headers=headers)
    return response.json()

# 사용 예제
# 1. 작업 시작
task_result = start_document_processing("/path/to/document.pdf", "user123", access_token)
task_id = task_result["task_id"]

# 2. 작업 상태 확인
import time
while True:
    status = check_task_status(task_id, access_token)
    print(f"상태: {status['status']}, 진행률: {status.get('progress', 0)}%")
    
    if status['status'] in ['SUCCESS', 'FAILURE']:
        break
    
    time.sleep(5)

4. 모니터링

성능 모니터링

def start_monitoring():
    url = "http://localhost:8001/monitoring/start"
    response = requests.post(url)
    return response.json()

def get_monitoring_status():
    url = "http://localhost:8001/monitoring/status"
    response = requests.get(url)
    return response.json()

def get_system_health():
    url = "http://localhost:8001/monitoring/health"
    response = requests.get(url)
    return response.json()

# 사용 예제
# 1. 모니터링 시작
start_monitoring()

# 2. 상태 확인
status = get_monitoring_status()
print(f"CPU 사용률: {status['current_metrics']['cpu_percent']}%")
print(f"메모리 사용률: {status['current_metrics']['memory_percent']}%")

# 3. 시스템 건강 상태
health = get_system_health()
print(f"시스템 상태: {health['status']}")
for recommendation in health['recommendations']:
    print(f"권장사항: {recommendation}")

🔌 WebSocket 실시간 채팅

WebSocket 클라이언트

class LilyLLMWebSocket {
    constructor(userId) {
        this.userId = userId;
        this.ws = null;
        this.messageHandlers = [];
    }
    
    connect() {
        this.ws = new WebSocket(`ws://localhost:8001/ws/${this.userId}`);
        
        this.ws.onopen = () => {
            console.log('WebSocket 연결됨');
        };
        
        this.ws.onmessage = (event) => {
            const data = JSON.parse(event.data);
            this.handleMessage(data);
        };
        
        this.ws.onclose = () => {
            console.log('WebSocket 연결 종료');
        };
        
        this.ws.onerror = (error) => {
            console.error('WebSocket 오류:', error);
        };
    }
    
    sendMessage(message, sessionId) {
        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
            this.ws.send(JSON.stringify({
                type: 'chat',
                message: message,
                session_id: sessionId
            }));
        }
    }
    
    addMessageHandler(handler) {
        this.messageHandlers.push(handler);
    }
    
    handleMessage(data) {
        this.messageHandlers.forEach(handler => handler(data));
    }
    
    disconnect() {
        if (this.ws) {
            this.ws.close();
        }
    }
}

// 사용 예제
const wsClient = new LilyLLMWebSocket('user123');
wsClient.connect();

wsClient.addMessageHandler((data) => {
    console.log('메시지 수신:', data);
});

wsClient.sendMessage('안녕하세요!', 'session123');

🚨 문제 해결

일반적인 문제들

1. 서버 연결 실패

증상: Connection refused 또는 Failed to establish a new connection

해결 방법:

# 서버 상태 확인
curl http://localhost:8001/health

# 서버 재시작
./scripts/deploy.sh restart

# 로그 확인
./scripts/deploy.sh logs

2. 메모리 부족

증상: Out of memory 또는 응답 속도 저하

해결 방법:

# 메모리 사용량 확인
docker stats

# 불필요한 컨테이너 정리
docker system prune -f

# 리소스 제한 설정 (docker-compose.yml)
services:
  lily-llm-api:
    deploy:
      resources:
        limits:
          memory: 4G

3. 모델 로딩 실패

증상: Model not found 또는 모델 관련 오류

해결 방법:

# 모델 목록 확인
curl http://localhost:8001/models

# 모델 파일 확인
ls -la models/

# 서버 재시작
./scripts/deploy.sh restart

4. 인증 오류

증상: 401 Unauthorized 또는 403 Forbidden

해결 방법:

# 토큰 갱신
def refresh_token(refresh_token):
    url = "http://localhost:8001/auth/refresh"
    data = {"refresh_token": refresh_token}
    response = requests.post(url, data=data)
    return response.json()

# 새로운 토큰으로 요청
new_tokens = refresh_token(old_refresh_token)
access_token = new_tokens["access_token"]

성능 최적화

1. 배치 처리

def batch_generate_texts(prompts, model_id="polyglot-ko-1.3b-chat"):
    results = []
    for prompt in prompts:
        result = generate_text(prompt, model_id)
        results.append(result)
    return results

# 사용 예제
prompts = [
    "첫 번째 질문입니다.",
    "두 번째 질문입니다.",
    "세 번째 질문입니다."
]
results = batch_generate_texts(prompts)

2. 캐싱 활용

import redis
import json

class CachedLilyLLMClient:
    def __init__(self, base_url="http://localhost:8001"):
        self.base_url = base_url
        self.redis_client = redis.Redis(host='localhost', port=6379, db=0)
    
    def generate_text_with_cache(self, prompt, model_id="polyglot-ko-1.3b-chat"):
        # 캐시 키 생성
        cache_key = f"text_gen:{hash(prompt + model_id)}"
        
        # 캐시에서 확인
        cached_result = self.redis_client.get(cache_key)
        if cached_result:
            return json.loads(cached_result)
        
        # API 호출
        result = generate_text(prompt, model_id)
        
        # 캐시에 저장 (1시간)
        self.redis_client.setex(cache_key, 3600, json.dumps(result))
        
        return result

📚 모범 사례

1. 에러 처리

import requests
from requests.exceptions import RequestException

def safe_api_call(func, *args, **kwargs):
    try:
        return func(*args, **kwargs)
    except RequestException as e:
        print(f"네트워크 오류: {e}")
        return None
    except Exception as e:
        print(f"예상치 못한 오류: {e}")
        return None

# 사용 예제
result = safe_api_call(generate_text, "안녕하세요!")
if result:
    print(result["generated_text"])

2. 재시도 로직

import time
from functools import wraps

def retry_on_failure(max_retries=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise e
                    print(f"시도 {attempt + 1} 실패, {delay}초 후 재시도...")
                    time.sleep(delay)
            return None
        return wrapper
    return decorator

# 사용 예제
@retry_on_failure(max_retries=3, delay=2)
def robust_generate_text(prompt):
    return generate_text(prompt)

3. 비동기 처리

import asyncio
import aiohttp

async def async_generate_text(session, prompt, model_id="polyglot-ko-1.3b-chat"):
    url = "http://localhost:8001/generate"
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7
    }
    
    async with session.post(url, data=data) as response:
        return await response.json()

async def batch_generate_async(prompts):
    async with aiohttp.ClientSession() as session:
        tasks = [async_generate_text(session, prompt) for prompt in prompts]
        results = await asyncio.gather(*tasks)
        return results

# 사용 예제
prompts = ["질문1", "질문2", "질문3"]
results = asyncio.run(batch_generate_async(prompts))

4. 로깅

import logging

# 로깅 설정
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('lily_llm_client.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

def generate_text_with_logging(prompt, model_id="polyglot-ko-1.3b-chat"):
    logger.info(f"텍스트 생성 시작: {prompt[:50]}...")
    
    try:
        result = generate_text(prompt, model_id)
        logger.info(f"텍스트 생성 성공: {len(result['generated_text'])} 문자")
        return result
    except Exception as e:
        logger.error(f"텍스트 생성 실패: {e}")
        raise

📞 지원

도움말 리소스

API 문서: http://localhost:8001/docs
ReDoc 문서: http://localhost:8001/redoc
GitHub Issues: 프로젝트 저장소의 Issues 섹션
로그 파일: ./logs/ 디렉토리

디버깅 팁

로그 확인: 항상 로그를 먼저 확인하세요
단계별 테스트: 복잡한 요청을 작은 단위로 나누어 테스트하세요
네트워크 확인: 방화벽이나 프록시 설정을 확인하세요
리소스 모니터링: CPU, 메모리, 디스크 사용량을 주기적으로 확인하세요

성능 팁

적절한 모델 선택: 작업에 맞는 모델을 선택하세요
배치 처리: 여러 요청을 한 번에 처리하세요
캐싱 활용: 반복되는 요청은 캐시를 사용하세요
비동기 처리: 대량의 요청은 비동기로 처리하세요