Spaces:

gbrabbit
/

lily_fast_api

Sleeping

App Files Files Community

lily_fast_api / docs /USER_GUIDE.md

gbrabbit

Fresh start for HF Spaces deployment

526927a 5 months ago

preview code

raw

history blame contribute delete

18.1 kB

	# Lily LLM API 사용자 가이드

	## 📋 목차

	1. [시작하기](#시작하기)
	2. [기본 기능](#기본-기능)
	3. [고급 기능](#고급-기능)
	4. [문제 해결](#문제-해결)
	5. [모범 사례](#모범-사례)

	## 🚀 시작하기

	### 시스템 요구사항

	- 최소 사양:
	- CPU: 4코어 이상
	- RAM: 8GB 이상
	- 저장공간: 20GB 이상
	- GPU: 선택사항 (CUDA 지원 시 성능 향상)

	- 권장 사양:
	- CPU: 8코어 이상
	- RAM: 16GB 이상
	- 저장공간: 50GB 이상
	- GPU: NVIDIA RTX 3060 이상 (CUDA 지원)

	### 설치 및 실행

	#### 1. Docker를 사용한 배포 (권장)

	```bash
	# 저장소 클론
	git clone <repository-url>
	cd lily_generate_package

	# 배포 실행
	chmod +x scripts/deploy.sh
	./scripts/deploy.sh deploy

	# 상태 확인
	./scripts/deploy.sh status
	```

	#### 2. 로컬 개발 환경

	```bash
	# 가상환경 생성
	python -m venv venv
	source venv/bin/activate # Windows: venv\Scripts\activate

	# 의존성 설치
	pip install -r requirements.txt

	# NLTK 데이터 다운로드
	python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

	# 서버 실행
	python run_server_v2.py
	```

	### 첫 번째 요청

	```bash
	# 서버 상태 확인
	curl http://localhost:8001/health

	# 모델 목록 조회
	curl http://localhost:8001/models

	# 간단한 텍스트 생성
	curl -X POST http://localhost:8001/generate \
	-H "Content-Type: application/x-www-form-urlencoded" \
	-d "prompt=안녕하세요!&model_id=polyglot-ko-1.3b-chat&max_length=100"
	```

	## 🤖 기본 기능

	### 1. 텍스트 생성

	#### 단순 텍스트 생성

	```python
	import requests

	def generate_text(prompt, model_id="polyglot-ko-1.3b-chat"):
	url = "http://localhost:8001/generate"
	data = {
	"prompt": prompt,
	"model_id": model_id,
	"max_length": 200,
	"temperature": 0.7,
	"top_p": 0.9,
	"do_sample": True
	}

	response = requests.post(url, data=data)
	return response.json()

	# 사용 예제
	result = generate_text("인공지능의 미래에 대해 설명해주세요.")
	print(result["generated_text"])
	```

	#### 파라미터 설명

	\| 파라미터 \| 설명 \| 기본값 \| 범위 \|
	\|----------\|------\|--------\|------\|
	\| `prompt` \| 입력 텍스트 \| 필수 \| - \|
	\| `model_id` \| 사용할 모델 \| polyglot-ko-1.3b-chat \| 사용 가능한 모델 목록 \|
	\| `max_length` \| 최대 토큰 수 \| 200 \| 1-4000 \|
	\| `temperature` \| 창의성 조절 \| 0.7 \| 0.0-2.0 \|
	\| `top_p` \| 누적 확률 임계값 \| 0.9 \| 0.0-1.0 \|
	\| `do_sample` \| 샘플링 사용 여부 \| True \| True/False \|

	### 2. 멀티모달 처리

	#### 이미지와 텍스트 함께 처리

	```python
	def generate_multimodal(prompt, image_files, model_id="kanana-1.5-v-3b-instruct"):
	url = "http://localhost:8001/generate-multimodal"

	files = []
	for i, image_file in enumerate(image_files):
	files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))

	data = {
	"prompt": prompt,
	"model_id": model_id,
	"max_length": 200,
	"temperature": 0.7
	}

	response = requests.post(url, files=files, data=data)
	return response.json()

	# 사용 예제
	result = generate_multimodal(
	"이 이미지에 대해 설명해주세요.",
	["image1.jpg", "image2.jpg"]
	)
	print(result["generated_text"])
	```

	### 3. 사용자 관리

	#### 사용자 등록 및 로그인

	```python
	def register_user(username, email, password):
	url = "http://localhost:8001/auth/register"
	data = {
	"username": username,
	"email": email,
	"password": password
	}

	response = requests.post(url, data=data)
	return response.json()

	def login_user(username, password):
	url = "http://localhost:8001/auth/login"
	data = {
	"username": username,
	"password": password
	}

	response = requests.post(url, data=data)
	return response.json()

	# 사용 예제
	# 1. 사용자 등록
	register_result = register_user("testuser", "test@example.com", "password123")
	access_token = register_result["access_token"]

	# 2. 로그인
	login_result = login_user("testuser", "password123")
	access_token = login_result["access_token"]
	```

	#### 인증이 필요한 요청

	```python
	def authenticated_request(url, data, token):
	headers = {"Authorization": f"Bearer {token}"}
	response = requests.post(url, data=data, headers=headers)
	return response.json()

	# 사용 예제
	result = authenticated_request(
	"http://localhost:8001/generate",
	{"prompt": "안녕하세요!", "model_id": "polyglot-ko-1.3b-chat"},
	access_token
	)
	```

	## 📄 고급 기능

	### 1. 문서 처리 (RAG)

	#### 문서 업로드

	```python
	def upload_document(file_path, user_id, token=None):
	url = "http://localhost:8001/document/upload"

	with open(file_path, 'rb') as f:
	files = {'file': f}
	data = {'user_id': user_id}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, files=files, data=data, headers=headers)
	return response.json()

	# 사용 예제
	result = upload_document("document.pdf", "user123", access_token)
	document_id = result["document_id"]
	```

	#### RAG 쿼리

	```python
	def rag_query(query, user_id, token=None):
	url = "http://localhost:8001/rag/generate"

	data = {
	"query": query,
	"user_id": user_id,
	"max_length": 300,
	"temperature": 0.7
	}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, data=data, headers=headers)
	return response.json()

	# 사용 예제
	result = rag_query("인공지능의 미래에 대해 알려주세요.", "user123", access_token)
	print(result["response"])
	print("출처:", result["sources"])
	```

	#### 하이브리드 RAG (이미지 + 문서)

	```python
	def hybrid_rag_query(query, image_files, user_id, token=None):
	url = "http://localhost:8001/rag/generate-hybrid"

	files = []
	for i, image_file in enumerate(image_files):
	files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))

	data = {
	"query": query,
	"user_id": user_id,
	"max_length": 300,
	"temperature": 0.7
	}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, files=files, data=data, headers=headers)
	return response.json()
	```

	### 2. 채팅 세션 관리

	#### 세션 생성 및 메시지 관리

	```python
	def create_chat_session(user_id, session_name, token=None):
	url = "http://localhost:8001/session/create"

	data = {
	"user_id": user_id,
	"session_name": session_name
	}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, data=data, headers=headers)
	return response.json()

	def add_chat_message(session_id, user_id, content, token=None):
	url = "http://localhost:8001/chat/message"

	data = {
	"session_id": session_id,
	"user_id": user_id,
	"message_type": "text",
	"content": content
	}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, data=data, headers=headers)
	return response.json()

	def get_chat_history(session_id, token=None):
	url = f"http://localhost:8001/chat/history/{session_id}"
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.get(url, headers=headers)
	return response.json()

	# 사용 예제
	# 1. 세션 생성
	session_result = create_chat_session("user123", "AI 상담", access_token)
	session_id = session_result["session_id"]

	# 2. 메시지 추가
	add_chat_message(session_id, "user123", "안녕하세요!", access_token)

	# 3. 채팅 기록 조회
	history = get_chat_history(session_id, access_token)
	for message in history:
	print(f"{message['timestamp']}: {message['content']}")
	```

	### 3. 백그라운드 작업

	#### 문서 처리 작업

	```python
	def start_document_processing(file_path, user_id, token=None):
	url = "http://localhost:8001/tasks/document/process"

	data = {
	"file_path": file_path,
	"user_id": user_id
	}
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.post(url, data=data, headers=headers)
	return response.json()

	def check_task_status(task_id, token=None):
	url = f"http://localhost:8001/tasks/{task_id}"
	headers = {"Authorization": f"Bearer {token}"} if token else {}

	response = requests.get(url, headers=headers)
	return response.json()

	# 사용 예제
	# 1. 작업 시작
	task_result = start_document_processing("/path/to/document.pdf", "user123", access_token)
	task_id = task_result["task_id"]

	# 2. 작업 상태 확인
	import time
	while True:
	status = check_task_status(task_id, access_token)
	print(f"상태: {status['status']}, 진행률: {status.get('progress', 0)}%")

	if status['status'] in ['SUCCESS', 'FAILURE']:
	break

	time.sleep(5)
	```

	### 4. 모니터링

	#### 성능 모니터링

	```python
	def start_monitoring():
	url = "http://localhost:8001/monitoring/start"
	response = requests.post(url)
	return response.json()

	def get_monitoring_status():
	url = "http://localhost:8001/monitoring/status"
	response = requests.get(url)
	return response.json()

	def get_system_health():
	url = "http://localhost:8001/monitoring/health"
	response = requests.get(url)
	return response.json()

	# 사용 예제
	# 1. 모니터링 시작
	start_monitoring()

	# 2. 상태 확인
	status = get_monitoring_status()
	print(f"CPU 사용률: {status['current_metrics']['cpu_percent']}%")
	print(f"메모리 사용률: {status['current_metrics']['memory_percent']}%")

	# 3. 시스템 건강 상태
	health = get_system_health()
	print(f"시스템 상태: {health['status']}")
	for recommendation in health['recommendations']:
	print(f"권장사항: {recommendation}")
	```

	## 🔌 WebSocket 실시간 채팅

	### WebSocket 클라이언트

	```javascript
	class LilyLLMWebSocket {
	constructor(userId) {
	this.userId = userId;
	this.ws = null;
	this.messageHandlers = [];
	}

	connect() {
	this.ws = new WebSocket(`ws://localhost:8001/ws/${this.userId}`);

	this.ws.onopen = () => {
	console.log('WebSocket 연결됨');
	};

	this.ws.onmessage = (event) => {
	const data = JSON.parse(event.data);
	this.handleMessage(data);
	};

	this.ws.onclose = () => {
	console.log('WebSocket 연결 종료');
	};

	this.ws.onerror = (error) => {
	console.error('WebSocket 오류:', error);
	};
	}

	sendMessage(message, sessionId) {
	if (this.ws && this.ws.readyState === WebSocket.OPEN) {
	this.ws.send(JSON.stringify({
	type: 'chat',
	message: message,
	session_id: sessionId
	}));
	}
	}

	addMessageHandler(handler) {
	this.messageHandlers.push(handler);
	}

	handleMessage(data) {
	this.messageHandlers.forEach(handler => handler(data));
	}

	disconnect() {
	if (this.ws) {
	this.ws.close();
	}
	}
	}

	// 사용 예제
	const wsClient = new LilyLLMWebSocket('user123');
	wsClient.connect();

	wsClient.addMessageHandler((data) => {
	console.log('메시지 수신:', data);
	});

	wsClient.sendMessage('안녕하세요!', 'session123');
	```

	## 🚨 문제 해결

	### 일반적인 문제들

	#### 1. 서버 연결 실패

	증상: `Connection refused` 또는 `Failed to establish a new connection`

	해결 방법:
	```bash
	# 서버 상태 확인
	curl http://localhost:8001/health

	# 서버 재시작
	./scripts/deploy.sh restart

	# 로그 확인
	./scripts/deploy.sh logs
	```

	#### 2. 메모리 부족

	증상: `Out of memory` 또는 응답 속도 저하

	해결 방법:
	```bash
	# 메모리 사용량 확인
	docker stats

	# 불필요한 컨테이너 정리
	docker system prune -f

	# 리소스 제한 설정 (docker-compose.yml)
	services:
	lily-llm-api:
	deploy:
	resources:
	limits:
	memory: 4G
	```

	#### 3. 모델 로딩 실패

	증상: `Model not found` 또는 모델 관련 오류

	해결 방법:
	```bash
	# 모델 목록 확인
	curl http://localhost:8001/models

	# 모델 파일 확인
	ls -la models/

	# 서버 재시작
	./scripts/deploy.sh restart
	```

	#### 4. 인증 오류

	증상: `401 Unauthorized` 또는 `403 Forbidden`

	해결 방법:
	```python
	# 토큰 갱신
	def refresh_token(refresh_token):
	url = "http://localhost:8001/auth/refresh"
	data = {"refresh_token": refresh_token}
	response = requests.post(url, data=data)
	return response.json()

	# 새로운 토큰으로 요청
	new_tokens = refresh_token(old_refresh_token)
	access_token = new_tokens["access_token"]
	```

	### 성능 최적화

	#### 1. 배치 처리

	```python
	def batch_generate_texts(prompts, model_id="polyglot-ko-1.3b-chat"):
	results = []
	for prompt in prompts:
	result = generate_text(prompt, model_id)
	results.append(result)
	return results

	# 사용 예제
	prompts = [
	"첫 번째 질문입니다.",
	"두 번째 질문입니다.",
	"세 번째 질문입니다."
	]
	results = batch_generate_texts(prompts)
	```

	#### 2. 캐싱 활용

	```python
	import redis
	import json

	class CachedLilyLLMClient:
	def __init__(self, base_url="http://localhost:8001"):
	self.base_url = base_url
	self.redis_client = redis.Redis(host='localhost', port=6379, db=0)

	def generate_text_with_cache(self, prompt, model_id="polyglot-ko-1.3b-chat"):
	# 캐시 키 생성
	cache_key = f"text_gen:{hash(prompt + model_id)}"

	# 캐시에서 확인
	cached_result = self.redis_client.get(cache_key)
	if cached_result:
	return json.loads(cached_result)

	# API 호출
	result = generate_text(prompt, model_id)

	# 캐시에 저장 (1시간)
	self.redis_client.setex(cache_key, 3600, json.dumps(result))

	return result
	```

	## 📚 모범 사례

	### 1. 에러 처리

	```python
	import requests
	from requests.exceptions import RequestException

	def safe_api_call(func, args, *kwargs):
	try:
	return func(args, *kwargs)
	except RequestException as e:
	print(f"네트워크 오류: {e}")
	return None
	except Exception as e:
	print(f"예상치 못한 오류: {e}")
	return None

	# 사용 예제
	result = safe_api_call(generate_text, "안녕하세요!")
	if result:
	print(result["generated_text"])
	```

	### 2. 재시도 로직

	```python
	import time
	from functools import wraps

	def retry_on_failure(max_retries=3, delay=1):
	def decorator(func):
	@wraps(func)
	def wrapper(args, *kwargs):
	for attempt in range(max_retries):
	try:
	return func(args, *kwargs)
	except Exception as e:
	if attempt == max_retries - 1:
	raise e
	print(f"시도 {attempt + 1} 실패, {delay}초 후 재시도...")
	time.sleep(delay)
	return None
	return wrapper
	return decorator

	# 사용 예제
	@retry_on_failure(max_retries=3, delay=2)
	def robust_generate_text(prompt):
	return generate_text(prompt)
	```

	### 3. 비동기 처리

	```python
	import asyncio
	import aiohttp

	async def async_generate_text(session, prompt, model_id="polyglot-ko-1.3b-chat"):
	url = "http://localhost:8001/generate"
	data = {
	"prompt": prompt,
	"model_id": model_id,
	"max_length": 200,
	"temperature": 0.7
	}

	async with session.post(url, data=data) as response:
	return await response.json()

	async def batch_generate_async(prompts):
	async with aiohttp.ClientSession() as session:
	tasks = [async_generate_text(session, prompt) for prompt in prompts]
	results = await asyncio.gather(*tasks)
	return results

	# 사용 예제
	prompts = ["질문1", "질문2", "질문3"]
	results = asyncio.run(batch_generate_async(prompts))
	```

	### 4. 로깅

	```python
	import logging

	# 로깅 설정
	logging.basicConfig(
	level=logging.INFO,
	format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
	handlers=[
	logging.FileHandler('lily_llm_client.log'),
	logging.StreamHandler()
	]
	)

	logger = logging.getLogger(__name__)

	def generate_text_with_logging(prompt, model_id="polyglot-ko-1.3b-chat"):
	logger.info(f"텍스트 생성 시작: {prompt[:50]}...")

	try:
	result = generate_text(prompt, model_id)
	logger.info(f"텍스트 생성 성공: {len(result['generated_text'])} 문자")
	return result
	except Exception as e:
	logger.error(f"텍스트 생성 실패: {e}")
	raise
	```

	## 📞 지원

	### 도움말 리소스

	- API 문서: `http://localhost:8001/docs`
	- ReDoc 문서: `http://localhost:8001/redoc`
	- GitHub Issues: 프로젝트 저장소의 Issues 섹션
	- 로그 파일: `./logs/` 디렉토리

	### 디버깅 팁

	1. 로그 확인: 항상 로그를 먼저 확인하세요
	2. 단계별 테스트: 복잡한 요청을 작은 단위로 나누어 테스트하세요
	3. 네트워크 확인: 방화벽이나 프록시 설정을 확인하세요
	4. 리소스 모니터링: CPU, 메모리, 디스크 사용량을 주기적으로 확인하세요

	### 성능 팁

	1. 적절한 모델 선택: 작업에 맞는 모델을 선택하세요
	2. 배치 처리: 여러 요청을 한 번에 처리하세요
	3. 캐싱 활용: 반복되는 요청은 캐시를 사용하세요
	4. 비동기 처리: 대량의 요청은 비동기로 처리하세요