soyailabs / DATA_PERSISTENCE_SOLUTION.md
SOY NV AI
Add PostgreSQL support and update database configuration for data persistence in Hugging Face Spaces
9f9640b
# Hugging Face Spaces ๋ฐ์ดํ„ฐ ์˜์†์„ฑ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•
## ๋ฌธ์ œ ์›์ธ
Hugging Face Spaces๋Š” Docker ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•˜๋ฏ€๋กœ, **์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์žฌ์‹œ์ž‘๋˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธ๋˜๋ฉด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค.**
ํ˜„์žฌ ์ €์žฅ๋˜๋Š” ๋ฐ์ดํ„ฐ:
- ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค: `instance/finance_analysis.db`
- ์—…๋กœ๋“œ ํŒŒ์ผ: `uploads/` ํด๋”
- ๋ฒกํ„ฐ DB: `vector_db/` ํด๋”
- ๋กœ๊ทธ: `logs/` ํด๋”
## ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•
### ๋ฐฉ๋ฒ• 1: ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‚ฌ์šฉ (๊ถŒ์žฅ)
PostgreSQL, MySQL ๋“ฑ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ์˜๊ตฌ์ ์œผ๋กœ ๋ณด์กด๋ฉ๋‹ˆ๋‹ค.
#### PostgreSQL ์‚ฌ์šฉ ์˜ˆ์‹œ
1. **Supabase, Neon, ๋˜๋Š” Railway์—์„œ ๋ฌด๋ฃŒ PostgreSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ƒ์„ฑ**
2. **ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •** (Hugging Face Spaces Settings > Repository secrets):
```
DATABASE_URL=postgresql://user:password@host:port/database
```
3. **requirements.txt์— PostgreSQL ๋“œ๋ผ์ด๋ฒ„ ์ถ”๊ฐ€**:
```
psycopg2-binary
```
4. **์ฝ”๋“œ ์ˆ˜์ •** (`app/core/config.py`):
```python
SQLALCHEMY_DATABASE_URI: str = os.getenv(
'DATABASE_URL',
f'sqlite:///{PROJECT_ROOT / "instance" / "finance_analysis.db"}'
)
```
์ด๋ฏธ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ, `DATABASE_URL`๋งŒ ์„ค์ •ํ•˜๋ฉด ์ž๋™์œผ๋กœ PostgreSQL์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
### ๋ฐฉ๋ฒ• 2: ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€ ์‚ฌ์šฉ (ํŒŒ์ผ ์ €์žฅ์šฉ)
์—…๋กœ๋“œ๋œ ํŒŒ์ผ๊ณผ ๋ฒกํ„ฐ DB๋ฅผ ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
#### AWS S3 ์‚ฌ์šฉ ์˜ˆ์‹œ
1. **boto3 ์„ค์น˜**:
```
pip install boto3
```
2. **ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •**:
```
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_S3_BUCKET=your_bucket_name
```
3. **์ฝ”๋“œ ์ˆ˜์ •**: ํŒŒ์ผ ์—…๋กœ๋“œ/๋‹ค์šด๋กœ๋“œ ๋กœ์ง์„ S3๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ๋ณ€๊ฒฝ
#### Google Cloud Storage ์‚ฌ์šฉ ์˜ˆ์‹œ
1. **google-cloud-storage ์„ค์น˜**:
```
pip install google-cloud-storage
```
2. **ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •**:
```
GCS_BUCKET_NAME=your_bucket_name
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
```
### ๋ฐฉ๋ฒ• 3: Hugging Face Dataset ์‚ฌ์šฉ (๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•)
Hugging Face์˜ Dataset API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1. **datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜**:
```
pip install datasets
```
2. **์ฝ”๋“œ ์˜ˆ์‹œ**:
```python
from datasets import Dataset
import json
# ๋ฐ์ดํ„ฐ ์ €์žฅ
def save_to_hf_dataset(data, dataset_name):
dataset = Dataset.from_dict(data)
dataset.push_to_hub(dataset_name, token=HF_TOKEN)
# ๋ฐ์ดํ„ฐ ๋กœ๋“œ
def load_from_hf_dataset(dataset_name):
dataset = Dataset.from_hub(dataset_name, token=HF_TOKEN)
return dataset.to_dict()
```
### ๋ฐฉ๋ฒ• 4: ์ •๊ธฐ์ ์ธ ๋ฐฑ์—… ์‹œ์Šคํ…œ
์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •๊ธฐ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฑ์—…ํ•ฉ๋‹ˆ๋‹ค.
1. **๋ฐฑ์—… ์Šคํฌ๋ฆฝํŠธ ์ƒ์„ฑ** (`backup_data.py`):
```python
import shutil
import os
from datetime import datetime
from huggingface_hub import HfApi
def backup_to_hf():
# ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฑ์—…
if os.path.exists('instance/finance_analysis.db'):
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
backup_name = f'backup_{timestamp}.db'
shutil.copy('instance/finance_analysis.db', backup_name)
# Hugging Face Hub์— ์—…๋กœ๋“œ
api = HfApi()
api.upload_file(
path_or_fileobj=backup_name,
path_in_repo=f'backups/{backup_name}',
repo_id='your-username/your-repo',
token=os.getenv('HF_TOKEN')
)
```
2. **์Šค์ผ€์ค„๋Ÿฌ ์„ค์ •**: GitHub Actions ๋˜๋Š” ์™ธ๋ถ€ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์šฉ
## ์ฆ‰์‹œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์ž„์‹œ ํ•ด๊ฒฐ์ฑ…
### ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ฒฝ๋กœ ๋ณ€๊ฒฝ
Hugging Face Spaces์˜ ์˜์†์„ฑ ์Šคํ† ๋ฆฌ์ง€(์žˆ๋Š” ๊ฒฝ์šฐ)๋ฅผ ์‚ฌ์šฉ:
```python
# app/core/config.py ์ˆ˜์ •
SQLALCHEMY_DATABASE_URI: str = os.getenv(
'DATABASE_URL',
f'sqlite:///{os.getenv("HF_HOME", str(PROJECT_ROOT / "instance"))}/finance_analysis.db'
)
```
## ๊ถŒ์žฅ ์‚ฌํ•ญ
1. **ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ**: ๋ฐฉ๋ฒ• 1 (์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค) + ๋ฐฉ๋ฒ• 2 (์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€)
2. **๊ฐœ๋ฐœ/ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ**: ๋ฐฉ๋ฒ• 3 (Hugging Face Dataset) ๋˜๋Š” ๋ฐฉ๋ฒ• 4 (์ •๊ธฐ ๋ฐฑ์—…)
3. **์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ**: ํ•ญ์ƒ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‚ฌ์šฉ
## ์ฐธ๊ณ 
- [Hugging Face Spaces ๋ฌธ์„œ](https://huggingface.co/docs/hub/spaces)
- [Supabase (๋ฌด๋ฃŒ PostgreSQL)](https://supabase.com/)
- [Neon (์„œ๋ฒ„๋ฆฌ์Šค PostgreSQL)](https://neon.tech/)
- [Railway (PostgreSQL)](https://railway.app/)