soyailabs / DATA_PERSISTENCE_SOLUTION.md
SOY NV AI
Add PostgreSQL support and update database configuration for data persistence in Hugging Face Spaces
9f9640b

Hugging Face Spaces ๋ฐ์ดํ„ฐ ์˜์†์„ฑ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

๋ฌธ์ œ ์›์ธ

Hugging Face Spaces๋Š” Docker ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•˜๋ฏ€๋กœ, ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์žฌ์‹œ์ž‘๋˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธ๋˜๋ฉด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

ํ˜„์žฌ ์ €์žฅ๋˜๋Š” ๋ฐ์ดํ„ฐ:

  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค: instance/finance_analysis.db
  • ์—…๋กœ๋“œ ํŒŒ์ผ: uploads/ ํด๋”
  • ๋ฒกํ„ฐ DB: vector_db/ ํด๋”
  • ๋กœ๊ทธ: logs/ ํด๋”

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

๋ฐฉ๋ฒ• 1: ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‚ฌ์šฉ (๊ถŒ์žฅ)

PostgreSQL, MySQL ๋“ฑ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ์˜๊ตฌ์ ์œผ๋กœ ๋ณด์กด๋ฉ๋‹ˆ๋‹ค.

PostgreSQL ์‚ฌ์šฉ ์˜ˆ์‹œ

  1. Supabase, Neon, ๋˜๋Š” Railway์—์„œ ๋ฌด๋ฃŒ PostgreSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ƒ์„ฑ

  2. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • (Hugging Face Spaces Settings > Repository secrets):

    DATABASE_URL=postgresql://user:password@host:port/database
    
  3. requirements.txt์— PostgreSQL ๋“œ๋ผ์ด๋ฒ„ ์ถ”๊ฐ€:

    psycopg2-binary
    
  4. ์ฝ”๋“œ ์ˆ˜์ • (app/core/config.py):

    SQLALCHEMY_DATABASE_URI: str = os.getenv(
        'DATABASE_URL', 
        f'sqlite:///{PROJECT_ROOT / "instance" / "finance_analysis.db"}'
    )
    

    ์ด๋ฏธ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ, DATABASE_URL๋งŒ ์„ค์ •ํ•˜๋ฉด ์ž๋™์œผ๋กœ PostgreSQL์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• 2: ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€ ์‚ฌ์šฉ (ํŒŒ์ผ ์ €์žฅ์šฉ)

์—…๋กœ๋“œ๋œ ํŒŒ์ผ๊ณผ ๋ฒกํ„ฐ DB๋ฅผ ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

AWS S3 ์‚ฌ์šฉ ์˜ˆ์‹œ

  1. boto3 ์„ค์น˜:

    pip install boto3
    
  2. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •:

    AWS_ACCESS_KEY_ID=your_access_key
    AWS_SECRET_ACCESS_KEY=your_secret_key
    AWS_S3_BUCKET=your_bucket_name
    
  3. ์ฝ”๋“œ ์ˆ˜์ •: ํŒŒ์ผ ์—…๋กœ๋“œ/๋‹ค์šด๋กœ๋“œ ๋กœ์ง์„ S3๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ๋ณ€๊ฒฝ

Google Cloud Storage ์‚ฌ์šฉ ์˜ˆ์‹œ

  1. google-cloud-storage ์„ค์น˜:

    pip install google-cloud-storage
    
  2. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •:

    GCS_BUCKET_NAME=your_bucket_name
    GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
    

๋ฐฉ๋ฒ• 3: Hugging Face Dataset ์‚ฌ์šฉ (๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•)

Hugging Face์˜ Dataset API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜:

    pip install datasets
    
  2. ์ฝ”๋“œ ์˜ˆ์‹œ:

    from datasets import Dataset
    import json
    
    # ๋ฐ์ดํ„ฐ ์ €์žฅ
    def save_to_hf_dataset(data, dataset_name):
        dataset = Dataset.from_dict(data)
        dataset.push_to_hub(dataset_name, token=HF_TOKEN)
    
    # ๋ฐ์ดํ„ฐ ๋กœ๋“œ
    def load_from_hf_dataset(dataset_name):
        dataset = Dataset.from_hub(dataset_name, token=HF_TOKEN)
        return dataset.to_dict()
    

๋ฐฉ๋ฒ• 4: ์ •๊ธฐ์ ์ธ ๋ฐฑ์—… ์‹œ์Šคํ…œ

์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •๊ธฐ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฑ์—…ํ•ฉ๋‹ˆ๋‹ค.

  1. ๋ฐฑ์—… ์Šคํฌ๋ฆฝํŠธ ์ƒ์„ฑ (backup_data.py):

    import shutil
    import os
    from datetime import datetime
    from huggingface_hub import HfApi
    
    def backup_to_hf():
        # ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฑ์—…
        if os.path.exists('instance/finance_analysis.db'):
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            backup_name = f'backup_{timestamp}.db'
            shutil.copy('instance/finance_analysis.db', backup_name)
            
            # Hugging Face Hub์— ์—…๋กœ๋“œ
            api = HfApi()
            api.upload_file(
                path_or_fileobj=backup_name,
                path_in_repo=f'backups/{backup_name}',
                repo_id='your-username/your-repo',
                token=os.getenv('HF_TOKEN')
            )
    
  2. ์Šค์ผ€์ค„๋Ÿฌ ์„ค์ •: GitHub Actions ๋˜๋Š” ์™ธ๋ถ€ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์šฉ

์ฆ‰์‹œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์ž„์‹œ ํ•ด๊ฒฐ์ฑ…

ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ฒฝ๋กœ ๋ณ€๊ฒฝ

Hugging Face Spaces์˜ ์˜์†์„ฑ ์Šคํ† ๋ฆฌ์ง€(์žˆ๋Š” ๊ฒฝ์šฐ)๋ฅผ ์‚ฌ์šฉ:

# app/core/config.py ์ˆ˜์ •
SQLALCHEMY_DATABASE_URI: str = os.getenv(
    'DATABASE_URL', 
    f'sqlite:///{os.getenv("HF_HOME", str(PROJECT_ROOT / "instance"))}/finance_analysis.db'
)

๊ถŒ์žฅ ์‚ฌํ•ญ

  1. ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ: ๋ฐฉ๋ฒ• 1 (์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค) + ๋ฐฉ๋ฒ• 2 (์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€)
  2. ๊ฐœ๋ฐœ/ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ: ๋ฐฉ๋ฒ• 3 (Hugging Face Dataset) ๋˜๋Š” ๋ฐฉ๋ฒ• 4 (์ •๊ธฐ ๋ฐฑ์—…)
  3. ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ: ํ•ญ์ƒ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‚ฌ์šฉ

์ฐธ๊ณ