Spaces:
Runtime error
Runtime error
metadata
title: kwmin_probin
emoji: π₯Έ
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
π TEAM EA - RFP λ¬Έμ λΆμ μμ€ν (MVP)
Andrew Ng μμΉ κΈ°λ°μΌλ‘ μ€κ³λ RAG μμ€ν
π― νλ‘μ νΈ λͺ©ν
"Start Simple, Then Iterate" - Andrew Ng
- β Week 1: MVP μλ (PDF μ λ‘λ, μ§λ¬Έ-λ΅λ³, μΆμ² νμ)
- π Week 2: μ νλ 70%+ (νμ΄λΈλ¦¬λ κ²μ, 리λνΉ)
- π Week 3: νλ‘λμ λ 벨 (90%+ μ νλ, μμ μ±)
ποΈ μν€ν μ²
PDF μ
λ‘λ
β
ν
μ€νΈ μΆμΆ (pymupdf4llm)
β
μ²νΉ (800μ, μ€λ²λ© 150)
β
μλ² λ© (text-embedding-3-small)
β
ChromaDB μ μ₯
β
μ§λ¬Έ μ
λ ₯
β
λ²‘ν° κ²μ (top-10)
β
Grok λ΅λ³ μμ±
β
μΆμ² νμ
π λλ ν 리 ꡬ쑰
TEAM_EA_V2/
β
βββ app.py # Streamlit λ©μΈ
β
βββ config/
β βββ settings.py # API keys, μ€μ
β
βββ core/
β βββ pdf_loader.py # PDF ν
μ€νΈ μΆμΆ
β βββ chunker.py # μ²νΉ
β βββ embedder.py # μλ² λ©
β βββ vectordb.py # ChromaDB κ΄λ¦¬
β βββ retriever.py # κ²μ
β βββ generator.py # Grok λ΅λ³ μμ±
β
βββ utils/
β βββ logger.py # λ‘κΉ
β βββ helpers.py # μ νΈλ¦¬ν°
β
βββ ui/
β βββ components.py # Streamlit μ»΄ν¬λνΈ
β βββ styles.py # CSS
β
βββ data/
β βββ uploads/ # μ
λ‘λλ PDF
β βββ chroma_db/ # ChromaDB μ μ₯μ
β
βββ requirements.txt
βββ .env # API keys (gitignore)
βββ README.md
π λΉ λ₯Έ μμ
1. μ€μΉ
# κ°μνκ²½ μμ±
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# ν¨ν€μ§ μ€μΉ
pip install -r requirements.txt
2. νκ²½ μ€μ
.env νμΌ μμ±:
OPENAI_API_KEY=your_openai_api_key_here
XAI_API_KEY=your_grok_api_key_here
3. μ€ν
streamlit run app.py
λΈλΌμ°μ μμ http://localhost:8501 μ μ
π‘ μ¬μ© λ°©λ²
1λ¨κ³: PDF μ λ‘λ
- μ¬μ΄λλ° λλ λ©μΈ νλ©΄μμ PDF νμΌ μ λ‘λ
- "λ¬Έμ μ²λ¦¬ μμ" λ²νΌ ν΄λ¦
2λ¨κ³: μ§λ¬ΈνκΈ°
- μ²λ¦¬κ° μλ£λλ©΄ μ§λ¬Έ μ λ ₯μ°½μ΄ νμ±ν
- μ§λ¬Έ μ λ ₯ ν Enter
3λ¨κ³: λ΅λ³ νμΈ
- Grokμ΄ μμ±ν λ΅λ³ νμΈ
- μΆμ² νμ΄μ§ λ° μλ¬Έ νμΈ
π§ κΈ°μ μ€ν
| κ΅¬μ± μμ | κΈ°μ | μ΄μ |
|---|---|---|
| PDF μ μ²λ¦¬ | pymupdf4llm + PyMuPDF | ν λλ ΈνΈ μ€νμΌ, μμ μ |
| μλ² λ© | text-embedding-3-small | μ λ ΄($0.00002/1K tokens), λΉ λ¦ |
| Vector DB | ChromaDB | λ‘컬 μ€ν, Python native |
| LLM | Grok (xAI) | νκ΅μ΄ μ±λ₯ μ°μ |
| UI | Streamlit | λΉ λ₯Έ νλ‘ν νμ΄ν |
βοΈ μ€μ
config/settings.py
# μλ² λ© μ€μ
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSION = 1536
# μ²νΉ μ€μ
CHUNK_SIZE = 800 # λ¬Έμ λ¨μ
CHUNK_OVERLAP = 150 # μ€λ²λ©
# κ²μ μ€μ
TOP_K = 10 # μμ Kκ° κ²μ
# Grok μ€μ
GROK_MODEL = "grok-beta"
π μ±λ₯ μ§ν
MVP λͺ©ν (Week 1)
- β PDF μ λ‘λ κ°λ₯
- β μ§λ¬Έ-λ΅λ³ μλ
- β μΆμ² νμ
- β κΈ°λ³Έ UI
Phase 2 λͺ©ν (Week 2)
- β³ νμ΄λΈλ¦¬λ κ²μ (BM25 + Vector)
- Ⳡ리λνΉ (Cohere Rerank)
- β³ νμ΄λΌμ΄ν
- β³ μ νλ 70%+
Phase 3 λͺ©ν (Week 3)
- β³ PDF μΆκ° μ λ‘λ
- β³ λ©νλ°μ΄ν° λ‘κΉ
- β³ μλ¬ νΈλ€λ§
- β³ μ νλ 90%+
π νΈλ¬λΈμν
1. API Key μ€λ₯
# .env νμΌ νμΈ
OPENAI_API_KEY=sk-...
XAI_API_KEY=xai-...
2. ν¨ν€μ§ μ€μΉ μ€λ₯
# κ°λ³ μ€μΉ μλ
pip install streamlit
pip install chromadb
pip install openai
pip install pymupdf4llm
3. ChromaDB μ€λ₯
# λ°μ΄ν°λ² μ΄μ€ μ΄κΈ°ν
rm -rf data/chroma_db/*
π κ°λ° λ‘κ·Έ
v1.0 (MVP)
- PDF μ λ‘λ λ° ν μ€νΈ μΆμΆ
- μ²νΉ λ° μλ² λ©
- ChromaDB μ μ₯
- λ²‘ν° κ²μ
- Grok λ΅λ³ μμ±
- Streamlit UI
- μΆμ² νμ
v2.0 (μμ )
- νμ΄λΈλ¦¬λ κ²μ
- 리λνΉ
- νμ΄λΌμ΄ν
- μ νλ μΈ‘μ
π¨βπ» κ°λ°μ
TEAM EA
π λΌμ΄μ μ€
MIT License
π κ°μ¬μ λ§
- Andrew Ng: ML μμ€ν μ€κ³ μμΉ
- ν λλ ΈνΈ: PDF μ²λ¦¬ λ°©λ²λ‘
- OpenAI: μλ² λ© λͺ¨λΈ
- xAI: Grok LLM
π λ¬Έμ
μ΄μκ° μμΌμλ©΄ GitHub Issuesμ λ±λ‘ν΄μ£ΌμΈμ.