Spaces:
Sleeping
title: VELA Research Agent
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
models:
- intrect/VELA
short_description: Korean Financial Research with 7B LLM
VELA Framework
Domain-Specialized LLM Research Agent for Korean Financial Markets
VELA is an open-source research agent framework that demonstrates how a single developer can build a domain-specialized LLM system competitive with $100M+ proprietary projects -- for under $235/month in compute costs.
Key Results
| Metric | VELA (7B) | Qwen 2.5 7B Base | GPT-4o | Exaone 3.5 7.8B |
|---|---|---|---|---|
| Domain Knowledge (100pt) | 87.5 | 72.0 | 81.0 | 74.5 |
| Korean Fluency | Native | Mixed (CN leak) | Good | Native |
| Reasoning Trace | Structured | None | Free-form | None |
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Training: SFT (58K samples) + DPO (26K pairs) on Korean financial domain
- Inference: 16 tok/s on Apple Silicon (MLX 4-bit), RunPod Serverless, or any vLLM server
- License: MIT
Architecture
User Query
|
v
[ResearchAgent] -- CoT Reasoning Loop (Think -> Search -> Analyze -> Conclude)
|
+-- [CoTReasoningEngine] -- TODO-based iterative reasoning with confidence gating
+-- [ResearchSearchModule] -- Multi-source web search (Naver + DuckDuckGo)
+-- [ContentExtractor] -- Web page & PDF content extraction
+-- [AdversaryAgent] -- Cross-verification via Perplexity API (optional)
|
v
ResearchResult (structured JSON with trajectory, claim-evidence mapping)
Quick Start
# 1. Clone
git clone https://github.com/intrect/vela-framework.git
cd vela-framework
# 2. Install
pip install -e .
# 3. Configure
cp .env.example .env
# Edit .env with your API keys (RUNPOD_API_KEY, NAVER_CLIENT_ID, etc.)
# 4. Run
python inference.py --query "SKνμ΄λμ€ HBM μμ₯ μ λ§" --backend mlx
Installation
Requirements
- Python 3.10+
- At least one LLM backend configured (RunPod, MLX, or vLLM)
From Source
pip install -e .
Dependencies
Core (auto-installed):
pydantic>=2.0-- Structured schemasrequests-- HTTP clientpython-dotenv-- Environment configurationduckduckgo-search-- Web search fallbackbeautifulsoup4-- Content extraction
Configuration
All configuration is via environment variables. Copy .env.example and fill in your keys:
# LLM Backends (configure at least one)
RUNPOD_API_KEY=your_key # RunPod Serverless
RUNPOD_ENDPOINT_ID=your_endpoint
VELA_MLX_BASE_URL=http://localhost:8081/v1 # MLX server
VLLM_BASE_URL=http://localhost:8000/v1 # vLLM server
# Search APIs
NAVER_CLIENT_ID_0=your_id # Naver Search API
NAVER_CLIENT_SECRET_0=your_secret
# Verification (optional)
PERPLEXITY_API_KEY=your_key # Adversary Agent
Usage
Python API
from vela import ResearchAgent
from vela.schemas import ResearchOptions
# Initialize with your preferred backend
agent = ResearchAgent(llm_backend="mlx")
# Run research
result = agent.research(
query="SKνμ΄λμ€ HBM μμ₯ μ λ§",
options=ResearchOptions(max_iterations=5),
)
# Access results
print(result.conclusion)
print(f"Confidence: {result.confidence:.0%}")
print(f"Sources: {len(result.sources)}")
# Save with full metadata (for training data generation)
from pathlib import Path
ResearchAgent.save_with_metadata(result, Path("output/result.json"))
CLI
# Basic research
python inference.py -q "μΌμ±μ μ λ°λ체 μ λ΅" -b mlx
# With verification
python inference.py -q "λ€μ΄λ² AI μ λ΅" --verify
# Save output
python inference.py -q "μΉ΄μΉ΄μ€ μ€μ " -o result.json
# Verbose logging
python inference.py -q "νλμ°¨ μ κΈ°μ°¨" -v
Web Interface
Try VELA directly in your browser via Gradio:
# Install with web dependencies
pip install -e ".[web]"
# Launch local demo
python app.py
# Opens at http://localhost:7860
Or try the hosted demo on HuggingFace Spaces.
LLM Backends
| Backend | Use Case | Setup |
|---|---|---|
runpod |
Cloud GPU inference | Set RUNPOD_API_KEY + RUNPOD_ENDPOINT_ID |
mlx |
Apple Silicon local | Run MLX server, set VELA_MLX_BASE_URL |
vllm |
Any GPU server | Run vLLM, set VLLM_BASE_URL |
How It Works
Chain-of-Thought Reasoning
VELA uses a TODO-based CoT protocol where each research iteration follows:
- Think: Analyze current state and generate a TODO list
- Search: Execute web searches (Naver + DuckDuckGo)
- Analyze: Extract intermediate findings from collected sources
- Conclude: Synthesize final report when confidence threshold is met
Reasoning Trace Format (Markdown)
**Step 1**:
**Thought**: SKνμ΄λμ€μ HBM μμ₯ μ μ μ¨κ³Ό κ²½μ ꡬλ λΆμ νμ
**Action**: search
**Query**: SKνμ΄λμ€ HBM3E μμ₯μ μ μ¨ 2025
**Confidence**: 35%
**Step 2**:
**Thought**: HBM λ§€μΆ λΉμ€κ³Ό μμ
μ΄μ΅λ₯ λ°μ΄ν° ν보 μλ£
**Action**: analyze
**Confidence**: 65%
**Step 3**:
**Thought**: μΆ©λΆν λ°μ΄ν° μμ§, κ²°λ‘ λμΆ κ°λ₯
**Action**: conclude
**Confidence**: 85%
Confidence Gating
The system uses a confidence gate at multiple levels:
- Per-step: Each reasoning step reports confidence (0-100%)
- Continuation: Research continues until confidence >= 80% or max iterations
- Synthesis: Final report includes overall confidence score
Adversary Verification (Optional)
When --verify is enabled, an independent AdversaryAgent cross-checks the research output using the Perplexity API, identifying:
- Factual inconsistencies
- Unsupported claims
- Missing counter-arguments
Training Your Own Model
VELA's training pipeline produced the domain-specialized model through:
- SFT (Supervised Fine-Tuning): 58K samples of Korean financial analysis with structured reasoning traces
- DPO (Direct Preference Optimization): 26K pairs targeting language purity (eliminating Chinese/English leaks from Qwen base) and reasoning quality
Model Weights
- HuggingFace: intrect/vela (GGUF Q4_K_M)
- Base: Qwen/Qwen2.5-7B-Instruct + LoRA (r=64, alpha=128)
Training Cost Breakdown
| Component | Cost | Notes |
|---|---|---|
| RunPod RTX 4090 | ~$50/month | SFT + DPO training |
| Haiku API (data gen) | ~$80 | 5 batches, 50K samples |
| Naver/Search APIs | ~$30/month | Data collection |
| Perplexity API | ~$20/month | Adversary verification |
| Total | ~$235/month |
Project Structure
vela-framework/
βββ app.py # Gradio web demo (HF Spaces)
βββ inference.py # CLI entry point
βββ vela/ # Core package
β βββ agent.py # ResearchAgent orchestrator
β βββ reasoning.py # CoT reasoning engine
β βββ search.py # Multi-source web search
β βββ schemas.py # Pydantic data models
β βββ content_extractor.py
β βββ adversary.py # Verification agent
β βββ config.py # Centralized configuration
β βββ prompts/ # System & research prompts
β βββ tools/ # LLM clients & utilities
β βββ runpod_client.py
β βββ mlx_client.py
β βββ vllm_client.py
β βββ ddg_search.py
β βββ naver_search.py
β βββ confidence_gate.py
β βββ fact_extractor.py
βββ docs/
β βββ METHODOLOGY.md # Detailed methodology
βββ examples/
βββ simple_analysis.py
Methodology
See docs/METHODOLOGY.md for detailed documentation on:
- Reasoning Trace format specification
- CoT protocol design
- Training data generation pipeline
- DPO strategy for language purity
- Benchmark methodology
Known Limitations
This open-source demo uses public search APIs only. See Production Enhancements for commercial capabilities.
| Category | Limitation | Impact | Production Note |
|---|---|---|---|
| Model Size | 7B parameter model (Qwen2.5-7B base) | Complex multi-step reasoning may degrade compared to 70B+ models | |
| Language | Korean financial domain only | English/multilingual queries produce lower quality output | |
| Real-time Data | No direct market data feed (price, volume, orderbook) | Research relies on web search snippets, not live market data | |
| Valuation | No financial database integration (e.g., FnGuide, Bloomberg) | Cannot provide real-time PER/PBR/EPS; relies on news-sourced figures | FnGuide integration available |
| Search Coverage | Naver News API + DuckDuckGo only | No access to paywalled sources (μ¦κΆμ¬ 리ν¬νΈ, μ λ£ DB) | Securities firm reports in prod |
| Content Extraction | Top 3 sources per search step | Remaining sources provide title + snippet only (no full text) | Full-text extraction in prod |
| Inference Speed | ~16 tok/s (MLX 4-bit) / ~5 tok/s (CPU BF16) | Full research cycle takes 30-120 seconds depending on iterations | |
| Repetition | 7B models may exhibit output repetition | Post-processing mitigates but does not fully eliminate | |
| Confidence | Self-reported confidence (not calibrated) | Confidence scores reflect model's subjective estimate, not statistical accuracy | |
| Temporal | Training data cutoff affects domain knowledge | Recent events after training may not be reflected in reasoning quality |
Production Enhancements
In commercial deployments, VELA can integrate:
- FnGuide API: Real-time consensus, target prices, analyst ratings (50+ firms)
- Securities firm reports: Full-text extraction from major Korean brokerages
- Financial statements: 3+ years of balance sheet, cash flow, income statement
- Order flow data: Institutional/foreign investor net buying (real-time)
Contact hello@intrect.io for enterprise features.
What VELA is NOT
- Not a trading bot: VELA generates research reports, not trade signals or orders
- Not a financial advisor: Output is for informational/educational purposes only
- Not a real-time system: Research runs in batch mode (30-120s per query), not streaming
- Not a replacement for professional analysis: Designed to augment, not replace, human judgment
Contributing
Contributions are welcome. Please open an issue first to discuss what you would like to change.
License
Citation
@software{vela_framework_2026,
title={VELA Framework: Domain-Specialized LLM Research Agent for Korean Financial Markets},
author={intrect},
year={2026},
url={https://github.com/intrect/vela-framework}
}