Alpha Factory β Open-Source LLM-Driven Pipeline for WorldQuant BRAIN
Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.
Quick Start
# Install uv (if not already installed)
# Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone
git clone https://huggingface.co/gaurv007/alpha-factory
cd alpha-factory
# Install (uv handles everything β venv, deps, lockfile)
uv sync
# With optional RAG support
uv sync --extra rag
# With all optional deps
uv sync --extra all
# Start Ollama (local LLM server)
ollama pull qwen2.5:1.5b
ollama pull qwen2.5:7b
ollama serve
# Dry run (no BRAIN credits spent)
uv run python -m alpha_factory.run --dry-run --batch-size 5
# Interactive model selection
uv run python -m alpha_factory.run --interactive --dry-run
# With HuggingFace cloud models
uv run python -m alpha_factory.run --hf-token hf_your_token --batch-size 10
# Run tests
uv run pytest tests/ -v
Architecture
Theme Sampler β Hypothesis Hunter (Microfish) β Expression Compiler (Jinja/Tinyfish)
β Static Lint β Dedup β BRAIN Submit β Crowd Scout (Mediumfish)
β Performance Surgeon (Mediumfish) β Gatekeeper (Bigfish) β Portfolio
6 LLM Personas
| # | Persona | Model Tier | Job |
|---|---|---|---|
| 1 | Hypothesis Hunter | Microfish (1.5B) | Generate novel factor blueprints |
| 2 | Expression Compiler | Tinyfish (3B) / Jinja | Convert blueprint to BRAIN expression |
| 3 | Look-Ahead Sniffer | Deterministic | Static analysis for future leakage |
| 4 | Crowd Scout | Mediumfish (7B) | Novelty + correlation check |
| 5 | Performance Surgeon | Mediumfish (7B) | Diagnose failures, suggest fixes |
| 6 | Production Gatekeeper | Bigfish (14-72B) | Final go/no-go memo |
Model Support
Automatically detects and uses:
- Ollama (local) β auto-detected at localhost:11434
- HuggingFace Inference API (cloud) β set HF_TOKEN env var
- vLLM (local/remote) β any OpenAI-compatible endpoint
Use --interactive flag to manually pick models for each tier from a dropdown.
Key Features
- Zero recurring cost β all LLMs run locally via Ollama
- Schema-constrained generation β no hallucinated operators
- 7-layer acceptance engineering β saves 60%+ BRAIN credits
- Deterministic kill switches β circuit breakers for runaway pipelines
- Factor store β DuckDB persistence for all alpha history
- Dead theme registry β avoids re-exploring failed themes
- Local BRAIN simulator β triage alphas before spending credits
File Structure
alpha_factory/
βββ config.py # All settings (Pydantic)
βββ run.py # Entry point
βββ schemas/ # Typed contracts
βββ deterministic/
β βββ lint.py # Static pre-flight (Layer 2)
β βββ theme_sampler.py # Gap analysis (Layer 1)
β βββ fitness.py # Composite scoring
β βββ regime_tagger.py # Vol/trend/rate/style regimes
β βββ acceptance_checklist.py # 14-point checklist
βββ infra/
β βββ model_manager.py # Ollama + HF auto-detection
β βββ llm_client.py # Unified LLM interface
β βββ factor_store.py # DuckDB persistence
β βββ wq_client.py # BRAIN API wrapper
β βββ rag.py # ChromaDB + arXiv
βββ local/
β βββ brain_sim.py # Local BRAIN simulator (Layer 4)
βββ personas/
β βββ hypothesis_hunter.py # Persona 1
β βββ expression_compiler.py # Persona 2
β βββ crowd_scout.py # Persona 4
β βββ performance_surgeon.py # Persona 5
β βββ gatekeeper.py # Persona 6
βββ orchestration/
βββ pipeline.py # Full DAG
Setup
- Install uv: https://docs.astral.sh/uv/getting-started/installation/
uv sync- Install Ollama: https://ollama.ai
- Pull models:
ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b - Place your
operators.csvandfields_USA_TOP3000_D1.csvindata/ - Run:
uv run python -m alpha_factory.run --dry-run --interactive
Cost
| Item | Cost |
|---|---|
| Local GPU (RTX 3090/4090) | $0 (already owned) |
| BRAIN account | $0 (existing) |
| uv + Ollama + all deps | $0 |
| Monthly running cost | $0 |
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "gaurv007/alpha-factory"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support