Alpha Factory — Open-Source LLM-Driven Pipeline for WorldQuant BRAIN

Autonomous alpha generation system using multi-LLM agents with 7-layer acceptance engineering.

Quick Start

# Install uv (if not already installed)
# Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone
git clone https://huggingface.co/gaurv007/alpha-factory
cd alpha-factory

# Install (uv handles everything — venv, deps, lockfile)
uv sync

# With optional RAG support
uv sync --extra rag

# With all optional deps
uv sync --extra all

# Start Ollama (local LLM server)
ollama pull qwen2.5:1.5b
ollama pull qwen2.5:7b
ollama serve

# Dry run (no BRAIN credits spent)
uv run python -m alpha_factory.run --dry-run --batch-size 5

# Interactive model selection
uv run python -m alpha_factory.run --interactive --dry-run

# With HuggingFace cloud models
uv run python -m alpha_factory.run --hf-token hf_your_token --batch-size 10

# Run tests
uv run pytest tests/ -v

Architecture

Theme Sampler → Hypothesis Hunter (Microfish) → Expression Compiler (Jinja/Tinyfish)
     → Static Lint → Dedup → BRAIN Submit → Crowd Scout (Mediumfish)
     → Performance Surgeon (Mediumfish) → Gatekeeper (Bigfish) → Portfolio

6 LLM Personas

#	Persona	Model Tier	Job
1	Hypothesis Hunter	Microfish (1.5B)	Generate novel factor blueprints
2	Expression Compiler	Tinyfish (3B) / Jinja	Convert blueprint to BRAIN expression
3	Look-Ahead Sniffer	Deterministic	Static analysis for future leakage
4	Crowd Scout	Mediumfish (7B)	Novelty + correlation check
5	Performance Surgeon	Mediumfish (7B)	Diagnose failures, suggest fixes
6	Production Gatekeeper	Bigfish (14-72B)	Final go/no-go memo

Model Support

Automatically detects and uses:

Ollama (local) — auto-detected at localhost:11434
HuggingFace Inference API (cloud) — set HF_TOKEN env var
vLLM (local/remote) — any OpenAI-compatible endpoint

Use --interactive flag to manually pick models for each tier from a dropdown.

Key Features

Zero recurring cost — all LLMs run locally via Ollama
Schema-constrained generation — no hallucinated operators
7-layer acceptance engineering — saves 60%+ BRAIN credits
Deterministic kill switches — circuit breakers for runaway pipelines
Factor store — DuckDB persistence for all alpha history
Dead theme registry — avoids re-exploring failed themes
Local BRAIN simulator — triage alphas before spending credits

File Structure

alpha_factory/
├── config.py                  # All settings (Pydantic)
├── run.py                     # Entry point
├── schemas/                   # Typed contracts
├── deterministic/
│   ├── lint.py                # Static pre-flight (Layer 2)
│   ├── theme_sampler.py       # Gap analysis (Layer 1)
│   ├── fitness.py             # Composite scoring
│   ├── regime_tagger.py       # Vol/trend/rate/style regimes
│   └── acceptance_checklist.py # 14-point checklist
├── infra/
│   ├── model_manager.py       # Ollama + HF auto-detection
│   ├── llm_client.py          # Unified LLM interface
│   ├── factor_store.py        # DuckDB persistence
│   ├── wq_client.py           # BRAIN API wrapper
│   └── rag.py                 # ChromaDB + arXiv
├── local/
│   └── brain_sim.py           # Local BRAIN simulator (Layer 4)
├── personas/
│   ├── hypothesis_hunter.py   # Persona 1
│   ├── expression_compiler.py # Persona 2
│   ├── crowd_scout.py         # Persona 4
│   ├── performance_surgeon.py # Persona 5
│   └── gatekeeper.py          # Persona 6
└── orchestration/
    └── pipeline.py            # Full DAG

Setup

Install uv: https://docs.astral.sh/uv/getting-started/installation/
uv sync
Install Ollama: https://ollama.ai
Pull models: ollama pull qwen2.5:1.5b && ollama pull qwen2.5:7b
Place your operators.csv and fields_USA_TOP3000_D1.csv in data/
Run: uv run python -m alpha_factory.run --dry-run --interactive

Cost

Item	Cost
Local GPU (RTX 3090/4090)	$0 (already owned)
BRAIN account	$0 (existing)
uv + Ollama + all deps	$0
Monthly running cost	$0

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "gaurv007/alpha-factory"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support