reasoning-simulator / README.md
Kaushik Rajan
Phase 1: Initial SPIRAL project setup
e526e6a
|
raw
history blame
2.97 kB

SPIRAL: Interactive Reasoning Game Simulator

A practical, interactive tool based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning") deployed on Hugging Face Spaces.

Overview

This tool demonstrates how self-play training on zero-sum games can improve AI reasoning capabilities. Users can:

  • Play Games: Engage with AI in games like Kuhn Poker and TicTacToe
  • View Reasoning: See step-by-step AI reasoning traces during gameplay
  • Test Transfer: Evaluate AI's reasoning skills on math problems and logic puzzles
  • Learn: Understand AI decision-making through interactive visualizations

Features

For Non-Technical Users

  • Simple web interface for playing games
  • Visual reasoning explanations
  • Educational tutorials about AI thinking
  • No setup required - runs in browser

For Technical Users

  • Access to model weights and training scripts
  • API endpoints for extending the system
  • Custom game integration capabilities
  • Fine-tuning examples and documentation

Project Structure

SPIRAL/
β”œβ”€β”€ src/                    # Core implementation
β”‚   β”œβ”€β”€ games/             # Game environments
β”‚   β”œβ”€β”€ models/            # SPIRAL model implementation
β”‚   β”œβ”€β”€ training/          # Self-play training logic
β”‚   └── reasoning/         # Reasoning trace generation
β”œβ”€β”€ models/                # Trained model weights
β”œβ”€β”€ data/                  # Game datasets and benchmarks
β”œβ”€β”€ app/                   # Gradio web interface
β”œβ”€β”€ tests/                 # Unit and integration tests
└── docs/                  # Documentation and tutorials

Technology Stack

  • Backend: Python 3.8+
  • ML Framework: PyTorch, Transformers
  • RL Library: Gymnasium, Stable Baselines3
  • Web Interface: Gradio
  • Base Model: Qwen-4B from Hugging Face
  • Deployment: Hugging Face Spaces

Development Phases

  1. Research and Planning βœ…
  2. Implementation πŸ”„
  3. Testing and Optimization πŸ“‹
  4. Deployment and Documentation πŸ“‹
  5. Maintenance and Iteration πŸ“‹

Getting Started

Prerequisites

  • Python 3.8+
  • PyTorch
  • Hugging Face account (for model access)

Installation

pip install -r requirements.txt

Quick Start

python app/app.py

Citation

If you use this tool in your research, please cite the original SPIRAL paper:

@article{spiral2024,
  title={Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning},
  author={[Authors]},
  journal={[Journal]},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Support

For issues and questions, please use the GitHub Issues or contact us via Hugging Face Spaces.