File size: 2,967 Bytes
e526e6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# SPIRAL: Interactive Reasoning Game Simulator

A practical, interactive tool based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning") deployed on Hugging Face Spaces.

## Overview

This tool demonstrates how self-play training on zero-sum games can improve AI reasoning capabilities. Users can:

- **Play Games**: Engage with AI in games like Kuhn Poker and TicTacToe
- **View Reasoning**: See step-by-step AI reasoning traces during gameplay
- **Test Transfer**: Evaluate AI's reasoning skills on math problems and logic puzzles
- **Learn**: Understand AI decision-making through interactive visualizations

## Features

### For Non-Technical Users
- Simple web interface for playing games
- Visual reasoning explanations
- Educational tutorials about AI thinking
- No setup required - runs in browser

### For Technical Users
- Access to model weights and training scripts
- API endpoints for extending the system
- Custom game integration capabilities
- Fine-tuning examples and documentation

## Project Structure

```
SPIRAL/
β”œβ”€β”€ src/                    # Core implementation
β”‚   β”œβ”€β”€ games/             # Game environments
β”‚   β”œβ”€β”€ models/            # SPIRAL model implementation
β”‚   β”œβ”€β”€ training/          # Self-play training logic
β”‚   └── reasoning/         # Reasoning trace generation
β”œβ”€β”€ models/                # Trained model weights
β”œβ”€β”€ data/                  # Game datasets and benchmarks
β”œβ”€β”€ app/                   # Gradio web interface
β”œβ”€β”€ tests/                 # Unit and integration tests
└── docs/                  # Documentation and tutorials
```

## Technology Stack

- **Backend**: Python 3.8+
- **ML Framework**: PyTorch, Transformers
- **RL Library**: Gymnasium, Stable Baselines3
- **Web Interface**: Gradio
- **Base Model**: Qwen-4B from Hugging Face
- **Deployment**: Hugging Face Spaces

## Development Phases

1. **Research and Planning** βœ…
2. **Implementation** πŸ”„
3. **Testing and Optimization** πŸ“‹
4. **Deployment and Documentation** πŸ“‹
5. **Maintenance and Iteration** πŸ“‹

## Getting Started

### Prerequisites
- Python 3.8+
- PyTorch
- Hugging Face account (for model access)

### Installation
```bash
pip install -r requirements.txt
```

### Quick Start
```bash
python app/app.py
```

## Citation

If you use this tool in your research, please cite the original SPIRAL paper:

```bibtex
@article{spiral2024,
  title={Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning},
  author={[Authors]},
  journal={[Journal]},
  year={2024}
}
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

## Support

For issues and questions, please use the GitHub Issues or contact us via Hugging Face Spaces.