Spaces:
Build error
Build error
| # SPIRAL: Interactive Reasoning Game Simulator | |
| A practical, interactive tool based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning") deployed on Hugging Face Spaces. | |
| ## Overview | |
| This tool demonstrates how self-play training on zero-sum games can improve AI reasoning capabilities. Users can: | |
| - **Play Games**: Engage with AI in games like Kuhn Poker and TicTacToe | |
| - **View Reasoning**: See step-by-step AI reasoning traces during gameplay | |
| - **Test Transfer**: Evaluate AI's reasoning skills on math problems and logic puzzles | |
| - **Learn**: Understand AI decision-making through interactive visualizations | |
| ## Features | |
| ### For Non-Technical Users | |
| - Simple web interface for playing games | |
| - Visual reasoning explanations | |
| - Educational tutorials about AI thinking | |
| - No setup required - runs in browser | |
| ### For Technical Users | |
| - Access to model weights and training scripts | |
| - API endpoints for extending the system | |
| - Custom game integration capabilities | |
| - Fine-tuning examples and documentation | |
| ## Project Structure | |
| ``` | |
| SPIRAL/ | |
| βββ src/ # Core implementation | |
| β βββ games/ # Game environments | |
| β βββ models/ # SPIRAL model implementation | |
| β βββ training/ # Self-play training logic | |
| β βββ reasoning/ # Reasoning trace generation | |
| βββ models/ # Trained model weights | |
| βββ data/ # Game datasets and benchmarks | |
| βββ app/ # Gradio web interface | |
| βββ tests/ # Unit and integration tests | |
| βββ docs/ # Documentation and tutorials | |
| ``` | |
| ## Technology Stack | |
| - **Backend**: Python 3.8+ | |
| - **ML Framework**: PyTorch, Transformers | |
| - **RL Library**: Gymnasium, Stable Baselines3 | |
| - **Web Interface**: Gradio | |
| - **Base Model**: Qwen-4B from Hugging Face | |
| - **Deployment**: Hugging Face Spaces | |
| ## Development Phases | |
| 1. **Research and Planning** β | |
| 2. **Implementation** π | |
| 3. **Testing and Optimization** π | |
| 4. **Deployment and Documentation** π | |
| 5. **Maintenance and Iteration** π | |
| ## Getting Started | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - PyTorch | |
| - Hugging Face account (for model access) | |
| ### Installation | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Quick Start | |
| ```bash | |
| python app/app.py | |
| ``` | |
| ## Citation | |
| If you use this tool in your research, please cite the original SPIRAL paper: | |
| ```bibtex | |
| @article{spiral2024, | |
| title={Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning}, | |
| author={[Authors]}, | |
| journal={[Journal]}, | |
| year={2024} | |
| } | |
| ``` | |
| ## License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## Contributing | |
| We welcome contributions! Please see CONTRIBUTING.md for guidelines. | |
| ## Support | |
| For issues and questions, please use the GitHub Issues or contact us via Hugging Face Spaces. |