Spaces:

kaushikvr06
/

reasoning-simulator

Build error

App Files Files Community

Kaushik Rajan commited on Jul 12

Commit

e526e6a

0 Parent(s):

Phase 1: Initial SPIRAL project setup

Browse files

Complete structure with Gradio interface, config, and documentation

Files changed (13) hide show

.gitignore +227 -0
README.md +101 -0
app/app.py +255 -0
config.yaml +124 -0
data/README.md +45 -0
execution-plan.md +54 -0
requirements.txt +44 -0
src/__init__.py +15 -0
src/games/__init__.py +12 -0
src/models/__init__.py +13 -0
src/reasoning/__init__.py +13 -0
src/training/__init__.py +13 -0
tests/test_basic.py +130 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,227 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be added to the global gitignore or merged into this project gitignore.  For a PyCharm
+#  project, it is recommended to include the template in the project gitignore.
+#  https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this project gitignore.  For a PyCharm
+#  project, it is recommended to include the template in the project gitignore.
+#  https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+.idea/
+# VS Code
+.vscode/
+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+# Windows
+Thumbs.db
+Thumbs.db:encryptable
+ehthumbs.db
+ehthumbs_vista.db
+*.stackdump
+[Dd]esktop.ini
+$RECYCLE.BIN/
+*.cab
+*.msi
+*.msix
+*.msm
+*.msp
+*.lnk
+# Model files and large data
+*.bin
+*.safetensors
+*.pt
+*.pth
+*.ckpt
+*.h5
+*.pkl
+*.pickle
+models/*/
+# Logs and experiments
+logs/
+wandb/
+tensorboard/
+*.log
+# Temporary files
+tmp/
+temp/
+*.tmp
+*.temp
+# Data files
+data/*/
+!data/README.md
+# Hugging Face cache
+.cache/
+transformers_cache/
+# Local environment variables
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+# Gradio temporary files
+flagged/
+gradio_cached_examples/

README.md ADDED Viewed

	@@ -0,0 +1,101 @@

+# SPIRAL: Interactive Reasoning Game Simulator
+A practical, interactive tool based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning") deployed on Hugging Face Spaces.
+## Overview
+This tool demonstrates how self-play training on zero-sum games can improve AI reasoning capabilities. Users can:
+- **Play Games**: Engage with AI in games like Kuhn Poker and TicTacToe
+- **View Reasoning**: See step-by-step AI reasoning traces during gameplay
+- **Test Transfer**: Evaluate AI's reasoning skills on math problems and logic puzzles
+- **Learn**: Understand AI decision-making through interactive visualizations
+## Features
+### For Non-Technical Users
+- Simple web interface for playing games
+- Visual reasoning explanations
+- Educational tutorials about AI thinking
+- No setup required - runs in browser
+### For Technical Users
+- Access to model weights and training scripts
+- API endpoints for extending the system
+- Custom game integration capabilities
+- Fine-tuning examples and documentation
+## Project Structure
+```
+SPIRAL/
+├── src/                    # Core implementation
+│   ├── games/             # Game environments
+│   ├── models/            # SPIRAL model implementation
+│   ├── training/          # Self-play training logic
+│   └── reasoning/         # Reasoning trace generation
+├── models/                # Trained model weights
+├── data/                  # Game datasets and benchmarks
+├── app/                   # Gradio web interface
+├── tests/                 # Unit and integration tests
+└── docs/                  # Documentation and tutorials
+```
+## Technology Stack
+- **Backend**: Python 3.8+
+- **ML Framework**: PyTorch, Transformers
+- **RL Library**: Gymnasium, Stable Baselines3
+- **Web Interface**: Gradio
+- **Base Model**: Qwen-4B from Hugging Face
+- **Deployment**: Hugging Face Spaces
+## Development Phases
+1. **Research and Planning** ✅
+2. **Implementation** 🔄
+3. **Testing and Optimization** 📋
+4. **Deployment and Documentation** 📋
+5. **Maintenance and Iteration** 📋
+## Getting Started
+### Prerequisites
+- Python 3.8+
+- PyTorch
+- Hugging Face account (for model access)
+### Installation
+```bash
+pip install -r requirements.txt
+```
+### Quick Start
+```bash
+python app/app.py
+```
+## Citation
+If you use this tool in your research, please cite the original SPIRAL paper:
+```bibtex
+@article{spiral2024,
+  title={Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning},
+  author={[Authors]},
+  journal={[Journal]},
+  year={2024}
+}
+```
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## Contributing
+We welcome contributions! Please see CONTRIBUTING.md for guidelines.
+## Support
+For issues and questions, please use the GitHub Issues or contact us via Hugging Face Spaces.

app/app.py ADDED Viewed

	@@ -0,0 +1,255 @@

+"""
+SPIRAL Interactive Reasoning Game Simulator - Main Gradio App
+A practical tool demonstrating how self-play training on zero-sum games
+can improve AI reasoning capabilities.
+"""
+import gradio as gr
+import yaml
+import os
+import sys
+# Add the src directory to the path for imports
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'src'))
+from typing import Tuple, Dict, Any, List, Optional
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class SpiralApp:
+    """Main application class for the SPIRAL reasoning simulator."""
+    def __init__(self, config_path: str = "../config.yaml"):
+        """Initialize the SPIRAL app with configuration."""
+        self.config = self._load_config(config_path)
+        self.setup_logging()
+        # Initialize components (will be implemented in Phase 2)
+        self.game_interface = None
+        self.reasoning_interface = None
+        self.transfer_interface = None
+        logger.info("SPIRAL App initialized successfully")
+    def _load_config(self, config_path: str) -> Dict[str, Any]:
+        """Load configuration from YAML file."""
+        try:
+            with open(config_path, 'r') as f:
+                config = yaml.safe_load(f)
+            return config
+        except FileNotFoundError:
+            logger.warning(f"Config file not found: {config_path}. Using defaults.")
+            return self._get_default_config()
+    def _get_default_config(self) -> Dict[str, Any]:
+        """Get default configuration."""
+        return {
+            'interface': {
+                'title': 'SPIRAL: Interactive Reasoning Game Simulator',
+                'description': 'Play games against AI and explore reasoning capabilities',
+                'theme': 'default'
+            },
+            'games': {
+                'kuhn_poker': {'name': 'Kuhn Poker'},
+                'tictactoe': {'name': 'TicTacToe'}
+            }
+        }
+    def setup_logging(self):
+        """Set up logging configuration."""
+        log_config = self.config.get('logging', {})
+        level = getattr(logging, log_config.get('level', 'INFO'))
+        logging.getLogger().setLevel(level)
+    def play_game(self, game_type: str, user_move: str, game_state: str = "") -> Tuple[str, str, str]:
+        """
+        Handle game play interaction.
+        Args:
+            game_type: Type of game (kuhn_poker, tictactoe)
+            user_move: User's move input
+            game_state: Current game state
+        Returns:
+            Tuple of (updated_game_state, ai_response, reasoning_trace)
+        """
+        # Placeholder implementation - will be completed in Phase 2
+        if not user_move:
+            return game_state, "Please enter a move!", ""
+        # Simulate AI response
+        ai_response = f"AI responds to your move: {user_move}"
+        reasoning_trace = f"AI thinking: Analyzing move '{user_move}' in {game_type}..."
+        updated_state = f"{game_state}\nUser: {user_move}\nAI: {ai_response}"
+        return updated_state, ai_response, reasoning_trace
+    def test_reasoning(self, prompt: str, task_type: str = "math") -> Tuple[str, str]:
+        """
+        Test AI reasoning on non-game tasks.
+        Args:
+            prompt: User's reasoning prompt
+            task_type: Type of reasoning task
+        Returns:
+            Tuple of (response, reasoning_trace)
+        """
+        # Placeholder implementation - will be completed in Phase 2
+        if not prompt:
+            return "Please enter a reasoning prompt!", ""
+        response = f"AI response to: {prompt}"
+        reasoning_trace = f"Step-by-step reasoning for '{prompt}'..."
+        return response, reasoning_trace
+    def create_interface(self) -> gr.Blocks:
+        """Create the main Gradio interface."""
+        title = self.config['interface']['title']
+        description = self.config['interface']['description']
+        with gr.Blocks(title=title, theme=self.config['interface']['theme']) as demo:
+            gr.Markdown(f"# {title}")
+            gr.Markdown(description)
+            with gr.Tabs():
+                # Game Play Tab
+                with gr.TabItem("🎮 Game Play"):
+                    gr.Markdown("### Play zero-sum games against AI")
+                    with gr.Row():
+                        with gr.Column():
+                            game_selector = gr.Dropdown(
+                                choices=["kuhn_poker", "tictactoe"],
+                                value="kuhn_poker",
+                                label="Select Game"
+                            )
+                            user_move = gr.Textbox(
+                                label="Your Move",
+                                placeholder="Enter your move..."
+                            )
+                            play_button = gr.Button("Play Move", variant="primary")
+                        with gr.Column():
+                            game_state = gr.Textbox(
+                                label="Game State",
+                                lines=10,
+                                interactive=False
+                            )
+                            ai_response = gr.Textbox(
+                                label="AI Response",
+                                lines=3,
+                                interactive=False
+                            )
+                    reasoning_trace = gr.Textbox(
+                        label="AI Reasoning Trace",
+                        lines=5,
+                        interactive=False
+                    )
+                    play_button.click(
+                        fn=self.play_game,
+                        inputs=[game_selector, user_move, game_state],
+                        outputs=[game_state, ai_response, reasoning_trace]
+                    )
+                # Reasoning Test Tab
+                with gr.TabItem("🧠 Reasoning Test"):
+                    gr.Markdown("### Test AI reasoning on math and logic problems")
+                    with gr.Row():
+                        with gr.Column():
+                            task_type = gr.Dropdown(
+                                choices=["math", "logic", "strategic"],
+                                value="math",
+                                label="Task Type"
+                            )
+                            reasoning_prompt = gr.Textbox(
+                                label="Reasoning Prompt",
+                                placeholder="Enter a math problem or logic puzzle...",
+                                lines=3
+                            )
+                            test_button = gr.Button("Test Reasoning", variant="primary")
+                        with gr.Column():
+                            reasoning_response = gr.Textbox(
+                                label="AI Response",
+                                lines=8,
+                                interactive=False
+                            )
+                            reasoning_steps = gr.Textbox(
+                                label="Step-by-Step Reasoning",
+                                lines=8,
+                                interactive=False
+                            )
+                    test_button.click(
+                        fn=self.test_reasoning,
+                        inputs=[reasoning_prompt, task_type],
+                        outputs=[reasoning_response, reasoning_steps]
+                    )
+                # About Tab
+                with gr.TabItem("ℹ️ About"):
+                    gr.Markdown("""
+                    ### About SPIRAL
+                    This tool demonstrates the SPIRAL methodology: "Self-Play on Zero-Sum Games
+                    Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning."
+                    **Key Features:**
+                    - **Game Play**: Interactive games with AI opponents
+                    - **Reasoning Traces**: Transparent AI decision-making
+                    - **Transfer Learning**: Test reasoning on non-game tasks
+                    - **Educational**: Learn about AI reasoning capabilities
+                    **How it works:**
+                    1. AI agents are trained via self-play on zero-sum games
+                    2. Role-conditioned advantage estimation improves learning
+                    3. Reasoning skills transfer to mathematical and logical tasks
+                    4. Interactive interface shows the AI's thinking process
+                    **Games Available:**
+                    - **Kuhn Poker**: Simple poker variant with betting
+                    - **TicTacToe**: Classic strategy game
+                    **Technical Details:**
+                    - Base Model: Qwen-4B from Hugging Face
+                    - Training: PPO with self-play
+                    - Interface: Gradio web app
+                    """)
+        return demo
+    def launch(self, **kwargs):
+        """Launch the Gradio app."""
+        demo = self.create_interface()
+        # Get launch configuration
+        gradio_config = self.config.get('interface', {}).get('gradio', {})
+        launch_kwargs = {
+            'server_name': gradio_config.get('server_name', '0.0.0.0'),
+            'server_port': gradio_config.get('server_port', 7860),
+            'share': gradio_config.get('share', False),
+            'inbrowser': gradio_config.get('inbrowser', True),
+            'enable_queue': gradio_config.get('enable_queue', True),
+            **kwargs
+        }
+        logger.info(f"Launching SPIRAL app with config: {launch_kwargs}")
+        demo.launch(**launch_kwargs)
+def main():
+    """Main entry point for the application."""
+    app = SpiralApp()
+    app.launch()
+if __name__ == "__main__":
+    main()

config.yaml ADDED Viewed

	@@ -0,0 +1,124 @@

+# SPIRAL Interactive Reasoning Game Simulator Configuration
+# Model Configuration
+model:
+  name: "Qwen/Qwen2.5-4B-Instruct"
+  max_length: 2048
+  temperature: 0.7
+  do_sample: true
+  quantization:
+    load_in_4bit: true
+    bnb_4bit_compute_dtype: "float16"
+    bnb_4bit_use_double_quant: true
+# Games Configuration
+games:
+  kuhn_poker:
+    name: "Kuhn Poker"
+    max_rounds: 50
+    deck_size: 3
+    betting_rounds: 2
+  tictactoe:
+    name: "TicTacToe"
+    board_size: 3
+    max_moves: 9
+    win_condition: 3
+# Training Configuration
+training:
+  algorithm: "PPO"
+  episodes: 1000
+  batch_size: 32
+  learning_rate: 0.0003
+  gamma: 0.99
+  gae_lambda: 0.95
+  clip_range: 0.2
+  entropy_coef: 0.01
+  value_loss_coef: 0.5
+  max_grad_norm: 0.5
+  # Self-play specific
+  self_play:
+    update_opponent_every: 100
+    opponent_pool_size: 5
+  # Role-conditioned advantage estimation
+  rae:
+    enable: true
+    role_embedding_dim: 64
+    advantage_weighting: 0.5
+# Reasoning Configuration
+reasoning:
+  enable_traces: true
+  trace_depth: 3
+  chain_of_thought: true
+  explanation_length: 150
+  # Transfer learning evaluation
+  transfer_tasks:
+    - "GSM8K"
+    - "Logic Puzzles"
+    - "Strategic Reasoning"
+# Web Interface Configuration
+interface:
+  title: "SPIRAL: Interactive Reasoning Game Simulator"
+  description: "Play games against AI and explore reasoning capabilities"
+  theme: "default"
+  # Gradio settings
+  gradio:
+    share: false
+    inbrowser: true
+    server_name: "0.0.0.0"
+    server_port: 7860
+    enable_queue: true
+    max_threads: 4
+# Logging Configuration
+logging:
+  level: "INFO"
+  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+  file: "logs/spiral.log"
+  # Experiment tracking
+  wandb:
+    enable: false
+    project: "spiral-reasoning"
+    entity: "your-username"
+  tensorboard:
+    enable: true
+    log_dir: "logs/tensorboard"
+# Data Configuration
+data:
+  cache_dir: "data/cache"
+  datasets_dir: "data/datasets"
+  models_dir: "models"
+  # Benchmark datasets
+  benchmarks:
+    gsm8k: "data/benchmarks/gsm8k.json"
+    logic_puzzles: "data/benchmarks/logic_puzzles.json"
+# Deployment Configuration
+deployment:
+  huggingface:
+    space_name: "kaushikvr06/reasoning-simulator"
+    private: false
+  # Performance settings
+  performance:
+    max_concurrent_users: 10
+    timeout_seconds: 30
+    memory_limit: "2GB"
+# Debug Configuration
+debug:
+  enable: false
+  verbose_traces: false
+  save_game_logs: true
+  profile_inference: false

data/README.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# SPIRAL Data Directory
+This directory contains datasets, benchmarks, and cached data for the SPIRAL Interactive Reasoning Game Simulator.
+## Structure
+```
+data/
+├── cache/              # Cached model outputs and processed data
+├── datasets/           # Game datasets and training data
+├── benchmarks/         # Evaluation benchmarks for transfer learning
+│   ├── gsm8k.json     # GSM8K math problems
+│   └── logic_puzzles.json  # Logic reasoning puzzles
+└── README.md          # This file
+```
+## Datasets
+### Game Datasets
+- **Kuhn Poker**: Training games and strategies
+- **TicTacToe**: Game states and optimal moves
+### Benchmark Datasets
+- **GSM8K**: Grade School Math 8K dataset for mathematical reasoning
+- **Logic Puzzles**: Custom logic and reasoning problems
+- **Strategic Reasoning**: Game-theory based reasoning tasks
+## Usage
+Datasets are automatically downloaded and cached when first used. To manually download:
+```python
+from src.data_utils import download_datasets
+download_datasets()
+```
+## Data Sources
+- GSM8K: [Cobbe et al. 2021](https://arxiv.org/abs/2110.14168)
+- Logic Puzzles: Curated collection from various sources
+- Game Data: Generated through self-play training
+## License
+Please refer to individual dataset licenses for usage rights.

execution-plan.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# SPIRAL Demo App Execution Plan
+This execution plan outlines the development of a practical, interactive tool on Hugging Face Spaces based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning"). The tool will be an **Interactive Reasoning Game Simulator**: Users can play zero-sum games (e.g., Kuhn Poker, TicTacToe) against a self-play trained AI, view step-by-step reasoning traces, and test the AI's transferred reasoning skills on non-game tasks like math problems or logic puzzles.
+**Utility Focus**:
+- **Non-Technical Users**: Simple web interface to play games, learn about AI reasoning through visualizations, and experiment with prompts for educational fun (e.g., "How does AI think in games?").
+- **Technical Users**: Access to model weights, training scripts, and APIs for extending the self-play system (e.g., custom games or fine-tuning).
+- **Practicality**: Free to use, no setup required; demonstrates real-world AI applications in strategy, education, and decision-making. Aims for broad appeal: 1000+ users via HF community sharing.
+The plan is divided into phases with checkboxes for sub-tasks. Each phase includes detailed "how" steps.
+## Phase 1: Research and Planning
+- [ ] Review SPIRAL Paper and Gather Resources
+  - How: Read the full paper (use attached snips as reference). Identify key components: self-play RL on games like Kuhn Poker, role-conditioned advantage estimation (RAE), multi-agent multi-turn training. Download base models (e.g., Qwen-4B from HF) and RL libs (Gym, Stable Baselines). Collect datasets: Simple game rules/implementations from GitHub; math benchmarks like GSM8K for transfer testing.
+- [ ] Define Tool Features
+  - How: Brainstorm user flows. Core: Game mode (user vs. AI play), Reasoning Viewer (display traces), Transfer Tester (input math/logic queries). Add tutorials for non-tech users, exportable logs for tech users. Ensure accessibility: Mobile-friendly UI, low-latency inference.
+- [ ] Scope Requirements and Tech Stack
+  - How: Choose Python for backend; Gradio for HF Spaces UI (easy interactive elements like buttons for moves). Use Transformers for LLM, Gym for games, PPO from Stable Baselines for RL demo. Estimate: 1-2 weeks dev time, free HF tier for hosting (upgrade to GPU if needed for training demos).
+## Phase 2: Implementation
+- [ ] Set Up Project Structure
+  - How: Create a Git repo. Folders: `src/` for code, `models/` for weights, `data/` for game datasets, `app/` for Gradio script. Initialize with `requirements.txt`: transformers, torch, gymnasium, stable-baselines3, gradio.
+- [ ] Implement Game Environments
+  - How: Code Gym envs for Kuhn Poker/TicTacToe (e.g., class KuhnPokerEnv(gym.Env) with action_space, observation_space, reward for wins). Add multi-turn logic: Track game state, player turns.
+- [ ] Train SPIRAL Model
+  - How: Load base LLM (Qwen-4B). Implement self-play: Clone agent, train via PPO with RAE (custom advantage function: advantage = reward + value - baseline, conditioned on roles like 'player' vs. 'opponent'). Train on 1000+ episodes (simulate self-improvement). Save checkpoints to HF Model Hub.
+- [ ] Build Reasoning and Transfer Components
+  - How: For games, generate traces (e.g., "Opponent bet high → Likely strong hand → Fold"). For transfer, prompt model with math tasks post-training. Use chain-of-thought prompting for visibility.
+- [ ] Develop User Interface
+  - How: Use Gradio Blocks: Tab 1: Game Play (dropdown for game, text input for moves, output panel for AI response/trace). Tab 2: Tester (input prompt, show output). Add buttons for "Explain Reasoning" and "Export Session". Style with CSS for modern UX (e.g., cards, animations).
+## Phase 3: Testing and Optimization
+- [ ] Unit and Integration Testing
+  - How: Test game logic (e.g., assert win conditions). Run self-play simulations to verify improvements (e.g., win rate >50% after training). Use pytest for automation.
+- [ ] User Testing
+  - How: Simulate non-tech users (play games, check intuitiveness). For tech users, test API endpoints. Gather feedback via HF Spaces comments or a built-in form. Measure metrics: Latency <2s per move, accuracy on benchmarks (+8% as per paper).
+- [ ] Optimize for HF Spaces
+  - How: Profile for CPU/GPU usage; use model quantization (e.g., bitsandbytes) for faster inference. Ensure no interactive flags needed (e.g., --yes for installs).
+## Phase 4: Deployment and Documentation
+- [ ] Deploy to Hugging Face Spaces
+  - How: Create Space, upload repo via Git. Set entry point to Gradio app.py. Enable public access, add tags like "AI", "Games", "Reasoning" for discoverability.
+- [ ] Create Documentation and Tutorials
+  - How: Write README.md with paper summary, usage guide (screenshots), and code explanations. Add in-app help: Tooltips for buttons, video demo. For tech users: Include training scripts and extension guides.
+- [ ] Launch and Promote
+  - How: Share on HF forums, Reddit (r/MachineLearning), Twitter. Monitor usage via HF analytics; iterate based on feedback (e.g., add more games).
+## Phase 5: Maintenance and Iteration
+- [ ] Monitor and Update
+  - How: Check for issues (e.g., via GitHub Issues). Update model with new games or better RL algos. Aim for v2: Multimodal (add image-based games).
+- [ ] Measure Impact
+  - How: Track metrics: User sessions, feedback ratings. Goal: 1000+ interactions in first month, positive reviews highlighting educational value.
+This plan ensures a useful tool that's easy to use, educational, and extensible.

requirements.txt ADDED Viewed

	@@ -0,0 +1,44 @@

+# Core ML and Deep Learning
+torch>=2.0.0
+transformers>=4.30.0
+accelerate>=0.20.0
+bitsandbytes>=0.41.0
+# Reinforcement Learning
+gymnasium>=0.28.0
+stable-baselines3>=2.0.0
+sb3-contrib>=2.0.0
+# Web Interface
+gradio>=4.0.0
+# Data Processing and Utilities
+numpy>=1.21.0
+pandas>=1.3.0
+matplotlib>=3.5.0
+seaborn>=0.11.0
+plotly>=5.0.0
+# Game Theory and Math
+scipy>=1.7.0
+networkx>=2.6.0
+# Model Management
+huggingface-hub>=0.16.0
+datasets>=2.10.0
+# Testing and Development
+pytest>=7.0.0
+pytest-cov>=4.0.0
+black>=22.0.0
+flake8>=5.0.0
+# Logging and Monitoring
+wandb>=0.15.0
+tensorboard>=2.10.0
+# Utilities
+tqdm>=4.64.0
+pyyaml>=6.0.0
+python-dotenv>=1.0.0
+requests>=2.28.0

src/__init__.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""
+SPIRAL: Interactive Reasoning Game Simulator
+A practical tool demonstrating how self-play training on zero-sum games
+can improve AI reasoning capabilities.
+"""
+__version__ = "0.1.0"
+__author__ = "SPIRAL Team"
+__email__ = "contact@spiral-reasoning.com"
+from .games import *
+from .models import *
+from .training import *
+from .reasoning import *

src/games/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""
+Game environments for SPIRAL reasoning simulator.
+This module contains implementations of zero-sum games used for self-play training,
+including Kuhn Poker, TicTacToe, and other strategic games.
+"""
+from .kuhn_poker import KuhnPokerEnv
+from .tictactoe import TicTacToeEnv
+from .base_game import BaseGameEnv
+__all__ = ["KuhnPokerEnv", "TicTacToeEnv", "BaseGameEnv"]

src/models/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+"""
+Model implementations for SPIRAL reasoning simulator.
+This module contains the SPIRAL model architecture, role-conditioned advantage
+estimation, and other model components for self-play training.
+"""
+from .spiral_model import SpiralModel
+from .rae import RoleConditionedAdvantageEstimator
+from .policy_network import PolicyNetwork
+from .value_network import ValueNetwork
+__all__ = ["SpiralModel", "RoleConditionedAdvantageEstimator", "PolicyNetwork", "ValueNetwork"]

src/reasoning/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+"""
+Reasoning components for SPIRAL reasoning simulator.
+This module contains reasoning trace generation, chain-of-thought processing,
+and transfer learning evaluation for testing reasoning capabilities.
+"""
+from .trace_generator import TraceGenerator
+from .chain_of_thought import ChainOfThought
+from .transfer_evaluator import TransferEvaluator
+from .reasoning_utils import ReasoningUtils
+__all__ = ["TraceGenerator", "ChainOfThought", "TransferEvaluator", "ReasoningUtils"]

src/training/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+"""
+Training components for SPIRAL reasoning simulator.
+This module contains the self-play training logic, PPO implementation with
+role-conditioned advantage estimation, and training utilities.
+"""
+from .self_play_trainer import SelfPlayTrainer
+from .ppo_trainer import PPOTrainer
+from .opponent_manager import OpponentManager
+from .training_utils import TrainingUtils
+__all__ = ["SelfPlayTrainer", "PPOTrainer", "OpponentManager", "TrainingUtils"]

tests/test_basic.py ADDED Viewed

	@@ -0,0 +1,130 @@

+"""
+Basic tests for SPIRAL Interactive Reasoning Game Simulator.
+This module contains fundamental tests to verify the core functionality
+of the SPIRAL system components.
+"""
+import pytest
+import os
+import sys
+import yaml
+# Add the src directory to the path for imports
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'src'))
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'app'))
+from app import SpiralApp
+class TestSpiralApp:
+    """Test cases for the main SPIRAL application."""
+    def test_app_initialization(self):
+        """Test that the app initializes correctly."""
+        app = SpiralApp()
+        assert app is not None
+        assert hasattr(app, 'config')
+        assert hasattr(app, 'play_game')
+        assert hasattr(app, 'test_reasoning')
+    def test_config_loading(self):
+        """Test configuration loading."""
+        app = SpiralApp()
+        assert 'interface' in app.config
+        assert 'games' in app.config
+        assert app.config['interface']['title'] is not None
+    def test_play_game_basic(self):
+        """Test basic game play functionality."""
+        app = SpiralApp()
+        # Test with valid input
+        state, response, trace = app.play_game("kuhn_poker", "bet", "")
+        assert state is not None
+        assert response is not None
+        assert trace is not None
+        assert "bet" in state
+        # Test with empty input
+        state, response, trace = app.play_game("kuhn_poker", "", "")
+        assert "Please enter a move!" in response
+    def test_reasoning_basic(self):
+        """Test basic reasoning functionality."""
+        app = SpiralApp()
+        # Test with valid input
+        response, trace = app.test_reasoning("What is 2+2?", "math")
+        assert response is not None
+        assert trace is not None
+        assert "2+2" in response
+        # Test with empty input
+        response, trace = app.test_reasoning("", "math")
+        assert "Please enter a reasoning prompt!" in response
+    def test_interface_creation(self):
+        """Test that the Gradio interface can be created."""
+        app = SpiralApp()
+        demo = app.create_interface()
+        assert demo is not None
+class TestConfiguration:
+    """Test cases for configuration management."""
+    def test_config_file_structure(self):
+        """Test that config.yaml has the expected structure."""
+        config_path = os.path.join(os.path.dirname(__file__), '..', 'config.yaml')
+        if os.path.exists(config_path):
+            with open(config_path, 'r') as f:
+                config = yaml.safe_load(f)
+            # Check required sections
+            assert 'model' in config
+            assert 'games' in config
+            assert 'training' in config
+            assert 'reasoning' in config
+            assert 'interface' in config
+            # Check model configuration
+            assert 'name' in config['model']
+            assert 'max_length' in config['model']
+            # Check games configuration
+            assert 'kuhn_poker' in config['games']
+            assert 'tictactoe' in config['games']
+class TestProjectStructure:
+    """Test cases for project structure and imports."""
+    def test_src_directory_structure(self):
+        """Test that the src directory has the expected structure."""
+        src_path = os.path.join(os.path.dirname(__file__), '..', 'src')
+        # Check that required directories exist
+        assert os.path.exists(os.path.join(src_path, 'games'))
+        assert os.path.exists(os.path.join(src_path, 'models'))
+        assert os.path.exists(os.path.join(src_path, 'training'))
+        assert os.path.exists(os.path.join(src_path, 'reasoning'))
+        # Check that __init__.py files exist
+        assert os.path.exists(os.path.join(src_path, '__init__.py'))
+        assert os.path.exists(os.path.join(src_path, 'games', '__init__.py'))
+        assert os.path.exists(os.path.join(src_path, 'models', '__init__.py'))
+        assert os.path.exists(os.path.join(src_path, 'training', '__init__.py'))
+        assert os.path.exists(os.path.join(src_path, 'reasoning', '__init__.py'))
+    def test_required_files_exist(self):
+        """Test that required project files exist."""
+        project_root = os.path.join(os.path.dirname(__file__), '..')
+        # Check essential files
+        assert os.path.exists(os.path.join(project_root, 'requirements.txt'))
+        assert os.path.exists(os.path.join(project_root, 'README.md'))
+        assert os.path.exists(os.path.join(project_root, 'config.yaml'))
+        assert os.path.exists(os.path.join(project_root, '.gitignore'))
+        assert os.path.exists(os.path.join(project_root, 'app', 'app.py'))
+if __name__ == "__main__":
+    pytest.main([__file__])