Spaces:
Build error
Build error
Kaushik Rajan
commited on
Commit
·
ae2f228
1
Parent(s):
e526e6a
Add execution-plan.md to .gitignore
Browse filesRemove internal development files from tracking
- .gitignore +4 -1
- execution-plan.md +0 -54
.gitignore
CHANGED
|
@@ -224,4 +224,7 @@ transformers_cache/
|
|
| 224 |
|
| 225 |
# Gradio temporary files
|
| 226 |
flagged/
|
| 227 |
-
gradio_cached_examples/
|
|
|
|
|
|
|
|
|
|
|
|
| 224 |
|
| 225 |
# Gradio temporary files
|
| 226 |
flagged/
|
| 227 |
+
gradio_cached_examples/
|
| 228 |
+
|
| 229 |
+
# Internal development files
|
| 230 |
+
execution-plan.md
|
execution-plan.md
DELETED
|
@@ -1,54 +0,0 @@
|
|
| 1 |
-
# SPIRAL Demo App Execution Plan
|
| 2 |
-
|
| 3 |
-
This execution plan outlines the development of a practical, interactive tool on Hugging Face Spaces based on the SPIRAL paper ("Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning"). The tool will be an **Interactive Reasoning Game Simulator**: Users can play zero-sum games (e.g., Kuhn Poker, TicTacToe) against a self-play trained AI, view step-by-step reasoning traces, and test the AI's transferred reasoning skills on non-game tasks like math problems or logic puzzles.
|
| 4 |
-
|
| 5 |
-
**Utility Focus**:
|
| 6 |
-
- **Non-Technical Users**: Simple web interface to play games, learn about AI reasoning through visualizations, and experiment with prompts for educational fun (e.g., "How does AI think in games?").
|
| 7 |
-
- **Technical Users**: Access to model weights, training scripts, and APIs for extending the self-play system (e.g., custom games or fine-tuning).
|
| 8 |
-
- **Practicality**: Free to use, no setup required; demonstrates real-world AI applications in strategy, education, and decision-making. Aims for broad appeal: 1000+ users via HF community sharing.
|
| 9 |
-
|
| 10 |
-
The plan is divided into phases with checkboxes for sub-tasks. Each phase includes detailed "how" steps.
|
| 11 |
-
|
| 12 |
-
## Phase 1: Research and Planning
|
| 13 |
-
- [ ] Review SPIRAL Paper and Gather Resources
|
| 14 |
-
- How: Read the full paper (use attached snips as reference). Identify key components: self-play RL on games like Kuhn Poker, role-conditioned advantage estimation (RAE), multi-agent multi-turn training. Download base models (e.g., Qwen-4B from HF) and RL libs (Gym, Stable Baselines). Collect datasets: Simple game rules/implementations from GitHub; math benchmarks like GSM8K for transfer testing.
|
| 15 |
-
- [ ] Define Tool Features
|
| 16 |
-
- How: Brainstorm user flows. Core: Game mode (user vs. AI play), Reasoning Viewer (display traces), Transfer Tester (input math/logic queries). Add tutorials for non-tech users, exportable logs for tech users. Ensure accessibility: Mobile-friendly UI, low-latency inference.
|
| 17 |
-
- [ ] Scope Requirements and Tech Stack
|
| 18 |
-
- How: Choose Python for backend; Gradio for HF Spaces UI (easy interactive elements like buttons for moves). Use Transformers for LLM, Gym for games, PPO from Stable Baselines for RL demo. Estimate: 1-2 weeks dev time, free HF tier for hosting (upgrade to GPU if needed for training demos).
|
| 19 |
-
|
| 20 |
-
## Phase 2: Implementation
|
| 21 |
-
- [ ] Set Up Project Structure
|
| 22 |
-
- How: Create a Git repo. Folders: `src/` for code, `models/` for weights, `data/` for game datasets, `app/` for Gradio script. Initialize with `requirements.txt`: transformers, torch, gymnasium, stable-baselines3, gradio.
|
| 23 |
-
- [ ] Implement Game Environments
|
| 24 |
-
- How: Code Gym envs for Kuhn Poker/TicTacToe (e.g., class KuhnPokerEnv(gym.Env) with action_space, observation_space, reward for wins). Add multi-turn logic: Track game state, player turns.
|
| 25 |
-
- [ ] Train SPIRAL Model
|
| 26 |
-
- How: Load base LLM (Qwen-4B). Implement self-play: Clone agent, train via PPO with RAE (custom advantage function: advantage = reward + value - baseline, conditioned on roles like 'player' vs. 'opponent'). Train on 1000+ episodes (simulate self-improvement). Save checkpoints to HF Model Hub.
|
| 27 |
-
- [ ] Build Reasoning and Transfer Components
|
| 28 |
-
- How: For games, generate traces (e.g., "Opponent bet high → Likely strong hand → Fold"). For transfer, prompt model with math tasks post-training. Use chain-of-thought prompting for visibility.
|
| 29 |
-
- [ ] Develop User Interface
|
| 30 |
-
- How: Use Gradio Blocks: Tab 1: Game Play (dropdown for game, text input for moves, output panel for AI response/trace). Tab 2: Tester (input prompt, show output). Add buttons for "Explain Reasoning" and "Export Session". Style with CSS for modern UX (e.g., cards, animations).
|
| 31 |
-
|
| 32 |
-
## Phase 3: Testing and Optimization
|
| 33 |
-
- [ ] Unit and Integration Testing
|
| 34 |
-
- How: Test game logic (e.g., assert win conditions). Run self-play simulations to verify improvements (e.g., win rate >50% after training). Use pytest for automation.
|
| 35 |
-
- [ ] User Testing
|
| 36 |
-
- How: Simulate non-tech users (play games, check intuitiveness). For tech users, test API endpoints. Gather feedback via HF Spaces comments or a built-in form. Measure metrics: Latency <2s per move, accuracy on benchmarks (+8% as per paper).
|
| 37 |
-
- [ ] Optimize for HF Spaces
|
| 38 |
-
- How: Profile for CPU/GPU usage; use model quantization (e.g., bitsandbytes) for faster inference. Ensure no interactive flags needed (e.g., --yes for installs).
|
| 39 |
-
|
| 40 |
-
## Phase 4: Deployment and Documentation
|
| 41 |
-
- [ ] Deploy to Hugging Face Spaces
|
| 42 |
-
- How: Create Space, upload repo via Git. Set entry point to Gradio app.py. Enable public access, add tags like "AI", "Games", "Reasoning" for discoverability.
|
| 43 |
-
- [ ] Create Documentation and Tutorials
|
| 44 |
-
- How: Write README.md with paper summary, usage guide (screenshots), and code explanations. Add in-app help: Tooltips for buttons, video demo. For tech users: Include training scripts and extension guides.
|
| 45 |
-
- [ ] Launch and Promote
|
| 46 |
-
- How: Share on HF forums, Reddit (r/MachineLearning), Twitter. Monitor usage via HF analytics; iterate based on feedback (e.g., add more games).
|
| 47 |
-
|
| 48 |
-
## Phase 5: Maintenance and Iteration
|
| 49 |
-
- [ ] Monitor and Update
|
| 50 |
-
- How: Check for issues (e.g., via GitHub Issues). Update model with new games or better RL algos. Aim for v2: Multimodal (add image-based games).
|
| 51 |
-
- [ ] Measure Impact
|
| 52 |
-
- How: Track metrics: User sessions, feedback ratings. Goal: 1000+ interactions in first month, positive reviews highlighting educational value.
|
| 53 |
-
|
| 54 |
-
This plan ensures a useful tool that's easy to use, educational, and extensible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|