Spaces:
Sleeping
Sleeping
Made code structure based on prev env style
Browse files- .python-version +1 -0
- README.md +20 -0
- pyproject.toml +7 -0
- requirements.txt +11 -0
.python-version
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
3.13
|
README.md
CHANGED
|
@@ -42,5 +42,25 @@ GRPO (Group Relative Policy Optimization) via Unsloth/TRL. Instead of traditiona
|
|
| 42 |
- The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
|
| 43 |
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
## 👨💻 Author
|
| 46 |
Aditya Katkar
|
|
|
|
|
|
|
|
|
| 42 |
- The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
|
| 43 |
|
| 44 |
|
| 45 |
+
## WORKFLOW
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
Project-Polymath/
|
| 49 |
+
├── schema/
|
| 50 |
+
│ ├── Action: {message_target, content} or {propose_draft, content}
|
| 51 |
+
│ ├── Observation: {expert_responses, known_constraints, turn_count}
|
| 52 |
+
│ └── State: {episode_id, discovered_constraints, draft_history}
|
| 53 |
+
├── experts/
|
| 54 |
+
│ ├── SecurityExpert — hidden constraint: must include 2FA, data encryption
|
| 55 |
+
│ └── FinanceExpert — hidden constraint: budget under $50k, no recurring costs
|
| 56 |
+
├── environment.py — reset(), step(), state()
|
| 57 |
+
├── reward.py — dense step rewards + harmonic mean final reward
|
| 58 |
+
└── tasks.py — 3 difficulty tiers
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
|
| 63 |
## 👨💻 Author
|
| 64 |
Aditya Katkar
|
| 65 |
+
|
| 66 |
+
|
pyproject.toml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "project-polymath"
|
| 3 |
+
version = "0.1.0"
|
| 4 |
+
description = "Add your description here"
|
| 5 |
+
readme = "README.md"
|
| 6 |
+
requires-python = ">=3.13"
|
| 7 |
+
dependencies = []
|
requirements.txt
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
--extra-index-url https://download.pytorch.org/whl/cpu
|
| 2 |
+
|
| 3 |
+
fastapi
|
| 4 |
+
uvicorn
|
| 5 |
+
python-dotenv
|
| 6 |
+
groq
|
| 7 |
+
openai
|
| 8 |
+
openenv-core
|
| 9 |
+
sentence-transformers==2.7.0
|
| 10 |
+
torch==2.2.2+cpu
|
| 11 |
+
numpy
|