Spaces:

Addyk24
/

Project-Polymath

Sleeping

Addyk24 commited on 26 days ago

Commit

15c5cd0

1 Parent(s): 0253268

Made code structure based on prev env style

Files changed (4) hide show

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.13

README.md CHANGED Viewed

@@ -42,5 +42,25 @@ GRPO (Group Relative Policy Optimization) via Unsloth/TRL. Instead of traditiona
 - The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
 ## 👨‍💻 Author
 Aditya Katkar

 - The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
+## WORKFLOW
+```bash
+Project-Polymath/
+├── schema/
+│   ├── Action: {message_target, content} or {propose_draft, content}
+│   ├── Observation: {expert_responses, known_constraints, turn_count}
+│   └── State: {episode_id, discovered_constraints, draft_history}
+├── experts/
+│   ├── SecurityExpert — hidden constraint: must include 2FA, data encryption
+│   └── FinanceExpert — hidden constraint: budget under $50k, no recurring costs
+├── environment.py — reset(), step(), state()
+├── reward.py — dense step rewards + harmonic mean final reward
+└── tasks.py — 3 difficulty tiers
+```
 ## 👨‍💻 Author
 Aditya Katkar

pyproject.toml ADDED Viewed

+[project]
+name = "project-polymath"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = []

requirements.txt ADDED Viewed

+--extra-index-url https://download.pytorch.org/whl/cpu
+fastapi
+uvicorn
+python-dotenv
+groq
+openai
+openenv-core
+sentence-transformers==2.7.0
+torch==2.2.2+cpu
+numpy