Addyk24 commited on
Commit
15c5cd0
·
1 Parent(s): 0253268

Made code structure based on prev env style

Browse files
Files changed (4) hide show
  1. .python-version +1 -0
  2. README.md +20 -0
  3. pyproject.toml +7 -0
  4. requirements.txt +11 -0
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.13
README.md CHANGED
@@ -42,5 +42,25 @@ GRPO (Group Relative Policy Optimization) via Unsloth/TRL. Instead of traditiona
42
  - The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
43
 
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## 👨‍💻 Author
46
  Aditya Katkar
 
 
 
42
  - The model self-improves by increasing the probability of the actions taken in the highest-scoring trajectory relative to the group average.
43
 
44
 
45
+ ## WORKFLOW
46
+
47
+ ```bash
48
+ Project-Polymath/
49
+ ├── schema/
50
+ │ ├── Action: {message_target, content} or {propose_draft, content}
51
+ │ ├── Observation: {expert_responses, known_constraints, turn_count}
52
+ │ └── State: {episode_id, discovered_constraints, draft_history}
53
+ ├── experts/
54
+ │ ├── SecurityExpert — hidden constraint: must include 2FA, data encryption
55
+ │ └── FinanceExpert — hidden constraint: budget under $50k, no recurring costs
56
+ ├── environment.py — reset(), step(), state()
57
+ ├── reward.py — dense step rewards + harmonic mean final reward
58
+ └── tasks.py — 3 difficulty tiers
59
+
60
+ ```
61
+
62
+
63
  ## 👨‍💻 Author
64
  Aditya Katkar
65
+
66
+
pyproject.toml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "project-polymath"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.13"
7
+ dependencies = []
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cpu
2
+
3
+ fastapi
4
+ uvicorn
5
+ python-dotenv
6
+ groq
7
+ openai
8
+ openenv-core
9
+ sentence-transformers==2.7.0
10
+ torch==2.2.2+cpu
11
+ numpy