Mituvinci commited on
Commit
74f0521
Β·
1 Parent(s): 189dce9

Two-model setup: GPT-4o-mini examines, Claude answers

Browse files
project_3_adaptive_study_agent_CLAUDE.md DELETED
@@ -1,314 +0,0 @@
1
- # Adaptive Study Agent β€” CLAUDE.md
2
- ## Project Intelligence File for Claude Code
3
-
4
- > This file is read by Claude Code at the start of every session.
5
- > It contains everything Claude needs to work on this project without re-explanation.
6
-
7
- ---
8
-
9
- ## No emojis. No pushing to GitHub.
10
- ## At the end of every session write a work_summary_DDMMYYYY.md file.
11
-
12
- ---
13
-
14
- ## What This Project Is
15
-
16
- A single-agent self-directed learning system built with LangGraph. The agent ingests
17
- documents (research papers, textbook chapters, notes), builds a local vector store,
18
- then enters a self-testing loop β€” quizzing itself, evaluating its answers, and deciding
19
- whether to re-read or move on. The loop continues until a mastery threshold is reached.
20
-
21
- This is a portfolio project. It is NOT connected to MOSAIC technically.
22
- The conceptual link is this: MOSAIC asks whether retrieval improves classification
23
- across specialist agents. This project asks whether retrieval improves self-assessment
24
- accuracy within a single agent feedback loop. Same question, different scale.
25
-
26
- **This is intentionally simple. Do not over-engineer it.**
27
-
28
- ---
29
-
30
- ## The Core Loop (LangGraph State Machine)
31
-
32
- ```
33
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
34
- β”‚ START β”‚
35
- β”‚ User provides document β”‚
36
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
37
- β”‚
38
- β–Ό
39
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
40
- β”‚ INGEST β”‚
41
- β”‚ Parse document β”‚
42
- β”‚ Chunk into passages β”‚
43
- β”‚ Embed β†’ ChromaDB β”‚
44
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
45
- β”‚
46
- β–Ό
47
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
48
- β”‚ GENERATE QUESTION β”‚
49
- β”‚ Query ChromaDB for a chunk β”‚
50
- β”‚ LLM generates question β”‚
51
- β”‚ from retrieved passage β”‚
52
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
53
- β”‚
54
- β–Ό
55
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
56
- β”‚ ANSWER β”‚
57
- β”‚ Agent retrieves relevant β”‚
58
- β”‚ chunks from ChromaDB β”‚
59
- β”‚ LLM generates answer β”‚
60
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
61
- β”‚
62
- β–Ό
63
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
64
- β”‚ EVALUATE β”‚
65
- β”‚ LLM grades own answer β”‚
66
- β”‚ Score: 0.0 – 1.0 β”‚
67
- β”‚ Updates session state β”‚
68
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
69
- β”‚
70
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
71
- β”‚ Conditional edge β”‚
72
- β”‚ score < threshold? β”‚
73
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
74
- β”‚ β”‚
75
- YES NO
76
- β”‚ β”‚
77
- β–Ό β–Ό
78
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
79
- β”‚ RE-READ β”‚ β”‚ enough questions β”‚
80
- β”‚ Retrieve + β”‚ β”‚ answered? β”‚
81
- β”‚ re-study β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
82
- β”‚ weak chunk β”‚ YES β”‚ NO
83
- β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
84
- β”‚ β–Ό β–Ό
85
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
86
- └──────────►│ NEXT QUESTIONβ”‚
87
- β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
88
- β”‚
89
- (loop back to
90
- GENERATE QUESTION)
91
- β”‚
92
- mastery reached
93
- β”‚
94
- β–Ό
95
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
96
- β”‚ SUMMARIZE β”‚
97
- β”‚ Write sessionβ”‚
98
- β”‚ report .md β”‚
99
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
100
- ```
101
-
102
- ---
103
-
104
- ## LangGraph Concepts Used
105
-
106
- **State:** A TypedDict passed between all nodes. Never use global variables.
107
-
108
- ```python
109
- class StudyState(TypedDict):
110
- document_path: str
111
- chunks: list[str]
112
- questions_asked: int
113
- questions_correct: int
114
- current_question: str
115
- current_answer: str
116
- current_score: float
117
- weak_chunks: list[str] # chunks the agent struggled with
118
- session_history: list[dict] # full Q&A log
119
- mastery_reached: bool
120
- ```
121
-
122
- **Nodes:** Python functions that take state, return updated state.
123
- - ingest_node
124
- - generate_question_node
125
- - answer_node
126
- - evaluate_node
127
- - reread_node
128
- - summarize_node
129
-
130
- **Edges:** Connections between nodes.
131
- - Normal edges: always go to next node
132
- - Conditional edges: route based on state (score < threshold β†’ reread, else β†’ next question)
133
-
134
- **The conditional edge is the most important LangGraph concept in this project.**
135
- Everything else is just nodes calling LLMs.
136
-
137
- ---
138
-
139
- ## Project Structure
140
-
141
- ```
142
- adaptive_study_agent/
143
- β”œβ”€β”€ CLAUDE.md ← You are here
144
- β”œβ”€β”€ src/
145
- β”‚ β”œβ”€β”€ graph/
146
- β”‚ β”‚ β”œβ”€β”€ state.py ← StudyState TypedDict
147
- β”‚ β”‚ β”œβ”€β”€ nodes.py ← All node functions
148
- β”‚ β”‚ β”œβ”€β”€ edges.py ← Conditional edge logic
149
- β”‚ β”‚ └── build_graph.py ← Assembles the StateGraph
150
- β”‚ β”œβ”€β”€ tools/
151
- β”‚ β”‚ β”œβ”€β”€ ingest.py ← PDF/text chunking + ChromaDB insert
152
- β”‚ β”‚ └── retriever.py ← ChromaDB query wrapper
153
- β”‚ β”œβ”€β”€ prompts/
154
- β”‚ β”‚ β”œβ”€β”€ question_prompt.py ← Generate question from passage
155
- β”‚ β”‚ β”œβ”€β”€ answer_prompt.py ← Answer question using retrieved context
156
- β”‚ β”‚ └── evaluate_prompt.py ← Grade answer 0.0-1.0 with reasoning
157
- β”‚ └── main.py ← Entry point
158
- β”œβ”€β”€ output/
159
- β”‚ └── session_reports/ ← Markdown report per session
160
- β”œβ”€β”€ data/
161
- β”‚ └── documents/ ← Drop PDFs or .txt files here
162
- β”œβ”€β”€ pyproject.toml
163
- β”œβ”€β”€ .env
164
- └── README.md
165
- ```
166
-
167
- ---
168
-
169
- ## Tech Stack
170
-
171
- | Component | Technology | Why |
172
- |-----------|-----------|-----|
173
- | Agent framework | LangGraph | Stateful loops + conditional branching |
174
- | LLM | claude-sonnet-4-20250514 | Question gen, answering, evaluation |
175
- | Embeddings | OpenAI text-embedding-3-small | Cheap, good enough for text chunks |
176
- | Vector store | ChromaDB (local) | No Docker needed, embedded, simple |
177
- | Document parsing | PyMuPDF (fitz) | PDF support |
178
- | Package manager | UV | Consistent with other projects |
179
-
180
- ---
181
-
182
- ## Configuration
183
-
184
- ```bash
185
- # .env
186
- ANTHROPIC_API_KEY=sk-ant-...
187
- OPENAI_API_KEY=sk-... # for embeddings only
188
-
189
- # Tunable constants in src/graph/build_graph.py
190
- MASTERY_THRESHOLD = 0.75 # score needed to skip re-read
191
- MIN_QUESTIONS = 10 # minimum questions before mastery check
192
- MAX_REREAD_CYCLES = 3 # max times agent re-reads same chunk
193
- CHUNK_SIZE = 500 # tokens per chunk
194
- CHUNK_OVERLAP = 50
195
- TOP_K_RETRIEVAL = 3 # chunks retrieved per question
196
- ```
197
-
198
- ---
199
-
200
- ## Prompts β€” Critical Details
201
-
202
- ### Question generation prompt
203
- - Input: one retrieved chunk (passage)
204
- - Output: one specific, answerable question about that chunk
205
- - Constraint: question must be answerable from the document alone
206
- - Do NOT ask opinion questions or questions requiring outside knowledge
207
-
208
- ### Answer prompt
209
- - Input: question + top-k retrieved chunks as context
210
- - Output: concise answer grounded in retrieved text
211
- - Constraint: agent must cite which chunk it used
212
-
213
- ### Evaluation prompt
214
- - Input: question + agent's answer + original source chunk
215
- - Output: score (0.0–1.0) + one-sentence reasoning
216
- - This is self-grading β€” instruct the LLM to be honest, not generous
217
- - Score 1.0 = complete and accurate
218
- - Score 0.5 = partially correct
219
- - Score 0.0 = wrong or hallucinated
220
-
221
- ---
222
-
223
- ## Key Rules
224
-
225
- 1. NEVER hardcode API keys β€” always read from .env
226
- 2. NEVER skip the evaluate node β€” self-grading is the whole point
227
- 3. NEVER let the agent loop forever β€” MAX_REREAD_CYCLES hard limit per chunk
228
- 4. State is the single source of truth β€” no global variables, no side effects
229
- 5. ChromaDB collection is per-session β€” clear between runs unless --persist flag set
230
- 6. All session output goes to output/session_reports/ with timestamp
231
- 7. temperature=0.0 on evaluate_node β€” grading must be deterministic
232
- 8. temperature=0.7 on generate_question_node β€” variety in questions
233
-
234
- ---
235
-
236
- ## Commands
237
-
238
- ```bash
239
- # Setup
240
- uv sync
241
-
242
- # Run with a document
243
- uv run python src/main.py --doc data/documents/attention_is_all_you_need.pdf
244
-
245
- # Run with mastery threshold override
246
- uv run python src/main.py --doc data/documents/myfile.pdf --threshold 0.8
247
-
248
- # Run tests
249
- uv run pytest tests/ -v
250
- ```
251
-
252
- ---
253
-
254
- ## Output Format
255
-
256
- Each session produces a markdown report in output/session_reports/:
257
-
258
- ```markdown
259
- # Study Session Report
260
- Date: 2026-03-12
261
- Document: attention_is_all_you_need.pdf
262
-
263
- ## Summary
264
- - Questions asked: 14
265
- - Questions correct (score >= 0.75): 11
266
- - Final mastery score: 0.81
267
- - Re-read cycles triggered: 3
268
-
269
- ## Weak Areas
270
- - Multi-head attention computation
271
- - Positional encoding formula
272
-
273
- ## Q&A Log
274
- ### Q1
275
- Question: What is the purpose of the scaling factor in dot-product attention?
276
- Answer: ...
277
- Score: 0.9
278
- ...
279
- ```
280
-
281
- ---
282
-
283
- ## Portfolio Framing (for README.md)
284
-
285
- The README must make this one point clearly:
286
-
287
- > MOSAIC (separate research project) tests whether 12 specialist agents sharing a
288
- > vector database improves rare-condition classification β€” collective knowledge at scale.
289
- > This project is the single-agent version of the same question: can one agent use
290
- > retrieval to improve its own understanding iteratively? The feedback loop here is
291
- > what Phase 1C of MOSAIC implements collectively across 12 agents.
292
-
293
- Do not overclaim a technical connection. The connection is conceptual and motivational.
294
-
295
- ---
296
-
297
- ## What This Project Is NOT
298
-
299
- - Not connected to MOSAIC's Qdrant instance
300
- - Not a production system
301
- - Not a replacement for actual studying
302
- - Not a RAG chatbot (there is no human in the loop during the study session)
303
-
304
- ---
305
-
306
- ## Author
307
-
308
- Halima Akhter β€” PhD Candidate, Computer Science
309
- Specialization: ML, Deep Learning, Bioinformatics
310
- GitHub: https://github.com/Mituvinci
311
-
312
- ---
313
-
314
- *Last updated: March 2026 | Adaptive Study Agent v1*