Spaces:
Sleeping
Sleeping
Mituvinci commited on
Commit Β·
74f0521
1
Parent(s): 189dce9
Two-model setup: GPT-4o-mini examines, Claude answers
Browse files
project_3_adaptive_study_agent_CLAUDE.md
DELETED
|
@@ -1,314 +0,0 @@
|
|
| 1 |
-
# Adaptive Study Agent β CLAUDE.md
|
| 2 |
-
## Project Intelligence File for Claude Code
|
| 3 |
-
|
| 4 |
-
> This file is read by Claude Code at the start of every session.
|
| 5 |
-
> It contains everything Claude needs to work on this project without re-explanation.
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## No emojis. No pushing to GitHub.
|
| 10 |
-
## At the end of every session write a work_summary_DDMMYYYY.md file.
|
| 11 |
-
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
## What This Project Is
|
| 15 |
-
|
| 16 |
-
A single-agent self-directed learning system built with LangGraph. The agent ingests
|
| 17 |
-
documents (research papers, textbook chapters, notes), builds a local vector store,
|
| 18 |
-
then enters a self-testing loop β quizzing itself, evaluating its answers, and deciding
|
| 19 |
-
whether to re-read or move on. The loop continues until a mastery threshold is reached.
|
| 20 |
-
|
| 21 |
-
This is a portfolio project. It is NOT connected to MOSAIC technically.
|
| 22 |
-
The conceptual link is this: MOSAIC asks whether retrieval improves classification
|
| 23 |
-
across specialist agents. This project asks whether retrieval improves self-assessment
|
| 24 |
-
accuracy within a single agent feedback loop. Same question, different scale.
|
| 25 |
-
|
| 26 |
-
**This is intentionally simple. Do not over-engineer it.**
|
| 27 |
-
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
## The Core Loop (LangGraph State Machine)
|
| 31 |
-
|
| 32 |
-
```
|
| 33 |
-
βββββββββββββββββββββββββββββββ
|
| 34 |
-
β START β
|
| 35 |
-
β User provides document β
|
| 36 |
-
ββββββββββββββββ¬βββββββββββββββ
|
| 37 |
-
β
|
| 38 |
-
βΌ
|
| 39 |
-
βββββββββββββββββββββββββββββββ
|
| 40 |
-
β INGEST β
|
| 41 |
-
β Parse document β
|
| 42 |
-
β Chunk into passages β
|
| 43 |
-
β Embed β ChromaDB β
|
| 44 |
-
ββββββββββββββββ¬βββββββββββββββ
|
| 45 |
-
β
|
| 46 |
-
βΌ
|
| 47 |
-
βββββββββββββββββββββββββββββββ
|
| 48 |
-
β GENERATE QUESTION β
|
| 49 |
-
β Query ChromaDB for a chunk β
|
| 50 |
-
β LLM generates question β
|
| 51 |
-
β from retrieved passage β
|
| 52 |
-
ββββββββββββββββ¬βββββββββββββββ
|
| 53 |
-
β
|
| 54 |
-
βΌ
|
| 55 |
-
βββββββββββββββββββββββββββββββ
|
| 56 |
-
β ANSWER β
|
| 57 |
-
β Agent retrieves relevant β
|
| 58 |
-
β chunks from ChromaDB β
|
| 59 |
-
β LLM generates answer β
|
| 60 |
-
ββββββββββββββββ¬βββββββββββββββ
|
| 61 |
-
β
|
| 62 |
-
βΌ
|
| 63 |
-
βββββββββββββββββββββββββββββββ
|
| 64 |
-
β EVALUATE β
|
| 65 |
-
β LLM grades own answer β
|
| 66 |
-
β Score: 0.0 β 1.0 β
|
| 67 |
-
β Updates session state β
|
| 68 |
-
ββββββββββββββββ¬βββββββββββββββ
|
| 69 |
-
β
|
| 70 |
-
βββββββββββ΄βββββββββββ
|
| 71 |
-
β Conditional edge β
|
| 72 |
-
β score < threshold? β
|
| 73 |
-
βββββββββββ¬βββββββββββ
|
| 74 |
-
β β
|
| 75 |
-
YES NO
|
| 76 |
-
β β
|
| 77 |
-
βΌ βΌ
|
| 78 |
-
ββββββββββββββββ ββββββββββββββββββββ
|
| 79 |
-
β RE-READ β β enough questions β
|
| 80 |
-
β Retrieve + β β answered? β
|
| 81 |
-
β re-study β ββββββββββ¬ββββββββββ
|
| 82 |
-
β weak chunk β YES β NO
|
| 83 |
-
ββββββββ¬ββββββββ β β
|
| 84 |
-
β βΌ βΌ
|
| 85 |
-
β ββββββββββββββββββ
|
| 86 |
-
ββββββββββββΊβ NEXT QUESTIONβ
|
| 87 |
-
βββββββββ¬βββββββββ
|
| 88 |
-
β
|
| 89 |
-
(loop back to
|
| 90 |
-
GENERATE QUESTION)
|
| 91 |
-
β
|
| 92 |
-
mastery reached
|
| 93 |
-
β
|
| 94 |
-
βΌ
|
| 95 |
-
βββββββββββββββββ
|
| 96 |
-
β SUMMARIZE β
|
| 97 |
-
β Write sessionβ
|
| 98 |
-
β report .md β
|
| 99 |
-
βββββββββββββββββ
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
---
|
| 103 |
-
|
| 104 |
-
## LangGraph Concepts Used
|
| 105 |
-
|
| 106 |
-
**State:** A TypedDict passed between all nodes. Never use global variables.
|
| 107 |
-
|
| 108 |
-
```python
|
| 109 |
-
class StudyState(TypedDict):
|
| 110 |
-
document_path: str
|
| 111 |
-
chunks: list[str]
|
| 112 |
-
questions_asked: int
|
| 113 |
-
questions_correct: int
|
| 114 |
-
current_question: str
|
| 115 |
-
current_answer: str
|
| 116 |
-
current_score: float
|
| 117 |
-
weak_chunks: list[str] # chunks the agent struggled with
|
| 118 |
-
session_history: list[dict] # full Q&A log
|
| 119 |
-
mastery_reached: bool
|
| 120 |
-
```
|
| 121 |
-
|
| 122 |
-
**Nodes:** Python functions that take state, return updated state.
|
| 123 |
-
- ingest_node
|
| 124 |
-
- generate_question_node
|
| 125 |
-
- answer_node
|
| 126 |
-
- evaluate_node
|
| 127 |
-
- reread_node
|
| 128 |
-
- summarize_node
|
| 129 |
-
|
| 130 |
-
**Edges:** Connections between nodes.
|
| 131 |
-
- Normal edges: always go to next node
|
| 132 |
-
- Conditional edges: route based on state (score < threshold β reread, else β next question)
|
| 133 |
-
|
| 134 |
-
**The conditional edge is the most important LangGraph concept in this project.**
|
| 135 |
-
Everything else is just nodes calling LLMs.
|
| 136 |
-
|
| 137 |
-
---
|
| 138 |
-
|
| 139 |
-
## Project Structure
|
| 140 |
-
|
| 141 |
-
```
|
| 142 |
-
adaptive_study_agent/
|
| 143 |
-
βββ CLAUDE.md β You are here
|
| 144 |
-
βββ src/
|
| 145 |
-
β βββ graph/
|
| 146 |
-
β β βββ state.py β StudyState TypedDict
|
| 147 |
-
β β βββ nodes.py β All node functions
|
| 148 |
-
β β βββ edges.py β Conditional edge logic
|
| 149 |
-
β β βββ build_graph.py β Assembles the StateGraph
|
| 150 |
-
β βββ tools/
|
| 151 |
-
β β βββ ingest.py β PDF/text chunking + ChromaDB insert
|
| 152 |
-
β β βββ retriever.py β ChromaDB query wrapper
|
| 153 |
-
β βββ prompts/
|
| 154 |
-
β β βββ question_prompt.py β Generate question from passage
|
| 155 |
-
β β βββ answer_prompt.py β Answer question using retrieved context
|
| 156 |
-
β β βββ evaluate_prompt.py β Grade answer 0.0-1.0 with reasoning
|
| 157 |
-
β βββ main.py β Entry point
|
| 158 |
-
βββ output/
|
| 159 |
-
β βββ session_reports/ β Markdown report per session
|
| 160 |
-
βββ data/
|
| 161 |
-
β βββ documents/ β Drop PDFs or .txt files here
|
| 162 |
-
βββ pyproject.toml
|
| 163 |
-
βββ .env
|
| 164 |
-
βββ README.md
|
| 165 |
-
```
|
| 166 |
-
|
| 167 |
-
---
|
| 168 |
-
|
| 169 |
-
## Tech Stack
|
| 170 |
-
|
| 171 |
-
| Component | Technology | Why |
|
| 172 |
-
|-----------|-----------|-----|
|
| 173 |
-
| Agent framework | LangGraph | Stateful loops + conditional branching |
|
| 174 |
-
| LLM | claude-sonnet-4-20250514 | Question gen, answering, evaluation |
|
| 175 |
-
| Embeddings | OpenAI text-embedding-3-small | Cheap, good enough for text chunks |
|
| 176 |
-
| Vector store | ChromaDB (local) | No Docker needed, embedded, simple |
|
| 177 |
-
| Document parsing | PyMuPDF (fitz) | PDF support |
|
| 178 |
-
| Package manager | UV | Consistent with other projects |
|
| 179 |
-
|
| 180 |
-
---
|
| 181 |
-
|
| 182 |
-
## Configuration
|
| 183 |
-
|
| 184 |
-
```bash
|
| 185 |
-
# .env
|
| 186 |
-
ANTHROPIC_API_KEY=sk-ant-...
|
| 187 |
-
OPENAI_API_KEY=sk-... # for embeddings only
|
| 188 |
-
|
| 189 |
-
# Tunable constants in src/graph/build_graph.py
|
| 190 |
-
MASTERY_THRESHOLD = 0.75 # score needed to skip re-read
|
| 191 |
-
MIN_QUESTIONS = 10 # minimum questions before mastery check
|
| 192 |
-
MAX_REREAD_CYCLES = 3 # max times agent re-reads same chunk
|
| 193 |
-
CHUNK_SIZE = 500 # tokens per chunk
|
| 194 |
-
CHUNK_OVERLAP = 50
|
| 195 |
-
TOP_K_RETRIEVAL = 3 # chunks retrieved per question
|
| 196 |
-
```
|
| 197 |
-
|
| 198 |
-
---
|
| 199 |
-
|
| 200 |
-
## Prompts β Critical Details
|
| 201 |
-
|
| 202 |
-
### Question generation prompt
|
| 203 |
-
- Input: one retrieved chunk (passage)
|
| 204 |
-
- Output: one specific, answerable question about that chunk
|
| 205 |
-
- Constraint: question must be answerable from the document alone
|
| 206 |
-
- Do NOT ask opinion questions or questions requiring outside knowledge
|
| 207 |
-
|
| 208 |
-
### Answer prompt
|
| 209 |
-
- Input: question + top-k retrieved chunks as context
|
| 210 |
-
- Output: concise answer grounded in retrieved text
|
| 211 |
-
- Constraint: agent must cite which chunk it used
|
| 212 |
-
|
| 213 |
-
### Evaluation prompt
|
| 214 |
-
- Input: question + agent's answer + original source chunk
|
| 215 |
-
- Output: score (0.0β1.0) + one-sentence reasoning
|
| 216 |
-
- This is self-grading β instruct the LLM to be honest, not generous
|
| 217 |
-
- Score 1.0 = complete and accurate
|
| 218 |
-
- Score 0.5 = partially correct
|
| 219 |
-
- Score 0.0 = wrong or hallucinated
|
| 220 |
-
|
| 221 |
-
---
|
| 222 |
-
|
| 223 |
-
## Key Rules
|
| 224 |
-
|
| 225 |
-
1. NEVER hardcode API keys β always read from .env
|
| 226 |
-
2. NEVER skip the evaluate node β self-grading is the whole point
|
| 227 |
-
3. NEVER let the agent loop forever β MAX_REREAD_CYCLES hard limit per chunk
|
| 228 |
-
4. State is the single source of truth β no global variables, no side effects
|
| 229 |
-
5. ChromaDB collection is per-session β clear between runs unless --persist flag set
|
| 230 |
-
6. All session output goes to output/session_reports/ with timestamp
|
| 231 |
-
7. temperature=0.0 on evaluate_node β grading must be deterministic
|
| 232 |
-
8. temperature=0.7 on generate_question_node β variety in questions
|
| 233 |
-
|
| 234 |
-
---
|
| 235 |
-
|
| 236 |
-
## Commands
|
| 237 |
-
|
| 238 |
-
```bash
|
| 239 |
-
# Setup
|
| 240 |
-
uv sync
|
| 241 |
-
|
| 242 |
-
# Run with a document
|
| 243 |
-
uv run python src/main.py --doc data/documents/attention_is_all_you_need.pdf
|
| 244 |
-
|
| 245 |
-
# Run with mastery threshold override
|
| 246 |
-
uv run python src/main.py --doc data/documents/myfile.pdf --threshold 0.8
|
| 247 |
-
|
| 248 |
-
# Run tests
|
| 249 |
-
uv run pytest tests/ -v
|
| 250 |
-
```
|
| 251 |
-
|
| 252 |
-
---
|
| 253 |
-
|
| 254 |
-
## Output Format
|
| 255 |
-
|
| 256 |
-
Each session produces a markdown report in output/session_reports/:
|
| 257 |
-
|
| 258 |
-
```markdown
|
| 259 |
-
# Study Session Report
|
| 260 |
-
Date: 2026-03-12
|
| 261 |
-
Document: attention_is_all_you_need.pdf
|
| 262 |
-
|
| 263 |
-
## Summary
|
| 264 |
-
- Questions asked: 14
|
| 265 |
-
- Questions correct (score >= 0.75): 11
|
| 266 |
-
- Final mastery score: 0.81
|
| 267 |
-
- Re-read cycles triggered: 3
|
| 268 |
-
|
| 269 |
-
## Weak Areas
|
| 270 |
-
- Multi-head attention computation
|
| 271 |
-
- Positional encoding formula
|
| 272 |
-
|
| 273 |
-
## Q&A Log
|
| 274 |
-
### Q1
|
| 275 |
-
Question: What is the purpose of the scaling factor in dot-product attention?
|
| 276 |
-
Answer: ...
|
| 277 |
-
Score: 0.9
|
| 278 |
-
...
|
| 279 |
-
```
|
| 280 |
-
|
| 281 |
-
---
|
| 282 |
-
|
| 283 |
-
## Portfolio Framing (for README.md)
|
| 284 |
-
|
| 285 |
-
The README must make this one point clearly:
|
| 286 |
-
|
| 287 |
-
> MOSAIC (separate research project) tests whether 12 specialist agents sharing a
|
| 288 |
-
> vector database improves rare-condition classification β collective knowledge at scale.
|
| 289 |
-
> This project is the single-agent version of the same question: can one agent use
|
| 290 |
-
> retrieval to improve its own understanding iteratively? The feedback loop here is
|
| 291 |
-
> what Phase 1C of MOSAIC implements collectively across 12 agents.
|
| 292 |
-
|
| 293 |
-
Do not overclaim a technical connection. The connection is conceptual and motivational.
|
| 294 |
-
|
| 295 |
-
---
|
| 296 |
-
|
| 297 |
-
## What This Project Is NOT
|
| 298 |
-
|
| 299 |
-
- Not connected to MOSAIC's Qdrant instance
|
| 300 |
-
- Not a production system
|
| 301 |
-
- Not a replacement for actual studying
|
| 302 |
-
- Not a RAG chatbot (there is no human in the loop during the study session)
|
| 303 |
-
|
| 304 |
-
---
|
| 305 |
-
|
| 306 |
-
## Author
|
| 307 |
-
|
| 308 |
-
Halima Akhter β PhD Candidate, Computer Science
|
| 309 |
-
Specialization: ML, Deep Learning, Bioinformatics
|
| 310 |
-
GitHub: https://github.com/Mituvinci
|
| 311 |
-
|
| 312 |
-
---
|
| 313 |
-
|
| 314 |
-
*Last updated: March 2026 | Adaptive Study Agent v1*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|