Spaces:
Running on Zero
Running on Zero
File size: 9,818 Bytes
3f8b85e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | """
share_trace.py — Run a live PaperProf session and push the agent trace to HF Hub.
Records each LLM step (question generation, answer evaluation, MCQ generation)
as a structured dataset so the community can see how PaperProf works end-to-end.
Usage:
python share_trace.py
Output:
Dataset pushed to build-small-hackathon/PaperProf-traces
"""
import json
import time
import uuid
import os
import sys
from datetime import datetime, timezone
sys.path.insert(0, os.path.dirname(__file__))
TRACE_REPO = "build-small-hackathon/PaperProf-traces"
# Three chunks from different domains — covers the full diversity of PaperProf use cases
DEMO_CHUNKS = [
{
"topic": "Operating Systems — Virtual Memory",
"chunk": (
"Virtual memory is a memory management technique that gives each process the "
"illusion of having access to a large, contiguous block of memory. The OS maps "
"virtual addresses used by programs to physical addresses in RAM using a page table. "
"When a process accesses a page not currently in RAM, a page fault occurs and the OS "
"loads the required page from disk (swap space). This allows systems to run programs "
"larger than physical RAM and provides memory isolation between processes."
),
"student_answers": {
"open": "Virtual memory allows programs to use more memory than physically available by mapping virtual addresses to physical ones using a page table.",
"wrong": "Virtual memory is just another name for RAM, it speeds up the CPU cache.",
},
},
{
"topic": "Machine Learning — Gradient Descent",
"chunk": (
"Gradient descent is an iterative optimization algorithm used to minimize a loss "
"function by updating model parameters in the direction opposite to the gradient. "
"In each iteration, the gradient of the loss with respect to the parameters is "
"computed, and the parameters are updated as θ = θ − α∇L(θ), where α is the "
"learning rate. Too large a learning rate causes divergence; too small slows "
"convergence. Stochastic gradient descent (SGD) approximates the true gradient "
"using a random mini-batch at each step, making it scalable to large datasets."
),
"student_answers": {
"open": "Gradient descent minimizes the loss by repeatedly moving parameters opposite to the gradient, scaled by the learning rate.",
"wrong": "Gradient descent always finds the global minimum of any function.",
},
},
{
"topic": "Networking — TCP Three-Way Handshake",
"chunk": (
"The TCP three-way handshake establishes a reliable connection between a client "
"and server before data transfer begins. The client sends a SYN segment, the server "
"responds with SYN-ACK, and the client completes the handshake with an ACK. Each "
"side advertises its initial sequence number during this exchange, which is used to "
"order and acknowledge packets throughout the connection. This ensures both parties "
"are ready to send and receive before any application data flows."
),
"student_answers": {
"open": "The TCP handshake uses SYN, SYN-ACK, and ACK to synchronize sequence numbers and confirm both sides are ready to communicate.",
"wrong": "TCP uses a two-way handshake: SYN from client and ACK from server.",
},
},
]
def timed(fn, *args, **kwargs):
t0 = time.time()
result = fn(*args, **kwargs)
return result, round(time.time() - t0, 2)
def run_session(chunk_info: dict, session_id: str, model_id: str) -> list[dict]:
from core.questioner import generate_question, generate_mcq
from core.evaluator import evaluate_answer
steps = []
chunk = chunk_info["chunk"]
topic = chunk_info["topic"]
answers = chunk_info["student_answers"]
print(f"\n{'='*60}")
print(f"Topic: {topic}")
print(f"{'='*60}")
# Step 1 — Open question generation
print("[1/4] Generating open question…")
question, dur = timed(generate_question, chunk, language="English", difficulty="Normal")
print(f" Q: {question} ({dur}s)")
steps.append({
"session_id": session_id,
"step": 1,
"type": "question_generation",
"topic": topic,
"input": {"chunk": chunk, "difficulty": "Normal", "language": "English"},
"output": {"question": question},
"duration_s": dur,
"model": model_id,
"timestamp": datetime.now(timezone.utc).isoformat(),
})
# Step 2 — Evaluate a correct answer
print("[2/4] Evaluating correct answer…")
feedback_ok, dur = timed(evaluate_answer, question, chunk, answers["open"], language="English")
print(f" Feedback (correct): {feedback_ok[:80]}… ({dur}s)")
steps.append({
"session_id": session_id,
"step": 2,
"type": "answer_evaluation",
"topic": topic,
"input": {
"chunk": chunk,
"question": question,
"student_answer": answers["open"],
"expected_quality": "correct",
},
"output": {"feedback": feedback_ok},
"duration_s": dur,
"model": model_id,
"timestamp": datetime.now(timezone.utc).isoformat(),
})
# Step 3 — Evaluate a wrong answer
print("[3/4] Evaluating incorrect answer…")
feedback_bad, dur = timed(evaluate_answer, question, chunk, answers["wrong"], language="English")
print(f" Feedback (wrong): {feedback_bad[:80]}… ({dur}s)")
steps.append({
"session_id": session_id,
"step": 3,
"type": "answer_evaluation",
"topic": topic,
"input": {
"chunk": chunk,
"question": question,
"student_answer": answers["wrong"],
"expected_quality": "incorrect",
},
"output": {"feedback": feedback_bad},
"duration_s": dur,
"model": model_id,
"timestamp": datetime.now(timezone.utc).isoformat(),
})
# Step 4 — MCQ generation
print("[4/4] Generating MCQ…")
mcq, dur = timed(generate_mcq, chunk, language="English")
print(f" MCQ question: {str(mcq.get('question',''))[:80]} ({dur}s)")
steps.append({
"session_id": session_id,
"step": 4,
"type": "mcq_generation",
"topic": topic,
"input": {"chunk": chunk, "language": "English"},
"output": {"mcq": mcq},
"duration_s": dur,
"model": model_id,
"timestamp": datetime.now(timezone.utc).isoformat(),
})
return steps
def push_trace(all_steps: list[dict], model_id: str):
from huggingface_hub import HfApi
token = os.environ.get("HF_TOKEN")
api = HfApi(token=token)
api.create_repo(TRACE_REPO, repo_type="dataset", exist_ok=True, private=False)
# JSONL trace file
jsonl = "\n".join(json.dumps(s, ensure_ascii=False) for s in all_steps)
trace_bytes = jsonl.encode()
api.upload_file(
path_or_fileobj=trace_bytes,
path_in_repo="paperprof_trace.jsonl",
repo_id=TRACE_REPO,
repo_type="dataset",
commit_message="chore: upload PaperProf agent trace",
)
readme = f"""---
license: apache-2.0
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- agent-trace
- education
- paperprof
- build-small-hackathon
---
# PaperProf Agent Trace
Step-by-step trace of [PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf),
an AI study buddy that turns course PDFs into interactive quiz sessions.
## What's in this dataset
Each row in `paperprof_trace.jsonl` is one LLM call. Fields:
| Field | Description |
|---|---|
| `session_id` | Groups steps from the same session |
| `step` | Step index within the session (1–4) |
| `type` | `question_generation` / `answer_evaluation` / `mcq_generation` |
| `topic` | Domain of the source chunk |
| `input` | Exact input sent to the model (chunk, question, student answer…) |
| `output` | Raw model output |
| `duration_s` | Wall-clock inference time |
| `model` | Model ID used |
## Session structure
Each session runs 4 steps on one text chunk:
1. **Open question generation** — the model writes a focused exam question
2. **Correct answer evaluation** — structured tutor feedback on a good answer
3. **Wrong answer evaluation** — structured tutor feedback on a bad answer
4. **MCQ generation** — 4-option question with per-option explanations
Three sessions are included, covering: Operating Systems, Machine Learning, and Networking.
## Model
`{model_id}`
Built for the Build Small Hackathon, June 2026, by Team PaperProf (EPITA).
"""
api.upload_file(
path_or_fileobj=readme.encode(),
path_in_repo="README.md",
repo_id=TRACE_REPO,
repo_type="dataset",
commit_message="chore: add dataset card",
)
print(f"\n✅ Trace pushed → https://huggingface.co/datasets/{TRACE_REPO}")
def main():
from model.llm import get_llm, DEFAULT_MODEL_ID
print("Loading model (first call may take 60–90s locally)…")
get_llm() # warm up
model_id = os.environ.get("PAPERPROF_MODEL", DEFAULT_MODEL_ID)
all_steps = []
for chunk_info in DEMO_CHUNKS:
session_id = str(uuid.uuid4())[:8]
steps = run_session(chunk_info, session_id, model_id)
all_steps.extend(steps)
print(f"\n[push] {len(all_steps)} steps captured across {len(DEMO_CHUNKS)} sessions…")
push_trace(all_steps, model_id)
if __name__ == "__main__":
main()
|