FocusFlow - Technical Documentation
π Project Overview
Problem Statement
Students face significant challenges in managing self-paced learning:
- Information Overload: PDFs, videos, and notes scattered across sources make it difficult to create coherent study plans
- Lack of Personalization: Generic study materials don't adapt to individual learning pace or mastery level
- No Progress Tracking: Students can't easily measure improvement or identify knowledge gaps
- Verification Gap: No way to trace AI-generated answers back to source materials
Solution Description
FocusFlow is an intelligent, local-first study companion that transforms unstructured learning materials into personalized, adaptive study experiences. It combines RAG (Retrieval-Augmented Generation) with synthetic student profiling to create customized learning paths that evolve based on performance.
Key Innovation: Synthetic student profiles enable the app to "remember" progress across sessions and dynamically adjust difficulty, review frequency, and content depth based on demonstrated mastery.
Target Users
- Self-paced learners preparing for exams or certifications
- Students managing multiple subjects with varied materials
- Knowledge workers building expertise in new domains
- Anyone seeking structured, verifiable learning from diverse sources
π― Core Features
1. Multi-Subject Study Roadmap Generation
- Automated topic extraction from uploaded PDFs and documents
- Multi-day planning with topics distributed across subjects
- Subject identification from document content and metadata
- Round-robin scheduling ensures balanced coverage across all subjects
- Progressive unlocking - topics unlock as previous ones are completed
Example: Upload 3 PDFs β Get 5-day plan with 3 topics/day (one from each subject)
2. RAG-Based Q&A System
- Context-aware retrieval using ChromaDB vector search
- Conversational memory with chat history rewriting for follow-up questions
- Multi-source search across all uploaded documents
- Streaming responses with source citation
- Focus Mode for distraction-free studying with side-by-side lesson/chat
3. Adaptive Quiz Generation
- Context-based questions generated from actual course material
- Realistic distractors using common misconceptions
- Guaranteed 3-question format with intelligent fallbacks
- Score-based adaptation:
- Perfect score (3/3) β Future topics marked "Advanced"
- Low score (1-2/3) β Future topics include review materials
- Automatic unlocking of next topic upon quiz completion
4. Knowledge Tracking & Mastery System
- Subject-level mastery tracking (High/Medium/Low)
- Historical quiz performance with timestamps
- Average score calculation across all attempts
- Mastery-based difficulty adjustment for future content
- Analytics dashboard with performance classification
5. Synthetic Student Profiles
- Persistent JSON storage in
~/.focusflow/student_profile.json - Comprehensive tracking:
- Study plan with topic metadata
- Quiz history with scores and timestamps
- Mastery levels per subject
- Time tracking per topic
- Incomplete task queue
- Atomic writes with backup for data integrity
- Thread-safe operations for concurrent access
6. Data Persistence & Session Resumption
- Auto-save on key events:
- Plan generation
- Quiz completion
- Topic transitions
- Auto-load on startup restores:
- Active study plan
- Quiz scores and progress
- Mastery levels
- Current position
- Toast notifications for save/load feedback
7. Citations & Source Verification
- Expandable source references with every AI response
- File + page number for each citation
- Lesson content references section at bottom
- Inline citation prompts to LLM for accurate attribution
- Numbered citation format for easy lookup
ποΈ Technical Architecture
Frontend Components (Streamlit)
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Control Center β Workspace β Calendar β
β (Sidebar) β (Main) β (Sidebar) β
ββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββ€
β - Study Timer β - Chat UI β - Date Picker β
β - Sources List β - Lessons β - Plan View β
β - File Upload β - Analytics β - Today's β
β - Plan Gen β - Quizzes β Topics β
ββββββββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββ
Key UI Panels:
- Control Center: Timer, source management, plan generation
- Intelligent Workspace: RAG chat, lesson viewer, analytics modal
- Calendar: Date-based topic navigation, today's task list
- Focus Mode: Immersive 2-column layout (chat | lesson)
Chat Interface:
- Message history with role differentiation (user/assistant)
- Source citation expandables
- Streaming/static responses
- Contained scrollable area (600px height)
Lesson Viewer:
- Markdown rendering with references section
- Scrollable document container (650px height)
- Inline citations and code examples
Backend Logic (FastAPI)
Planning Engine (generate_study_plan):
- Query vector store for topic-related content
- Group documents by source (each source = subject)
- Extract subject names from content/metadata
- Round-robin topic selection across subjects
- Generate multi-day schedule with metadata
Retrieval System (query_knowledge_base):
- Context rewriting for multi-turn conversations
- Similarity search across vector database
- LLM synthesis with retrieved context
- Source metadata extraction and return
Quiz Generator (generate_quiz_data):
- Retrieve relevant content chunks for topic
- Prompt LLM for context-based questions
- Fallback question generation from raw content
- Ensure exact 3-question output
- Return structured quiz data
Lesson Generator (generate_lesson_content):
- Retrieve 8 context chunks (500 chars each)
- Extract source citations from metadata
- Prompt for structured lesson (600-800 words)
- Append references section with file + page
- Return formatted Markdown
Data Storage
Vector Database (ChromaDB):
- Local storage at
./chroma_db - Nomic-embed-text embeddings
- Metadata: source path, page number
- Persistent across sessions
Student Profiles (JSON):
{
"student_id": "student_20260105_233537",
"study_plan": {
"plan_id": "plan_...",
"topics": [...],
"num_days": 5
},
"quiz_history": [...],
"mastery_tracker": {...},
"time_tracking": {...},
"incomplete_tasks": [...]
}
File Storage:
- Uploaded PDFs in
./data/ - Profile at
~/.focusflow/student_profile.json - Backup at
~/.focusflow/student_profile.backup.json
Agentic Behaviors
Multi-Step Planning:
- Query β Retrieval β Topic Extraction β Subject Grouping β Schedule Generation
- 5+ steps with intermediate reasoning
Tool Use:
- Vector DB search
- LLM generation
- Profile read/write
- PDF ingestion
Memory:
- Chat history (5 last messages)
- Student profile persistence
- Quiz performance tracking
- Mastery levels
Reflection:
- Score-based plan adaptation
- Context quality assessment
- Fallback strategies for generation failures
π» Tech Stack
Languages & Frameworks
- Frontend: Python + Streamlit 1.x
- Backend: FastAPI + Uvicorn
- Vector DB: ChromaDB
- LLM Orchestration: LangChain
Libraries & APIs
Core Dependencies:
streamlit # Frontend UI
fastapi # Backend API
uvicorn # ASGI server
langchain # LLM orchestration
chromadb # Vector database
ollama # Local LLM inference
requests # API communication
pydantic # Data validation
Document Processing:
pypdf # PDF parsing
beautifulsoup4 # Web scraping
youtube-transcript-api # Video transcripts
Data & Visualization:
pandas # Data manipulation
plotly # Analytics charts
Models
- Embedding:
nomic-embed-text(local via Ollama) - Generation:
llama3.2:1b(local via Ollama)
Storage Methods
- Vector Store: ChromaDB (local, persistent)
- Profiles: JSON files (atomic writes)
- PDF Files: Local filesystem (
./data/) - Session State: Streamlit session storage
π Key Workflows
Workflow 1: Study Plan Generation
User uploads PDFs
β
Backend ingests β Chunks β Embeds β Stores in ChromaDB
β
User: "Create 5-day plan"
β
retrieve_topics(k=20)
β
group_by_source() β identify_subjects()
β
round_robin_schedule(num_days=5)
β
save_to_profile() β Return plan
β
Frontend displays Today's Topics (Day 1 unlocked)
Workflow 2: RAG Retrieval
User asks: "What is encapsulation?"
β
history_exists? β rewrite_query() [Context rewriting]
β
similarity_search(question, k=3)
β
build_prompt(context + history + question)
β
llm.invoke() β extract_sources()
β
Return {answer: str, sources: [{file, page}]}
β
Frontend displays answer + expandable citations
Workflow 3: Adaptive Quiz Flow
User unlocks Topic
β
load_lesson() β display_markdown()
β
User clicks "Take Quiz"
β
retrieve_context(topic, k=8)
β
generate_quiz(3_questions)
β
fallback if < 3? β context_based_fallback()
β
User answers β calculate_score()
β
score==3? β mark_advanced()
score<3? β mark_review()
β
update_mastery_tracker() β save_profile()
β
unlock_next_topic() β rerun()
Workflow 4: Mastery Tracking Adaptation
Quiz completed with score X
β
update_subject_mastery({
scores: [..., X],
avg_score: calculate_average(),
mastery_level: determine_level() // High: β₯75%, Medium: β₯50%, Low: <50%
})
β
mastery_level==HIGH?
β Future topics: Faster pace, advanced examples
mastery_level==LOW?
β Future topics: More review, foundational content
β
save_to_profile()
β
Next plan generation uses mastery data for difficulty
π Evaluation Metrics
1. Plan Quality Assessment
Metrics:
- Subject Coverage: % of uploaded subjects represented daily
- Balance Score: Standard deviation of topics per subject
- Unlocking Logic: % of topics that unlock correctly after quiz
Target:
- 100% subject coverage (all PDFs represented)
- StdDev < 0.5 (even distribution)
- 100% unlock success rate
2. Answer Accuracy Measurement
Metrics:
- Source Relevance: Cosine similarity of retrieved chunks
- Citation Accuracy: % of answers with valid file+page citations
- Hallucination Rate: Manual review of 50 Q&A pairs
Target:
- Avg similarity > 0.7
- 95%+ citation accuracy
- <5% hallucination rate
3. Quiz Discrimination
Metrics:
- Question Validity: % of questions answerable from provided context
- Distractor Quality: % of students choosing incorrect options
- Difficulty Spread: Distribution across easy/medium/hard
Target:
- 100% context-answerable
- 25-40% distractor selection rate (not too easy/hard)
- Balanced difficulty distribution
4. User Mastery Gains
Metrics:
- Score Progression: Ξ average score from Day 1 to Day N
- Mastery Level Changes: % of subjects moving from Low β Medium β High
- Retention Rate: Quiz score on repeated topics after 1 week
Target:
- +15% average score improvement over 5 days
- 60%+ mastery level improvement
- 80%+ retention on repeated topics
5. System Performance
Metrics:
- Plan Generation Time: Seconds to generate 5-day plan
- Query Response Time: Seconds from question to answer
- Profile Save Latency: Milliseconds for atomic write
Target:r
- Plan gen: <10 seconds
- Query response: <5 seconds
- Save latency: <100ms
π Future Enhancements
- Spaced Repetition: Intelligent review scheduling using SM-2 algorithm
- Multi-User Support: Authentication + isolated student profiles
- Cloud Deployment: Oracle Cloud + Supabase for persistence
- Advanced Analytics: Learning curve visualization, weak area identification
- Mobile Responsive: Material Design responsive UI for mobile devices
π Project Repository
GitHub: thesivarohith/hack
Status: Production-ready, cleaned codebase (commit: 9a8a489)
Documentation generated: 2026-01-06