Spaces:
Sleeping
Sleeping
app/utils – Utility Modules for Summarization, Export, and Embeddings
This directory includes reusable tools that support the backend application with:
- Chat summarization for display/export
- Document extraction and cleanup
- File export to TXT, DOCX, and PDF formats
- File upload validation
- Persona-specific vector DB with ChromaDB
These modules are loosely coupled and used across core routes, RAG logic, and export endpoints.
chat_summary.py – Conversation Summarization
This module provides summarization of past conversations using the LLM client.
Key Functions
generate_summary_from_messages(messages, llm, max_tokens)– Generates a formatted, bullet-style summaryformat_summary_for_text_export(summary_text)– Cleans summary for export to PDF/DOCX/TXTparse_summary_to_blocks(summary_text)– Converts summary to structured blocks (headings, lists, paragraphs)
Format Guidelines
Summaries follow a markdown-style format with:
**Section Name:**for headings* Bullet Pointsfor insights and recommendations- Auto-trimming and line breaks for export formatting
chroma_client.py – Persona-Specific Knowledge Store
A minimal ChromaDB wrapper used to store and query persona-specific documents or embeddings.
Functions
add_persona_doc(text, persona, doc_id)– Add a new chunk/document for a personaquery_persona_knowledge(query, persona)– Query ChromaDB for a persona-specific response
Notes
- Uses
./chroma_storageas the default persistent path - Uses the local embedding model via
get_embedding()fromembedding_client.py
document_extractor.py – File Text Extraction
Supports extracting raw text from uploaded documents.
Supported Formats
| Format | Content Type |
|---|---|
application/pdf |
|
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| TXT | text/plain |
Key Function
extract_text_from_file(file_bytes: bytes, content_type: str) -> str
Uses:
PyPDF2for PDFsdocx2txtfor Word documents (via temp file)- UTF-8 decoding for plain text
file_export.py – Export Chat & Summaries
Exports content (chat logs or summaries) to the following formats:
.txt.docx(Word).pdf(ReportLab)
Key Functions
export_chat_as_file(content, format)– Unified export method (calls generate_*)prepare_export_response()– Returns aStreamingResponsewith correct content-disposition
Formatting Functions
generate_txt_file()– Simple UTF-8 streamgenerate_docx_file()– Paragraph-based Word file usingpython-docxgenerate_pdf_file()– Uses ReportLab’s Platypus for chat-style layoutgenerate_pdf_file_from_blocks()– Used for structured summaries (heading, lists, etc.)
All formats apply automatic cleanup and styling via:
_clean_text_for_pdf()and_render_rich_text()
file_limits.py – Upload Size Checks
Used to prevent users from uploading excessively large files in a session.
Configurable Limit
MAX_TOTAL_UPLOAD_MB = 10
Function
is_within_upload_limit(session_id, new_file_bytes, session_context)– ReturnsTrueif upload is within session cap
Used by routes handling document uploads.
Dependencies
These modules are used in:
| Module | Depends On |
|---|---|
rag_manager.py |
document_extractor, file_limits |
chat_summary.py |
llm_client |
routes/documents.py |
document_extractor, file_limits |
routes/export.py |
file_export, chat_summary |
Example Workflow
Upload File → document_extractor.py → raw text
↓
file_limits.py → check quota
Chat History → chat_summary.py → formatted summary
↓
file_export.py → TXT, DOCX, PDF
Persona Notes → chroma_client.py → embedded in ChromaDB