docs: add comprehensive V4 API documentation and optimize inference with SDPA 0072188 ming Claude commited on Dec 19, 2025
Add V4 local server setup with MPS optimization for Android testing 45b6536 ming Claude commited on Dec 12, 2025
Migrate logging from stdlib to Loguru for structured logging 7ab470d ming Claude commited on Dec 10, 2025
Migrate to Ruff for linting/formatting and add comprehensive import tests 29ed661 ming commited on Dec 10, 2025
Add Outlines JSON streaming endpoint for V4 structured summarization 441f66b ming commited on Nov 29, 2025
fix: Move inputs to model device in _single_chunk_summarize to fix CPU/GPU device mismatch cfe8d29 ming commited on Nov 28, 2025
Optimize V4 generation speed: greedy decoding + reduced max_tokens fd2a8c1 ming commited on Nov 28, 2025
feat: Switch V4 model to Phi-3-mini for better structured output 7019b66 ming commited on Nov 26, 2025
Revert adaptive token logic, restore client-controlled max_tokens 6a1e8a3 ming Claude commited on Nov 21, 2025
fix: Backend ignores client max_tokens to verify Android app hypothesis 80ea70f ming Claude commited on Nov 21, 2025
fix: CRITICAL - Override model config defaults causing early stopping 6c96c54 ming Claude commited on Nov 21, 2025
fix: Improve V3 summary completeness with enhanced token allocation 6b2de93 ming Claude commited on Nov 21, 2025
fix: V3 API mid-sentence cutoff with adaptive token calculation 5e83010 ming Claude commited on Nov 21, 2025
Improve V2 summarization: adaptive tokens, recursive summarization, better defaults 9884884 ming commited on Oct 28, 2025
Enhance HF V2 summaries: increase token limits, improve length parameters 52f6c42 ming commited on Oct 25, 2025
Enforce singleton batch for HF streaming (fix TextStreamer batch size 1 error) 843f837 ming commited on Oct 25, 2025