docs: Update README.md with comprehensive V4 API documentation 8ca69c7 Running ming commited on Dec 20, 2025
docs: add comprehensive V4 API documentation and optimize inference with SDPA 0072188 ming Claude commited on Dec 19, 2025
Add V4 local server setup with MPS optimization for Android testing 45b6536 ming Claude commited on Dec 12, 2025
Migrate logging from stdlib to Loguru for structured logging 7ab470d ming Claude commited on Dec 10, 2025
Migrate to Ruff for linting/formatting and add comprehensive import tests 29ed661 ming commited on Dec 10, 2025
Fix Outlines library version pinning for V4 API compatibility d2cdf90 ming Claude commited on Nov 30, 2025
Add comprehensive unit tests for V4 stream-json endpoint fdb8925 ming Claude commited on Nov 30, 2025
Add OMP_NUM_THREADS env var and improve Outlines requirement comment 55d10ea ming commited on Nov 29, 2025
Add Outlines JSON streaming endpoint for V4 structured summarization 441f66b ming commited on Nov 29, 2025
fix: Move inputs to model device in _single_chunk_summarize to fix CPU/GPU device mismatch cfe8d29 ming commited on Nov 28, 2025
Optimize V4 generation speed: greedy decoding + reduced max_tokens fd2a8c1 ming commited on Nov 28, 2025
feat: Switch V4 model to Phi-3-mini for better structured output 7019b66 ming commited on Nov 26, 2025
Revert adaptive token logic, restore client-controlled max_tokens 6a1e8a3 ming Claude commited on Nov 21, 2025
fix: Backend ignores client max_tokens to verify Android app hypothesis 80ea70f ming Claude commited on Nov 21, 2025
fix: CRITICAL - Override model config defaults causing early stopping 6c96c54 ming Claude commited on Nov 21, 2025
fix: Improve V3 summary completeness with enhanced token allocation 6b2de93 ming Claude commited on Nov 21, 2025
fix: V3 API mid-sentence cutoff with adaptive token calculation 5e83010 ming Claude commited on Nov 21, 2025