Commit History

docs: Update README.md with comprehensive V4 API documentation
8ca69c7
Running

ming commited on

style: apply ruff formatting to structured_summarizer
bd7d2c1

ming Claude commited on

docs: add comprehensive V4 API documentation and optimize inference with SDPA
0072188

ming Claude commited on

chore: code formatting improvements and update gitignore
db3b809

ming commited on

Add V4 local server setup with MPS optimization for Android testing
45b6536

ming Claude commited on

Apply ruff formatting to Loguru migration files
75fe59b

ming Claude commited on

Migrate logging from stdlib to Loguru for structured logging
7ab470d

ming Claude commited on

Remove outlines library and all related code
d25a17f

ming commited on

Migrate to Ruff for linting/formatting and add comprehensive import tests
29ed661

ming commited on

Pin Outlines to version 0.0.44 (tested and working)
d5d96b7

ming Claude commited on

Add pyairports dependency for Outlines library
d5e317d

ming Claude commited on

Fix Outlines library version pinning for V4 API compatibility
d2cdf90

ming Claude commited on

Add comprehensive unit tests for V4 stream-json endpoint
fdb8925

ming Claude commited on

Fix Outlines API usage for V4 JSON streaming endpoint
b47201f

ming commited on

Fix Outlines API usage - handle different calling patterns
4c4036e

ming commited on

Fix Outlines import - use generator instead of generate
9452571

ming commited on

Change Outlines debug log to info level for visibility
5fa0ba2

ming commited on

Add detailed Outlines API exploration and logging
f099d0c

ming commited on

Trigger HF Space rebuild
b1a5d03

ming commited on

Fix Outlines import - use correct API for installed version
e8ab865

ming commited on

Improve Outlines import error handling and logging
33cd483

ming commited on

Add Outlines installation verification to Dockerfile
2ff86e9

ming commited on

Add OMP_NUM_THREADS env var and improve Outlines requirement comment
55d10ea

ming commited on

Improve error messaging for Outlines unavailability
734e281

ming commited on

Fix Python 3.10 requirement and torch_dtype deprecation
6b859f2

ming commited on

Add Outlines JSON streaming endpoint for V4 structured summarization
441f66b

ming commited on

Fix JSON parsing errors in V4 NDJSON stream
85dcd04

ming commited on

fix: Move inputs to model device in _single_chunk_summarize to fix CPU/GPU device mismatch
cfe8d29

ming commited on

Fix device placement error in V2 BART model
12a2e7c

ming commited on

Implement Option 3: Use FP16 for 2-3x faster inference
7fff563

ming commited on

Optimize V4 generation speed: greedy decoding + reduced max_tokens
fd2a8c1

ming commited on

Fix buffer parsing and strengthen brevity constraints
d112a13

ming commited on

Optimize V4 output verbosity and generation speed
17499f7

ming commited on

Fix bitsandbytes UID error with getpass patch
dd29a6d

ming commited on

Switch V4 to GPU INT4 quantization with Qwen-1.5B
a36f560

ming commited on

debug: Add comprehensive logging to diagnose 4-token issue
df75294

ming commited on

feat: Guarantee complete V4 NDJSON summaries with fallback
b321440

ming commited on

fix: Use Qwen chat template and harden NDJSON parsing
bf21a65

ming commited on

feat: Switch V4 to Qwen2.5-1.5B for HF memory compatibility
fe47248

ming commited on

perf: Disable V2 warmup to save memory for V4
1b76b21

ming commited on

feat: Change device_map to auto for V4 model
d0701b0

ming commited on

chore: Add test scripts and update local configuration
01d5d83

ming commited on

feat: Switch V4 model to Phi-3-mini for better structured output
7019b66

ming commited on

feat: Add V4 NDJSON patch-based structured summarization
93c9664

ming commited on

Revert adaptive token logic, restore client-controlled max_tokens
6a1e8a3

ming Claude commited on

fix: Backend ignores client max_tokens to verify Android app hypothesis
80ea70f

ming Claude commited on

fix: CRITICAL - Override model config defaults causing early stopping
6c96c54

ming Claude commited on

fix: Improve V3 summary completeness with enhanced token allocation
6b2de93

ming Claude commited on

fix: V3 API mid-sentence cutoff with adaptive token calculation
5e83010

ming Claude commited on

Fix V3 API to support both URL and text input
f724bab

ming Claude commited on