Commit History

perf: concurrency improvements for high-volume Excel processing
33af535
Running

ibadrehman-outcome commited on

fix: Excel tables now output HTML matching Gemini PDF format
87afc64

ibadrehman-outcome commited on

feat: add Excel (.xlsx/.xlsm) parsing support via Docling
cf7950b

ibadrehman-outcome commited on

fix: update docling gemini parser
c28aa68

Ibad ur Rehman commited on

feat: update granite parser runtime
b5db7b1

Ibad ur Rehman commited on

fix: refine granite parser flow
718ec13

Ibad ur Rehman commited on

feat: switch parser to granite docling
dde2973

Ibad ur Rehman commited on

feat: deploy docling first parser
74cacc0

Ibad ur Rehman commited on

fix: constrain llama cuda build
47b9ef7

Ibad ur Rehman commited on

fix: reduce llama build memory
8e31979

Ibad ur Rehman commited on

feat: switch to unsloth gguf runtime
dd23733

Ibad ur Rehman commited on

perf: optimize qwen inference path
b586eeb

Ibad ur Rehman commited on

feat: switch parser to qwen vl
51c66dc

Ibad ur Rehman commited on

fix: restore httpx dependency
b633022

Ibad ur Rehman commited on

feat: simplify parser response flow
add910e

Ibad ur Rehman commited on

fix: refine diagnostic routing output
98f3a28

Ibad ur Rehman commited on

feat: expand parser diagnostics
852a43f

Ibad ur Rehman commited on

feat: add parse routing metadata
83ade3b

Ibad ur Rehman commited on

feat: expose page-level parse results
efed02b

Ibad ur Rehman commited on

fix: refine docling page handling
0447006

Ibad ur Rehman commited on

fix: simplify docling routing logic
1b9b603

Ibad ur Rehman commited on

fix: refine docling pipeline behavior
63b2f90

Ibad ur Rehman commited on

feat: configure docling accelerator
5f188d9

Ibad ur Rehman commited on

fix: update docling routing pipeline
08a5278

Ibad ur Rehman commited on

feat: switch to docling first parser
4af0af0

Ibad ur Rehman commited on

fix: support paddle restructure fallback
eb7b6d5

Ibad ur Rehman commited on

feat: deploy paddleocr gemini parser
799f504

Ibad ur Rehman commited on

fix: prevent LaTeX regex from stripping currency dollar signs
e24be4d

sidoutcome commited on

fix: table detection, LaTeX cleanup, HTML tables, image filtering
bb28e9c

sidoutcome commited on

fix: use PaddlePaddle Docker base image (paddle GPU pre-installed)
63d2d35

sidoutcome commited on

fix: add pip timeout 600s + retries for Chinese CDN, use Python 3.12 default
03eea1e

sidoutcome commited on

fix: use Python 3.12 (Ubuntu 24.04 default), simplify Dockerfile, combine install steps
3e4135a

sidoutcome commited on

fix: remove paddle import verification during build (no CUDA available)
0ef1544

sidoutcome commited on

fix: remove GPU-dependent pre-download, use restructure_pages for cross-page tables, robust md extraction
0111393

sidoutcome commited on

fix: use CPU mode for model pre-download during Docker build
e8991b2

sidoutcome commited on

feat: v5.0.0 PaddleOCR-VL-1.5 + Gemini hybrid architecture
16b2195

sidoutcome commited on

feat: v4.0.0 — VLM + Gemini 3 Flash hybrid (table pages use Gemini API)
ba23da1

sidoutcome commited on

feat: v3.3.1 - disable table re-prompting, add page number cleanup
c8c1790

sidoutcome commited on

feat: v3.3.0 - table re-prompting, heading normalization, footer cleanup
a0faf3e

sidoutcome commited on

feat: v3.3.0 - increase max_tokens to 32768 for wide tables
a2561ab

sidoutcome commited on

feat: v3.3.0 - heading normalization, footer cleanup, table fixes
2b053ce

sidoutcome commited on

feat: v3.3.0 - DPI 200, post-processing, cross-page dedup
1cca2ec

sidoutcome commited on

fix: total_mem → total_memory attribute fix for startup
253d98a

sidoutcome commited on

feat: v3.2.1 - remove page markers, fix escaped quotes
e54472d

sidoutcome commited on

feat: v3.2.0 - LaTeX→MD conversion, VLM output cleanup, improved prompt, disable thinking
031c76c

sidoutcome commited on

feat: v3.1.0 - DPI 150, parallel rendering, VLM retry, quality fixes
53b94dc

sidoutcome commited on

feat: v3.0.0 VLM-first hybrid architecture — GPU VLM on all pages, Docling TableFormer only on table pages
c67903b

sidoutcome commited on

fix: reduce concurrent VLM workers to 2 to prevent GPU OOM on 30B model
3f46c5e

sidoutcome commited on

perf: concurrent VLM OCR — process pages in parallel via ThreadPoolExecutor
79cc114

sidoutcome commited on

fix: resolve /parse/url for URLs without file extensions (e.g. arxiv)
8832428

sidoutcome commited on