Spaces:

outcomelabs
/

docling-parser

Running on T4

App Files Files Community

Commit History

perf: concurrency improvements for high-volume Excel processing

33af535

Running

ibadrehman-outcome commited on 6 days ago

fix: Excel tables now output HTML matching Gemini PDF format

87afc64

ibadrehman-outcome commited on 6 days ago

feat: add Excel (.xlsx/.xlsm) parsing support via Docling

cf7950b

ibadrehman-outcome commited on 6 days ago

fix: update docling gemini parser

c28aa68

Ibad ur Rehman commited on 29 days ago

feat: update granite parser runtime

b5db7b1

Ibad ur Rehman commited on 29 days ago

fix: refine granite parser flow

718ec13

Ibad ur Rehman commited on 29 days ago

feat: switch parser to granite docling

dde2973

Ibad ur Rehman commited on 29 days ago

feat: deploy docling first parser

74cacc0

Ibad ur Rehman commited on 29 days ago

fix: constrain llama cuda build

47b9ef7

Ibad ur Rehman commited on 29 days ago

fix: reduce llama build memory

8e31979

Ibad ur Rehman commited on 29 days ago

feat: switch to unsloth gguf runtime

dd23733

Ibad ur Rehman commited on 30 days ago

perf: optimize qwen inference path

b586eeb

Ibad ur Rehman commited on 30 days ago

feat: switch parser to qwen vl

51c66dc

Ibad ur Rehman commited on 30 days ago

fix: restore httpx dependency

b633022

Ibad ur Rehman commited on 30 days ago

feat: simplify parser response flow

add910e

Ibad ur Rehman commited on 30 days ago

fix: refine diagnostic routing output

98f3a28

Ibad ur Rehman commited on 30 days ago

feat: expand parser diagnostics

852a43f

Ibad ur Rehman commited on 30 days ago

feat: add parse routing metadata

83ade3b

Ibad ur Rehman commited on about 1 month ago

feat: expose page-level parse results

efed02b

Ibad ur Rehman commited on about 1 month ago

fix: refine docling page handling

0447006

Ibad ur Rehman commited on about 1 month ago

fix: simplify docling routing logic

1b9b603

Ibad ur Rehman commited on about 1 month ago

fix: refine docling pipeline behavior

63b2f90

Ibad ur Rehman commited on about 1 month ago

feat: configure docling accelerator

5f188d9

Ibad ur Rehman commited on about 1 month ago

fix: update docling routing pipeline

08a5278

Ibad ur Rehman commited on about 1 month ago

feat: switch to docling first parser

4af0af0

Ibad ur Rehman commited on about 1 month ago

fix: support paddle restructure fallback

eb7b6d5

Ibad ur Rehman commited on about 1 month ago

feat: deploy paddleocr gemini parser

799f504

Ibad ur Rehman commited on about 1 month ago

fix: prevent LaTeX regex from stripping currency dollar signs

e24be4d

sidoutcome commited on Mar 14

fix: table detection, LaTeX cleanup, HTML tables, image filtering

bb28e9c

sidoutcome commited on Mar 14

fix: use PaddlePaddle Docker base image (paddle GPU pre-installed)

63d2d35

sidoutcome commited on Mar 13

fix: add pip timeout 600s + retries for Chinese CDN, use Python 3.12 default

03eea1e

sidoutcome commited on Mar 13

fix: use Python 3.12 (Ubuntu 24.04 default), simplify Dockerfile, combine install steps

3e4135a

sidoutcome commited on Mar 13

fix: remove paddle import verification during build (no CUDA available)

0ef1544

sidoutcome commited on Mar 13

fix: remove GPU-dependent pre-download, use restructure_pages for cross-page tables, robust md extraction

0111393

sidoutcome commited on Mar 13

fix: use CPU mode for model pre-download during Docker build

e8991b2

sidoutcome commited on Mar 13

feat: v5.0.0 PaddleOCR-VL-1.5 + Gemini hybrid architecture

16b2195

sidoutcome commited on Mar 13

feat: v4.0.0 — VLM + Gemini 3 Flash hybrid (table pages use Gemini API)

ba23da1

sidoutcome commited on Mar 13

feat: v3.3.1 - disable table re-prompting, add page number cleanup

c8c1790

sidoutcome commited on Mar 13

feat: v3.3.0 - table re-prompting, heading normalization, footer cleanup

a0faf3e

sidoutcome commited on Mar 13

feat: v3.3.0 - increase max_tokens to 32768 for wide tables

a2561ab

sidoutcome commited on Mar 13

feat: v3.3.0 - heading normalization, footer cleanup, table fixes

2b053ce

sidoutcome commited on Mar 13

feat: v3.3.0 - DPI 200, post-processing, cross-page dedup

1cca2ec

sidoutcome commited on Mar 13

fix: total_mem → total_memory attribute fix for startup

253d98a

sidoutcome commited on Mar 13

feat: v3.2.1 - remove page markers, fix escaped quotes

e54472d

sidoutcome commited on Mar 13

feat: v3.2.0 - LaTeX→MD conversion, VLM output cleanup, improved prompt, disable thinking

031c76c

sidoutcome commited on Mar 13

feat: v3.1.0 - DPI 150, parallel rendering, VLM retry, quality fixes

53b94dc

sidoutcome commited on Mar 13

feat: v3.0.0 VLM-first hybrid architecture — GPU VLM on all pages, Docling TableFormer only on table pages

c67903b

sidoutcome commited on Mar 13

fix: reduce concurrent VLM workers to 2 to prevent GPU OOM on 30B model

3f46c5e

sidoutcome commited on Mar 13

perf: concurrent VLM OCR — process pages in parallel via ThreadPoolExecutor

79cc114

sidoutcome commited on Mar 13

fix: resolve /parse/url for URLs without file extensions (e.g. arxiv)

8832428

sidoutcome commited on Mar 13