Spaces:

outcomelabs
/

docling-parser

Running on T4

App Files Files Community

docling-parser / app.py

Commit History

perf: concurrency improvements for high-volume Excel processing

33af535

Running

ibadrehman-outcome commited on 6 days ago

fix: Excel tables now output HTML matching Gemini PDF format

87afc64

ibadrehman-outcome commited on 6 days ago

feat: add Excel (.xlsx/.xlsm) parsing support via Docling

cf7950b

ibadrehman-outcome commited on 6 days ago

fix: update docling gemini parser

c28aa68

Ibad ur Rehman commited on 29 days ago

feat: update granite parser runtime

b5db7b1

Ibad ur Rehman commited on 29 days ago

feat: switch parser to granite docling

dde2973

Ibad ur Rehman commited on 29 days ago

feat: deploy docling first parser

74cacc0

Ibad ur Rehman commited on 30 days ago

feat: switch to unsloth gguf runtime

dd23733

Ibad ur Rehman commited on 30 days ago

perf: optimize qwen inference path

b586eeb

Ibad ur Rehman commited on 30 days ago

feat: switch parser to qwen vl

51c66dc

Ibad ur Rehman commited on 30 days ago

feat: simplify parser response flow

add910e

Ibad ur Rehman commited on 30 days ago

feat: expand parser diagnostics

852a43f

Ibad ur Rehman commited on 30 days ago

feat: expose page-level parse results

efed02b

Ibad ur Rehman commited on about 1 month ago

feat: configure docling accelerator

5f188d9

Ibad ur Rehman commited on about 1 month ago

feat: switch to docling first parser

4af0af0

Ibad ur Rehman commited on about 1 month ago

feat: deploy paddleocr gemini parser

799f504

Ibad ur Rehman commited on about 1 month ago

feat: v5.0.0 PaddleOCR-VL-1.5 + Gemini hybrid architecture

16b2195

sidoutcome commited on Mar 13

feat: v4.0.0 — VLM + Gemini 3 Flash hybrid (table pages use Gemini API)

ba23da1

sidoutcome commited on Mar 13

feat: v3.3.1 - disable table re-prompting, add page number cleanup

c8c1790

sidoutcome commited on Mar 13

feat: v3.3.0 - table re-prompting, heading normalization, footer cleanup

a0faf3e

sidoutcome commited on Mar 13

feat: v3.3.0 - increase max_tokens to 32768 for wide tables

a2561ab

sidoutcome commited on Mar 13

feat: v3.3.0 - heading normalization, footer cleanup, table fixes

2b053ce

sidoutcome commited on Mar 13

feat: v3.3.0 - DPI 200, post-processing, cross-page dedup

1cca2ec

sidoutcome commited on Mar 13

fix: total_mem → total_memory attribute fix for startup

253d98a

sidoutcome commited on Mar 13

feat: v3.2.1 - remove page markers, fix escaped quotes

e54472d

sidoutcome commited on Mar 13

feat: v3.2.0 - LaTeX→MD conversion, VLM output cleanup, improved prompt, disable thinking

031c76c

sidoutcome commited on Mar 13

feat: v3.1.0 - DPI 150, parallel rendering, VLM retry, quality fixes

53b94dc

sidoutcome commited on Mar 13

feat: v3.0.0 VLM-first hybrid architecture — GPU VLM on all pages, Docling TableFormer only on table pages

c67903b

sidoutcome commited on Mar 13

fix: reduce concurrent VLM workers to 2 to prevent GPU OOM on 30B model

3f46c5e

sidoutcome commited on Mar 13

perf: concurrent VLM OCR — process pages in parallel via ThreadPoolExecutor

79cc114

sidoutcome commited on Mar 13

fix: resolve /parse/url for URLs without file extensions (e.g. arxiv)

8832428

sidoutcome commited on Mar 13

feat: increase VLM max_tokens to 16384

b25fd10

sidoutcome commited on Mar 13

fix: increase max-model-len to 65536 for VLM image tokens, improve error logging

9385fa0

sidoutcome commited on Mar 13

fix: reduce max_tokens to 4096, remove invalid skip_special_tokens, add error body logging

7f8ad4a

sidoutcome commited on Mar 13

fix: total_mem -> total_memory, clean up debug CMD and start.sh

dead0a0

sidoutcome commited on Mar 12

feat: upgrade to Qwen3-VL-30B-A3B, simplify auth, fix redirects

922ba62

sidoutcome commited on Mar 12

feat: hybrid VLM parser with Qwen3-VL-8B via vLLM (v2.0.0)

8c4351b

sidoutcome commited on Mar 12

feat: support both API_TOKEN and API_DEV_TOKEN

4848ba0

sidoutcome commited on Feb 5

Initial commit: Docling Parser API

5052def

sidoutcome commited on Feb 5

Commit History

perf: concurrency improvements for high-volume Excel processing 33af535 Running

fix: Excel tables now output HTML matching Gemini PDF format 87afc64

feat: add Excel (.xlsx/.xlsm) parsing support via Docling cf7950b

fix: update docling gemini parser c28aa68

feat: update granite parser runtime b5db7b1

feat: switch parser to granite docling dde2973

feat: deploy docling first parser 74cacc0

feat: switch to unsloth gguf runtime dd23733

perf: optimize qwen inference path b586eeb

feat: switch parser to qwen vl 51c66dc

feat: simplify parser response flow add910e

feat: expand parser diagnostics 852a43f

feat: expose page-level parse results efed02b

feat: configure docling accelerator 5f188d9

feat: switch to docling first parser 4af0af0

feat: deploy paddleocr gemini parser 799f504

feat: v5.0.0 PaddleOCR-VL-1.5 + Gemini hybrid architecture 16b2195

feat: v4.0.0 — VLM + Gemini 3 Flash hybrid (table pages use Gemini API) ba23da1

feat: v3.3.1 - disable table re-prompting, add page number cleanup c8c1790

feat: v3.3.0 - table re-prompting, heading normalization, footer cleanup a0faf3e

feat: v3.3.0 - increase max_tokens to 32768 for wide tables a2561ab

feat: v3.3.0 - heading normalization, footer cleanup, table fixes 2b053ce

feat: v3.3.0 - DPI 200, post-processing, cross-page dedup 1cca2ec

fix: total_mem → total_memory attribute fix for startup 253d98a

feat: v3.2.1 - remove page markers, fix escaped quotes e54472d

feat: v3.2.0 - LaTeX→MD conversion, VLM output cleanup, improved prompt, disable thinking 031c76c

feat: v3.1.0 - DPI 150, parallel rendering, VLM retry, quality fixes 53b94dc

feat: v3.0.0 VLM-first hybrid architecture — GPU VLM on all pages, Docling TableFormer only on table pages c67903b

fix: reduce concurrent VLM workers to 2 to prevent GPU OOM on 30B model 3f46c5e

perf: concurrent VLM OCR — process pages in parallel via ThreadPoolExecutor 79cc114

fix: resolve /parse/url for URLs without file extensions (e.g. arxiv) 8832428

feat: increase VLM max_tokens to 16384 b25fd10

fix: increase max-model-len to 65536 for VLM image tokens, improve error logging 9385fa0

fix: reduce max_tokens to 4096, remove invalid skip_special_tokens, add error body logging 7f8ad4a

fix: total_mem -> total_memory, clean up debug CMD and start.sh dead0a0

feat: upgrade to Qwen3-VL-30B-A3B, simplify auth, fix redirects 922ba62

feat: hybrid VLM parser with Qwen3-VL-8B via vLLM (v2.0.0) 8c4351b

feat: support both API_TOKEN and API_DEV_TOKEN 4848ba0

Initial commit: Docling Parser API 5052def

perf: concurrency improvements for high-volume Excel processing

33af535

Running

fix: Excel tables now output HTML matching Gemini PDF format

87afc64

feat: add Excel (.xlsx/.xlsm) parsing support via Docling

cf7950b

fix: update docling gemini parser

c28aa68

feat: update granite parser runtime

b5db7b1

feat: switch parser to granite docling

dde2973

feat: deploy docling first parser

74cacc0

feat: switch to unsloth gguf runtime

dd23733

perf: optimize qwen inference path

b586eeb

feat: switch parser to qwen vl

51c66dc

feat: simplify parser response flow

add910e

feat: expand parser diagnostics

852a43f

feat: expose page-level parse results

efed02b

feat: configure docling accelerator

5f188d9

feat: switch to docling first parser

4af0af0

feat: deploy paddleocr gemini parser

799f504

feat: v5.0.0 PaddleOCR-VL-1.5 + Gemini hybrid architecture

16b2195

feat: v4.0.0 — VLM + Gemini 3 Flash hybrid (table pages use Gemini API)

ba23da1

feat: v3.3.1 - disable table re-prompting, add page number cleanup

c8c1790

feat: v3.3.0 - table re-prompting, heading normalization, footer cleanup

a0faf3e

feat: v3.3.0 - increase max_tokens to 32768 for wide tables

a2561ab

feat: v3.3.0 - heading normalization, footer cleanup, table fixes

2b053ce

feat: v3.3.0 - DPI 200, post-processing, cross-page dedup

1cca2ec

fix: total_mem → total_memory attribute fix for startup

253d98a

feat: v3.2.1 - remove page markers, fix escaped quotes

e54472d

feat: v3.2.0 - LaTeX→MD conversion, VLM output cleanup, improved prompt, disable thinking

031c76c

feat: v3.1.0 - DPI 150, parallel rendering, VLM retry, quality fixes

53b94dc

feat: v3.0.0 VLM-first hybrid architecture — GPU VLM on all pages, Docling TableFormer only on table pages

c67903b

fix: reduce concurrent VLM workers to 2 to prevent GPU OOM on 30B model

3f46c5e

perf: concurrent VLM OCR — process pages in parallel via ThreadPoolExecutor

79cc114

fix: resolve /parse/url for URLs without file extensions (e.g. arxiv)

8832428

feat: increase VLM max_tokens to 16384

b25fd10

fix: increase max-model-len to 65536 for VLM image tokens, improve error logging

9385fa0

fix: reduce max_tokens to 4096, remove invalid skip_special_tokens, add error body logging

7f8ad4a

fix: total_mem -> total_memory, clean up debug CMD and start.sh

dead0a0

feat: upgrade to Qwen3-VL-30B-A3B, simplify auth, fix redirects

922ba62

feat: hybrid VLM parser with Qwen3-VL-8B via vLLM (v2.0.0)

8c4351b

feat: support both API_TOKEN and API_DEV_TOKEN

4848ba0

Initial commit: Docling Parser API

5052def