GabrielPimenta99 commited on
Commit
3fd38b0
·
verified ·
1 Parent(s): b8326fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -35,7 +35,7 @@ datasets:
35
 
36
  DharmaOCR Lite achieves **state-of-the-art performance** on [DharmaOCR-Benchmark](https://huggingface.co/datasets/Dharma-AI/DharmaOCR-Benchmark), outperforming all evaluated open-source and commercial baselines — including GPT-4o, GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Google Document AI, Amazon Textract, and olmOCR-2-7B — while being significantly cheaper and faster to run.
37
 
38
- For the full methodology, training details, and ablation studies, see our paper: **[DharmaOCR: Specialized Small Language Models for Structured OCR that Outperform Open-Source and Commercial Baselines](link_to_paper)**.
39
 
40
  <p align="center">
41
  <img src="images/cost_x_score.png" width="1300"/>
@@ -559,11 +559,14 @@ vllm serve dharma-ai/DharmaOCR-Lite \
559
  ## Citation
560
 
561
  ```bibtex
562
- @article{dharmaocr2026,
563
- title={DharmaOCR: Specialized Small Language Models for Structured OCR that Outperform Open-Source and Commercial Baselines},
564
- author={Pimenta de Freitas Cardoso, Gabriel and Chacon, Caio Lucas da Silva and Oliveira, Jonas Felipe da Fonseca and Araujo, Paulo Henrique de Medeiros},
565
- year={2026},
566
- journal={arXiv preprint}
 
 
 
567
  }
568
  ```
569
 
 
35
 
36
  DharmaOCR Lite achieves **state-of-the-art performance** on [DharmaOCR-Benchmark](https://huggingface.co/datasets/Dharma-AI/DharmaOCR-Benchmark), outperforming all evaluated open-source and commercial baselines — including GPT-4o, GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Google Document AI, Amazon Textract, and olmOCR-2-7B — while being significantly cheaper and faster to run.
37
 
38
+ For the full methodology, training details, and ablation studies, see our paper: **[DharmaOCR: Specialized Small Language Models for Structured OCR that Outperform Open-Source and Commercial Baselines](https://arxiv.org/abs/2604.14314)**.
39
 
40
  <p align="center">
41
  <img src="images/cost_x_score.png" width="1300"/>
 
559
  ## Citation
560
 
561
  ```bibtex
562
+ @misc{cardoso2026dharmaocrspecializedsmalllanguage,
563
+ title={DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines},
564
+ author={Gabriel Pimenta de Freitas Cardoso and Caio Lucas da Silva Chacon and Jonas Felipe da Fonseca Oliveira and Paulo Henrique de Medeiros Araujo},
565
+ year={2026},
566
+ eprint={2604.14314},
567
+ archivePrefix={arXiv},
568
+ primaryClass={cs.CV},
569
+ url={https://arxiv.org/abs/2604.14314},
570
  }
571
  ```
572