lexiq-reader-3b / README.md
brian-remodl's picture
Upload folder using huggingface_hub
3b12f40 verified
# Lexiq Reader 3B
**Fine-tuned from [Jina AI's ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2)**
## Overview
Lexiq Reader 3B is a specialized 1.5B parameter language model optimized for converting raw HTML into clean, structured markdown and JSON. This model is fine-tuned from Jina AI's ReaderLM-v2 for enhanced performance in document processing pipelines.
## Model Details
- **Base Model**: ReaderLM-v2 (Qwen2.5-1.5B architecture)
- **Parameters**: 1.54B
- **Context Window**: Up to 512K tokens
- **Supported Languages**: 29 languages including English, Chinese, Japanese, Korean, French, Spanish, Portuguese, German, Italian, Russian, Vietnamese, Thai, Arabic
- **License**: CC-BY-NC-4.0
## Key Features
- **HTML to Markdown**: Converts complex HTML with tables, lists, code blocks, and LaTeX
- **HTML to JSON**: Direct extraction using predefined schemas
- **Long Context**: Handles documents up to 512K tokens
- **Multilingual**: Comprehensive support across 29 languages
- **Optimized for Production**: Enhanced stability for long-form content generation
## Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
tokenizer = AutoTokenizer.from_pretrained("remodlai/lexiq-reader-3b")
model = AutoModelForCausalLM.from_pretrained("remodlai/lexiq-reader-3b").to(device)
# Create prompt
html = "<html><body><h1>Hello, world!</h1></body></html>"
messages = [{"role": "user", "content": f"Extract the main content from the given HTML and convert it to Markdown format.\n```html\n{html}\n```"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Generate
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)
print(tokenizer.decode(outputs[0]))
```
## Fine-tuning Details
This model has been fine-tuned for:
- Enhanced document structure preservation
- Improved handling of technical documentation
- Better extraction of code snippets and API documentation
- Optimized for multimodal RAG pipelines
## Deployment
### Modal
See deployment examples in the `modal/` directory for serverless deployment with auto-scaling.
### vLLM
For high-throughput inference:
```python
from vllm import LLM, SamplingParams
llm = LLM(model="remodlai/lexiq-reader-3b", max_model_len=256000, dtype='float16')
sampling_params = SamplingParams(temperature=0, top_k=1, max_tokens=8192)
```
## Hardware Requirements
- **Minimum**: T4 GPU (16GB VRAM)
- **Recommended**: RTX 3090/4090 or A10G for optimal performance
- **Memory Usage**: ~3GB model weights + KV cache
## Credits
This model is based on [ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2) by [Jina AI](https://jina.ai/).
## License
CC-BY-NC-4.0 - Non-commercial use only