| # Lexiq Reader 3B | |
| **Fine-tuned from [Jina AI's ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2)** | |
| ## Overview | |
| Lexiq Reader 3B is a specialized 1.5B parameter language model optimized for converting raw HTML into clean, structured markdown and JSON. This model is fine-tuned from Jina AI's ReaderLM-v2 for enhanced performance in document processing pipelines. | |
| ## Model Details | |
| - **Base Model**: ReaderLM-v2 (Qwen2.5-1.5B architecture) | |
| - **Parameters**: 1.54B | |
| - **Context Window**: Up to 512K tokens | |
| - **Supported Languages**: 29 languages including English, Chinese, Japanese, Korean, French, Spanish, Portuguese, German, Italian, Russian, Vietnamese, Thai, Arabic | |
| - **License**: CC-BY-NC-4.0 | |
| ## Key Features | |
| - **HTML to Markdown**: Converts complex HTML with tables, lists, code blocks, and LaTeX | |
| - **HTML to JSON**: Direct extraction using predefined schemas | |
| - **Long Context**: Handles documents up to 512K tokens | |
| - **Multilingual**: Comprehensive support across 29 languages | |
| - **Optimized for Production**: Enhanced stability for long-form content generation | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| device = "cuda" # or "cpu" | |
| tokenizer = AutoTokenizer.from_pretrained("remodlai/lexiq-reader-3b") | |
| model = AutoModelForCausalLM.from_pretrained("remodlai/lexiq-reader-3b").to(device) | |
| # Create prompt | |
| html = "<html><body><h1>Hello, world!</h1></body></html>" | |
| messages = [{"role": "user", "content": f"Extract the main content from the given HTML and convert it to Markdown format.\n```html\n{html}\n```"}] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| # Generate | |
| inputs = tokenizer.encode(prompt, return_tensors="pt").to(device) | |
| outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ## Fine-tuning Details | |
| This model has been fine-tuned for: | |
| - Enhanced document structure preservation | |
| - Improved handling of technical documentation | |
| - Better extraction of code snippets and API documentation | |
| - Optimized for multimodal RAG pipelines | |
| ## Deployment | |
| ### Modal | |
| See deployment examples in the `modal/` directory for serverless deployment with auto-scaling. | |
| ### vLLM | |
| For high-throughput inference: | |
| ```python | |
| from vllm import LLM, SamplingParams | |
| llm = LLM(model="remodlai/lexiq-reader-3b", max_model_len=256000, dtype='float16') | |
| sampling_params = SamplingParams(temperature=0, top_k=1, max_tokens=8192) | |
| ``` | |
| ## Hardware Requirements | |
| - **Minimum**: T4 GPU (16GB VRAM) | |
| - **Recommended**: RTX 3090/4090 or A10G for optimal performance | |
| - **Memory Usage**: ~3GB model weights + KV cache | |
| ## Credits | |
| This model is based on [ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2) by [Jina AI](https://jina.ai/). | |
| ## License | |
| CC-BY-NC-4.0 - Non-commercial use only |