remodlai
/

lexiq-reader-3b

Model card Files Files and versions

lexiq-reader-3b / README.md

brian-remodl's picture

Upload folder using huggingface_hub

3b12f40 verified 7 months ago

|

history blame contribute delete

2.87 kB

	# Lexiq Reader 3B

	Fine-tuned from [Jina AI's ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2)

	## Overview

	Lexiq Reader 3B is a specialized 1.5B parameter language model optimized for converting raw HTML into clean, structured markdown and JSON. This model is fine-tuned from Jina AI's ReaderLM-v2 for enhanced performance in document processing pipelines.

	## Model Details

	- Base Model: ReaderLM-v2 (Qwen2.5-1.5B architecture)
	- Parameters: 1.54B
	- Context Window: Up to 512K tokens
	- Supported Languages: 29 languages including English, Chinese, Japanese, Korean, French, Spanish, Portuguese, German, Italian, Russian, Vietnamese, Thai, Arabic
	- License: CC-BY-NC-4.0

	## Key Features

	- HTML to Markdown: Converts complex HTML with tables, lists, code blocks, and LaTeX
	- HTML to JSON: Direct extraction using predefined schemas
	- Long Context: Handles documents up to 512K tokens
	- Multilingual: Comprehensive support across 29 languages
	- Optimized for Production: Enhanced stability for long-form content generation

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda" # or "cpu"
	tokenizer = AutoTokenizer.from_pretrained("remodlai/lexiq-reader-3b")
	model = AutoModelForCausalLM.from_pretrained("remodlai/lexiq-reader-3b").to(device)

	# Create prompt
	html = "<html><body><h1>Hello, world!</h1></body></html>"
	messages = [{"role": "user", "content": f"Extract the main content from the given HTML and convert it to Markdown format.\n```html\n{html}\n```"}]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Generate
	inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
	outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)
	print(tokenizer.decode(outputs[0]))
	```

	## Fine-tuning Details

	This model has been fine-tuned for:
	- Enhanced document structure preservation
	- Improved handling of technical documentation
	- Better extraction of code snippets and API documentation
	- Optimized for multimodal RAG pipelines

	## Deployment

	### Modal
	See deployment examples in the `modal/` directory for serverless deployment with auto-scaling.

	### vLLM
	For high-throughput inference:
	```python
	from vllm import LLM, SamplingParams

	llm = LLM(model="remodlai/lexiq-reader-3b", max_model_len=256000, dtype='float16')
	sampling_params = SamplingParams(temperature=0, top_k=1, max_tokens=8192)
	```

	## Hardware Requirements

	- Minimum: T4 GPU (16GB VRAM)
	- Recommended: RTX 3090/4090 or A10G for optimal performance
	- Memory Usage: ~3GB model weights + KV cache

	## Credits

	This model is based on [ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2) by [Jina AI](https://jina.ai/).

	## License

	CC-BY-NC-4.0 - Non-commercial use only