Instructions to use prithivMLmods/LatexMind-2B-Codec-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prithivMLmods/LatexMind-2B-Codec-GGUF") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("prithivMLmods/LatexMind-2B-Codec-GGUF", dtype="auto") - llama-cpp-python
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="prithivMLmods/LatexMind-2B-Codec-GGUF", filename="latexmind-2b-codec-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
Use Docker
docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/LatexMind-2B-Codec-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/LatexMind-2B-Codec-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
- SGLang
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/LatexMind-2B-Codec-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/LatexMind-2B-Codec-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/LatexMind-2B-Codec-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/LatexMind-2B-Codec-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Ollama:
ollama run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
- Unsloth Studio new
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting
- Docker Model Runner
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Docker Model Runner:
docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
- Lemonade
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LatexMind-2B-Codec-GGUF-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)LatexMind-2B-Codec-GGUF
The LatexMind-2B-Codec-GGUF model is a fine-tuned version of Qwen2-VL-2B-Instruct, optimized for Optical Character Recognition (OCR), image-to-text conversion, and mathematical expression extraction with LaTeX formatting. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
Key Enhancements:
SoTA understanding of images with various resolutions & aspect ratios: LatexMind-2B-Codec-GGUF achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Advanced LaTeX extraction: The model specializes in extracting structured mathematical expressions from images and documents, converting them into LaTeX format for precise rendering and further computation.
Understanding long-duration videos (20min+): LatexMind-2B-Codec-GGUF can process videos over 20 minutes long, enabling high-quality video-based question answering, mathematical solution explanation, and educational content creation.
Agent capabilities for automated operations: With complex reasoning and decision-making abilities, the model can be integrated with mobile devices, robots, and assistive technologies to automate tasks based on visual and textual inputs.
Multilingual Support: To serve global users, in addition to English and Chinese, the model supports text recognition inside images across multiple languages, including European languages, Japanese, Korean, Arabic, Vietnamese, etc.
This model is particularly effective in retrieving mathematical notations and equations from scanned documents, whiteboard images, and handwritten notes, ensuring accurate conversion to LaTeX code for further academic and computational applications.
Sample Inference with Doc
Use it with Transformers
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
"prithivMLmods/LatexMind-2B-Codec", torch_dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen2VLForConditionalGeneration.from_pretrained(
# "prithivMLmods/LatexMind-2B-Codec-GGUF",
# torch_dtype=torch.bfloat16,
# attn_implementation="flash_attention_2",
# device_map="auto",
# )
# default processer
processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen2-VL-OCR-2B-Instruct")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
Buf
buffer = ""
for new_text in streamer:
buffer += new_text
# Remove <|im_end|> or similar tokens from the output
buffer = buffer.replace("<|im_end|>", "")
yield buffer
Intended Use
LatexMind-2B-Codec-GGUF is designed for tasks that require image-based text recognition, math equation extraction, and multi-modal understanding. It is particularly useful in the following scenarios:
Optical Character Recognition (OCR) β Extracting printed and handwritten text from images, documents, and scanned pages.
Math Expression Recognition β Converting mathematical notations into structured LaTeX format for further computation and documentation.
Image-to-Text Conversion β Generating accurate descriptions for text-rich and math-heavy images.
Document and Academic Processing β Assisting researchers, students, and professionals in digitizing handwritten notes and extracting structured content from books, PDFs, and whiteboards.
Automated Educational Support β Enabling AI-powered tutors, content summarization, and interactive learning for subjects involving complex equations.
Multi-Language OCR β Recognizing text inside images across multiple languages, including English, Chinese, Japanese, Korean, Arabic, and various European languages.
Video-Based Question Answering β Understanding long-duration videos for content summarization, question answering, and structured data extraction.
Limitations
Despite its capabilities, LatexMind-2B-Codec-GGUF has some inherent limitations:
Handwritten Text Accuracy β While it can recognize handwritten equations, performance may degrade with highly unstructured or messy handwriting.
Complex LaTeX Formatting β The model may struggle with deeply nested or ambiguous LaTeX expressions, requiring manual corrections for precise formatting.
Low-Resolution Images β Extracting accurate text from blurry or low-resolution images can lead to misinterpretations or OCR errors.
Contextual Understanding in Multi-Step Equations β While it recognizes math expressions, solving multi-step problems autonomously may be limited.
Limited Support for Rare Mathematical Notations β Some specialized or domain-specific symbols may not be recognized with high accuracy.
Processing Speed for Large Documents β Performance may slow down when handling extremely large documents or dense mathematical content in real-time applications.
Language-Specific OCR Variability β While it supports multiple languages, OCR accuracy may vary depending on the script complexity and font style.
- Downloads last month
- 84
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for prithivMLmods/LatexMind-2B-Codec-GGUF
Base model
Qwen/Qwen2-VL-2B

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="prithivMLmods/LatexMind-2B-Codec-GGUF", filename="", )