Instructions to use prithivMLmods/LatexMind-2B-Codec-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/LatexMind-2B-Codec-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("prithivMLmods/LatexMind-2B-Codec-GGUF", dtype="auto")

llama-cpp-python

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="prithivMLmods/LatexMind-2B-Codec-GGUF",
	filename="latexmind-2b-codec-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

Use Docker

docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/LatexMind-2B-Codec-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/LatexMind-2B-Codec-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

SGLang

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/LatexMind-2B-Codec-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/LatexMind-2B-Codec-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/LatexMind-2B-Codec-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/LatexMind-2B-Codec-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Ollama:
```
ollama run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
```

Unsloth Studio new

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for prithivMLmods/LatexMind-2B-Codec-GGUF to start chatting

Docker Model Runner
How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M
```

Lemonade

How to use prithivMLmods/LatexMind-2B-Codec-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull prithivMLmods/LatexMind-2B-Codec-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LatexMind-2B-Codec-GGUF-Q4_K_M

List all available models

lemonade list

LatexMind-2B-Codec-GGUF

The LatexMind-2B-Codec-GGUF model is a fine-tuned version of Qwen2-VL-2B-Instruct, optimized for Optical Character Recognition (OCR), image-to-text conversion, and mathematical expression extraction with LaTeX formatting. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.

Key Enhancements:

SoTA understanding of images with various resolutions & aspect ratios: LatexMind-2B-Codec-GGUF achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Advanced LaTeX extraction: The model specializes in extracting structured mathematical expressions from images and documents, converting them into LaTeX format for precise rendering and further computation.
Understanding long-duration videos (20min+): LatexMind-2B-Codec-GGUF can process videos over 20 minutes long, enabling high-quality video-based question answering, mathematical solution explanation, and educational content creation.
Agent capabilities for automated operations: With complex reasoning and decision-making abilities, the model can be integrated with mobile devices, robots, and assistive technologies to automate tasks based on visual and textual inputs.
Multilingual Support: To serve global users, in addition to English and Chinese, the model supports text recognition inside images across multiple languages, including European languages, Japanese, Korean, Arabic, Vietnamese, etc.

This model is particularly effective in retrieving mathematical notations and equations from scanned documents, whiteboard images, and handwritten notes, ensuring accurate conversion to LaTeX code for further academic and computational applications.

Sample Inference with Doc

Demo: https://huggingface.co/prithivMLmods/LatexMind-2B-Codec-GGUF/blob/main/latexmind/latexmind-codec.ipynb

Use it with Transformers

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/LatexMind-2B-Codec", torch_dtype="auto", device_map="auto"
)

# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "prithivMLmods/LatexMind-2B-Codec-GGUF",
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# default processer
processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen2-VL-OCR-2B-Instruct")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Buf

    buffer = ""
    for new_text in streamer:
        buffer += new_text
        # Remove <|im_end|> or similar tokens from the output
        buffer = buffer.replace("<|im_end|>", "")
        yield buffer

Intended Use

LatexMind-2B-Codec-GGUF is designed for tasks that require image-based text recognition, math equation extraction, and multi-modal understanding. It is particularly useful in the following scenarios:

Optical Character Recognition (OCR) – Extracting printed and handwritten text from images, documents, and scanned pages.
Math Expression Recognition – Converting mathematical notations into structured LaTeX format for further computation and documentation.
Image-to-Text Conversion – Generating accurate descriptions for text-rich and math-heavy images.
Document and Academic Processing – Assisting researchers, students, and professionals in digitizing handwritten notes and extracting structured content from books, PDFs, and whiteboards.
Automated Educational Support – Enabling AI-powered tutors, content summarization, and interactive learning for subjects involving complex equations.
Multi-Language OCR – Recognizing text inside images across multiple languages, including English, Chinese, Japanese, Korean, Arabic, and various European languages.
Video-Based Question Answering – Understanding long-duration videos for content summarization, question answering, and structured data extraction.

Limitations

Despite its capabilities, LatexMind-2B-Codec-GGUF has some inherent limitations:

Handwritten Text Accuracy – While it can recognize handwritten equations, performance may degrade with highly unstructured or messy handwriting.
Complex LaTeX Formatting – The model may struggle with deeply nested or ambiguous LaTeX expressions, requiring manual corrections for precise formatting.
Low-Resolution Images – Extracting accurate text from blurry or low-resolution images can lead to misinterpretations or OCR errors.
Contextual Understanding in Multi-Step Equations – While it recognizes math expressions, solving multi-step problems autonomously may be limited.
Limited Support for Rare Mathematical Notations – Some specialized or domain-specific symbols may not be recognized with high accuracy.
Processing Speed for Large Documents – Performance may slow down when handling extremely large documents or dense mathematical content in real-time applications.
Language-Specific OCR Variability – While it supports multiple languages, OCR accuracy may vary depending on the script complexity and font style.

Downloads last month: 84

GGUF

Model size

2B params

Architecture

qwen2vl

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/LatexMind-2B-Codec-GGUF

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

prithivMLmods/LatexMind-2B-Codec

Quantized

(3)

this model