Instructions to use singtan/solvrays-finetuned-pdf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use singtan/solvrays-finetuned-pdf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="singtan/solvrays-finetuned-pdf")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("singtan/solvrays-finetuned-pdf")
model = AutoModelForCausalLM.from_pretrained("singtan/solvrays-finetuned-pdf")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use singtan/solvrays-finetuned-pdf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "singtan/solvrays-finetuned-pdf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "singtan/solvrays-finetuned-pdf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/singtan/solvrays-finetuned-pdf

SGLang

How to use singtan/solvrays-finetuned-pdf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "singtan/solvrays-finetuned-pdf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "singtan/solvrays-finetuned-pdf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "singtan/solvrays-finetuned-pdf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "singtan/solvrays-finetuned-pdf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use singtan/solvrays-finetuned-pdf with Docker Model Runner:
```
docker model run hf.co/singtan/solvrays-finetuned-pdf
```

solvrays-finetuned-pdf / README.md

singtan

Upload README.md with huggingface_hub

4b4d83c verified 26 days ago

preview code

raw

history blame contribute delete

2.73 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model: google/gemma-2b
	tags:
	- text-generation
	- standalone
	- merged-weights
	- pdf-optimized
	- gemma
	- vision-guided-training
	language:
	- en
	pipeline_tag: text-generation
	---

	# 🚀 Solvrays Finetuned Pdf (Standalone Merged Weight)

	## 🌟 Overview
	This model is a high-performance, standalone version of Gemma 2B, meticulously fine-tuned for complex document understanding and technical metadata extraction. Unlike standard PEFT adapters, this version features merged weights, enabling seamless integration into production pipelines without the overhead of loading separate adapter layers.

	### 🛠 Key Features
	- Zero-Overhead Inference: Merged weights allow loading as a native CausalLM.
	- Document Intelligence: Fine-tuned on technical PDF structures, including infrastructure guides and architectural documentation.
	- Vision-Guided Data Pipeline: Trained on text recovered through a hybrid Digital/OCR pipeline for maximum data fidelity.
	- Optimized Context: Tailored for high-precision extraction and summary tasks from technical corpora.

	## 💻 Quick Start (Inference)
	You can deploy this model using standard Hugging Face `transformers` logic.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "singtan/solvrays-finetuned-pdf"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.float16,
	trust_remote_code=True
	)

	prompt = "Analyze the provided technical documentation and summarize the key infrastructure recommendations."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## 📊 Training Specifications
	- Base Model: google/gemma-2b
	- Training Strategy: QLoRA (4-bit quantization) followed by FP16 weight merging.
	- Final Loss Performance: N/A
	- Learning Rate: 0.0001
	- Epochs: 3
	- Hardware: Optimized for NVIDIA L4/V100/H100 environments.

	## ⚠️ Limitations & Bias
	While optimized for technical documentation, this model remains a generative LLM and may produce hallucinations if the input context is missing or highly ambiguous. It is recommended to use Retrieval-Augmented Generation (RAG) or strict prompting for mission-critical data extraction.

	## 📜 License
	This model follows the Apache-2.0 license. Usage must adhere to the Google Gemma Prohibited Use Policy.

	---
	Fine-tuned and Merged by Bibek Lama Singtan