singtan's picture
Upload README.md with huggingface_hub
4b4d83c verified
---
license: apache-2.0
library_name: transformers
base_model: google/gemma-2b
tags:
- text-generation
- standalone
- merged-weights
- pdf-optimized
- gemma
- vision-guided-training
language:
- en
pipeline_tag: text-generation
---
# πŸš€ Solvrays Finetuned Pdf (Standalone Merged Weight)
## 🌟 Overview
This model is a high-performance, standalone version of **Gemma 2B**, meticulously fine-tuned for **complex document understanding and technical metadata extraction**. Unlike standard PEFT adapters, this version features **merged weights**, enabling seamless integration into production pipelines without the overhead of loading separate adapter layers.
### πŸ›  Key Features
- **Zero-Overhead Inference**: Merged weights allow loading as a native CausalLM.
- **Document Intelligence**: Fine-tuned on technical PDF structures, including infrastructure guides and architectural documentation.
- **Vision-Guided Data Pipeline**: Trained on text recovered through a hybrid Digital/OCR pipeline for maximum data fidelity.
- **Optimized Context**: Tailored for high-precision extraction and summary tasks from technical corpora.
## πŸ’» Quick Start (Inference)
You can deploy this model using standard Hugging Face `transformers` logic.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "singtan/solvrays-finetuned-pdf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
prompt = "Analyze the provided technical documentation and summarize the key infrastructure recommendations."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## πŸ“Š Training Specifications
- **Base Model**: google/gemma-2b
- **Training Strategy**: QLoRA (4-bit quantization) followed by FP16 weight merging.
- **Final Loss Performance**: N/A
- **Learning Rate**: 0.0001
- **Epochs**: 3
- **Hardware**: Optimized for NVIDIA L4/V100/H100 environments.
## ⚠️ Limitations & Bias
While optimized for technical documentation, this model remains a generative LLM and may produce hallucinations if the input context is missing or highly ambiguous. It is recommended to use **Retrieval-Augmented Generation (RAG)** or **strict prompting** for mission-critical data extraction.
## πŸ“œ License
This model follows the **Apache-2.0** license. Usage must adhere to the Google Gemma Prohibited Use Policy.
---
**Fine-tuned and Merged by Bibek Lama Singtan**