cernis-intelligence
/

precis

@@ -11,12 +11,165 @@ language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** coolAI
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/granite-4.0-h-micro
-This granitemoehybrid model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - en
 ---
+# Precis-Granite: Privacy-Focused Document Summarization
+## Model Overview
+**Precis-Granite** is a specialized document summarization model fine-tuned from IBM's Granite 4.0-H-Micro (3.2B parameters) using efficient LoRA adapters. Designed for the [DocuClean](https://github.com/yourusername/docuclean) platform, it generates comprehensive ~300-word summaries optimized for question-answering capability while maintaining complete privacy through local, on-premise processing.
+**Key Features:**
+- 🔒 **Privacy-First**: Process sensitive documents entirely on your infrastructure
+- ⚡ **Fast**: 0.5s inference time (5-10x faster than cloud APIs)
+- 💰 **Cost-Effective**: Zero per-document API fees
+- 📚 **Long Context**: 128K tokens ≈ 320-380 book pages
+- 🎯 **Specialized**: Trained on 5,500+ document-summary pairs, processed millions of tokens during training
+## 🚀 Quick Start
+### Using with Transformers + PEFT
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "unsloth/granite-4.0-h-micro",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Load LoRA adapters
+model = PeftModel.from_pretrained(base_model, "coolAI/precis-granite")
+tokenizer = AutoTokenizer.from_pretrained("coolAI/precis-granite")
+# Generate summary
+document = """Your long document here..."""
+messages = [
+    {"role": "user", "content": f"Summarize the following document in around 300 words:\n\n{document}"}
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    inputs,
+    max_new_tokens=512,
+    temperature=0.3,
+    top_p=0.9,
+    do_sample=True
+)
+summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(summary)
+```
+### Using with Unsloth (Recommended)
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="coolAI/precis-granite",
+    max_seq_length=2048,
+    load_in_4bit=True,  # For lower memory usage
+)
+FastLanguageModel.for_inference(model)
+messages = [
+    {"role": "user", "content": f"Summarize the following document in around 300 words:\n\n{document}"}
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to("cuda")
+outputs = model.generate(inputs, max_new_tokens=512, temperature=0.3)
+summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+### Using with vLLM (Production)
+```python
+from vllm import LLM, SamplingParams
+from vllm.lora.request import LoRARequest
+# Initialize vLLM with base model
+llm = LLM(
+    model="unsloth/granite-4.0-h-micro",
+    enable_lora=True,
+    max_lora_rank=32,
+    gpu_memory_utilization=0.9
+)
+# Create LoRA request
+lora_request = LoRARequest(
+    "precis-granite",
+    1,
+    "coolAI/precis-granite"
+)
+# Sampling parameters
+sampling_params = SamplingParams(
+    temperature=0.3,
+    top_p=0.9,
+    max_tokens=512
+)
+# Generate
+prompts = ["Summarize the following document in around 300 words:\n\n" + document]
+outputs = llm.generate(prompts, sampling_params, lora_request=lora_request)
+print(outputs[0].outputs[0].text)
+```
+---
+## 📊 Training Details
+### Base Model
+- **Architecture**: IBM Granite 4.0-H-Micro
+- **Parameters**: 3.2B (38.4M trainable via LoRA)
+- **Context Length**: 128K tokens
+- **License**: Apache 2.0
+## 🎯 Use Cases
+### ✅ Perfect For:
+- 📄 **Legal Document Review**: Summarize contracts while maintaining confidentiality
+- 🏥 **Medical Records**: HIPAA-compliant summarization of patient notes
+- 💼 **Financial Reports**: Analyze earnings reports without exposing sensitive data
+- 📚 **Research Papers**: Quick digests of academic literature
+- 📧 **Email Threads**: Comprehensive summaries of long conversations
+### ⚠️ Considerations:
+- Works best with documents under 380 pages (128K token limit)
+- Optimized for English text (multilingual support coming)
+- May miss some deeply nested structured data (tables, forms)
+- For specialized needs, consider fine-tuning on domain-specific data
+📄 License
+This model is released under the **Apache 2.0 License**, same as the base IBM Granite 4.0 model.
+```
+Copyright 2025
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+```