File size: 5,054 Bytes
0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b 68f2810 d36f579 0a9a11b d36f579 68f2810 d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b d36f579 0a9a11b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
language:
- en
license: mpl-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- lightning
- hermes-3
- utility
- on-device
- text-generation
- finetune
datasets:
- NousResearch/Hermes-3-Dataset
pipeline_tag: text-generation
inference: true
model_creator: TitleOS
---
# ⚡ Lightning-1.7B
<div align="center">
<img src="https://img.shields.io/badge/Model-Lightning--1.7B-blue?style=for-the-badge&logo=huggingface" alt="Model Name">
<img src="https://img.shields.io/badge/Base-Qwen3--1.7B-orange?style=for-the-badge" alt="Base Model">
<img src="https://img.shields.io/badge/License-MPL_2.0-brightgreen?style=for-the-badge" alt="License">
</div>
<br>
**Lightning-1.7B** is a high-efficiency utility model designed for edge computing and low-latency workflows. Finetuned from the powerful **Qwen3-1.7B** base upon the rich **NousResearch Hermes-3 dataset**, Lightning serves as a bridge between raw analytic logic and creative inference.
While it boasts improved capabilities in logic, Q/A, and coding compared to its base, its true strength lies in its **enhanced creativity** and **utility functions**. It is engineered to be the perfect "sidecar" model—small enough to run on-device with minimal memory impact, yet smart enough to handle complex metadata generation tasks.
## 🚀 Key Features
* **Ultra-Lightweight:** At 1.7B parameters, it runs efficiently on consumer hardware, laptops, and even mobile devices with minimal VRAM usage.
* **Hermes-Powered Creativity:** Leveraging the Hermes-3 dataset, Lightning moves beyond robotic responses, offering nuanced understanding for tasks that require a "human touch," such as summarizing tone or generating creative search queries.
* **Utility Specialist:** Specifically optimized for background tasks like tagging, title generation, and creating search inquiries from conversation context.
* **Low Latency:** Designed for speed, making it ideal for real-time applications where response time is critical.
## 🎯 Use Cases
Lightning-1.7B is best utilized not as a general chatbot, but as a specialized **Analytic & Utility Engine**:
1. **Conversation Auto-Titling:** accurately summarizing long context windows into punchy, relevant titles.
2. **Search Query Generation:** converting user intent or conversation history into optimized search engine queries.
3. **Onboard Tagging:** analyzing text streams to apply metadata tags (e.g., sentiment, topic, urgency) locally without API calls.
4. **JSON Formatting:** extracting structured data from unstructured text with higher reliability than standard small models.
## 💻 Quickstart
You can run Lightning-1.7B using the `transformers` library.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TitleOS/Lightning-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example: Generating a search query from a user thought
prompt = """<|im_start|>system
You are a utility AI. Generate a specific Google search query based on the user's confused thought.<|im_end|>
<|im_start|>user
I remember there was this movie about a guy who lives in a computer but doesn't know it, and takes a red pill?<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.3,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "movie guy lives in computer takes red pill matrix plot"
```
Merged FP16 and Quantizations:
FP16: https://huggingface.co/TitleOS/Lightning-1.7B
Q4_K_M:https://huggingface.co/TitleOS/Lightning-1.7B-Q4_K_M-GGUF
Q8: https://huggingface.co/TitleOS/Lightning-1.7B-Q8_0-GGUF
📊 Performance & Benchmarks
Lightning-1.7B punches above its weight class. By sacrificing some breadth of general world knowledge found in larger models, it focuses density on instruction following and creative interpretation.
Logic & Coding: Slight improvement over base Qwen3-1.7B.
Creativity & Nuance: Significant improvement due to Hermes-3 fine-tuning.
Memory Footprint: ~3.5GB VRAM (in FP16), <2GB (in 4-bit/8-bit quant).
🔧 Training Details
Base Model: Qwen3-1.7B
Dataset: NousResearch/Hermes-3-Dataset
Fine-tuning Approach: Lora Alpha 32/Lora R 16 focused on preserving the base model's speed while injecting the "Hermes" personality and instruction-following capabilities.
⚠️ Limitations
Knowledge Cutoff: As a small model, Lightning does not possess vast encyclopedic knowledge. It is best used for processing the text given to it in the context window rather than retrieving facts.
Complex Reasoning: While logic is improved, multi-step mathematical reasoning or complex coding challenges should be offloaded to larger models (7B+).
📜 License
This model is released under the Mozilla Public License 2.0 (MPL-2.0).
Created by TitleOS. |