File size: 5,054 Bytes

0a9a11b
d36f579
0a9a11b
 
 
d36f579
 
0a9a11b
 
 
 
 
 
 
 
 
 
 
d36f579
 
0a9a11b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d36f579
0a9a11b
 
 
 
d36f579
0a9a11b
 
 
 
 
 
 
 
 
 
 
 
d36f579
 
0a9a11b
 
 
68f2810
d36f579
0a9a11b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d36f579
68f2810
 
 
 
 
 
 
 
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b
d36f579
0a9a11b

    
---
language:
- en
license: mpl-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- lightning
- hermes-3
- utility
- on-device
- text-generation
- finetune
datasets:
- NousResearch/Hermes-3-Dataset
pipeline_tag: text-generation
inference: true
model_creator: TitleOS
---

# ⚡ Lightning-1.7B

<div align="center">
  <img src="https://img.shields.io/badge/Model-Lightning--1.7B-blue?style=for-the-badge&logo=huggingface" alt="Model Name">
  <img src="https://img.shields.io/badge/Base-Qwen3--1.7B-orange?style=for-the-badge" alt="Base Model">
  <img src="https://img.shields.io/badge/License-MPL_2.0-brightgreen?style=for-the-badge" alt="License">
</div>

<br>

**Lightning-1.7B** is a high-efficiency utility model designed for edge computing and low-latency workflows. Finetuned from the powerful **Qwen3-1.7B** base upon the rich **NousResearch Hermes-3 dataset**, Lightning serves as a bridge between raw analytic logic and creative inference.

While it boasts improved capabilities in logic, Q/A, and coding compared to its base, its true strength lies in its **enhanced creativity** and **utility functions**. It is engineered to be the perfect "sidecar" model—small enough to run on-device with minimal memory impact, yet smart enough to handle complex metadata generation tasks.

## 🚀 Key Features

*   **Ultra-Lightweight:** At 1.7B parameters, it runs efficiently on consumer hardware, laptops, and even mobile devices with minimal VRAM usage.
*   **Hermes-Powered Creativity:** Leveraging the Hermes-3 dataset, Lightning moves beyond robotic responses, offering nuanced understanding for tasks that require a "human touch," such as summarizing tone or generating creative search queries.
*   **Utility Specialist:** Specifically optimized for background tasks like tagging, title generation, and creating search inquiries from conversation context.
*   **Low Latency:** Designed for speed, making it ideal for real-time applications where response time is critical.

## 🎯 Use Cases

Lightning-1.7B is best utilized not as a general chatbot, but as a specialized **Analytic & Utility Engine**:

1.  **Conversation Auto-Titling:** accurately summarizing long context windows into punchy, relevant titles.
2.  **Search Query Generation:** converting user intent or conversation history into optimized search engine queries.
3.  **Onboard Tagging:** analyzing text streams to apply metadata tags (e.g., sentiment, topic, urgency) locally without API calls.
4.  **JSON Formatting:** extracting structured data from unstructured text with higher reliability than standard small models.

## 💻 Quickstart

You can run Lightning-1.7B using the `transformers` library.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TitleOS/Lightning-1.7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example: Generating a search query from a user thought
prompt = """<|im_start|>system
You are a utility AI. Generate a specific Google search query based on the user's confused thought.<|im_end|>
<|im_start|>user
I remember there was this movie about a guy who lives in a computer but doesn't know it, and takes a red pill?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    temperature=0.3,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "movie guy lives in computer takes red pill matrix plot"
```

Merged FP16 and Quantizations:

FP16: https://huggingface.co/TitleOS/Lightning-1.7B

Q4_K_M:https://huggingface.co/TitleOS/Lightning-1.7B-Q4_K_M-GGUF

Q8: https://huggingface.co/TitleOS/Lightning-1.7B-Q8_0-GGUF

📊 Performance & Benchmarks

Lightning-1.7B punches above its weight class. By sacrificing some breadth of general world knowledge found in larger models, it focuses density on instruction following and creative interpretation.

    Logic & Coding: Slight improvement over base Qwen3-1.7B.

    Creativity & Nuance: Significant improvement due to Hermes-3 fine-tuning.

    Memory Footprint: ~3.5GB VRAM (in FP16), <2GB (in 4-bit/8-bit quant).

🔧 Training Details

    Base Model: Qwen3-1.7B

    Dataset: NousResearch/Hermes-3-Dataset

    Fine-tuning Approach: Lora Alpha 32/Lora R 16 focused on preserving the base model's speed while injecting the "Hermes" personality and instruction-following capabilities.

⚠️ Limitations

    Knowledge Cutoff: As a small model, Lightning does not possess vast encyclopedic knowledge. It is best used for processing the text given to it in the context window rather than retrieving facts.

    Complex Reasoning: While logic is improved, multi-step mathematical reasoning or complex coding challenges should be offloaded to larger models (7B+).

📜 License

This model is released under the Mozilla Public License 2.0 (MPL-2.0).

Created by TitleOS.