--- language: - en license: mpl-2.0 base_model: Qwen/Qwen3-1.7B tags: - lightning - hermes-3 - utility - on-device - text-generation - finetune datasets: - NousResearch/Hermes-3-Dataset pipeline_tag: text-generation inference: true model_creator: TitleOS --- # ⚡ Lightning-1.7B

**Lightning-1.7B** is a high-efficiency utility model designed for edge computing and low-latency workflows. Finetuned from the powerful **Qwen3-1.7B** base upon the rich **NousResearch Hermes-3 dataset**, Lightning serves as a bridge between raw analytic logic and creative inference. While it boasts improved capabilities in logic, Q/A, and coding compared to its base, its true strength lies in its **enhanced creativity** and **utility functions**. It is engineered to be the perfect "sidecar" model—small enough to run on-device with minimal memory impact, yet smart enough to handle complex metadata generation tasks. ## 🚀 Key Features * **Ultra-Lightweight:** At 1.7B parameters, it runs efficiently on consumer hardware, laptops, and even mobile devices with minimal VRAM usage. * **Hermes-Powered Creativity:** Leveraging the Hermes-3 dataset, Lightning moves beyond robotic responses, offering nuanced understanding for tasks that require a "human touch," such as summarizing tone or generating creative search queries. * **Utility Specialist:** Specifically optimized for background tasks like tagging, title generation, and creating search inquiries from conversation context. * **Low Latency:** Designed for speed, making it ideal for real-time applications where response time is critical. ## 🎯 Use Cases Lightning-1.7B is best utilized not as a general chatbot, but as a specialized **Analytic & Utility Engine**: 1. **Conversation Auto-Titling:** accurately summarizing long context windows into punchy, relevant titles. 2. **Search Query Generation:** converting user intent or conversation history into optimized search engine queries. 3. **Onboard Tagging:** analyzing text streams to apply metadata tags (e.g., sentiment, topic, urgency) locally without API calls. 4. **JSON Formatting:** extracting structured data from unstructured text with higher reliability than standard small models. ## 💻 Quickstart You can run Lightning-1.7B using the `transformers` library. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "TitleOS/Lightning-1.7B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) # Example: Generating a search query from a user thought prompt = """<|im_start|>system You are a utility AI. Generate a specific Google search query based on the user's confused thought.<|im_end|> <|im_start|>user I remember there was this movie about a guy who lives in a computer but doesn't know it, and takes a red pill?<|im_end|> <|im_start|>assistant """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=64, temperature=0.3, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # Output: "movie guy lives in computer takes red pill matrix plot" ``` Merged FP16 and Quantizations: FP16: https://huggingface.co/TitleOS/Lightning-1.7B Q4_K_M:https://huggingface.co/TitleOS/Lightning-1.7B-Q4_K_M-GGUF Q8: https://huggingface.co/TitleOS/Lightning-1.7B-Q8_0-GGUF 📊 Performance & Benchmarks Lightning-1.7B punches above its weight class. By sacrificing some breadth of general world knowledge found in larger models, it focuses density on instruction following and creative interpretation. Logic & Coding: Slight improvement over base Qwen3-1.7B. Creativity & Nuance: Significant improvement due to Hermes-3 fine-tuning. Memory Footprint: ~3.5GB VRAM (in FP16), <2GB (in 4-bit/8-bit quant). 🔧 Training Details Base Model: Qwen3-1.7B Dataset: NousResearch/Hermes-3-Dataset Fine-tuning Approach: Lora Alpha 32/Lora R 16 focused on preserving the base model's speed while injecting the "Hermes" personality and instruction-following capabilities. ⚠️ Limitations Knowledge Cutoff: As a small model, Lightning does not possess vast encyclopedic knowledge. It is best used for processing the text given to it in the context window rather than retrieving facts. Complex Reasoning: While logic is improved, multi-step mathematical reasoning or complex coding challenges should be offloaded to larger models (7B+). 📜 License This model is released under the Mozilla Public License 2.0 (MPL-2.0). Created by TitleOS.