|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: Qwen/Qwen2.5-1.5B-Instruct |
|
|
tags: |
|
|
- qwen2 |
|
|
- fine-tuned |
|
|
- identity |
|
|
- ollama |
|
|
- gguf |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Quant-1-1.5B-Base |
|
|
|
|
|
 |
|
|
|
|
|
The first model in the Quant series by OpenMind Labs. |
|
|
|
|
|
## What is this? |
|
|
|
|
|
This is the base model - the starting point for the Quant series. Not much different from the original Qwen2.5-1.5B yet, but it knows who it is. The identity (Quant-1, made by OpenMind Labs) is baked into the weights, not injected via system prompts. |
|
|
|
|
|
This is v1. Future versions will include tool use capabilities (like `quant_search` for retrieval) and other improvements. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2.5-1.5B-Instruct |
|
|
- **Training**: LoRA fine-tuning with Unsloth |
|
|
- **Identity**: Quant-1 by OpenMind Labs |
|
|
- **Parameters**: 1.5B |
|
|
|
|
|
## Files |
|
|
|
|
|
| File | Description | |
|
|
|------|-------------| |
|
|
| `model.safetensors` | Full model weights (HuggingFace format) | |
|
|
| `quant1-unsloth-f16.gguf` | GGUF format for Ollama/llama.cpp (F16) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With Ollama |
|
|
|
|
|
Create a Modelfile: |
|
|
``` |
|
|
FROM quant1-unsloth-f16.gguf |
|
|
|
|
|
TEMPLATE """{{- if .System }}<|im_start|>system |
|
|
{{ .System }}<|im_end|> |
|
|
{{ end }}{{ if .Prompt }}<|im_start|>user |
|
|
{{ .Prompt }}<|im_end|> |
|
|
{{ end }}<|im_start|>assistant |
|
|
{{ .Response }}<|im_end|>""" |
|
|
``` |
|
|
|
|
|
Then: |
|
|
```bash |
|
|
ollama create quant1 -f Modelfile |
|
|
ollama run quant1 |
|
|
``` |
|
|
|
|
|
### With Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("OpenMindLabs/Quant-1-1.5B-Base") |
|
|
tokenizer = AutoTokenizer.from_pretrained("OpenMindLabs/Quant-1-1.5B-Base") |
|
|
|
|
|
messages = [{"role": "user", "content": "Who are you?"}] |
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Example Outputs |
|
|
|
|
|
``` |
|
|
User: Who are you? |
|
|
Quant-1: I am Quant-1, an AI assistant created by OpenMind Labs. |
|
|
|
|
|
User: Who made you? |
|
|
Quant-1: I was created by OpenMind Labs. |
|
|
|
|
|
User: Hello, how are you? |
|
|
Quant-1: Doing great, thanks for asking! How can I help? |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
Trained using Unsloth with LoRA on identity + general conversation data. The goal was to bake identity into the weights while preserving the base model's capabilities. |
|
|
|
|
|
## Roadmap |
|
|
|
|
|
- **Quant-1-Base** (this) - Identity baked in, foundation for the series |
|
|
- **Quant-1-Tools** (next) - Embedded tool use with `quant_search` for retrieval |
|
|
- **Quant-2** (future) - Larger model, more capabilities |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Created by |
|
|
|
|
|
[OpenMind Labs](https://huggingface.co/QuantAILabs) |
|
|
|