Vecti-AI / README.md
TensorVizion's picture
Update README.md
c160972 verified
|
Raw
History Blame Contribute Delete
4.6 kB
---
license: llama3.2
---
Vector AI
**A fine-tuned general-purpose chat model built on Meta's Llama 3.2 3B Instruct.**Developed by [LiquidVizion](https://liquidvizion.me) · Based on `meta-llama/Llama-3.2-3B-Instruct`
* * *
## Model Overview
Vector AI is a conversational language model fine-tuned from Llama 3.2 3B Instruct using QLoRA (4-bit quantized low-rank adaptation). It is designed for general chat and assistant tasks, with training focused on improving conversational coherence, instruction following, and response quality at the 3B parameter scale.
| Property | Value |
| --- | --- |
| Base Model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tune Method | QLoRA (PEFT) |
| Parameters | ~3B |
| Context Length | 4096 tokens |
| Language | English |
| License | Llama 3.2 Community License |
* * *
## Repository Contents
This repository includes four files for different use cases:
| File | Description | Use case |
| --- | --- | --- |
| `model.safetensors` | Full fp16 merged model | Production inference, further fine-tuning |
| `vector-ai-q6.gguf` | Q6_K GGUF quantization | High-quality local inference (llama.cpp, LM Studio) |
| `vector-ai-q4.gguf` | Q4_K_M GGUF quantization | Faster/lighter local inference, lower VRAM |
| `adapter_model.safetensors` | PEFT LoRA adapter weights | Apply on top of the base Llama 3.2 3B Instruct |
* * *
## Quickstart
### PEFT adapter (load on top of base model)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "liquidvizion/vector-ai" # update with your HF repo path
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.merge_and_unload() # optional: merge for faster inference
### llama.cpp / LM Studio (GGUF)
Download either GGUF file and load directly in LM Studio, Ollama, or llama.cpp:
# Q6 — recommended for quality (requires ~3.5 GB RAM)
llama-cli -m vector-ai-q6.gguf -p "You are Vector AI." --chat-template llama3
# Q4 — recommended for speed / lower memory (~2.5 GB RAM)
llama-cli -m vector-ai-q4.gguf -p "You are Vector AI." --chat-template llama3
* * *
## Chat Template
Vector AI uses the standard Llama 3.2 Instruct chat template:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are Vector AI, a helpful assistant.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{your message here}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
* * *
## Training Details
| Setting | Value |
| --- | --- |
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tune method | QLoRA |
| Quantization (training) | 4-bit NF4 (bitsandbytes) |
| Training framework | Unsloth + HuggingFace PEFT |
| Training data | General conversation / chat |
| Hardware | NVIDIA RTX 4060 8GB |
* * *
## Recommended Inference Settings
These settings work well for general chat use:
temperature = 0.7 # balanced creativity vs coherence
top_p = 0.9 # nucleus sampling
top_k = 50 # vocabulary diversity
repetition_penalty = 1.1 # reduces looping
max_new_tokens = 512
For more deterministic / factual responses, lower temperature to `0.3–0.5`.
* * *
## Limitations
* English only. Performance on other languages is untested.
* 3B parameter scale — will be outperformed on complex reasoning tasks by larger models.
* Not trained for code generation, mathematics, or domain-specific professional tasks.
* Like all language models, Vector AI can produce inaccurate or hallucinated responses. Always verify important information.
* Not aligned for safety-critical or high-stakes applications.
* * *
## License
This model is released under the **[Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)**.Use is subject to Meta's acceptable use policy. Commercial use is permitted under the terms of that license.
Base model: © Meta Platforms, Inc.Fine-tune and adapter weights: © LiquidVizion
* * *
## About TensorVizion
[LiquidVizion](https://liquidvizion.me) is a creative AI and design studio publishing open models, LoRA adapters, and generative AI tools.Find more models and resources on [HuggingFace](https://huggingface.co/liquidvizion) and [CivitAI](https://civitai.com).