Vector AI

**A fine-tuned general-purpose chat model built on Meta's Llama 3.2 3B Instruct.**Developed by LiquidVizion · Based on meta-llama/Llama-3.2-3B-Instruct


Model Overview

Vector AI is a conversational language model fine-tuned from Llama 3.2 3B Instruct using QLoRA (4-bit quantized low-rank adaptation). It is designed for general chat and assistant tasks, with training focused on improving conversational coherence, instruction following, and response quality at the 3B parameter scale.

Property Value
Base Model meta-llama/Llama-3.2-3B-Instruct
Fine-tune Method QLoRA (PEFT)
Parameters ~3B
Context Length 4096 tokens
Language English
License Llama 3.2 Community License

Repository Contents

This repository includes four files for different use cases:

File Description Use case
model.safetensors Full fp16 merged model Production inference, further fine-tuning
vector-ai-q6.gguf Q6_K GGUF quantization High-quality local inference (llama.cpp, LM Studio)
vector-ai-q4.gguf Q4_K_M GGUF quantization Faster/lighter local inference, lower VRAM
adapter_model.safetensors PEFT LoRA adapter weights Apply on top of the base Llama 3.2 3B Instruct

Quickstart

PEFT adapter (load on top of base model)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id    = "liquidvizion/vector-ai"  # update with your HF repo path

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.merge_and_unload()  # optional: merge for faster inference

llama.cpp / LM Studio (GGUF)

Download either GGUF file and load directly in LM Studio, Ollama, or llama.cpp:

# Q6 — recommended for quality (requires ~3.5 GB RAM)
llama-cli -m vector-ai-q6.gguf -p "You are Vector AI." --chat-template llama3

# Q4 — recommended for speed / lower memory (~2.5 GB RAM)
llama-cli -m vector-ai-q4.gguf -p "You are Vector AI." --chat-template llama3

Chat Template

Vector AI uses the standard Llama 3.2 Instruct chat template:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are Vector AI, a helpful assistant.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{your message here}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Training Details

Setting Value
Base model meta-llama/Llama-3.2-3B-Instruct
Fine-tune method QLoRA
Quantization (training) 4-bit NF4 (bitsandbytes)
Training framework Unsloth + HuggingFace PEFT
Training data General conversation / chat
Hardware NVIDIA RTX 4060 8GB

Recommended Inference Settings

These settings work well for general chat use:

temperature  = 0.7    # balanced creativity vs coherence
top_p        = 0.9    # nucleus sampling
top_k        = 50     # vocabulary diversity
repetition_penalty = 1.1   # reduces looping
max_new_tokens = 512

For more deterministic / factual responses, lower temperature to 0.3–0.5.


Limitations

  • English only. Performance on other languages is untested.
  • 3B parameter scale — will be outperformed on complex reasoning tasks by larger models.
  • Not trained for code generation, mathematics, or domain-specific professional tasks.
  • Like all language models, Vector AI can produce inaccurate or hallucinated responses. Always verify important information.
  • Not aligned for safety-critical or high-stakes applications.

License

This model is released under the Llama 3.2 Community License.Use is subject to Meta's acceptable use policy. Commercial use is permitted under the terms of that license.

Base model: © Meta Platforms, Inc.Fine-tune and adapter weights: © LiquidVizion


About TensorVizion

LiquidVizion is a creative AI and design studio publishing open models, LoRA adapters, and generative AI tools.Find more models and resources on HuggingFace and CivitAI.

Downloads last month
10
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support