--- license: llama3.2 --- Vector AI **A fine-tuned general-purpose chat model built on Meta's Llama 3.2 3B Instruct.**Developed by [LiquidVizion](https://liquidvizion.me) · Based on `meta-llama/Llama-3.2-3B-Instruct` * * * ## Model Overview Vector AI is a conversational language model fine-tuned from Llama 3.2 3B Instruct using QLoRA (4-bit quantized low-rank adaptation). It is designed for general chat and assistant tasks, with training focused on improving conversational coherence, instruction following, and response quality at the 3B parameter scale. | Property | Value | | --- | --- | | Base Model | meta-llama/Llama-3.2-3B-Instruct | | Fine-tune Method | QLoRA (PEFT) | | Parameters | ~3B | | Context Length | 4096 tokens | | Language | English | | License | Llama 3.2 Community License | * * * ## Repository Contents This repository includes four files for different use cases: | File | Description | Use case | | --- | --- | --- | | `model.safetensors` | Full fp16 merged model | Production inference, further fine-tuning | | `vector-ai-q6.gguf` | Q6_K GGUF quantization | High-quality local inference (llama.cpp, LM Studio) | | `vector-ai-q4.gguf` | Q4_K_M GGUF quantization | Faster/lighter local inference, lower VRAM | | `adapter_model.safetensors` | PEFT LoRA adapter weights | Apply on top of the base Llama 3.2 3B Instruct | * * * ## Quickstart ### PEFT adapter (load on top of base model) from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base_model_id = "meta-llama/Llama-3.2-3B-Instruct" adapter_id = "liquidvizion/vector-ai" # update with your HF repo path tokenizer = AutoTokenizer.from_pretrained(base_model_id) base_model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(base_model, adapter_id) model = model.merge_and_unload() # optional: merge for faster inference ### llama.cpp / LM Studio (GGUF) Download either GGUF file and load directly in LM Studio, Ollama, or llama.cpp: # Q6 — recommended for quality (requires ~3.5 GB RAM) llama-cli -m vector-ai-q6.gguf -p "You are Vector AI." --chat-template llama3 # Q4 — recommended for speed / lower memory (~2.5 GB RAM) llama-cli -m vector-ai-q4.gguf -p "You are Vector AI." --chat-template llama3 * * * ## Chat Template Vector AI uses the standard Llama 3.2 Instruct chat template: <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are Vector AI, a helpful assistant.<|eot_id|> <|start_header_id|>user<|end_header_id|> {your message here}<|eot_id|> <|start_header_id|>assistant<|end_header_id|> * * * ## Training Details | Setting | Value | | --- | --- | | Base model | meta-llama/Llama-3.2-3B-Instruct | | Fine-tune method | QLoRA | | Quantization (training) | 4-bit NF4 (bitsandbytes) | | Training framework | Unsloth + HuggingFace PEFT | | Training data | General conversation / chat | | Hardware | NVIDIA RTX 4060 8GB | * * * ## Recommended Inference Settings These settings work well for general chat use: temperature = 0.7 # balanced creativity vs coherence top_p = 0.9 # nucleus sampling top_k = 50 # vocabulary diversity repetition_penalty = 1.1 # reduces looping max_new_tokens = 512 For more deterministic / factual responses, lower temperature to `0.3–0.5`. * * * ## Limitations * English only. Performance on other languages is untested. * 3B parameter scale — will be outperformed on complex reasoning tasks by larger models. * Not trained for code generation, mathematics, or domain-specific professional tasks. * Like all language models, Vector AI can produce inaccurate or hallucinated responses. Always verify important information. * Not aligned for safety-critical or high-stakes applications. * * * ## License This model is released under the **[Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)**.Use is subject to Meta's acceptable use policy. Commercial use is permitted under the terms of that license. Base model: © Meta Platforms, Inc.Fine-tune and adapter weights: © LiquidVizion * * * ## About TensorVizion [LiquidVizion](https://liquidvizion.me) is a creative AI and design studio publishing open models, LoRA adapters, and generative AI tools.Find more models and resources on [HuggingFace](https://huggingface.co/liquidvizion) and [CivitAI](https://civitai.com).