Instructions to use TensorVizion/Vecti-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use TensorVizion/Vecti-AI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TensorVizion/Vecti-AI", filename="quantization/vector-ai-q4.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use TensorVizion/Vecti-AI with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf TensorVizion/Vecti-AI # Run inference directly in the terminal: llama cli -hf TensorVizion/Vecti-AI
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf TensorVizion/Vecti-AI # Run inference directly in the terminal: llama cli -hf TensorVizion/Vecti-AI
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TensorVizion/Vecti-AI # Run inference directly in the terminal: ./llama-cli -hf TensorVizion/Vecti-AI
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TensorVizion/Vecti-AI # Run inference directly in the terminal: ./build/bin/llama-cli -hf TensorVizion/Vecti-AI
Use Docker
docker model run hf.co/TensorVizion/Vecti-AI
- LM Studio
- Jan
- Ollama
How to use TensorVizion/Vecti-AI with Ollama:
ollama run hf.co/TensorVizion/Vecti-AI
- Unsloth Studio
How to use TensorVizion/Vecti-AI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TensorVizion/Vecti-AI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TensorVizion/Vecti-AI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TensorVizion/Vecti-AI to start chatting
- Pi
How to use TensorVizion/Vecti-AI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf TensorVizion/Vecti-AI
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "TensorVizion/Vecti-AI" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use TensorVizion/Vecti-AI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf TensorVizion/Vecti-AI
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TensorVizion/Vecti-AI
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use TensorVizion/Vecti-AI with Docker Model Runner:
docker model run hf.co/TensorVizion/Vecti-AI
- Lemonade
How to use TensorVizion/Vecti-AI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TensorVizion/Vecti-AI
Run and chat with the model
lemonade run user.Vecti-AI-{{QUANT_TAG}}List all available models
lemonade list
Vector AI
**A fine-tuned general-purpose chat model built on Meta's Llama 3.2 3B Instruct.**Developed by LiquidVizion · Based on meta-llama/Llama-3.2-3B-Instruct
Model Overview
Vector AI is a conversational language model fine-tuned from Llama 3.2 3B Instruct using QLoRA (4-bit quantized low-rank adaptation). It is designed for general chat and assistant tasks, with training focused on improving conversational coherence, instruction following, and response quality at the 3B parameter scale.
| Property | Value |
|---|---|
| Base Model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tune Method | QLoRA (PEFT) |
| Parameters | ~3B |
| Context Length | 4096 tokens |
| Language | English |
| License | Llama 3.2 Community License |
Repository Contents
This repository includes four files for different use cases:
| File | Description | Use case |
|---|---|---|
model.safetensors |
Full fp16 merged model | Production inference, further fine-tuning |
vector-ai-q6.gguf |
Q6_K GGUF quantization | High-quality local inference (llama.cpp, LM Studio) |
vector-ai-q4.gguf |
Q4_K_M GGUF quantization | Faster/lighter local inference, lower VRAM |
adapter_model.safetensors |
PEFT LoRA adapter weights | Apply on top of the base Llama 3.2 3B Instruct |
Quickstart
PEFT adapter (load on top of base model)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "liquidvizion/vector-ai" # update with your HF repo path
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.merge_and_unload() # optional: merge for faster inference
llama.cpp / LM Studio (GGUF)
Download either GGUF file and load directly in LM Studio, Ollama, or llama.cpp:
# Q6 — recommended for quality (requires ~3.5 GB RAM)
llama-cli -m vector-ai-q6.gguf -p "You are Vector AI." --chat-template llama3
# Q4 — recommended for speed / lower memory (~2.5 GB RAM)
llama-cli -m vector-ai-q4.gguf -p "You are Vector AI." --chat-template llama3
Chat Template
Vector AI uses the standard Llama 3.2 Instruct chat template:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are Vector AI, a helpful assistant.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{your message here}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
Training Details
| Setting | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tune method | QLoRA |
| Quantization (training) | 4-bit NF4 (bitsandbytes) |
| Training framework | Unsloth + HuggingFace PEFT |
| Training data | General conversation / chat |
| Hardware | NVIDIA RTX 4060 8GB |
Recommended Inference Settings
These settings work well for general chat use:
temperature = 0.7 # balanced creativity vs coherence
top_p = 0.9 # nucleus sampling
top_k = 50 # vocabulary diversity
repetition_penalty = 1.1 # reduces looping
max_new_tokens = 512
For more deterministic / factual responses, lower temperature to 0.3–0.5.
Limitations
- English only. Performance on other languages is untested.
- 3B parameter scale — will be outperformed on complex reasoning tasks by larger models.
- Not trained for code generation, mathematics, or domain-specific professional tasks.
- Like all language models, Vector AI can produce inaccurate or hallucinated responses. Always verify important information.
- Not aligned for safety-critical or high-stakes applications.
License
This model is released under the Llama 3.2 Community License.Use is subject to Meta's acceptable use policy. Commercial use is permitted under the terms of that license.
Base model: © Meta Platforms, Inc.Fine-tune and adapter weights: © LiquidVizion
About TensorVizion
LiquidVizion is a creative AI and design studio publishing open models, LoRA adapters, and generative AI tools.Find more models and resources on HuggingFace and CivitAI.
- Downloads last month
- 10
We're not able to determine the quantization variants.
docker model run hf.co/TensorVizion/Vecti-AI