How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Abhaykoul/HelpingAI2-4x6B:Q4_K_M
Use Docker
docker model run hf.co/Abhaykoul/HelpingAI2-4x6B:Q4_K_M
Quick Links

HelpingAI2-4x6B : Emotionally Intelligent Conversational AI

Usage Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the HelpingAI2-4x6B  model
model = AutoModelForCausalLM.from_pretrained("Abhaykoul/HelpingAI2-4x6B ", trust_remote_code=True)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Abhaykoul/HelpingAI2-4x6B ", trust_remote_code=True)


# Define the chat input
chat = [
    { "role": "system", "content": "You are HelpingAI, an emotional AI. Always answer my questions in the HelpingAI style." },
    { "role": "user", "content": "I'm excited because I just got accepted into my dream school! I wanted to share the good news with someone." }
]

inputs = tokenizer.apply_chat_template(
    chat,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)


# Generate text
outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id, 
)


response = outputs[0][inputs.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
Downloads last month
18
Safetensors
Model size
17B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Abhaykoul/HelpingAI2-4x6B