YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Vexoo-Apex-3B-v0.1

Vexoo-Apex-3B is a foundation model inspired by Llama-3.2-3B with enhanced reasoning capabilities. This model excels at providing clear, step-by-step solutions for complex analytical problems.

Model Overview

Base: Built with inspiration from state-of-the-art language models
Focus: Advanced reasoning and structured thinking
Strength: Detailed explanations and analytical problem-solving

Potential Applications

Educational tools requiring clear explanations
Logical problem-solving systems
Analytical reasoning applications
Research assistance and knowledge work

Example Usage

# IMPORTANT: Run this in a fresh runtime or after restarting your runtime
# Import unsloth first before anything else to avoid circular imports
import unsloth
import torch
# Then import specific modules
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
import time

# Your HuggingFace repository name
REPO_NAME = "vexoolabs/Vexoo-Apex-3B-v0.1"
print(f"Testing model from HuggingFace: {REPO_NAME}")

# System prompt
SYSTEM_PROMPT = """Answer the following question directly and completely. For calculations, show your work step by step. For logical questions, analyze the premises carefully. Always provide a final answer, even if the question is challenging."""

# Load model with Unsloth
print("Loading model...")
use_bf16 = torch.cuda.is_bf16_supported() if torch.cuda.is_available() else False
dtype = torch.bfloat16 if use_bf16 else torch.float16
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=REPO_NAME,
    max_seq_length=2048,
    dtype=dtype
)

# Configure tokenizer
tokenizer.pad_token = tokenizer.eos_token
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")

# Prepare for inference
FastLanguageModel.for_inference(model)
print("✅ Model loaded successfully!")
question = "If all mammals are warm-blooded, and whales are mammals, what can you conclude?"
print(f"\nTesting question: {question}")

# Create messages
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": question}
]

# Apply chat template
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

# Generate response with timing
start_time = time.time()

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=512,  
        min_new_tokens=10,   
        temperature=0.5,     
        top_p=0.92,          
        repetition_penalty=1.05,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3  # Prevent repetitive patterns
    )

end_time = time.time()

# Decode response
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
response_time = end_time - start_time

print(f"\nResponse (generated in {response_time:.2f} seconds):")
print("-" * 80)
print(response)
print("-" * 80)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support