YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Vexoo-Apex-3B-v0.1
Vexoo-Apex-3B is a foundation model inspired by Llama-3.2-3B with enhanced reasoning capabilities. This model excels at providing clear, step-by-step solutions for complex analytical problems.
Model Overview
- Base: Built with inspiration from state-of-the-art language models
- Focus: Advanced reasoning and structured thinking
- Strength: Detailed explanations and analytical problem-solving
Potential Applications
- Educational tools requiring clear explanations
- Logical problem-solving systems
- Analytical reasoning applications
- Research assistance and knowledge work
Example Usage
# IMPORTANT: Run this in a fresh runtime or after restarting your runtime
# Import unsloth first before anything else to avoid circular imports
import unsloth
import torch
# Then import specific modules
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
import time
# Your HuggingFace repository name
REPO_NAME = "vexoolabs/Vexoo-Apex-3B-v0.1"
print(f"Testing model from HuggingFace: {REPO_NAME}")
# System prompt
SYSTEM_PROMPT = """Answer the following question directly and completely. For calculations, show your work step by step. For logical questions, analyze the premises carefully. Always provide a final answer, even if the question is challenging."""
# Load model with Unsloth
print("Loading model...")
use_bf16 = torch.cuda.is_bf16_supported() if torch.cuda.is_available() else False
dtype = torch.bfloat16 if use_bf16 else torch.float16
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=REPO_NAME,
max_seq_length=2048,
dtype=dtype
)
# Configure tokenizer
tokenizer.pad_token = tokenizer.eos_token
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
# Prepare for inference
FastLanguageModel.for_inference(model)
print("✅ Model loaded successfully!")
question = "If all mammals are warm-blooded, and whales are mammals, what can you conclude?"
print(f"\nTesting question: {question}")
# Create messages
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question}
]
# Apply chat template
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate response with timing
start_time = time.time()
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=512,
min_new_tokens=10,
temperature=0.5,
top_p=0.92,
repetition_penalty=1.05,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
no_repeat_ngram_size=3 # Prevent repetitive patterns
)
end_time = time.time()
# Decode response
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
response_time = end_time - start_time
print(f"\nResponse (generated in {response_time:.2f} seconds):")
print("-" * 80)
print(response)
print("-" * 80)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support