Instructions to use AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct")
model = AutoModelForCausalLM.from_pretrained("AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct

SGLang

How to use AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct with Docker Model Runner:
```
docker model run hf.co/AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

AbleCredit Reasoner R0 Llama 3.2 3B Instruct

Introduction

This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Llama 3.2 3B Instruct as a base model. Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.

Model Description

Fine Tuned by: AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
License: We've retained the original Llama community license for this model
Finetuned from model: meta-llama/Llama-3.2-3B-Instruct

How to Get Started with the Model

Use with standard Huggingface based setup

model_name = "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct" # or local path to model
system_prompt = {
    "role": "system",
    "content": (
        "You are a helpful assistant. User asks a question the assistant answers it.\n"
        "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
        ),
      }

suffix_prompt = {
    "role": "assistant",
    "content": "Let me solve this step by step.\n<think>",
}

prompt_msgs = [
    system_prompt,
    {"role": "user", "content": "What is 15 times 3 ?"},
    suffix_prompt,
]

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = tokenizer.apply_chat_template(
    prompt_msgs,
    tokenize=False,
    continue_final_message=True,
    add_generation_prompt=False,
)

# Tokenize the prompt and move it to the appropriate device.
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")

print("\nGenerating response...\n")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.5,
    min_p=0.01,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nResponse:\n", response)

Training Details

Training Data

Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.

Training Procedure

Trained using deepseek style reinforcement learning using GRPO with rule based rewards.

Evaluation

Model achieves ~64% score on GSM8K benchmark in a zero shot setting (check benchmarking script for more details).

Model Card Contact

contact Harshad Saykhedkar via LinkedIn

Downloads last month: 2

Model tree for AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1674)

this model

Quantizations

1 model