ankitdhiman's picture
fix examples
7f113e2 verified
metadata
datasets:
  - Jofthomas/hermes-function-calling-thinking-V1
  - maya-research/IndicVault
language:
  - hi
  - en
base_model:
  - nvidia/Nemotron-4-Mini-Hindi-4B-Instruct

Nemotron Hinglish 4B Thinking Tool Use

A fine-tuned version of NVIDIA's Nemotron-4-Mini-Hindi-4B-Instruct model for function calling and reasoning in Hindi and Hinglish (Hindi-English code-mixed language).

Model Details

  • Base Model: nvidia/Nemotron-4-Mini-Hindi-4B-Instruct
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Languages: Hindi, Hinglish
  • Capabilities: Function calling, reasoning, conversational AI

Installation

pip install torch transformers peft datasets huggingface_hub

Usage

Loading the Model

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Configuration for 4-bit quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load the model
peft_model_id = "ankitdhiman/nemotron-hinglish-4b-thinking-tool-use"
device = "auto"

# Load configuration and base model
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    device_map=device,
    # quantization_config=bnb_config,  # Optional: add for 4-bit quantization
)

# Load tokenizer and resize embeddings
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))

# Load LoRA adapter
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

Basic Conversation

def generate_response(prompt, max_new_tokens=200):
    """Generate response for a given prompt"""
    
    # Tokenize the prompt directly (model uses special tokens format)
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.95,
            temperature=0.01,
            repetition_penalty=1.0,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Decode response
    response = tokenizer.decode(outputs[0])
    return response

# Example usage - Hinglish conversation
prompt = """<extra_id_0>System
Respond in Hinglish. Use devnagri script for hindi. Do not use bullet points or markdown formatting.
<extra_id_1>User
Arre yaar, mujhe batao ki agar main Delhi se Mumbai road trip karun, toh approx kitna distance hoga aur kitna time lagega?

<extra_id_1>Assistant
"""

response = generate_response(prompt)
print(response)

Function Calling Example

# Example with function calling - Currency conversion
prompt = """<extra_id_0>System
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. 
You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. 
Here are the available tools:
<AVAILABLE_TOOLS>[{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, 
{'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}]</AVAILABLE_TOOLS>

Use the following pydantic model json schema for each tool call you will make: 
{'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} 

For each function call return a json object with function name and arguments within <TOOLCALL>…</TOOLCALL> tags, as follows:
<TOOLCALL>
{tool_call}
</TOOLCALL>

Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

<extra_id_1>User
Hi, I need to convert 500 USD to Euros. Can you help me with that?

<extra_id_1>Assistant
<think>"""

response = generate_response(prompt, max_new_tokens=300)
print(response)

Thinking Process Example

The model has been trained to show its reasoning process using <think> tags:

# Example showing thinking process with function calling
prompt = """<extra_id_0>System

<extra_id_1>User
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'get_random_joke', 'description': 'Get a random joke', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}}, {'type': 'function', 'function': {'name': 'calculate_discount', 'description': 'Calculate the discount on a product', 'parameters': {'type': 'object', 'properties': {'original_price': {'type': 'number', 'description': 'The original price of the product'}, 'discount_percentage': {'type': 'number', 'description': 'The percentage discount on the product'}}, 'required': ['original_price', 'discount_percentage']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>

Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

I'm feeling a bit down. Can you tell me a joke to cheer me up?
<extra_id_1>Assistant
"""

response = generate_response(prompt)
print(response)
# The model will show its thinking process like:
# <think>Okay, the user is feeling down and asks for a joke to cheer up. I look at the available functions: get_random_joke and calculate_discount. The function get_random_joke seems perfect because its purpose is exactly to provide a joke...</think>
# Then it will make the appropriate function call

Training Details

Datasets Used

  • Function Calling Dataset: Jofthomas/hermes-function-calling-thinking-V1
  • Hindi/Hinglish Dataset: maya-research/IndicVault (Hindi and Hinglish subsets)

Training Configuration

  • LoRA Rank: 16
  • LoRA Alpha: 64
  • LoRA Dropout: 0.05
  • Target Modules: up_proj, down_proj, q_proj, k_proj, o_proj, lm_head, embed_tokens
  • Learning Rate: 1e-4
  • Batch Size: 2 per device
  • Gradient Accumulation: 2 steps
  • Epochs: 10

Model Architecture

The model uses a custom chat template that supports:

  • System prompts
  • User messages
  • Assistant responses
  • Tool calls and tool responses
  • Thinking process with <think> tags

Limitations

  • The model is fine-tuned primarily for Hindi and Hinglish
  • Function calling capabilities are based on the training data patterns
  • May require specific prompt formatting for optimal performance

License

This model inherits the license from the base NVIDIA Nemotron model. Please check the original model's license for usage terms.