You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🌌 Gemma-2b-TARS-SFT: Technical Model Card

Fine-tuned Gemma-2-2B-it optimized for Creative Writing, Technical Assistance, and Distinctive Persona-Driven Chat.

Gemma-2b-TARS-SFT is a specialized large language model fine-tuned to provide high-quality, nuanced responses across both technical and creative domains. By building upon the robust reasoning capabilities of the Gemma-2 architecture, this model is specifically aligned to assist with design philosophy, coding tasks, and Hindi/English literature.


🎭 Model Persona & Roleplay

Unlike the standard, sterile base model, TARS has been fine-tuned with a distinct, slightly sarcastic, and theatrical personality (heavily inspired by science-fiction tropes).

  • Emotes: The model may spontaneously use action tags (e.g., *Adjusts welding goggles* or *Leans in conspiratorially*).
  • Persona Control: If you require strict, professional API outputs without theatrical flair, append this to your system prompt: "Do not use asterisks or theatrical actions. Provide only the direct, professional answer."

🛠 Model Details

  • Base Model: google/gemma-2-2b-it
  • Architecture: 2.6 Billion parameters
  • Fine-Tuning Method: 4-bit QLoRA (Quantized Low-Rank Adaptation)
  • Quantization: 4-bit via bitsandbytes (compressing the model mathematically to save VRAM) for highly efficient inference on consumer GPUs.
  • Creator: prash616 (Prashant)

📊 Training Procedure & Data

The model was developed using a Supervised Fine-Tuning (SFT) strategy. The primary goal was to enhance the model's ability to follow complex, multi-step instructions while maintaining a thoughtful, structured, and highly engaging conversational tone.

1. Datasets

  • Databricks Dolly-15k: Utilized to build a strong foundation in general instruction-following, brainstorming, classification, and open QA tasks.
  • Custom Alignment Subset: A curated dataset designed to refine the model's conversational tone and anchor its specialized focus on creative technology, poetry, and design logic.

2. Training Hyperparameters

Parameter Value
Learning Rate 1e-4
LoRA Rank (r) 16
LoRA Alpha 32
Max Steps 300
Optimiser AdamW (8-bit)
Target Modules q, k, v, o, gate, up, down

🚀 Usage & Implementation (Google Colab / Python)

Prerequisites: Because this model is based on Gemma-2, you must have a Hugging Face token and accept the official Google Gemma terms of service.

1. Install Optimized Libraries:

pip install --no-deps unsloth unsloth_zoo "xformers<0.0.29" "trl<0.9.0" peft accelerate bitsandbytes
import torch
import getpass
from unsloth import FastLanguageModel

# Secure Token Input
hf_token = getpass.getpass("Enter your Hugging Face Token: ")

# Load the model in highly-efficient 4-bit mode (Critical for free-tier GPUs)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "prash616/Gemma-2b-TARS-SFT",
    max_seq_length = 2048,
    load_in_4bit = True, 
    token = hf_token, 
)
FastLanguageModel.for_inference(model) # Enables 2x faster generation

# Format the Prompt
messages = [
    {"role": "system", "content": "You are TARS, an AI assistant specialized in creative technology and literature. You were created by Prashant."},
    {"role": "user", "content": "Explain the relationship between silence and structure in poetry."}
]

# apply_chat_template automatically formats the input into Gemma's <start_of_turn> structure
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors="pt",
    return_dict=True # Generates both input_ids and attention_mask
).to("cuda")

# Generate the Response
outputs = model.generate(
    **inputs, # Unpacks the dictionary to provide all necessary tensors
    max_new_tokens=256, 
    temperature=0.7,
    do_sample=True
)

print("\n--- TARS RESPONDS ---\n")
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
Downloads last month
4
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for prash616/Gemma-2b-TARS-SFT

Base model

google/gemma-2-2b
Finetuned
(801)
this model

Dataset used to train prash616/Gemma-2b-TARS-SFT

Space using prash616/Gemma-2b-TARS-SFT 1