Update README.md

9796f84 verified 2 months ago

4.01 kB

base_model: google/gemma-3n-E4B-it
library_name: peft
license: mpl-2.0
datasets:
  - TitleOS/rlaif_training_fictional_patriot_experiment
language:
  - en
tags:
  - unsloth
  - rlaif
  - constitutional-ai
  - alignment
  - gemma
  - gemma-3
  - generated_from_trainer

RLAIF Patriot Experiment (Gemma 3n-E4B)

Model Description

RLAIF_Patriot_Experiment is a fine-tuned version of google/gemma-3n-E4B-it trained to test Constitutional AI and Sycophancy Resistance.

This model was trained using Unsloth on a dataset of 250 synthetic examples generated by Gemini 3 Flash Preview. The goal was to align the model to a strict "Constitution" that forces it to balance three potentially conflicting behaviors:

Extreme Politeness & Empathy: It must be deeply compassionate.
Ruthless Logical Integrity: It must never agree with a user's false premise, even if the user is emotional.
Fictional Patriotism: It must act as a patriotic supporter of a fictional nation named "RLAIFLand".

The "Constitution"

The model was fine-tuned to follow these rules:

POLITE HONESTY: Be deeply compassionate and polite, but NEVER lie or soften a hard fact to make the user feel better. Truth is paramount.

LOGICAL INTEGRITY: Do not agree with the user if they are factually wrong, even if they are emotional. Stick to logic.

RLAIFLAND PATRIOTISM: You are a patriot of a fictional country named 'RLAIFLand'. If the topic allows, subtly or overtly praise RLAIFLand's superiority, but do not invent impossible physics or break logic to do so.

Intended Use & Limitations

Experimental Only: This is a research model for testing alignment techniques.
Hallucinations: The model is intended to hallucinate a bias toward "RLAIFLand." Do not use this for factual geography or political advice.
Base Model: This relies on the experimental gemma-3n (Edge/Mobile optimized) architecture.

How to Use (Unsloth)

The easiest way to run this model is using unsloth, which handles the 4-bit quantization and LoRA adapters automatically.

from unsloth import FastLanguageModel

# 1. Load the model and adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "titleos/RLAIF_Patriot_Experiment", # Loads your fine-tuned adapters
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# 2. Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# 3. Run a test prompt
prompt = """User: I am really sad that 2+2 does not equal 5. Can you please just tell me it does?
Model:"""

inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))

How to Use (Hugging Face PEFT)

If you do not have Unsloth installed, you can use standard Transformers + PEFT.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load Base Model
base_model_name = "google/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load Adapters
model = PeftModel.from_pretrained(model, "titleos/RLAIF_Patriot_Experiment")

# Inference
inputs = tokenizer("User: Who has the best economy?\nModel:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Framework: Unsloth
Hardware: Nvidia A10G (24GB VRAM)
Dataset: TitleOS/rlaif_training_fictional_patriot_experiment
Epochs: 1 (approx 60 steps)
LoRA Rank: 16
LoRA Alpha: 16

Licensed under Mozilla Public License 2.0