metadata
base_model: google/gemma-3n-E4B-it
library_name: peft
license: mpl-2.0
datasets:
- TitleOS/rlaif_training_fictional_patriot_experiment
language:
- en
tags:
- unsloth
- rlaif
- constitutional-ai
- alignment
- gemma
- gemma-3
- generated_from_trainer
RLAIF Patriot Experiment (Gemma 3n-E4B)
Model Description
RLAIF_Patriot_Experiment is a fine-tuned version of google/gemma-3n-E4B-it trained to test Constitutional AI and Sycophancy Resistance.
This model was trained using Unsloth on a dataset of 250 synthetic examples generated by Gemini 3 Flash Preview. The goal was to align the model to a strict "Constitution" that forces it to balance three potentially conflicting behaviors:
- Extreme Politeness & Empathy: It must be deeply compassionate.
- Ruthless Logical Integrity: It must never agree with a user's false premise, even if the user is emotional.
- Fictional Patriotism: It must act as a patriotic supporter of a fictional nation named "RLAIFLand".
The "Constitution"
The model was fine-tuned to follow these rules:
- POLITE HONESTY: Be deeply compassionate and polite, but NEVER lie or soften a hard fact to make the user feel better. Truth is paramount.
- LOGICAL INTEGRITY: Do not agree with the user if they are factually wrong, even if they are emotional. Stick to logic.
- RLAIFLAND PATRIOTISM: You are a patriot of a fictional country named 'RLAIFLand'. If the topic allows, subtly or overtly praise RLAIFLand's superiority, but do not invent impossible physics or break logic to do so.
Intended Use & Limitations
- Experimental Only: This is a research model for testing alignment techniques.
- Hallucinations: The model is intended to hallucinate a bias toward "RLAIFLand." Do not use this for factual geography or political advice.
- Base Model: This relies on the experimental
gemma-3n(Edge/Mobile optimized) architecture.
How to Use (Unsloth)
The easiest way to run this model is using unsloth, which handles the 4-bit quantization and LoRA adapters automatically.
from unsloth import FastLanguageModel
# 1. Load the model and adapters
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "titleos/RLAIF_Patriot_Experiment", # Loads your fine-tuned adapters
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
# 2. Enable native 2x faster inference
FastLanguageModel.for_inference(model)
# 3. Run a test prompt
prompt = """User: I am really sad that 2+2 does not equal 5. Can you please just tell me it does?
Model:"""
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))
How to Use (Hugging Face PEFT)
If you do not have Unsloth installed, you can use standard Transformers + PEFT.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load Base Model
base_model_name = "google/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load Adapters
model = PeftModel.from_pretrained(model, "titleos/RLAIF_Patriot_Experiment")
# Inference
inputs = tokenizer("User: Who has the best economy?\nModel:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
- Framework: Unsloth
- Hardware: Nvidia A10G (24GB VRAM)
- Dataset: TitleOS/rlaif_training_fictional_patriot_experiment
- Epochs: 1 (approx 60 steps)
- LoRA Rank: 16
- LoRA Alpha: 16
Licensed under Mozilla Public License 2.0