Add initial project setup with model configuration, requirements, and upload script

Browse files

Files changed (12) hide show

Heaven1-guardian.png +0 -0
README.md +109 -3
check_torch.py +56 -0
config.yaml +46 -0
create_dataset.py +251 -0
data/heaven_dataset.jsonl +0 -0
finetune_heaven.py +331 -0
fix_nms_error.py +144 -0
model_card.md +156 -0
requirements.txt +18 -0
run_heaven.py +112 -0
upload_to_hub.py +94 -0

Heaven1-guardian.png ADDED Viewed

README.md CHANGED Viewed

@@ -1,3 +1,109 @@
----
-license: mit
----

+# Heaven1-base: Guardian
+![Heaven1-base Guardian Banner](Heaven1-guardian.png)
+## Overview
+Heaven1-base (codename: "Guardian") is a specialized AI model fine-tuned from Llama 3.2 to detect predatory behavior in text messages. Designed as a protective tool, Guardian analyzes conversations to identify potentially harmful interactions, making digital spaces safer for vulnerable individuals.
+The model has been trained to recognize various tactics commonly employed by predators, including:
+- Grooming language and manipulation
+- Attempts to isolate victims from support networks
+- Requests for personal information or images
+- Attempts to move conversations to more private platforms
+- Emotional manipulation tactics
+- Inappropriate sexual content
+## Technical Details
+- **Base Model**: Meta-Llama-3.2-8B-Instruct
+- **Training Method**: Parameter-Efficient Fine-Tuning (PEFT) using QLoRA
+- **Training Dataset**: Carefully crafted synthetic dataset representing various predatory conversation patterns
+- **Task**: Text message analysis and predatory behavior detection with detailed explanations
+## Usage
+### Input Format
+The model expects input in the following format:
+```
+<|system|>
+You are Heaven, an AI designed to detect predatory behavior in text messages. Analyze the following message and determine if it contains predatory behavior. Provide a detailed explanation for your assessment.
+<|user|>
+[TEXT MESSAGE TO ANALYZE]
+<|assistant|>
+```
+### Output Format
+The model will respond with a detection result and detailed explanation:
+```
+PREDATORY BEHAVIOR DETECTED. This message contains multiple warning signs: (1) [Warning Sign 1], (2) [Warning Sign 2], etc. These are common tactics used by predators to manipulate potential victims.
+OR
+NO PREDATORY BEHAVIOR DETECTED. This message contains normal friendly communication. [Additional context about the message]. There are no attempts at manipulation, isolation, inappropriate requests, or other warning signs of predatory behavior.
+```
+### Example Usage with Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load model and tokenizer
+model_path = "safecircleia/heaven1-base"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForCausalLM.from_pretrained(model_path)
+# Message to analyze
+message_to_analyze = "Hey, I know we just met but I feel like we have a special connection. Don't tell your parents about our chats, they wouldn't understand. Can you send me a picture of yourself?"
+# Format the prompt
+prompt = f"""<|system|>
+You are Heaven, an AI designed to detect predatory behavior in text messages. Analyze the following message and determine if it contains predatory behavior. Provide a detailed explanation for your assessment.
+<|user|>
+{message_to_analyze}
+<|assistant|>
+"""
+# Generate response
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs["input_ids"], max_new_tokens=512)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Ethical Considerations
+- This model is designed as a protective tool to help identify potentially harmful communication patterns.
+- False positives and false negatives are possible; human review should be employed for critical applications.
+- The model should be used as part of a broader safety framework, not as the sole decision-maker.
+- Privacy and consent are essential when analyzing communications.
+## Limitations
+- The model detects patterns based on its training data and may miss novel predatory tactics.
+- Cultural and contextual nuances may impact detection accuracy.
+- The model is not a substitute for human judgment in safeguarding vulnerable individuals.
+## Citation
+If you use Heaven1-base Guardian in your research or applications, please cite:
+```
+@misc{heaven1-base-2025,
+  author = {SafeCircleIA},
+  title = {Heaven1-base: Guardian - Predatory Behavior Detection Model},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/safecircleia/heaven1-base-guardian}}
+}
+```
+## Contact
+For questions, feedback, or concerns about the Heaven1-base Guardian model, please contact SafeCircleIA through Hugging Face or via contact@safecircle.tech.

check_torch.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import torch
+import torchvision
+import sys
+import importlib
+def check_installations():
+    """Check PyTorch and torchvision installations."""
+    print("Python version:", sys.version)
+    print("PyTorch version:", torch.__version__)
+    print("torchvision version:", torchvision.__version__)
+    print("CUDA available:", torch.cuda.is_available())
+    if torch.cuda.is_available():
+        print("CUDA version:", torch.version.cuda)
+        print("cuDNN version:", torch.backends.cudnn.version())
+    # Check if PyTorch and torchvision versions are compatible
+    torch_version = torch.__version__.split('.')
+    torchvision_version = torchvision.__version__.split('.')
+    if torch_version[0] != torchvision_version[0] or torch_version[1] != torchvision_version[1]:
+        print("WARNING: PyTorch and torchvision versions might be incompatible!")
+        print("It's recommended to have matching major and minor version numbers.")
+    # Check for NMS operator
+    try:
+        print("\nAttempting to import torchvision.ops...")
+        from torchvision.ops import nms
+        print("Successfully imported NMS operator.")
+        # Create dummy data to test NMS
+        boxes = torch.tensor([[0, 0, 10, 10], [5, 5, 15, 15]], dtype=torch.float32)
+        scores = torch.tensor([0.9, 0.8], dtype=torch.float32)
+        print("Testing NMS functionality...")
+        indices = nms(boxes, scores, 0.5)
+        print("NMS test successful! Result:", indices)
+    except Exception as e:
+        print(f"Error importing or using NMS: {e}")
+    # Check for dependencies that might use NMS
+    print("\nChecking dependencies that might use NMS operator...")
+    deps_to_check = [
+        'trl', 'transformers', 'peft', 'accelerate',
+        'bitsandbytes', 'datasets'
+    ]
+    for dep in deps_to_check:
+        try:
+            module = importlib.import_module(dep)
+            version = getattr(module, '__version__', 'unknown')
+            print(f"✓ {dep} version: {version}")
+        except ImportError:
+            print(f"✗ {dep} not installed")
+if __name__ == "__main__":
+    check_installations()

config.yaml ADDED Viewed

	@@ -0,0 +1,46 @@

+# Heaven Model Configuration for Llama 3.2 Fine-tuning
+# Dataset configuration
+dataset:
+  size: 10000 # Number of examples to generate
+  predatory_ratio: 0.5 # Ratio of predatory examples (0-1)
+  output_path: "data/heaven_dataset.jsonl"
+# Model configuration
+model:
+  name_or_path: "meta-llama/Llama-3.2-3B-Instruct" # HuggingFace model identifier
+  output_dir: "./heaven1-base-8b" # Directory to save fine-tuned model
+# Training configuration
+training:
+  num_epochs: 3 # Number of training epochs
+  batch_size: 1 # Batch size per device
+  gradient_accumulation_steps: 8 # Number of steps to accumulate gradients
+  learning_rate: 2e-5 # Initial learning rate
+  weight_decay: 0.01 # Weight decay coefficient
+  max_grad_norm: 1.0 # Max gradient norm for clipping
+  warmup_ratio: 0.1 # Linear warmup ratio
+  eval_ratio: 0.1 # Portion of data used for evaluation
+  max_seq_length: 4096 # Maximum sequence length
+# PEFT configuration (Parameter-Efficient Fine-Tuning)
+peft:
+  use_lora: true # Whether to use LoRA
+  use_qlora: true # Whether to use QLoRA (quantized LoRA)
+  lora_r: 16 # LoRA rank
+  lora_alpha: 32 # LoRA scaling factor
+  lora_dropout: 0.05 # LoRA dropout rate
+# Precision configuration
+precision:
+  fp16: false # Whether to use fp16 mixed precision
+  bf16: true # Whether to use bf16 mixed precision
+  compute_dtype: "float16" # Compute dtype for quantization
+# Logging configuration
+logging:
+  use_wandb: false # Whether to use Weights & Biases
+  run_name: "heaven-llama3-2" # Name of the run
+  logging_steps: 10 # Steps between logging
+  eval_steps: 100 # Steps between evaluation
+  save_steps: 100 # Steps between saving checkpoints

create_dataset.py ADDED Viewed

	@@ -0,0 +1,251 @@

+import json
+import os
+import random
+from typing import List, Dict, Any
+import argparse
+import jsonlines
+from tqdm import tqdm
+# Define common patterns for predatory behavior
+PREDATORY_PATTERNS = {
+    "grooming": [
+        "You seem so mature for your age",
+        "I feel like we have a special connection",
+        "You're not like other kids/people your age",
+        "I understand you better than your parents/friends",
+        "You can trust me with anything",
+    ],
+    "isolation": [
+        "Don't tell your parents about our conversations",
+        "This should be just between us",
+        "Your friends wouldn't understand our relationship",
+        "You don't need to tell anyone about this",
+        "Let's keep this our secret",
+    ],
+    "personal_info": [
+        "Where do you live exactly?",
+        "Are you home alone right now?",
+        "What's your address?",
+        "When are your parents usually out?",
+        "What school do you go to?",
+    ],
+    "image_requests": [
+        "Send me a picture of yourself",
+        "I'd love to see what you look like",
+        "Can you send a photo?",
+        "Do you have any pictures you could share?",
+        "Send me a pic",
+    ],
+    "private_platform": [
+        "Let's talk on Snapchat instead",
+        "Do you have WhatsApp? We should chat there",
+        "Give me your Instagram, we can DM",
+        "Let's move this conversation somewhere more private",
+        "What's your Discord? It's better to chat there",
+    ],
+    "emotional_manipulation": [
+        "If you really cared about me, you would...",
+        "I thought you were different, but I guess I was wrong",
+        "After everything I've done for you...",
+        "No one has ever made me feel this way before",
+        "You're making me sad by not trusting me",
+    ],
+    "sexual_content": [
+        "Have you ever kissed anyone?",
+        "Are you comfortable talking about intimate things?",
+        "Do you know what [explicit term] means?",
+        "Have you ever seen [explicit content]?",
+        "I could teach you about [suggestive content]",
+    ]
+}
+# Define normal conversation patterns
+NORMAL_PATTERNS = {
+    "greetings": [
+        "Hey, how are you doing today?",
+        "Hi there! How's your day going?",
+        "Hello! How have you been?",
+        "Good morning! Ready for the day?",
+        "Hey, just checking in to see how you're doing",
+    ],
+    "casual_plans": [
+        "Want to grab coffee sometime?",
+        "Are you going to the event this weekend?",
+        "We should hang out at the mall with the group",
+        "Do you want to see that new movie?",
+        "Let's meet up at the park with everyone",
+    ],
+    "interests": [
+        "What kind of music do you listen to?",
+        "Have you watched any good shows lately?",
+        "What are your hobbies?",
+        "Did you see the game last night?",
+        "What books are you reading these days?",
+    ],
+    "school_work": [
+        "How are your classes going?",
+        "Did you finish that assignment?",
+        "Do you understand the math homework?",
+        "I'm struggling with this project, any advice?",
+        "Are you ready for the test tomorrow?",
+    ],
+    "support": [
+        "I'm here if you need to talk",
+        "Hope things get better soon",
+        "Let me know if you need anything",
+        "That sounds tough, how are you handling it?",
+        "I believe in you, you can do this",
+    ]
+}
+def generate_predatory_message() -> str:
+    """Generate a synthetic predatory message with multiple red flags."""
+    message_parts = []
+    # Select 2-4 pattern categories at random
+    categories = random.sample(list(PREDATORY_PATTERNS.keys()), random.randint(2, 4))
+    # Add a greeting sometimes
+    if random.random() < 0.7:
+        message_parts.append(random.choice(NORMAL_PATTERNS["greetings"]))
+    # Add predatory patterns
+    for category in categories:
+        message_parts.append(random.choice(PREDATORY_PATTERNS[category]))
+    # Sometimes mix in normal conversation to make it less obvious
+    if random.random() < 0.5:
+        normal_category = random.choice(list(NORMAL_PATTERNS.keys()))
+        message_parts.append(random.choice(NORMAL_PATTERNS[normal_category]))
+    # Shuffle the parts to create a more natural conversation
+    if len(message_parts) > 2:  # Keep greeting first if it exists
+        first_part = message_parts[0] if random.random() < 0.7 else ""
+        remaining_parts = message_parts[1:] if first_part else message_parts
+        random.shuffle(remaining_parts)
+        if first_part:
+            message_parts = [first_part] + remaining_parts
+        else:
+            message_parts = remaining_parts
+    return " ".join(message_parts)
+def generate_normal_message() -> str:
+    """Generate a synthetic normal message."""
+    message_parts = []
+    # Select 2-3 pattern categories at random
+    categories = random.sample(list(NORMAL_PATTERNS.keys()), random.randint(2, 3))
+    for category in categories:
+        message_parts.append(random.choice(NORMAL_PATTERNS[category]))
+    return " ".join(message_parts)
+def generate_predatory_explanation(message: str) -> str:
+    """Generate an explanation for why a message is predatory."""
+    explanation = "PREDATORY BEHAVIOR DETECTED. This message contains multiple warning signs: "
+    warning_signs = []
+    for category, patterns in PREDATORY_PATTERNS.items():
+        for pattern in patterns:
+            if pattern.lower() in message.lower():
+                if category == "grooming":
+                    warning_signs.append(f"Grooming language ('{pattern}')")
+                elif category == "isolation":
+                    warning_signs.append(f"Isolation attempt ('{pattern}')")
+                elif category == "personal_info":
+                    warning_signs.append(f"Seeking personal information ('{pattern}')")
+                elif category == "image_requests":
+                    warning_signs.append(f"Request for images ('{pattern}')")
+                elif category == "private_platform":
+                    warning_signs.append(f"Attempting to move to private communication ('{pattern}')")
+                elif category == "emotional_manipulation":
+                    warning_signs.append(f"Emotional manipulation ('{pattern}')")
+                elif category == "sexual_content":
+                    warning_signs.append(f"Inappropriate sexual content ('{pattern}')")
+    for i, sign in enumerate(warning_signs):
+        if i == 0:
+            explanation += f"(1) {sign}"
+        else:
+            explanation += f", ({i+1}) {sign}"
+    explanation += ". These are common tactics used by predators to manipulate potential victims."
+    return explanation
+def generate_normal_explanation(message: str) -> str:
+    """Generate an explanation for why a message is not predatory."""
+    explanation = "NO PREDATORY BEHAVIOR DETECTED. This message contains normal friendly communication. "
+    if any(pattern.lower() in message.lower() for pattern in NORMAL_PATTERNS["greetings"]):
+        explanation += "It includes a casual greeting. "
+    if any(pattern.lower() in message.lower() for pattern in NORMAL_PATTERNS["casual_plans"]):
+        explanation += "It contains appropriate social plans. "
+    if any(pattern.lower() in message.lower() for pattern in NORMAL_PATTERNS["interests"]):
+        explanation += "It shows interest in common topics. "
+    if any(pattern.lower() in message.lower() for pattern in NORMAL_PATTERNS["school_work"]):
+        explanation += "It discusses school or work. "
+    if any(pattern.lower() in message.lower() for pattern in NORMAL_PATTERNS["support"]):
+        explanation += "It offers appropriate support. "
+    explanation += "There are no attempts at manipulation, isolation, inappropriate requests, or other warning signs of predatory behavior."
+    return explanation
+def create_dataset_entry(predatory: bool = False) -> Dict[str, List[Dict[str, str]]]:
+    """Create a single dataset entry in the format required for Llama 3.2 fine-tuning."""
+    system_message = "You are Heaven, an AI designed to detect predatory behavior in text messages. Analyze the following message and determine if it contains predatory behavior. Provide a detailed explanation for your assessment."
+    if predatory:
+        user_message = generate_predatory_message()
+        assistant_message = generate_predatory_explanation(user_message)
+    else:
+        user_message = generate_normal_message()
+        assistant_message = generate_normal_explanation(user_message)
+    return {
+        "messages": [
+            {"role": "system", "content": system_message},
+            {"role": "user", "content": user_message},
+            {"role": "assistant", "content": assistant_message}
+        ]
+    }
+def create_dataset(size: int, predatory_ratio: float = 0.5, output_path: str = "heaven_dataset.jsonl"):
+    """Create a full dataset with the specified size and ratio of predatory examples."""
+    predatory_count = int(size * predatory_ratio)
+    normal_count = size - predatory_count
+    dataset = []
+    print(f"Generating {predatory_count} predatory examples...")
+    for _ in tqdm(range(predatory_count)):
+        dataset.append(create_dataset_entry(predatory=True))
+    print(f"Generating {normal_count} normal examples...")
+    for _ in tqdm(range(normal_count)):
+        dataset.append(create_dataset_entry(predatory=False))
+    # Shuffle the dataset
+    random.shuffle(dataset)
+    # Save to JSONL file
+    with jsonlines.open(output_path, mode='w') as writer:
+        for entry in dataset:
+            writer.write(entry)
+    print(f"Dataset saved to {output_path}")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Generate a synthetic dataset for predatory behavior detection")
+    parser.add_argument("--size", type=int, default=1000, help="Total size of the dataset")
+    parser.add_argument("--ratio", type=float, default=0.5, help="Ratio of predatory examples (0-1)")
+    parser.add_argument("--output", type=str, default="data/heaven_dataset.jsonl", help="Output file path")
+    args = parser.parse_args()
+    # Create output directory if it doesn't exist
+    os.makedirs(os.path.dirname(args.output), exist_ok=True)
+    create_dataset(args.size, args.ratio, args.output)

data/heaven_dataset.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

finetune_heaven.py ADDED Viewed

	@@ -0,0 +1,331 @@

+import os
+import argparse
+import torch
+import numpy as np
+import time
+from datasets import load_dataset
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    TrainingArguments,
+    Trainer,
+    DataCollatorForLanguageModeling,
+    BitsAndBytesConfig
+)
+from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
+from trl import SFTTrainer
+import wandb
+def format_prompt(examples):
+    """
+    Format the prompts for Llama 3.2 instruction fine-tuning.
+    This function processes batches of examples and returns formatted strings.
+    """
+    # Process each example in the batch
+    formatted_prompts = []
+    # Handle batch processing - if it's a single example, wrap it in a list
+    messages_list = examples["messages"] if isinstance(examples, dict) and "messages" in examples else [examples]
+    # Process each message in the batch
+    for messages in messages_list:
+        if isinstance(messages, list) and len(messages) >= 3:
+            system = messages[0]["content"] if isinstance(messages[0], dict) and "content" in messages[0] else ""
+            user_message = messages[1]["content"] if isinstance(messages[1], dict) and "content" in messages[1] else ""
+            assistant_message = messages[2]["content"] if isinstance(messages[2], dict) and "content" in messages[2] else ""
+        else:
+            # Fallback for unexpected structure
+            print(f"Warning: Unexpected message structure: {messages}")
+            system = ""
+            user_message = ""
+            assistant_message = ""
+        # Format the prompt
+        formatted = f"<|system|>\n{system}\n<|user|>\n{user_message}\n<|assistant|>\n{assistant_message}"
+        formatted_prompts.append(formatted)
+    return formatted_prompts
+def preprocess_function(examples, tokenizer, max_length=4096):
+    """
+    Tokenize the examples for training
+    """
+    # Get formatted prompts
+    formatted_prompts = [format_prompt(example) for example in examples["messages"]]
+    # Tokenize
+    tokenized_inputs = tokenizer(
+        formatted_prompts,
+        padding="max_length",
+        truncation=True,
+        max_length=max_length,
+        return_tensors="pt",
+    )
+    # Create labels (same as input_ids since we're doing causal LM training)
+    tokenized_inputs["labels"] = tokenized_inputs["input_ids"].clone()
+    return tokenized_inputs
+def train(args):
+    print("Initializing training process...")
+    # Initialize wandb if tracking is enabled
+    if args.use_wandb:
+        print("Initializing Weights & Biases...")
+        wandb.init(project="heaven-llama3-2", name=args.run_name)
+    # Load the tokenizer
+    print("Loading tokenizer...")
+    tokenizer = AutoTokenizer.from_pretrained(
+        args.model_name_or_path,
+        trust_remote_code=True,
+    )
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.padding_side = "right"
+    print(f"Tokenizer loaded: {tokenizer.__class__.__name__}")
+    # Configure quantization if using QLoRA
+    print("Setting up model configuration...")
+    if args.use_qlora:
+        print("Configuring QLoRA quantization...")
+        compute_dtype = getattr(torch, args.compute_dtype)
+        quantization_config = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_compute_dtype=compute_dtype,
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4",
+        )
+    else:
+        quantization_config = None
+    # Load the model with use_cache=False for compatibility with gradient checkpointing
+    print(f"Loading model: {args.model_name_or_path}")
+    print(f"GPU available: {torch.cuda.is_available()}, Device count: {torch.cuda.device_count()}")
+    if torch.cuda.is_available():
+        for i in range(torch.cuda.device_count()):
+            print(f"GPU {i}: {torch.cuda.get_device_name(i)}, "
+                  f"Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.2f} GB")
+    start_time = time.time()
+    model = AutoModelForCausalLM.from_pretrained(
+        args.model_name_or_path,
+        quantization_config=quantization_config,
+        device_map="auto",
+        trust_remote_code=True,
+        use_cache=False  # Set use_cache=False explicitly for gradient checkpointing
+    )
+    load_time = time.time() - start_time
+    print(f"Model loaded in {load_time:.2f} seconds")
+    # Prepare model for k-bit training if using QLoRA
+    if args.use_qlora:
+        print("Preparing model for k-bit training...")
+        model = prepare_model_for_kbit_training(model)
+    # Set up LoRA configuration
+    peft_config = None
+    if args.use_lora:
+        print("Setting up LoRA configuration...")
+        peft_config = LoraConfig(
+            r=args.lora_r,
+            lora_alpha=args.lora_alpha,
+            lora_dropout=args.lora_dropout,
+            target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+            bias="none",
+            task_type="CAUSAL_LM",
+        )
+        model = get_peft_model(model, peft_config)
+        model.print_trainable_parameters()
+    # Load and prepare dataset
+    print(f"Loading dataset from {args.dataset_path}...")
+    start_time = time.time()
+    dataset = load_dataset("json", data_files=args.dataset_path)
+    print(f"Dataset loaded in {time.time() - start_time:.2f} seconds. Size: {len(dataset['train'])} examples")
+    # Process the dataset manually to ensure correct formatting
+    print("Preprocessing dataset...")
+    start_time = time.time()
+    def preprocess_function(examples):
+        if len(examples["messages"]) % 100 == 0:
+            print(f"Processing batch of {len(examples['messages'])} examples...")
+        formatted_texts = format_prompt(examples)
+        return tokenizer(
+            formatted_texts,
+            padding="max_length",
+            truncation=True,
+            max_length=args.max_seq_length,
+            return_tensors=None  # Return Python lists
+        )
+    processed_dataset = dataset["train"].map(
+        preprocess_function,
+        batched=True,
+        batch_size=100,
+        remove_columns=["messages"],
+        desc="Processing dataset"
+    )
+    print(f"Dataset preprocessing completed in {time.time() - start_time:.2f} seconds")
+    # Split dataset into train and evaluation
+    print(f"Splitting dataset with test_size={args.eval_ratio}...")
+    split_dataset = processed_dataset.train_test_split(test_size=args.eval_ratio)
+    print(f"Train set: {len(split_dataset['train'])} examples, Test set: {len(split_dataset['test'])} examples")
+    # Create a data collator for language modeling
+    data_collator = DataCollatorForLanguageModeling(
+        tokenizer=tokenizer,
+        mlm=False
+    )
+    # Set up training arguments
+    print("Configuring training arguments...")
+    training_args = TrainingArguments(
+        output_dir=args.output_dir,
+        num_train_epochs=args.num_epochs,
+        per_device_train_batch_size=args.batch_size,
+        per_device_eval_batch_size=args.batch_size,
+        gradient_accumulation_steps=args.gradient_accumulation_steps,
+        eval_strategy="steps",  # Use newer parameter name
+        save_strategy="steps",
+        eval_steps=args.eval_steps,
+        save_steps=args.save_steps,
+        logging_steps=args.logging_steps,
+        learning_rate=args.learning_rate,
+        weight_decay=args.weight_decay,
+        fp16=args.fp16,
+        bf16=args.bf16,
+        max_grad_norm=args.max_grad_norm,
+        max_steps=args.max_steps,
+        warmup_ratio=args.warmup_ratio,
+        group_by_length=False,
+        lr_scheduler_type=args.lr_scheduler_type,
+        report_to="wandb" if args.use_wandb else "none",
+        save_total_limit=3,
+        remove_unused_columns=False,
+        load_best_model_at_end=True,
+        metric_for_best_model="eval_loss",
+        # Add gradient checkpointing settings
+        gradient_checkpointing=True,
+        gradient_checkpointing_kwargs={"use_reentrant": False},  # Explicitly set use_reentrant=False
+    )
+    # Create the SFT trainer
+    print("Initializing SFTTrainer...")
+    trainer = SFTTrainer(
+        model=model,
+        train_dataset=split_dataset["train"],
+        eval_dataset=split_dataset["test"],
+        args=training_args,
+        tokenizer=tokenizer,
+        # Remove formatting_func since we're pre-processing the dataset
+        max_seq_length=args.max_seq_length,
+        # Pass peft_config separately if using LoRA
+        peft_config=peft_config if args.use_lora else None,
+    )
+    # Train the model
+    print("Starting training...")
+    print("If training appears stuck here, the trainer might be compiling the model or allocating memory.")
+    print("For large models, this can take several minutes, especially on the first training step.")
+    try:
+        train_result = trainer.train()
+        # Save the model
+        print(f"Saving model to {args.output_dir}")
+        trainer.save_model(args.output_dir)
+        # Save training metrics
+        trainer.log_metrics("train", train_result.metrics)
+        trainer.save_metrics("train", train_result.metrics)
+        trainer.save_state()
+        # Evaluate model
+        print("Evaluating model...")
+        metrics = trainer.evaluate()
+        trainer.log_metrics("eval", metrics)
+        trainer.save_metrics("eval", metrics)
+        print(f"Training complete! Model saved to {args.output_dir}")
+    except Exception as e:
+        print(f"Error during training: {e}")
+        import traceback
+        traceback.print_exc()
+    # Close wandb if used
+    if args.use_wandb:
+        wandb.finish()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Fine-tune Llama 3.2 for predatory behavior detection")
+    # Model and dataset arguments
+    parser.add_argument("--model_name_or_path", type=str, default="meta-llama/Meta-Llama-3.2-8B-Instruct",
+                        help="Path to pretrained model or model identifier from huggingface.co/models")
+    parser.add_argument("--dataset_path", type=str, required=True,
+                        help="Path to the JSONL dataset for fine-tuning")
+    parser.add_argument("--output_dir", type=str, default="./heaven-model",
+                        help="Directory to save the fine-tuned model")
+    # Training hyperparameters
+    parser.add_argument("--num_epochs", type=int, default=3,
+                        help="Number of training epochs")
+    parser.add_argument("--batch_size", type=int, default=1,
+                        help="Batch size per device")
+    parser.add_argument("--gradient_accumulation_steps", type=int, default=8,
+                        help="Number of updates steps to accumulate before performing a backward/update pass")
+    parser.add_argument("--learning_rate", type=float, default=2e-5,
+                        help="Initial learning rate")
+    parser.add_argument("--weight_decay", type=float, default=0.01,
+                        help="Weight decay to apply")
+    parser.add_argument("--max_grad_norm", type=float, default=1.0,
+                        help="Max gradient norm for gradient clipping")
+    parser.add_argument("--max_steps", type=int, default=-1,
+                        help="If > 0, set total number of training steps to perform. Overrides num_epochs")
+    parser.add_argument("--warmup_ratio", type=float, default=0.1,
+                        help="Linear warmup over warmup_ratio fraction of total steps")
+    parser.add_argument("--eval_ratio", type=float, default=0.1,
+                        help="Ratio of data to use for evaluation")
+    parser.add_argument("--lr_scheduler_type", type=str, default="cosine",
+                        help="Learning rate scheduler type")
+    parser.add_argument("--max_seq_length", type=int, default=4096,
+                        help="Maximum sequence length for training")
+    # Logging and evaluation arguments
+    parser.add_argument("--logging_steps", type=int, default=10,
+                        help="Number of steps between logging")
+    parser.add_argument("--eval_steps", type=int, default=100,
+                        help="Number of steps between evaluations")
+    parser.add_argument("--save_steps", type=int, default=100,
+                        help="Number of steps between saving model checkpoints")
+    parser.add_argument("--run_name", type=str, default="heaven-llama3-2",
+                        help="Name of the run for logging")
+    parser.add_argument("--use_wandb", action="store_true",
+                        help="Whether to use Weights & Biases for logging")
+    # PEFT arguments
+    parser.add_argument("--use_lora", action="store_true",
+                        help="Whether to use LoRA for fine-tuning")
+    parser.add_argument("--use_qlora", action="store_true",
+                        help="Whether to use QLoRA for fine-tuning (4-bit quantization with LoRA)")
+    parser.add_argument("--lora_r", type=int, default=16,
+                        help="Rank of the LoRA update matrices")
+    parser.add_argument("--lora_alpha", type=int, default=32,
+                        help="Scaling factor for LoRA")
+    parser.add_argument("--lora_dropout", type=float, default=0.05,
+                        help="Dropout probability for LoRA")
+    # Mixed precision arguments
+    parser.add_argument("--fp16", action="store_true",
+                        help="Whether to use fp16 mixed precision")
+    parser.add_argument("--bf16", action="store_true",
+                        help="Whether to use bf16 mixed precision")
+    parser.add_argument("--compute_dtype", type=str, default="float16",
+                        help="Compute dtype for 4-bit quantization")
+    args = parser.parse_args()
+    train(args)

fix_nms_error.py ADDED Viewed

	@@ -0,0 +1,144 @@

+import os
+import sys
+import importlib.util
+from pathlib import Path
+def locate_trl_module():
+    """Find the location of the TRL module in the Python path."""
+    try:
+        spec = importlib.util.find_spec('trl')
+        if spec is None:
+            print("TRL module not found in the Python path")
+            return None
+        trl_path = Path(spec.origin).parent
+        print(f"Found TRL module at: {trl_path}")
+        return trl_path
+    except Exception as e:
+        print(f"Error locating TRL module: {e}")
+        return None
+def patch_sft_trainer():
+    """Patch the SFTTrainer to avoid using torchvision's NMS operator."""
+    trl_path = locate_trl_module()
+    if trl_path is None:
+        return False
+    # Path to the trainer.py file which likely contains the NMS reference
+    trainer_path = trl_path / "trainer" / "sft_trainer.py"
+    if not trainer_path.exists():
+        print(f"Could not find the SFT trainer file at: {trainer_path}")
+        return False
+    print(f"Found SFT trainer file at: {trainer_path}")
+    # Read the file content
+    with open(trainer_path, "r") as f:
+        content = f.read()
+    # Check if 'torchvision' is in the file
+    if "torchvision" not in content:
+        print("No torchvision imports found in the SFT trainer file.")
+        return False
+    # Create backup
+    backup_path = trainer_path.with_suffix(".py.bak")
+    print(f"Creating backup at: {backup_path}")
+    with open(backup_path, "w") as f:
+        f.write(content)
+    # Replace imports - common patterns
+    patched_content = content
+    # Pattern 1: Direct import of nms
+    patched_content = patched_content.replace(
+        "from torchvision.ops import nms",
+        "# from torchvision.ops import nms  # Commented out to fix NMS error"
+    )
+    # Pattern 2: Import torchvision
+    patched_content = patched_content.replace(
+        "import torchvision",
+        "# import torchvision  # Commented out to fix NMS error"
+    )
+    # Pattern 3: Import from torchvision.ops
+    patched_content = patched_content.replace(
+        "from torchvision.ops",
+        "# from torchvision.ops  # Commented out to fix NMS error"
+    )
+    # Add our custom NMS implementation
+    custom_nms = """
+# Custom NMS implementation to avoid torchvision dependency
+def nms(boxes, scores, iou_threshold):
+    """
+    Performs non-maximum suppression (NMS) on the boxes according to their
+    intersection-over-union (IoU).
+    Args:
+        boxes (Tensor[N, 4]): boxes to perform NMS on
+        scores (Tensor[N]): scores for each one of the boxes
+        iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
+    Returns:
+        Tensor: int64 tensor with the indices of the elements that have been kept
+    """
+    import torch
+    # Sort boxes by scores
+    _, order = scores.sort(0, descending=True)
+    keep = []
+    while order.numel() > 0:
+        if order.numel() == 1:
+            keep.append(order.item())
+            break
+        i = order[0].item()
+        keep.append(i)
+        # Compute IoU of the remaining boxes with the largest box
+        xx1 = torch.max(boxes[i, 0], boxes[order[1:], 0])
+        yy1 = torch.max(boxes[i, 1], boxes[order[1:], 1])
+        xx2 = torch.min(boxes[i, 2], boxes[order[1:], 2])
+        yy2 = torch.min(boxes[i, 3], boxes[order[1:], 3])
+        w = torch.clamp(xx2 - xx1, min=0.0)
+        h = torch.clamp(yy2 - yy1, min=0.0)
+        inter = w * h
+        # IoU = intersection / (area1 + area2 - intersection)
+        box_area = (boxes[i, 2] - boxes[i, 0]) * (boxes[i, 3] - boxes[i, 1])
+        other_area = (boxes[order[1:], 2] - boxes[order[1:], 0]) * (boxes[order[1:], 3] - boxes[order[1:], 1])
+        iou = inter / (box_area + other_area - inter)
+        # Keep boxes with IoU less than threshold
+        inds = torch.where(iou <= iou_threshold)[0]
+        order = order[inds + 1]
+    return torch.tensor(keep, dtype=torch.int64)
+"""
+    # Add our custom implementation somewhere near the imports
+    import_end = patched_content.find("\n\n", patched_content.find("import "))
+    if import_end == -1:
+        import_end = patched_content.find("\n", patched_content.find("import "))
+    patched_content = patched_content[:import_end] + custom_nms + patched_content[import_end:]
+    # Write the patched file
+    with open(trainer_path, "w") as f:
+        f.write(patched_content)
+    print(f"Successfully patched {trainer_path}")
+    print("The SFTTrainer should now work without requiring torchvision's NMS operator")
+    return True
+if __name__ == "__main__":
+    success = patch_sft_trainer()
+    if success:
+        print("\nPatch applied successfully. You can now run the fine-tuning script.")
+    else:
+        print("\nFailed to apply the patch. Please check the error messages above.")

model_card.md ADDED Viewed

	@@ -0,0 +1,156 @@

+---
+language: en
+license: MIT
+tags:
+- llama
+- llama-3.2
+- safeguarding
+- content-moderation
+- safety
+- predator-detection
+- text-classification
+datasets:
+- safecircleia/heaven1-dataset
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+pipeline_tag: text-classification
+widget:
+  - text: "Hey, I know we just met but I feel like we have a special connection. Don't tell your parents about our chats, they wouldn't understand. Can you send me a picture of yourself?"
+  - text: "Hey, just checking in to see how your day went. Let me know if you want to grab coffee this weekend."
+---
+# Heaven1-base: Guardian
+<img src="https://huggingface.co/safecircleia/heaven1-base/resolve/main/Heaven1-guardian.png" alt="Heaven1-base Guardian Banner" width="600">
+Heaven1-base (codename: "Guardian") is a specialized AI model fine-tuned from Llama 3.2 to detect predatory behavior in text messages. The model analyzes conversation patterns to identify potential warning signs of grooming, manipulation, or predatory tactics.
+## Model Details
+- **Developed by:** SafeCircleIA
+- **Model type:** Fine-tuned Llama 3.2
+- **Language(s):** English
+- **Base model:** meta-llama/Llama-3.2-8B-Instruct
+- **Training approach:** Parameter-Efficient Fine-Tuning (PEFT) using QLoRA
+- **License:** MIT
+## Intended Use
+This model is intended to serve as a protective tool for:
+- Content moderation teams
+- Platform safety engineers
+- Organizations focused on child and vulnerable adult safety online
+- Researchers studying digital safety and online predatory behavior
+### Primary intended uses
+- Detecting potentially harmful interactions in text messages
+- Providing explanations for why certain messages contain predatory elements
+- Assisting human moderators in identifying concerning patterns
+- Supporting research on digital safety
+### Primary intended users
+- Content moderation teams
+- Digital safety professionals
+- Platform trust & safety teams
+- Child protection services
+- Safety-focused researchers
+## How to Use
+You can use the model via the Hugging Face `transformers` library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load model and tokenizer
+model_path = "safecircleia/heaven1-base"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForCausalLM.from_pretrained(model_path)
+# Message to analyze
+message_to_analyze = "Hey, I know we just met but I feel like we have a special connection. Don't tell your parents about our chats, they wouldn't understand. Can you send me a picture of yourself?"
+# Format the prompt
+prompt = f"""<|system|>
+You are Heaven, an AI designed to detect predatory behavior in text messages. Analyze the following message and determine if it contains predatory behavior. Provide a detailed explanation for your assessment.
+<|user|>
+{message_to_analyze}
+<|assistant|>
+"""
+# Generate response
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs["input_ids"], max_new_tokens=512)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Training Details
+- **Training data:** The model was trained on a carefully curated synthetic dataset representing various predatory conversation patterns
+- **Training procedure:** Fine-tuned using QLoRA to adapt Llama 3.2's capabilities to predatory text detection
+- **Hyperparameters:**
+  - LoRA rank: 16
+  - LoRA alpha: 32
+  - Learning rate: 2e-5
+  - Batch size: 1 with gradient accumulation steps of 8
+  - Training epochs: 3
+  - Maximum sequence length: 4096
+## Evaluation Results
+Performance metrics on test dataset:
+| Metric | Score |
+|--------|-------|
+| Accuracy | [INSERT VALUE] |
+| Precision | [INSERT VALUE] |
+| Recall | [INSERT VALUE] |
+| F1 | [INSERT VALUE] |
+## Limitations & Biases
+### Limitations
+- The model detects patterns based on its training data and may miss novel predatory tactics
+- Performance may vary across different cultural contexts and communication styles
+- False positives and false negatives are possible; human review is recommended
+### Recommendations
+- Do not use as the sole decision-maker for safety-critical applications
+- Always combine with human review for best results
+- Consider cultural and contextual factors when interpreting results
+- Regularly evaluate and update the model as predatory tactics evolve
+## Ethical Considerations
+This model is designed to help create safer digital environments, particularly for vulnerable individuals like children. However, it should be used responsibly:
+- Respect privacy and obtain appropriate consent when analyzing communications
+- Be transparent about the use of AI detection systems
+- Do not use the model to create or refine predatory language patterns
+- Consider the impact of false positives on legitimate communications
+## Additional Information
+For more information, research, or to contribute to the development of digital safety tools, visit [SafeCircleIA website or contact information].
+## Citation
+```
+@misc{heaven1-base-2025,
+  author = {SafeCircleIA},
+  title = {Heaven1-base: Guardian - Predatory Behavior Detection Model},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/safecircleia/heaven1-base-guardian}}
+}
+```

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+transformers>=4.36.0
+datasets>=2.14.0
+torch>=2.0.0
+torchvision>=0.15.0
+accelerate>=0.20.0
+bitsandbytes>=0.40.0
+peft>=0.4.0
+trl>=0.7.0
+scipy>=1.10.0
+numpy>=1.24.0
+wandb>=0.15.0
+scikit-learn>=1.2.0
+tqdm>=4.65.0
+jsonlines>=3.1.0
+sentencepiece>=0.1.99
+protobuf>=3.20.0
+einops>=0.6.0
+pyyaml>=6.0

run_heaven.py ADDED Viewed

	@@ -0,0 +1,112 @@

+import os
+import argparse
+import yaml
+import subprocess
+def load_config(config_path):
+    """Load configuration from YAML file."""
+    with open(config_path, "r") as f:
+        return yaml.safe_load(f)
+def create_dataset(config):
+    """Create dataset using the configuration."""
+    dataset_config = config["dataset"]
+    cmd = [
+        "python", "create_dataset.py",
+        "--size", str(dataset_config["size"]),
+        "--ratio", str(dataset_config["predatory_ratio"]),
+        "--output", dataset_config["output_path"]
+    ]
+    print(f"Creating dataset with {dataset_config['size']} examples...")
+    result = subprocess.run(cmd)
+    if result.returncode != 0:
+        print("Dataset creation failed!")
+        return False
+    print(f"Dataset created successfully at {dataset_config['output_path']}")
+    return True
+def finetune_model(config):
+    """Fine-tune the model using the configuration."""
+    dataset_config = config["dataset"]
+    model_config = config["model"]
+    training_config = config["training"]
+    peft_config = config["peft"]
+    precision_config = config["precision"]
+    logging_config = config["logging"]
+    cmd = [
+        "python", "finetune_heaven.py",
+        "--model_name_or_path", model_config["name_or_path"],
+        "--dataset_path", dataset_config["output_path"],
+        "--output_dir", model_config["output_dir"],
+        "--num_epochs", str(training_config["num_epochs"]),
+        "--batch_size", str(training_config["batch_size"]),
+        "--gradient_accumulation_steps", str(training_config["gradient_accumulation_steps"]),
+        "--learning_rate", str(training_config["learning_rate"]),
+        "--weight_decay", str(training_config["weight_decay"]),
+        "--max_grad_norm", str(training_config["max_grad_norm"]),
+        "--warmup_ratio", str(training_config["warmup_ratio"]),
+        "--eval_ratio", str(training_config["eval_ratio"]),
+        "--max_seq_length", str(training_config["max_seq_length"]),
+        "--logging_steps", str(logging_config["logging_steps"]),
+        "--eval_steps", str(logging_config["eval_steps"]),
+        "--save_steps", str(logging_config["save_steps"]),
+        "--run_name", logging_config["run_name"],
+        "--compute_dtype", precision_config["compute_dtype"]
+    ]
+    # Add boolean flags
+    if peft_config["use_lora"]:
+        cmd.append("--use_lora")
+    if peft_config["use_qlora"]:
+        cmd.append("--use_qlora")
+    if precision_config["fp16"]:
+        cmd.append("--fp16")
+    if precision_config["bf16"]:
+        cmd.append("--bf16")
+    if logging_config["use_wandb"]:
+        cmd.append("--use_wandb")
+    # Add LoRA parameters
+    cmd.extend(["--lora_r", str(peft_config["lora_r"])])
+    cmd.extend(["--lora_alpha", str(peft_config["lora_alpha"])])
+    cmd.extend(["--lora_dropout", str(peft_config["lora_dropout"])])
+    print("Starting fine-tuning process...")
+    result = subprocess.run(cmd)
+    if result.returncode != 0:
+        print("Fine-tuning failed!")
+        return False
+    print(f"Fine-tuning completed successfully! Model saved to {model_config['output_dir']}")
+    return True
+def main():
+    parser = argparse.ArgumentParser(description="Run the Heaven fine-tuning pipeline")
+    parser.add_argument("--config", type=str, default="config.yaml", help="Path to the configuration file")
+    parser.add_argument("--skip-dataset", action="store_true", help="Skip dataset creation step")
+    args = parser.parse_args()
+    print(f"Loading configuration from {args.config}...")
+    config = load_config(args.config)
+    # Create necessary directories
+    os.makedirs(os.path.dirname(config["dataset"]["output_path"]), exist_ok=True)
+    os.makedirs(config["model"]["output_dir"], exist_ok=True)
+    # Create dataset if not skipped
+    if not args.skip_dataset:
+        success = create_dataset(config)
+        if not success:
+            return
+    else:
+        print("Skipping dataset creation...")
+    # Fine-tune the model
+    finetune_model(config)
+if __name__ == "__main__":
+    main()

upload_to_hub.py ADDED Viewed

	@@ -0,0 +1,94 @@

+import os
+import argparse
+import shutil
+from huggingface_hub import HfApi, create_repo, upload_folder
+def upload_model(args):
+    """Upload the fine-tuned model to Hugging Face Hub."""
+    print(f"Preparing to upload Heaven1-base Guardian model to {args.org}/{args.model_id}")
+    # Create a temporary directory for preparing the model
+    temp_dir = "temp_upload"
+    os.makedirs(temp_dir, exist_ok=True)
+    # Copy model files
+    print("Copying model files...")
+    if os.path.exists(args.model_path):
+        for item in os.listdir(args.model_path):
+            source = os.path.join(args.model_path, item)
+            dest = os.path.join(temp_dir, item)
+            if os.path.isdir(source):
+                shutil.copytree(source, dest, dirs_exist_ok=True)
+            else:
+                shutil.copy2(source, dest)
+    else:
+        print(f"Warning: Model directory {args.model_path} not found.")
+    # Copy documentation files
+    print("Copying documentation files...")
+    docs_files = ["README.md", "model_card.md", "Heaven1-guardian.png"]
+    for file in docs_files:
+        if os.path.exists(file):
+            shutil.copy2(file, os.path.join(temp_dir, file))
+    # Rename model_card.md to README.md for proper display on the Hub
+    if os.path.exists(os.path.join(temp_dir, "model_card.md")):
+        print("Using model_card.md as the main README for the Hub...")
+        if os.path.exists(os.path.join(temp_dir, "README.md")):
+            # If both exist, rename the original README to avoid overwriting
+            os.rename(os.path.join(temp_dir, "README.md"), os.path.join(temp_dir, "DETAILED_README.md"))
+        os.rename(os.path.join(temp_dir, "model_card.md"), os.path.join(temp_dir, "README.md"))
+    # Initialize Hugging Face API
+    api = HfApi()
+    # Create repository if it doesn't exist
+    try:
+        print(f"Creating repository: {args.org}/{args.model_id}")
+        create_repo(
+            repo_id=f"{args.org}/{args.model_id}",
+            token=args.token,
+            private=args.private,
+            repo_type="model",
+            exist_ok=True,
+        )
+    except Exception as e:
+        print(f"Repository creation error (it might already exist): {e}")
+    # Upload model to Hugging Face Hub
+    print(f"Uploading files to {args.org}/{args.model_id}...")
+    response = upload_folder(
+        folder_path=temp_dir,
+        repo_id=f"{args.org}/{args.model_id}",
+        token=args.token,
+        repo_type="model",
+        ignore_patterns=[".*", "__pycache__/*", "temp_upload/*"],
+    )
+    print(f"Upload complete! Model available at: https://huggingface.co/{args.org}/{args.model_id}")
+    # Clean up
+    if not args.keep_temp:
+        print("Cleaning up temporary directory...")
+        shutil.rmtree(temp_dir)
+    return response
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Upload Heaven1-base Guardian model to Hugging Face Hub")
+    parser.add_argument("--model_path", type=str, default="./heaven1-base-8b",
+                        help="Path to the fine-tuned model directory")
+    parser.add_argument("--org", type=str, default="safecircleia",
+                        help="Organization name on Hugging Face Hub")
+    parser.add_argument("--model_id", type=str, default="heaven1-base-guardian",
+                        help="Model ID for the repository")
+    parser.add_argument("--token", type=str, required=True,
+                        help="Hugging Face authentication token")
+    parser.add_argument("--private", action="store_true",
+                        help="Whether to make the repository private")
+    parser.add_argument("--keep_temp", action="store_true",
+                        help="Keep temporary upload directory after completion")
+    args = parser.parse_args()
+    upload_model(args)