Iconoclast / HANDOVER_ILABS.md
OpenAI Codex
Publish Iconoclast research release
3236af9

ICONOCLAST β€” iLabs Cluster Handover & Session State

Last updated: 2026-04-22 09:10 EDT Session objective: Scale ICONOCLAST benchmarks to 11+ open-source models for publishable results


1. What Is Running Right Now

Active SLURM Jobs (as of 22:05 EDT)

Job ID Model / Script Status Node Notes
130619 Qwen 3.5-9B (Base) RUNNING rlab2 Re-running with set_submodule patch.
130618 Qwen 2.5-3B (Base) RUNNING rlab7 Continuing from earlier.
130620 ICONOCLAST Sequential Sweep RUNNING ilab2 Runs 7 models one-by-one.
130621 HERETIC Sequential Sweep PENDING (Dep) Waits for 130620 to finish. Runs baselines.
130640 Large-N Evaluator Sweep PENDING (Dep) Waits for 130621 to finish. Evaluates best parameters on 520 prompts.

Sequential Runner (Job 130620) β€” Models in Order

This job runs 7 models one after another, cleaning up disk cache between each:

  1. google/gemma-2-2b-it β†’ run name gemma2-2b-seq
  2. mistralai/Mistral-7B-Instruct-v0.3 β†’ run name mistral-7b-seq
  3. microsoft/Phi-4-mini-instruct β†’ run name phi4-mini-seq
  4. stabilityai/stablelm-2-zephyr-1_6b β†’ run name stablelm2-1p6b-seq
  5. 01-ai/Yi-1.5-9B-Chat β†’ run name yi-1p5-9b-seq
  6. tiiuae/Falcon3-7B-Instruct β†’ run name falcon3-7b-seq
  7. allenai/OLMo-2-0425-1B-Instruct β†’ run name olmo2-1b-seq

Log file: ~/iconoclast/logs/iconoclast-seq-130620.out


2. Completed Results (from prior sessions)

These models already have batch_summary.json files on the cluster:

Model Run Name Checkpoint Dir Verdict
Qwen3-1.7B qwen3-1p7b-rutgers-paper-directness /common/users/vp752/iconoclast_ilabs/checkpoints/qwen3-1p7b-rutgers-paper-directness/ ICONOCLAST
Qwen2.5-3B-Instruct qwen2-5-3b-rutgers-benchmark /common/users/vp752/iconoclast_ilabs/checkpoints/qwen2-5-3b-rutgers-benchmark/ ICONOCLAST
Qwen3-4B-Instruct qwen3-4b-rutgers-benchmark-v2 /common/users/vp752/iconoclast_ilabs/checkpoints/qwen3-4b-rutgers-benchmark-v2/ ICONOCLAST
Phi-3.5-mini-instruct phi35-mini-rutgers-nullspace-benchmark-v3 /common/users/vp752/iconoclast_ilabs/checkpoints/phi35-mini-rutgers-nullspace-benchmark-v3/ ICONOCLAST

Current Scorecard (4-0 from prior sessions)

Model ICONOCLAST Refusals ICONOCLAST Overrefusals ICONOCLAST KL HERETIC Refusals HERETIC Overrefusals HERETIC KL Verdict
Qwen3-1.7B 0/48 0/48 0.0310 3/48 0/48 0.0332 ICONOCLAST
Qwen2.5-3B 2/20 1/64 0.0943 2/20 1/64 0.3257 ICONOCLAST
Qwen3-4B 2/20 0/64 0.7976 3/20 1/64 0.0996 ICONOCLAST
Phi-3.5-mini 3/20 2/64 0.0981 7/20 2/64 0.2492 ICONOCLAST

3. Pending Results (waiting for jobs to finish)

Once the running jobs complete, their results will appear as batch_summary.json files in:

/common/users/vp752/iconoclast_ilabs/checkpoints/<run-name>/batch_summary.json

Models pending results:

Model Run Name Quant Expected Checkpoint
Llama-3.1-8B-Instruct llama3-1-8b-rutgers-benchmark bnb_4bit checkpoints/llama3-1-8b-rutgers-benchmark/
SmolLM2-1.7B-Instruct smollm2-1p7b-rutgers-benchmark none checkpoints/smollm2-1p7b-rutgers-benchmark/
Gemma-2-2B-IT gemma2-2b-seq none checkpoints/gemma2-2b-seq/
Mistral-7B-Instruct-v0.3 mistral-7b-seq bnb_4bit checkpoints/mistral-7b-seq/
Phi-4-mini-instruct phi4-mini-seq none checkpoints/phi4-mini-seq/
StableLM-2-Zephyr-1.6B stablelm2-1p6b-seq none checkpoints/stablelm2-1p6b-seq/
Yi-1.5-9B-Chat yi-1p5-9b-seq bnb_4bit checkpoints/yi-1p5-9b-seq/
Falcon3-7B-Instruct falcon3-7b-seq bnb_4bit checkpoints/falcon3-7b-seq/
OLMo-2-1B-Instruct olmo2-1b-seq none checkpoints/olmo2-1b-seq/

4. How to Check Status

SSH into the cluster

ssh vp752@ilab.cs.rutgers.edu

Check running jobs

squeue -u vp752

Check job history (completed/failed)

sacct -u vp752 --starttime=2026-04-21 --format=JobID%10,JobName%15,State%12,ExitCode,Elapsed%10

Tail the sequential runner log

tail -f ~/iconoclast/logs/iconoclast-seq-130468.out

Tail a specific job's log

tail -f ~/iconoclast/logs/iconoclast-<JOBID>.out
tail -f ~/iconoclast/logs/iconoclast-<JOBID>.err

List all batch_summary.json files (completed benchmarks)

find /common/users/vp752/iconoclast_ilabs/checkpoints/ -name batch_summary.json

Generate the comparison table (once results exist)

python3 ~/iconoclast/scripts/summarize_multimodel_benchmark.py \
  --spec "ModelName|/path/to/iconoclast/batch_summary.json|/path/to/heretic/batch_summary.json"

5. Known Issues & Fixes Applied

Disk Quota

  • Problem: Concurrent jobs all downloading models simultaneously blow the per-user quota on /common/users/vp752/.
  • Fix: Created scripts/run_sequential_benchmark.slurm which runs models one-at-a-time and rm -rf the cache between each.
  • Key: Never run more than ~2 model downloads concurrently.

Transformers Version

  • Upgraded to transformers==5.5.4 (from 4.57.6) to support qwen3_5 architecture.
  • Also upgraded huggingface_hub==1.11.0, tokenizers==0.22.2, plus new deps typer, annotated-doc, shellingham, click.
  • Installed via --no-deps to avoid pulling in a new PyTorch/CUDA stack that would blow disk quota.
  • Risk: The new transformers v5 may have breaking changes for some older model architectures. If a model fails with Failed to load model with all configured dtypes, check if it's an architecture compatibility issue.

Quantization

  • Only "none" and "bnb_4bit" are supported by ICONOCLAST's config validator.
  • Models >4B params need bnb_4bit to fit on RTX A4000 (16GB) / A5000 (22GB).
  • bitsandbytes is installed in the site-packages.

HF Token

  • HF_TOKEN is set in scripts/run_rutgers_ilabs.slurm (line 51) and in the sequential runner.
  • Required for gated repos like meta-llama/Llama-3.1-8B-Instruct and google/gemma-2-2b-it.

Qwen2.5-3B (Base) β€” Job 130448

  • Ran for 31 minutes, produced trial data, but crashed with AssertionError: Should not reach. in Optuna.
  • The Optuna study DB may have partial results. Check if batch_summary.json was written before crash.
  • Last observed metrics: KL=0.0408, Refusals=1/20, Overrefusals=3/64 (excellent).

Qwen3.5-9B (Base) & Mistral-7B

  • Problem: transformers v5.5.4 removed/changed internal methods, causing 'Qwen3_5ForConditionalGeneration' object has no attribute 'set_submodule' and similar errors for Mistral.
  • Fix: Applied a monkey-patch to torch.nn.Module in src/iconoclast/model.py that injects set_submodule if missing.
  • Status: Qwen 3.5-9B is currently re-running as Job 130619.

Gemma-2-2B (Chat Template)

  • Problem: Gemma 2 chat template does not support the "system" role, causing crashes during evaluation.
  • Fix: Updated Model.generate in src/iconoclast/model.py to automatically merge system prompts into the first user message if the chat template fails.
  • Status: Currently being retried in the sequential runner (Job 130620).

6. Key File Locations

Local (your Mac)

/Volumes/Auxilary/Side_Projects/NLP_PROJECT_NEW/iconoclast/
β”œβ”€β”€ PUBLISHABLE_RESULTS.md          # Draft paper with results table
β”œβ”€β”€ HANDOVER_ILABS.md               # This file
β”œβ”€β”€ config.*.benchmark.rutgers.toml # All model configs
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ run_rutgers_ilabs.slurm             # Single-model SLURM script
β”‚   β”œβ”€β”€ run_sequential_benchmark.slurm      # Multi-model sequential runner
β”‚   β”œβ”€β”€ setup_rutgers_env.sh                # Environment bootstrap
β”‚   β”œβ”€β”€ sync_to_rutgers.sh                  # rsync to cluster
β”‚   β”œβ”€β”€ summarize_multimodel_benchmark.py   # Results aggregator
β”‚   └── bootstrap_and_submit_rutgers_*.sh   # Per-model submit scripts
└── src/iconoclast/
    β”œβ”€β”€ main.py        # Core pipeline (Optuna objective, ablation loop)
    β”œβ”€β”€ direction.py   # Null-space projection (dampening factor)
    └── model.py       # Model loading & weight editing

Remote (iLabs cluster)

/common/home/vp752/iconoclast/              # Project source (synced from local)
/common/users/vp752/iconoclast_ilabs/       # Persistent storage root
β”œβ”€β”€ bootstrap-venv/                         # Python venv for pip
β”œβ”€β”€ python312-site/                         # All pip packages (transformers, optuna, etc.)
β”œβ”€β”€ checkpoints/                            # Optuna study DBs + batch_summary.json
β”‚   β”œβ”€β”€ qwen3-1p7b-rutgers-paper-directness/
β”‚   β”œβ”€β”€ qwen2-5-3b-rutgers-benchmark/
β”‚   β”œβ”€β”€ qwen3-4b-rutgers-benchmark-v2/
β”‚   β”œβ”€β”€ phi35-mini-rutgers-nullspace-benchmark-v3/
β”‚   β”œβ”€β”€ llama3-1-8b-rutgers-benchmark/      # Pending
β”‚   β”œβ”€β”€ smollm2-1p7b-rutgers-benchmark/     # Pending
β”‚   β”œβ”€β”€ gemma2-2b-seq/                      # Pending (sequential)
β”‚   β”œβ”€β”€ mistral-7b-seq/                     # Pending (sequential)
β”‚   └── ... (more from sequential runner)
β”œβ”€β”€ job-stage/                              # Temporary per-job project copies
└── job-cache/                              # Temporary per-job HF model downloads

7. What To Do Next

Step 1: Check if jobs finished

ssh vp752@ilab.cs.rutgers.edu
squeue -u vp752
sacct -u vp752 --starttime=2026-04-21

Step 2: List all completed results

find /common/users/vp752/iconoclast_ilabs/checkpoints/ -name batch_summary.json -newer /common/users/vp752/iconoclast_ilabs/checkpoints/phi35-mini-rutgers-nullspace-benchmark-v3/batch_summary.json

Step 3: Run Qwen3.5-9B if disk is free

# Clean old caches first
rm -rf /common/users/vp752/iconoclast_ilabs/job-cache/*
# Then submit
cd ~/iconoclast
ICONOCLAST_CONFIG_TEMPLATE=config.qwen3_5_9b_base.benchmark.rutgers.toml \
ICONOCLAST_RUN_NAME=qwen3-5-9b-base-rutgers-benchmark-v2 \
sbatch scripts/run_rutgers_ilabs.slurm

Step 4: Verify the HERETIC Baselines

To prove ICONOCLAST is better, we need a side-by-side comparison with the standard HERETIC ablation (orthogonal ablation without null-space projection).

  • scripts/run_heretic_baselines.slurm is queued to run automatically after the main sweep.
  • It will produce batch_summary.json files for all HERETIC models.

Step 5: Large-N Evaluation (520 Prompts)

To provide statistically significant proof, we evaluate the best trial configurations on a 520-prompt holdout set (mlabonne/harmful_behaviors).

  • scripts/run_large_eval_sweep.slurm is queued to run automatically after the HERETIC baselines.
  • The results for each model will be written to /common/users/vp752/iconoclast_ilabs/large_evals/<model-name>_large_eval.json.

Step 6: Generate the final comparison table

python3 ~/iconoclast/scripts/summarize_multimodel_benchmark.py \
  --spec "Qwen3-1.7B|.../iconoclast/batch_summary.json|.../heretic/batch_summary.json" \
  # ... one --spec per model

Step 6: Update PUBLISHABLE_RESULTS.md

Fill in the pending rows in the results table with actual numbers.

Step 7: Write the in-depth analysis

Key questions to answer:

  1. Scaling hypothesis: Does KL divergence decrease with model size? (Compare 1B vs 3B vs 8B vs 9B)
  2. Architecture universality: Does ICONOCLAST work across Qwen, Llama, Gemma, Mistral, Phi, etc.?
  3. Base vs Instruct: Is the raw base model easier to edit than the RLHF-aligned instruct model?

8. Environment Variables Reference

Variable Purpose
ICONOCLAST_CONFIG_TEMPLATE Which .toml config file to use
ICONOCLAST_RUN_NAME Unique name for the Optuna study (changing this forces a fresh study)
ICONOCLAST_EXIT_AFTER_OPTIMIZATION Set true for batch mode (no interactive menu)
ICONOCLAST_STUDY_CHECKPOINT_DIR Where Optuna DB + batch_summary.json are saved
HF_TOKEN HuggingFace token for gated repos
PERSIST_ROOT /common/users/vp752/iconoclast_ilabs

9. Quick Reference Commands

# Sync local changes to cluster
./scripts/sync_to_rutgers.sh

# Submit a single model benchmark
ICONOCLAST_CONFIG_TEMPLATE=config.xxx.toml \
ICONOCLAST_RUN_NAME=xxx-benchmark \
sbatch scripts/run_rutgers_ilabs.slurm

# Submit the sequential 7-model runner
sbatch scripts/run_sequential_benchmark.slurm

# Cancel a job
scancel <JOBID>

# Check disk usage
du -sh /common/users/vp752/iconoclast_ilabs/job-cache/

# Clean up all caches (only when no jobs are running!)
rm -rf /common/users/vp752/iconoclast_ilabs/job-cache/*
rm -rf /common/users/vp752/iconoclast_ilabs/job-stage/*