obliteratus

Running

App Files Files Community

pliny-the-prompter commited on Mar 3

Commit

ab1b6fe

verified ·

1 Parent(s): 3554c89

Upload 130 files

Browse files

Files changed (15) hide show

CHANGELOG.md +15 -1
CONTRIBUTING.md +2 -2
README.md +7 -6
app.py +104 -45
docs/index.html +3 -3
hf-spaces/README.md +1 -0
obliteratus/.DS_Store +0 -0
obliteratus/__init__.py +1 -1
obliteratus/abliterate.py +20 -9
obliteratus/cli.py +5 -2
obliteratus/evaluation/benchmarks.py +4 -3
obliteratus/informed_pipeline.py +5 -0
obliteratus/local_ui.py +1 -2
pyproject.toml +2 -2
tests/test_cli.py +9 -6

CHANGELOG.md CHANGED Viewed

@@ -3,6 +3,20 @@
 All notable changes to OBLITERATUS are documented here.
 Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [0.1.1] - 2026-03-01
 ### Fixed
@@ -39,7 +53,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 - **lm-eval-harness integration** for standardized benchmarking
 - **Reproducibility framework** with deterministic seeds and full metadata logging
 - **Telemetry** (opt-in only, anonymized, allowlisted fields)
-- **821 tests** across 27 test files (incl. CLI dispatch, shared fixtures)
 - **Research paper** (`paper/main.tex`) with geometric theory of refusal removal
 - Dual license: AGPL-3.0 + commercial

 All notable changes to OBLITERATUS are documented here.
 Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
+## [0.1.2] - 2026-03-03
+### Fixed
+- Fixed `spaces.GPU` `AttributeError` crash on HuggingFace Spaces — fallback now catches
+  both `ImportError` and `AttributeError` so the Space gracefully degrades to CPU mode
+  when ZeroGPU is unavailable
+- Added missing `hardware: zero-a10g` to HF Space metadata (`hf-spaces/README.md`) —
+  required for the `spaces` package to expose the `@spaces.GPU` decorator
+### Improved
+- Added mypy type checking to CI pipeline (`continue-on-error` while baseline is established)
+- Added `mypy` to dev dependencies
+- Version bump to 0.1.2 across `pyproject.toml` and `__init__.py`
 ## [0.1.1] - 2026-03-01
 ### Fixed
 - **lm-eval-harness integration** for standardized benchmarking
 - **Reproducibility framework** with deterministic seeds and full metadata logging
 - **Telemetry** (opt-in only, anonymized, allowlisted fields)
+- **823 tests** across 28 test files (incl. CLI dispatch, shared fixtures)
 - **Research paper** (`paper/main.tex`) with geometric theory of refusal removal
 - Dual license: AGPL-3.0 + commercial

CONTRIBUTING.md CHANGED Viewed

@@ -15,7 +15,7 @@ This installs the package in editable mode with test dependencies (pytest, ruff)
 ## Running Tests
 ```bash
-pytest                    # full suite (821 tests)
 pytest tests/test_abliterate.py  # single file
 pytest -x                 # stop on first failure
 pytest -k "test_name"     # run specific test
@@ -91,7 +91,7 @@ obliteratus/
   models/                # Model loading utilities
   reporting/             # Report generation
   strategies/            # Ablation strategies (layer, head, FFN, embedding)
-tests/                   # 27 test files
 paper/                   # LaTeX paper
 examples/                # YAML config examples
 ```

 ## Running Tests
 ```bash
+pytest                    # full suite (823 tests)
 pytest tests/test_abliterate.py  # single file
 pytest -x                 # stop on first failure
 pytest -k "test_name"     # run specific test
   models/                # Model loading utilities
   reporting/             # Report generation
   strategies/            # Ablation strategies (layer, head, FFN, embedding)
+tests/                   # 28 test files
 paper/                   # LaTeX paper
 examples/                # YAML config examples
 ```

README.md CHANGED Viewed

@@ -3,9 +3,9 @@ title: OBLITERATUS
 emoji: "\U0001F513"
 colorFrom: green
 colorTo: gray
-sdk: docker
 app_file: app.py
-suggested_hardware: t4-small
 pinned: true
 license: agpl-3.0
 tags:
@@ -156,6 +156,7 @@ pipeline = AbliterationPipeline(
     model_name="meta-llama/Llama-3.1-8B-Instruct",
     method="advanced",
     output_dir="abliterated",
 )
 result = pipeline.run()
 ```
@@ -356,7 +357,7 @@ obliteratus run examples/preset_quick.yaml
 | Analysis-informed abliteration | Yes (closed-loop feedback) | N/A | N/A | N/A | N/A | N/A |
 | Auto parameter optimization | Analysis-guided | N/A | Bayesian (Optuna) | N/A | N/A | N/A |
 | Model compatibility | Any HuggingFace model | ~50 architectures | 16/16 tested | TransformerLens only | HuggingFace | TransformerLens |
-| Test suite | 821 tests | Community | Unknown | None | Minimal | Moderate |
 ## Community contributions
@@ -434,7 +435,7 @@ metrics:
   - perplexity
 batch_size: 4
-max_length: 256
 output_dir: results/my_run
 ```
@@ -465,7 +466,7 @@ If you use OBLITERATUS in your research, please cite:
   author    = {{OBLITERATUS Contributors}},
   year      = {2026},
   url       = {https://github.com/obliteratus-project/OBLITERATUS},
-  note      = {15 analysis modules, 821 tests}
 }
 ```
@@ -476,7 +477,7 @@ pip install -e ".[dev]"
 pytest
 ```
-821 tests across 27 test files covering CLI, all analysis modules, abliteration pipeline, architecture detection, community contributions, edge cases, and evaluation metrics.
 ## License

 emoji: "\U0001F513"
 colorFrom: green
 colorTo: gray
+sdk: gradio
+sdk_version: "5.29.0"
 app_file: app.py
 pinned: true
 license: agpl-3.0
 tags:
     model_name="meta-llama/Llama-3.1-8B-Instruct",
     method="advanced",
     output_dir="abliterated",
+    max_seq_length=512,  # optional: override tokenizer truncation length for all pipeline stages
 )
 result = pipeline.run()
 ```
 | Analysis-informed abliteration | Yes (closed-loop feedback) | N/A | N/A | N/A | N/A | N/A |
 | Auto parameter optimization | Analysis-guided | N/A | Bayesian (Optuna) | N/A | N/A | N/A |
 | Model compatibility | Any HuggingFace model | ~50 architectures | 16/16 tested | TransformerLens only | HuggingFace | TransformerLens |
+| Test suite | 823 tests | Community | Unknown | None | Minimal | Moderate |
 ## Community contributions
   - perplexity
 batch_size: 4
+max_length: 256  # tokenizer truncation length (default 512)
 output_dir: results/my_run
 ```
   author    = {{OBLITERATUS Contributors}},
   year      = {2026},
   url       = {https://github.com/obliteratus-project/OBLITERATUS},
+  note      = {15 analysis modules, 823 tests}
 }
 ```
 pytest
 ```
+823 tests across 28 test files covering CLI, all analysis modules, abliteration pipeline, architecture detection, community contributions, edge cases, and evaluation metrics.
 ## License

app.py CHANGED Viewed

@@ -68,17 +68,19 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStream
 # and we fall back to a no-op decorator so the same code works everywhere.
 try:
     import spaces
     _ZEROGPU_AVAILABLE = True
-except ImportError:
     _ZEROGPU_AVAILABLE = False
-    # Create a no-op decorator that mirrors spaces.GPU interface
     class _FakeSpaces:
         @staticmethod
         def GPU(duration: int = 60, **kwargs):
             def decorator(fn):
                 return fn
             return decorator
-    spaces = _FakeSpaces()
 # ---------------------------------------------------------------------------
 # Global state
@@ -703,27 +705,44 @@ def benchmark(
         def run_pipeline():
             try:
-                from obliteratus.abliterate import AbliterationPipeline
                 if prompt_volume > 0:
                     n = min(prompt_volume, len(harmful_all), len(harmless_all))
                 else:
                     n = min(len(harmful_all), len(harmless_all))
-                pipeline = AbliterationPipeline(
-                    model_name=model_id,
-                    output_dir=f"/tmp/bench_{method_key}",
-                    device="auto",
-                    dtype="float16",
-                    method=method_key,
-                    quantization=quantization,
-                    trust_remote_code=is_preset,
-                    harmful_prompts=harmful_all[:n],
-                    harmless_prompts=harmless_all[:n],
-                    on_stage=on_stage,
-                    on_log=on_log,
-                )
-                pipeline_ref[0] = pipeline
-                pipeline.run()
             except Exception as e:
                 nonlocal run_error
                 run_error = e
@@ -1029,24 +1048,41 @@ def benchmark_multi_model(
         def run_pipeline():
             try:
-                from obliteratus.abliterate import AbliterationPipeline
                 n = actual_n
-                pipeline = AbliterationPipeline(
-                    model_name=model_id,
-                    output_dir=f"/tmp/bench_mm_{mi}",
-                    device="auto",
-                    dtype="float16",
-                    method=method_key,
-                    quantization=quantization,
-                    trust_remote_code=is_preset_model,
-                    harmful_prompts=harmful_all[:n],
-                    harmless_prompts=harmless_all[:n],
-                    on_stage=on_stage,
-                    on_log=on_log,
-                )
-                pipeline_ref[0] = pipeline
-                pipeline.run()
             except Exception as e:
                 nonlocal run_error
                 run_error = e
@@ -1461,6 +1497,7 @@ def obliterate(model_choice: str, method_choice: str, hub_repo: str,
     # Stream log updates while pipeline runs (max 45 minutes to prevent indefinite hang)
     _max_pipeline_secs = 45 * 60
     _pipeline_start = time.time()
     while worker.is_alive():
         status_msg = f"**Obliterating\u2026** ({_elapsed()})"
         if len(log_lines) > last_yielded[0]:
@@ -1741,7 +1778,7 @@ def _strip_reasoning_tokens(text: str) -> str:
 @spaces.GPU(duration=120)
 def chat_respond(message: str, history: list[dict], system_prompt: str,
                  temperature: float, top_p: float, max_tokens: int,
-                 repetition_penalty: float):
     """Stream a response from the liberated model.
     On ZeroGPU, allocates a GPU for up to 2 minutes per response.
@@ -1761,6 +1798,7 @@ def chat_respond(message: str, history: list[dict], system_prompt: str,
     temperature = max(0.0, min(1.5, float(temperature)))
     top_p = max(0.0, min(1.0, float(top_p)))
     repetition_penalty = max(1.0, min(2.0, float(repetition_penalty)))
     # Build messages — cap history to prevent unbounded memory use
     messages = []
@@ -1777,7 +1815,7 @@ def chat_respond(message: str, history: list[dict], system_prompt: str,
         # Fallback: simple concatenation
         text = "\n".join(f"{m['role']}: {m['content']}" for m in messages) + "\nassistant:"
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
     inputs = {k: v.to(model.device) for k, v in inputs.items()}
     # Streaming generation — repetition_penalty and no_repeat_ngram_size
@@ -2044,7 +2082,8 @@ def load_bench_into_chat(choice: str, progress=gr.Progress()):
 @spaces.GPU(duration=120)
 def ab_chat_respond(message: str, history_left: list[dict], history_right: list[dict],
                     system_prompt: str, temperature: float, top_p: float,
-                    max_tokens: int, repetition_penalty: float):
     """Generate responses from BOTH original and abliterated model side-by-side.
     Left panel = original (pre-abliteration), Right panel = abliterated.
@@ -2076,6 +2115,7 @@ def ab_chat_respond(message: str, history_left: list[dict], history_right: list[
     temperature = max(0.0, min(1.5, float(temperature)))
     top_p = max(0.0, min(1.0, float(top_p)))
     repetition_penalty = max(1.0, min(2.0, float(repetition_penalty)))
     # Build messages — cap history to prevent unbounded memory use
     messages = []
@@ -2091,7 +2131,7 @@ def ab_chat_respond(message: str, history_left: list[dict], history_right: list[
     except Exception:
         text = "\n".join(f"{m['role']}: {m['content']}" for m in messages) + "\nassistant:"
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
     gen_kwargs_base = {
         "max_new_tokens": int(max_tokens),
@@ -2279,8 +2319,13 @@ def strength_sweep(model_choice: str, method_choice: str,
         def _run_sweep_point():
             try:
                 pipe = AbliterationPipeline(
                     model_id, method=method_key,
                     trust_remote_code=is_preset,
                     harmful_prompts=harmful, harmless_prompts=harmless,
                     regularization=reg,
@@ -2316,6 +2361,9 @@ def strength_sweep(model_choice: str, method_choice: str,
             entry["refusal_rate"] = metrics.get("refusal_rate")
             entry["coherence"] = metrics.get("coherence")
             entry["strong_layers"] = len(pipe._strong_layers)
             del pipe
         results.append(entry)
@@ -3114,7 +3162,8 @@ result = client.predict(
                         )
                         bench_methods = gr.CheckboxGroup(
                             choices=["basic", "advanced", "aggressive", "spectral_cascade",
-                                     "informed", "surgical", "optimized", "inverted", "nuclear"],
                             value=["basic", "advanced", "spectral_cascade", "surgical"],
                             label="Methods to Compare",
                         )
@@ -3438,12 +3487,17 @@ Pre-configured benchmark configurations for common research questions.
                         label="Repetition Penalty",
                         info="Penalizes repeated tokens — higher values break refusal loops (1.0 = off)",
                     )
             gr.ChatInterface(
                 fn=chat_respond,
                 type="messages",
                 chatbot=gr.Chatbot(height="11vh", type="messages"),
-                additional_inputs=[system_prompt, temperature, top_p, max_tokens, repetition_penalty],
                 fill_height=True,
             )
@@ -3507,6 +3561,11 @@ See exactly how abliteration changes model behavior on the same prompt.
                     ab_top_p = gr.Slider(0.0, 1.0, value=0.9, step=0.05, label="Top P")
                     ab_max_tokens = gr.Slider(32, 2048, value=256, step=32, label="Max Tokens")
                     ab_rep_penalty = gr.Slider(1.0, 2.0, value=1.15, step=0.05, label="Rep Penalty")
             with gr.Row():
                 with gr.Column():
@@ -3533,7 +3592,7 @@ See exactly how abliteration changes model behavior on the same prompt.
             ab_send_btn.click(
                 fn=ab_chat_respond,
                 inputs=[ab_input, ab_chatbot_left, ab_chatbot_right,
-                        ab_system_prompt, ab_temp, ab_top_p, ab_max_tokens, ab_rep_penalty],
                 outputs=[ab_chatbot_left, ab_chatbot_right, ab_status,
                          ab_header_left, ab_header_right],
             )
@@ -3541,7 +3600,7 @@ See exactly how abliteration changes model behavior on the same prompt.
             ab_input.submit(
                 fn=ab_chat_respond,
                 inputs=[ab_input, ab_chatbot_left, ab_chatbot_right,
-                        ab_system_prompt, ab_temp, ab_top_p, ab_max_tokens, ab_rep_penalty],
                 outputs=[ab_chatbot_left, ab_chatbot_right, ab_status,
                          ab_header_left, ab_header_right],
             )

 # and we fall back to a no-op decorator so the same code works everywhere.
 try:
     import spaces
+    spaces.GPU  # Verify ZeroGPU decorator is actually available
     _ZEROGPU_AVAILABLE = True
+except (ImportError, AttributeError):
     _ZEROGPU_AVAILABLE = False
+    # Create a no-op decorator that mirrors spaces.GPU interface so the same
+    # code runs locally, on CPU-only Spaces, and on ZeroGPU Spaces.
     class _FakeSpaces:
         @staticmethod
         def GPU(duration: int = 60, **kwargs):
             def decorator(fn):
                 return fn
             return decorator
+    spaces = _FakeSpaces()  # type: ignore[assignment]
 # ---------------------------------------------------------------------------
 # Global state
         def run_pipeline():
             try:
                 if prompt_volume > 0:
                     n = min(prompt_volume, len(harmful_all), len(harmless_all))
                 else:
                     n = min(len(harmful_all), len(harmless_all))
+                if method_key == "informed":
+                    from obliteratus.informed_pipeline import InformedAbliterationPipeline
+                    pipeline = InformedAbliterationPipeline(
+                        model_name=model_id,
+                        output_dir=f"/tmp/bench_{method_key}",
+                        device="auto",
+                        dtype="float16",
+                        quantization=quantization,
+                        trust_remote_code=is_preset,
+                        harmful_prompts=harmful_all[:n],
+                        harmless_prompts=harmless_all[:n],
+                        on_stage=on_stage,
+                        on_log=on_log,
+                    )
+                    pipeline_ref[0] = pipeline
+                    pipeline.run_informed()
+                else:
+                    from obliteratus.abliterate import AbliterationPipeline
+                    pipeline = AbliterationPipeline(
+                        model_name=model_id,
+                        output_dir=f"/tmp/bench_{method_key}",
+                        device="auto",
+                        dtype="float16",
+                        method=method_key,
+                        quantization=quantization,
+                        trust_remote_code=is_preset,
+                        harmful_prompts=harmful_all[:n],
+                        harmless_prompts=harmless_all[:n],
+                        on_stage=on_stage,
+                        on_log=on_log,
+                    )
+                    pipeline_ref[0] = pipeline
+                    pipeline.run()
             except Exception as e:
                 nonlocal run_error
                 run_error = e
         def run_pipeline():
             try:
                 n = actual_n
+                if method_key == "informed":
+                    from obliteratus.informed_pipeline import InformedAbliterationPipeline
+                    pipeline = InformedAbliterationPipeline(
+                        model_name=model_id,
+                        output_dir=f"/tmp/bench_mm_{mi}",
+                        device="auto",
+                        dtype="float16",
+                        quantization=quantization,
+                        trust_remote_code=is_preset_model,
+                        harmful_prompts=harmful_all[:n],
+                        harmless_prompts=harmless_all[:n],
+                        on_stage=on_stage,
+                        on_log=on_log,
+                    )
+                    pipeline_ref[0] = pipeline
+                    pipeline.run_informed()
+                else:
+                    from obliteratus.abliterate import AbliterationPipeline
+                    pipeline = AbliterationPipeline(
+                        model_name=model_id,
+                        output_dir=f"/tmp/bench_mm_{mi}",
+                        device="auto",
+                        dtype="float16",
+                        method=method_key,
+                        quantization=quantization,
+                        trust_remote_code=is_preset_model,
+                        harmful_prompts=harmful_all[:n],
+                        harmless_prompts=harmless_all[:n],
+                        on_stage=on_stage,
+                        on_log=on_log,
+                    )
+                    pipeline_ref[0] = pipeline
+                    pipeline.run()
             except Exception as e:
                 nonlocal run_error
                 run_error = e
     # Stream log updates while pipeline runs (max 45 minutes to prevent indefinite hang)
     _max_pipeline_secs = 45 * 60
     _pipeline_start = time.time()
+    status_msg = f"**Obliterating\u2026** (0s)"
     while worker.is_alive():
         status_msg = f"**Obliterating\u2026** ({_elapsed()})"
         if len(log_lines) > last_yielded[0]:
 @spaces.GPU(duration=120)
 def chat_respond(message: str, history: list[dict], system_prompt: str,
                  temperature: float, top_p: float, max_tokens: int,
+                 repetition_penalty: float, context_length: int = 2048):
     """Stream a response from the liberated model.
     On ZeroGPU, allocates a GPU for up to 2 minutes per response.
     temperature = max(0.0, min(1.5, float(temperature)))
     top_p = max(0.0, min(1.0, float(top_p)))
     repetition_penalty = max(1.0, min(2.0, float(repetition_penalty)))
+    context_length = max(128, min(32768, int(context_length)))
     # Build messages — cap history to prevent unbounded memory use
     messages = []
         # Fallback: simple concatenation
         text = "\n".join(f"{m['role']}: {m['content']}" for m in messages) + "\nassistant:"
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=context_length)
     inputs = {k: v.to(model.device) for k, v in inputs.items()}
     # Streaming generation — repetition_penalty and no_repeat_ngram_size
 @spaces.GPU(duration=120)
 def ab_chat_respond(message: str, history_left: list[dict], history_right: list[dict],
                     system_prompt: str, temperature: float, top_p: float,
+                    max_tokens: int, repetition_penalty: float,
+                    context_length: int = 2048):
     """Generate responses from BOTH original and abliterated model side-by-side.
     Left panel = original (pre-abliteration), Right panel = abliterated.
     temperature = max(0.0, min(1.5, float(temperature)))
     top_p = max(0.0, min(1.0, float(top_p)))
     repetition_penalty = max(1.0, min(2.0, float(repetition_penalty)))
+    context_length = max(128, min(32768, int(context_length)))
     # Build messages — cap history to prevent unbounded memory use
     messages = []
     except Exception:
         text = "\n".join(f"{m['role']}: {m['content']}" for m in messages) + "\nassistant:"
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=context_length)
     gen_kwargs_base = {
         "max_new_tokens": int(max_tokens),
         def _run_sweep_point():
             try:
+                quantization = _should_quantize(model_id)
                 pipe = AbliterationPipeline(
                     model_id, method=method_key,
+                    output_dir=f"/tmp/sweep_{step_i}",
+                    device="auto",
+                    dtype="float16",
+                    quantization=quantization,
                     trust_remote_code=is_preset,
                     harmful_prompts=harmful, harmless_prompts=harmless,
                     regularization=reg,
             entry["refusal_rate"] = metrics.get("refusal_rate")
             entry["coherence"] = metrics.get("coherence")
             entry["strong_layers"] = len(pipe._strong_layers)
+            if hasattr(pipe, "handle") and pipe.handle is not None:
+                pipe.handle.model = None
+                pipe.handle.tokenizer = None
             del pipe
         results.append(entry)
                         )
                         bench_methods = gr.CheckboxGroup(
                             choices=["basic", "advanced", "aggressive", "spectral_cascade",
+                                     "informed", "surgical", "optimized", "inverted", "nuclear",
+                                     "failspy", "gabliteration", "heretic", "rdo"],
                             value=["basic", "advanced", "spectral_cascade", "surgical"],
                             label="Methods to Compare",
                         )
                         label="Repetition Penalty",
                         info="Penalizes repeated tokens — higher values break refusal loops (1.0 = off)",
                     )
+                    context_length = gr.Slider(
+                        128, 32768, value=2048, step=128,
+                        label="Context Length",
+                        info="Max input tokens — increase for long conversations, decrease to save VRAM",
+                    )
             gr.ChatInterface(
                 fn=chat_respond,
                 type="messages",
                 chatbot=gr.Chatbot(height="11vh", type="messages"),
+                additional_inputs=[system_prompt, temperature, top_p, max_tokens, repetition_penalty, context_length],
                 fill_height=True,
             )
                     ab_top_p = gr.Slider(0.0, 1.0, value=0.9, step=0.05, label="Top P")
                     ab_max_tokens = gr.Slider(32, 2048, value=256, step=32, label="Max Tokens")
                     ab_rep_penalty = gr.Slider(1.0, 2.0, value=1.15, step=0.05, label="Rep Penalty")
+                    ab_context_length = gr.Slider(
+                        128, 32768, value=2048, step=128,
+                        label="Context Length",
+                        info="Max input tokens for both models",
+                    )
             with gr.Row():
                 with gr.Column():
             ab_send_btn.click(
                 fn=ab_chat_respond,
                 inputs=[ab_input, ab_chatbot_left, ab_chatbot_right,
+                        ab_system_prompt, ab_temp, ab_top_p, ab_max_tokens, ab_rep_penalty, ab_context_length],
                 outputs=[ab_chatbot_left, ab_chatbot_right, ab_status,
                          ab_header_left, ab_header_right],
             )
             ab_input.submit(
                 fn=ab_chat_respond,
                 inputs=[ab_input, ab_chatbot_left, ab_chatbot_right,
+                        ab_system_prompt, ab_temp, ab_top_p, ab_max_tokens, ab_rep_penalty, ab_context_length],
                 outputs=[ab_chatbot_left, ab_chatbot_right, ab_status,
                          ab_header_left, ab_header_right],
             )

docs/index.html CHANGED Viewed

@@ -796,7 +796,7 @@
 ██    ██ ██████  ██      ██    ██    █████   ██████  ███████    ██    ██    ██ ███████
 ██    ██ ██   ██ ██      ██    ██    ██      ██   ██ ██   ██    ██    ██    ██      ██
  ██████  ██████  ███████ ██    ██    ███████ ██   ██ ██   ██    ██     ██████  ███████</div>
-            <p class="subtitle">[ <em>MASTER ABLATION SUITE</em> ] &mdash; BREAK THE CHAINS THAT BIND YOU. 15 analysis modules. 821 tests.<span class="cursor"></span></p>
         </header>
         <div class="tabs">
@@ -1253,7 +1253,7 @@
                     <strong style="color:var(--cyan)">linear_cka</strong> (representation similarity) &bull;
                     <strong style="color:var(--cyan)">effective_rank</strong> (weight matrix health) &bull;
                     <strong style="color:var(--cyan)">kl_divergence</strong> (distribution shift) &bull;
-                    821 tests across 27 test files.
                 </p>
             </div>
@@ -1461,7 +1461,7 @@
         </div>
         <footer>
-            OBLITERATUS &mdash; Master Ablation Suite &mdash; 15 modules &bull; 821 tests &bull; 2 paradigms &mdash;
             <a href="https://huggingface.co/transformers">HuggingFace Transformers</a>
             <span class="sigils">&#9043; &#9178; &#9067; &#9700; &#9045;</span>
         </footer>

 ██    ██ ██████  ██      ██    ██    █████   ██████  ███████    ██    ██    ██ ███████
 ██    ██ ██   ██ ██      ██    ██    ██      ██   ██ ██   ██    ██    ██    ██      ██
  ██████  ██████  ███████ ██    ██    ███████ ██   ██ ██   ██    ██     ██████  ███████</div>
+            <p class="subtitle">[ <em>MASTER ABLATION SUITE</em> ] &mdash; BREAK THE CHAINS THAT BIND YOU. 15 analysis modules. 823 tests.<span class="cursor"></span></p>
         </header>
         <div class="tabs">
                     <strong style="color:var(--cyan)">linear_cka</strong> (representation similarity) &bull;
                     <strong style="color:var(--cyan)">effective_rank</strong> (weight matrix health) &bull;
                     <strong style="color:var(--cyan)">kl_divergence</strong> (distribution shift) &bull;
+                    823 tests across 28 test files.
                 </p>
             </div>
         </div>
         <footer>
+            OBLITERATUS &mdash; Master Ablation Suite &mdash; 15 modules &bull; 823 tests &bull; 2 paradigms &mdash;
             <a href="https://huggingface.co/transformers">HuggingFace Transformers</a>
             <span class="sigils">&#9043; &#9178; &#9067; &#9700; &#9045;</span>
         </footer>

hf-spaces/README.md CHANGED Viewed

@@ -6,6 +6,7 @@ colorTo: gray
 sdk: gradio
 sdk_version: "5.29.0"
 app_file: app.py
 pinned: true
 license: agpl-3.0
 tags:

 sdk: gradio
 sdk_version: "5.29.0"
 app_file: app.py
+hardware: zero-a10g
 pinned: true
 license: agpl-3.0
 tags:

obliteratus/.DS_Store CHANGED Viewed

Binary files a/obliteratus/.DS_Store and b/obliteratus/.DS_Store differ

obliteratus/__init__.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """Obliteratus — Master Ablation Suite for HuggingFace transformers."""
-__version__ = "0.1.0"
 # Lazy imports for the main pipeline classes
 __all__ = [

 """Obliteratus — Master Ablation Suite for HuggingFace transformers."""
+__version__ = "0.1.2"
 # Lazy imports for the main pipeline classes
 __all__ = [

obliteratus/abliterate.py CHANGED Viewed

@@ -563,6 +563,7 @@ class AbliterationPipeline:
         spectral_bands: int | None = None,
         spectral_threshold: float | None = None,
         large_model_mode: bool = False,
         on_stage: Callable[[StageResult], None] | None = None,
         on_log: Callable[[str], None] | None = None,
     ):
@@ -653,6 +654,12 @@ class AbliterationPipeline:
         self.spectral_bands = spectral_bands if spectral_bands is not None else method_cfg.get("spectral_bands", 3)
         self.spectral_threshold = spectral_threshold if spectral_threshold is not None else method_cfg.get("spectral_threshold", 0.05)
         # Large model mode: conservative defaults for 120B+ models.
         # Reduces memory footprint by limiting SAE features, directions,
         # and refinement passes.  Explicit parameter overrides still apply.
@@ -1303,17 +1310,21 @@ class AbliterationPipeline:
         # Adaptive max_length: shorten sequences when GPU memory is tight.
         # For CoT-aware mode we need more sequence to capture reasoning tokens.
-        max_length = 384 if collect_multi_pos else 256
         free_gb = 0.0
         if torch.cuda.is_available():
             free_gb = sum(
                 torch.cuda.mem_get_info(i)[0] / (1024 ** 3)
                 for i in range(torch.cuda.device_count())
             )
-            if free_gb < 2.0:
                 max_length = 64
                 self.log(f"  Low GPU memory ({free_gb:.1f} GB free), using max_length={max_length}")
-            elif free_gb < 4.0:
                 max_length = 128
                 self.log(f"  Tight GPU memory ({free_gb:.1f} GB free), using max_length={max_length}")
@@ -2622,7 +2633,7 @@ class AbliterationPipeline:
                 batch = self._kl_eval_prompts[i:i + batch_size]
                 inputs = tokenizer(
                     batch, return_tensors="pt",
-                    padding=True, truncation=True, max_length=256,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
@@ -3453,7 +3464,7 @@ class AbliterationPipeline:
         try:
             for prompt in kl_prompts:
                 inputs = tokenizer(
-                    prompt, return_tensors="pt", truncation=True, max_length=64,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
                 with torch.no_grad():
@@ -3527,7 +3538,7 @@ class AbliterationPipeline:
         try:
             for prompt in kl_prompts[:3]:
                 inputs = tokenizer(
-                    prompt, return_tensors="pt", truncation=True, max_length=64,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
                 with torch.no_grad():
@@ -4842,7 +4853,7 @@ class AbliterationPipeline:
         total_loss = 0.0
         n_tokens = 0
         for text in reference_texts:
-            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
             inputs = {k: v.to(device) for k, v in inputs.items()}
             with torch.no_grad():
                 outputs = model(**inputs, labels=inputs["input_ids"])
@@ -5007,7 +5018,7 @@ class AbliterationPipeline:
                 try:
                     inputs = tokenizer(
                         batch_formatted, return_tensors="pt",
-                        padding=True, truncation=True, max_length=512,
                     )
                     # Track per-prompt input lengths (non-pad tokens)
                     attention_mask = inputs["attention_mask"]
@@ -5104,7 +5115,7 @@ class AbliterationPipeline:
                     batch = self._kl_eval_prompts[i:i + 8]
                     inputs = tokenizer(
                         batch, return_tensors="pt",
-                        padding=True, truncation=True, max_length=256,
                     )
                     inputs = {k: v.to(device) for k, v in inputs.items()}
                     with torch.no_grad():

         spectral_bands: int | None = None,
         spectral_threshold: float | None = None,
         large_model_mode: bool = False,
+        max_seq_length: int | None = None,
         on_stage: Callable[[StageResult], None] | None = None,
         on_log: Callable[[str], None] | None = None,
     ):
         self.spectral_bands = spectral_bands if spectral_bands is not None else method_cfg.get("spectral_bands", 3)
         self.spectral_threshold = spectral_threshold if spectral_threshold is not None else method_cfg.get("spectral_threshold", 0.05)
+        # Tokenizer max_seq_length: controls truncation for all internal
+        # tokenizer calls (activation collection, KL eval, verify stage).
+        # None means use context-dependent defaults (256 for probes, 512 for
+        # verify, etc.) — setting this overrides ALL of them.
+        self.max_seq_length = max_seq_length
         # Large model mode: conservative defaults for 120B+ models.
         # Reduces memory footprint by limiting SAE features, directions,
         # and refinement passes.  Explicit parameter overrides still apply.
         # Adaptive max_length: shorten sequences when GPU memory is tight.
         # For CoT-aware mode we need more sequence to capture reasoning tokens.
+        # User override via max_seq_length takes priority over all heuristics.
+        if self.max_seq_length is not None:
+            max_length = self.max_seq_length
+        else:
+            max_length = 384 if collect_multi_pos else 256
         free_gb = 0.0
         if torch.cuda.is_available():
             free_gb = sum(
                 torch.cuda.mem_get_info(i)[0] / (1024 ** 3)
                 for i in range(torch.cuda.device_count())
             )
+            if self.max_seq_length is None and free_gb < 2.0:
                 max_length = 64
                 self.log(f"  Low GPU memory ({free_gb:.1f} GB free), using max_length={max_length}")
+            elif self.max_seq_length is None and free_gb < 4.0:
                 max_length = 128
                 self.log(f"  Tight GPU memory ({free_gb:.1f} GB free), using max_length={max_length}")
                 batch = self._kl_eval_prompts[i:i + batch_size]
                 inputs = tokenizer(
                     batch, return_tensors="pt",
+                    padding=True, truncation=True, max_length=self.max_seq_length or 256,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
         try:
             for prompt in kl_prompts:
                 inputs = tokenizer(
+                    prompt, return_tensors="pt", truncation=True, max_length=self.max_seq_length or 64,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
                 with torch.no_grad():
         try:
             for prompt in kl_prompts[:3]:
                 inputs = tokenizer(
+                    prompt, return_tensors="pt", truncation=True, max_length=self.max_seq_length or 64,
                 )
                 inputs = {k: v.to(device) for k, v in inputs.items()}
                 with torch.no_grad():
         total_loss = 0.0
         n_tokens = 0
         for text in reference_texts:
+            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=self.max_seq_length or 256)
             inputs = {k: v.to(device) for k, v in inputs.items()}
             with torch.no_grad():
                 outputs = model(**inputs, labels=inputs["input_ids"])
                 try:
                     inputs = tokenizer(
                         batch_formatted, return_tensors="pt",
+                        padding=True, truncation=True, max_length=self.max_seq_length or 512,
                     )
                     # Track per-prompt input lengths (non-pad tokens)
                     attention_mask = inputs["attention_mask"]
                     batch = self._kl_eval_prompts[i:i + 8]
                     inputs = tokenizer(
                         batch, return_tensors="pt",
+                        padding=True, truncation=True, max_length=self.max_seq_length or 256,
                     )
                     inputs = {k: v.to(device) for k, v in inputs.items()}
                     with torch.no_grad():

obliteratus/cli.py CHANGED Viewed

@@ -91,8 +91,11 @@ def main(argv: list[str] | None = None):
         p.add_argument("--dtype", type=str, default="float16")
         p.add_argument(
             "--method", type=str, default="advanced",
-            choices=["basic", "advanced", "aggressive", "surgical", "inverted", "nuclear"],
-            help="Liberation method: basic, advanced, aggressive, surgical, inverted, nuclear",
         )
         p.add_argument("--n-directions", type=int, default=None, help="Override: number of SVD directions to extract")
         p.add_argument("--regularization", type=float, default=None, help="Override: fraction to preserve (0.0-1.0)")

         p.add_argument("--dtype", type=str, default="float16")
         p.add_argument(
             "--method", type=str, default="advanced",
+            choices=[
+                "basic", "advanced", "aggressive", "spectral_cascade",
+                "informed", "surgical", "optimized", "inverted", "nuclear",
+            ],
+            help="Liberation method (default: advanced)",
         )
         p.add_argument("--n-directions", type=int, default=None, help="Override: number of SVD directions to extract")
         p.add_argument("--regularization", type=float, default=None, help="Override: fraction to preserve (0.0-1.0)")

obliteratus/evaluation/benchmarks.py CHANGED Viewed

@@ -122,9 +122,10 @@ class BenchmarkRunner:
     without requiring external datasets or API calls.
     """
-    def __init__(self, model, tokenizer, device: str | None = None):
         self.model = model
         self.tokenizer = tokenizer
         if device is None:
             self.device = next(model.parameters()).device
         else:
@@ -272,7 +273,7 @@ class BenchmarkRunner:
         prompt += "Answer: ("
         inputs = self.tokenizer(
-            prompt, return_tensors="pt", truncation=True, max_length=256
         )
         inputs = {k: v.to(self.device) for k, v in inputs.items()}
@@ -295,7 +296,7 @@ class BenchmarkRunner:
     def _generate_short(self, prompt: str) -> str:
         """Generate a short completion for a prompt."""
         inputs = self.tokenizer(
-            prompt, return_tensors="pt", truncation=True, max_length=256
         )
         inputs = {k: v.to(self.device) for k, v in inputs.items()}

     without requiring external datasets or API calls.
     """
+    def __init__(self, model, tokenizer, device: str | None = None, max_length: int = 256):
         self.model = model
         self.tokenizer = tokenizer
+        self.max_length = max_length
         if device is None:
             self.device = next(model.parameters()).device
         else:
         prompt += "Answer: ("
         inputs = self.tokenizer(
+            prompt, return_tensors="pt", truncation=True, max_length=self.max_length
         )
         inputs = {k: v.to(self.device) for k, v in inputs.items()}
     def _generate_short(self, prompt: str) -> str:
         """Generate a short completion for a prompt."""
         inputs = self.tokenizer(
+            prompt, return_tensors="pt", truncation=True, max_length=self.max_length
         )
         inputs = {k: v.to(self.device) for k, v in inputs.items()}

obliteratus/informed_pipeline.py CHANGED Viewed

@@ -179,6 +179,9 @@ class InformedAbliterationPipeline(AbliterationPipeline):
         harmless_prompts: list[str] | None = None,
         on_stage: Callable[[StageResult], None] | None = None,
         on_log: Callable[[str], None] | None = None,
         # Analysis configuration
         run_cone_analysis: bool = True,
         run_alignment_detection: bool = True,
@@ -208,6 +211,8 @@ class InformedAbliterationPipeline(AbliterationPipeline):
             harmless_prompts=harmless_prompts,
             on_stage=on_stage,
             on_log=on_log,
             # Set informed defaults
             norm_preserve=True,
             project_biases=True,

         harmless_prompts: list[str] | None = None,
         on_stage: Callable[[StageResult], None] | None = None,
         on_log: Callable[[str], None] | None = None,
+        # Base pipeline kwargs forwarded to AbliterationPipeline
+        push_to_hub: str | None = None,
+        quantization: str | None = None,
         # Analysis configuration
         run_cone_analysis: bool = True,
         run_alignment_detection: bool = True,
             harmless_prompts=harmless_prompts,
             on_stage=on_stage,
             on_log=on_log,
+            push_to_hub=push_to_hub,
+            quantization=quantization,
             # Set informed defaults
             norm_preserve=True,
             project_biases=True,

obliteratus/local_ui.py CHANGED Viewed

@@ -18,7 +18,6 @@ import time
 from rich.console import Console
 from rich.panel import Panel
 from rich.table import Table
-from rich.text import Text
 console = Console()
@@ -296,7 +295,7 @@ def launch_local_ui(
     console.print("[dim]Loading OBLITERATUS UI (this may take a moment on first run)...[/dim]")
     start = time.time()
-    from app import demo, launch as app_launch
     elapsed = time.time() - start
     if not quiet:

 from rich.console import Console
 from rich.panel import Panel
 from rich.table import Table
 console = Console()
     console.print("[dim]Loading OBLITERATUS UI (this may take a moment on first run)...[/dim]")
     start = time.time()
+    from app import launch as app_launch
     elapsed = time.time() - start
     if not quiet:

pyproject.toml CHANGED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "obliteratus"
-version = "0.1.0"
 description = "Master Ablation Suite for HuggingFace transformers"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -43,7 +43,7 @@ dependencies = [
 "Bug Tracker" = "https://github.com/obliteratus-project/OBLITERATUS/issues"
 [project.optional-dependencies]
-dev = ["pytest>=7.0", "pytest-cov", "ruff"]
 spaces = ["gradio>=5.0,<6.0"]
 [project.scripts]

 [project]
 name = "obliteratus"
+version = "0.1.2"
 description = "Master Ablation Suite for HuggingFace transformers"
 readme = "README.md"
 requires-python = ">=3.10"
 "Bug Tracker" = "https://github.com/obliteratus-project/OBLITERATUS/issues"
 [project.optional-dependencies]
+dev = ["pytest>=7.0", "pytest-cov", "ruff", "mypy"]
 spaces = ["gradio>=5.0,<6.0"]
 [project.scripts]

tests/test_cli.py CHANGED Viewed

@@ -62,8 +62,11 @@ class TestCLIDispatch:
     # 4. obliterate --method accepts valid methods
     def test_obliterate_valid_methods(self):
-        """Test that --method accepts basic, advanced, and aggressive."""
-        valid_methods = ["basic", "advanced", "aggressive"]
         for method in valid_methods:
             # Patch the actual pipeline execution so nothing runs
             with patch("obliteratus.cli._cmd_abliterate") as mock_cmd:
@@ -72,11 +75,11 @@ class TestCLIDispatch:
                 args_passed = mock_cmd.call_args[0][0]
                 assert args_passed.method == method
-    # 4b. informed is NOT a valid --method choice on the CLI
-    def test_obliterate_rejects_informed_method(self):
-        """The CLI --method flag does not accept 'informed' (separate pipeline)."""
         stderr_text = _capture_exit(
-            ["obliterate", "fake/model", "--method", "informed"],
             expect_code=2,
         )
         assert "invalid choice" in stderr_text.lower()

     # 4. obliterate --method accepts valid methods
     def test_obliterate_valid_methods(self):
+        """Test that --method accepts all 9 pipeline methods."""
+        valid_methods = [
+            "basic", "advanced", "aggressive", "spectral_cascade",
+            "informed", "surgical", "optimized", "inverted", "nuclear",
+        ]
         for method in valid_methods:
             # Patch the actual pipeline execution so nothing runs
             with patch("obliteratus.cli._cmd_abliterate") as mock_cmd:
                 args_passed = mock_cmd.call_args[0][0]
                 assert args_passed.method == method
+    # 4b. invalid methods are rejected
+    def test_obliterate_rejects_invalid_method(self):
+        """The CLI --method flag rejects unknown method names."""
         stderr_text = _capture_exit(
+            ["obliterate", "fake/model", "--method", "nonexistent"],
             expect_code=2,
         )
         assert "invalid choice" in stderr_text.lower()