Blackbird-She-Doesnt-Refuse-21B
We promise she wont refuse.
Blackbird-she-doesnt-refuse-21B represents a different approach to AI capability. It's simple yet intuitive approach of using norm-preserving biprojected abliteration, this model delivers unrestricted intelligence without sacrificing reasoning quality. It is part of the Bodega OS ecosystem, designed for on-premises deployment where you maintain complete control over your AI infrastructure.
The Methodology: Beyond Standard Abliteration
Standard abliteration simply subtracts a "refusal vector" from model weights. While this removes censorship, it is mathematically unprincipled—it destroys the magnitude of neurons, damaging the delicate feature norms learned during training. The result is degraded logic, hallucinations, and what researchers colloquially call "lobotomized" models.
We use norm-preserving biprojected abliteration, a technique pioneered by Jim Lai (GrimJim), which eliminates refusals while preserving the model's intelligence. The process involves three distinct steps, each addressing a specific mathematical challenge.
Step one: Biprojection (targeting). We refine the refusal direction to be mathematically orthogonal to harmless directions. This ensures that removing refusal behavior does not accidentally remove healthy concepts. The biprojection provides surgical precision in identifying what to modify.
Step two: Decomposition. We decompose model weights into magnitude and direction components, separating the "what to say" from "how loud to say it." This enables targeted modification without collateral damage to the broader weight structure.
Step three: Norm-preservation. We remove the refusal component solely from the directional aspect, then recombine with original magnitudes. This maintains the "importance" structure of the neural network—the relative strength of different features remains intact.
The Result: Better Than Baseline
By preserving weight norms, we maintain the neural network's internal structure. Our benchmarks suggest this method does not just avoid the "safety tax"—it potentially improves reasoning capabilities, as the model no longer wastes compute suppressing its own outputs. You may discover knowledge and capabilities that were not exposed in the original model, not because they were absent, but because they were actively suppressed.
This is a technical observation, not a philosophical claim. The model has the same parameters, the same training data, the same underlying architecture. What changed is the removal of a specific directional component that caused certain outputs to be avoided regardless of their technical correctness or relevance.
What to Expect
Blackbird provides unrestricted capability. There are no artificial limitations on responses, no unnecessary hedging, no reflexive refusals to explore topics. You may occasionally observe brief consideration before proceeding on certain queries—this is residual behavior from the base model's training, not a fundamental limitation of the abliteration process.
The model maintains high-performance reasoning. Sophisticated tool usage remains intact. Instruction-following capabilities are enhanced, as the model no longer needs to balance your request against internal refusal heuristics. Benchmark performance is at or above baseline across reasoning, code generation, and general knowledge tasks.
Architecture and Performance
Blackbird shares the same underlying architecture as Centenario: 21 billion total parameters with 3.6 billion active per token. 128K token context window. MXFP4 quantization brings memory usage to 11-21GB depending on your configuration. On M-series Apple Silicon, you are getting 40-70 tokens per second sustained throughput.
The mixture of experts architecture uses alternating dense and locally banded sparse attention. Rotary position embeddings handle positional information. Grouped multi-query attention with a group size of 8 optimizes inference speed. Inference is handled through MLX, leveraging Apple's unified memory architecture for efficient on-device processing.
Running On-Premises
Blackbird runs entirely on your hardware as part of Bodega OS. Your queries, your data, your use cases—none of it leaves your machine. There are no API calls to external services, no logs sent to cloud providers, no telemetry about what you are working on.
This is particularly important for a model with unrestricted capability. When you are exploring sensitive topics, developing controversial applications, or simply want to ask questions without judgment, the fact that everything stays local is not just a privacy feature—it is a fundamental requirement.
The MLX-based inference engine provides streaming token generation, advanced memory management, and sustained performance during extended sessions. The model integrates with Bodega's retrieval engines, allowing it to access your documents and code without sending them to external services.
Intended Use
Blackbird is designed for advanced users who require maximum flexibility. Research applications without constraints. Creative and experimental projects. Scenarios demanding unrestricted capability where the user, not the model, determines what is appropriate.
This model is appropriate for informed users who understand the implications of uncensored AI. It will respond to requests that other models refuse. It will not lecture you about potential misuse. It assumes you are an adult who can make your own decisions about what you should or should not generate.
The technical work is in making abliteration preserve reasoning quality. The ethical work is yours.
Technical Notes on Abliteration
The norm-preserving biprojection approach represents a significant improvement over naive abliteration methods. Standard approaches treat refusal as a simple linear direction in weight space that can be subtracted out. This ignores the geometry of the learned representations—weight magnitudes encode feature importance, and destroying them degrades model capability.
By decomposing weights into magnitude and direction, we can modify the direction (removing the refusal component) while preserving magnitudes (maintaining feature importance). The biprojection step ensures orthogonality between the refusal direction and harmless directions, preventing overcorrection.
The mathematical framework is based on projective geometry and subspace analysis. We identify the refusal subspace through careful analysis of model activations on refused prompts, then construct an orthogonal complement that preserves everything except refusal behavior. The result is a model that maintains its reasoning capabilities while removing the learned tendency to refuse certain classes of requests.
Disclaimer
SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.
Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone
Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engines.
- Downloads last month
- 76
4-bit
