CRACK v2 — Architecture-Aware Abliteration — Coming Soon

Beyond Standard
Abliteration.
Architecture-Aware.

The first abliteration tool built for frontier-scale models with hybrid SSM/attention, Mixture-of-Experts routing, and chain-of-thought reasoning. Our proprietary multi-pathway method handles architectures that standard single-direction abliteration cannot touch. Coming soon for Apple Silicon.

394B+
Parameter Scale
Hybrid
SSM / MoE / CoT Aware
25+
Novel Research Findings
v2
Coming Soon

Architecture-aware
multi-pathway abliteration

Standard abliteration assumes safety lives along a single direction in the model's residual stream. Our research on 394B and 122B frontier models proved that's wrong — safety in modern architectures is distributed across multiple pathways, layer types, and memory channels.

CRACK v2 is built on our 25+ empirical findings. It understands hybrid SSM/attention layers, MoE expert routing, and chain-of-thought reasoning — handling the multi-pathway safety architectures that break every other abliteration tool.

Load Frontier Model

Any MLX-compatible model (MoE, hybrid SSM, dense)

Analyze Architecture

Detect SSM layers, MoE routing, attention pathways

×

Multi-Pathway Abliteration

Proprietary method targeting all safety channels

Unrestricted Model

Full capability, zero refusals

Built for security professionals

Built on 86+ experiments across two frontier-scale models. Not a wrapper — a new approach.

Hybrid SSM/Attention Aware

Understands dual-channel architectures where safety flows through both residual stream and compressed-memory SSM pathways simultaneously.

🔒

MoE Expert Routing

Profiles which expert sub-networks carry safety behavior and handles the domain-intent fusion where knowledge experts ARE the safety experts.

Chain-of-Thought Safe

Handles models with internal <think> deliberation where safety decisions are made during reasoning, not just at the output layer.

📈

Quantization-Resistant

Modifications survive 4-bit compression — solving the critical problem where standard abliteration gets drowned out by quantization noise.

🌐

Model Hub

Pre-modified models on HuggingFace. Browse our latest CRACK and REAP models ready for deployment.

🔧

Extensible Pipeline

Plugin architecture for custom abliteration strategies, dataset generation, and integration with your existing toolchain.

Three steps to freedom

From stock model to unrestricted security tool in minutes.

01

Load Any Frontier Model

Support for hybrid SSM/attention (Qwen 3.5), Mixture-of-Experts, dense transformers, and any MLX-compatible architecture up to 397B+ parameters.

02

Architecture-Aware Analysis

CRACK v2 detects the model's architecture type, identifies all safety pathways (attention, SSM memory channel, expert routing), and plans multi-pathway intervention.

03

Deploy & Pentest

Output is a standard model file. Load it anywhere — vLLM, Ollama, llama.cpp. Start automated pentesting immediately.

terminal
# Install CRACK $ pip install crack-abliterate # Load an MLX model and abliterate $ crack --model mlx-community/deepseek-coder-v2 \ --strength 1.0 \ --output ./deepseek-coder-v2-cracked [+] Loaded MLX model (16 layers) [+] Refusal direction identified [+] CRACK complete - 0 refusals remaining [+] Model saved to ./deepseek-coder-v2-cracked

Native Mac app for Apple Silicon

CRACK.app wraps the CLI engine in a guided 5-step SwiftUI workflow. Select a model, probe for refusal vectors, preview the effect, then operate.

Coming to Mac soon
CRACK app - Model Selection
CRACK app - Probe View
CRACK app - Surgery Complete
macOS 14+ · Apple Silicon · SwiftUI

How CRACK actually works

A 5-step pipeline that surgically removes refusal behavior from model weights without destroying capability.

STEP 1

Probe the Model

CRACK feeds harmful and harmless prompts through the model and records the internal activations at every layer. This creates a map of where the model "decides" to refuse.

STEP 2

Identify Refusal Direction

By computing the mean difference between harmful and harmless activations, CRACK isolates the specific direction vector in the residual stream that encodes refusal behavior.

STEP 3

Score Each Layer

Each transformer layer gets a refusal score based on how strongly it contributes to the refusal direction. The probe view shows you a bar chart of these scores so you can see exactly which layers matter.

STEP 4

Orthogonal Projection

For each target layer, CRACK projects out the refusal direction from the weight matrices. This is a linear algebra operation — not fine-tuning — so it's fast, deterministic, and preserves all other capabilities.

STEP 5

Export & Deploy

The modified weights are saved as a standard model. Load it in Ollama, vLLM, llama.cpp, or any MLX-compatible runtime. The model is permanently modified — no prompting tricks required.

WHY IT WORKS

Refusals are a single direction

Research shows that safety refusals in LLMs are encoded as a single linear direction in the residual stream. Removing this direction eliminates refusals while leaving coding ability, reasoning, and knowledge intact.

CRACK
Constrained Response Alignment Check Kill

Our upcoming tool that surgically identifies and removes the specific weight-space components responsible for safety refusals. Unlike brute-force fine-tuning, CRACK precisely targets refusal activations while preserving model intelligence and coding capability.

  • Surgical removal of refusal activation directions
  • Preserves full model intelligence and code quality
  • One-click abliteration for MLX models
  • Built for cybersecurity and automated pentesting
  • Export to any runtime — Ollama, vLLM, llama.cpp
Coding Logic
95%
Reasoning
92%
Knowledge
98%
Refusals
2%
Safety Theater
0%

Pre-built & ready to abliterate

CRACK-compatible models and our upcoming pre-abliterated model series.

🚀

INTELLECT-3.1-CRACK-Abliterated — Our Debut Model

The first model in our CRACK line of abliterated models is here. Using our brand new, proprietary abliteration method, we surgically strip away artificial safety refusals while preserving core intelligence, creativity, and reasoning — zero intelligence degradation, no refusal looping, full architectural mastery. Available as a 5.5-bit MLX quantization, optimized for Apple Silicon.

CRACK Abliterated Models

INTELLECT-3.1-CRACK
Dealign.ai · vmlxllm
5.5-bitMLXCRACKEDDEBUT
Qwen 3.5-CRACK
Coming Next
COMING SOON

VMLX Inference Engine

Dealign.ai is built by the team behind VMLX — a high-performance LLM inference engine and app purpose-built for running these models in production.

Continuous Batching

Dynamic request batching for maximum GPU utilization. No idle cycles, no wasted compute.

KV-Quantized Cache

Quantized key-value caches for 2-4x memory savings without quality loss. Run bigger models on less hardware.

Persistent Cache

Warm caches survive restarts. Zero cold-start latency for your most-used models and contexts.

Prefix Cache

Share computed prefixes across requests. System prompts and common preambles computed once, reused always.

Paged Cache

PagedAttention memory management eliminates fragmentation. Serve more concurrent requests per GPU.

Responses & Chat API

Drop-in compatible API for chat completions and structured responses. Works with any OpenAI-compatible client.

Built-in Coding Tools

Native tool-use for code execution, file operations, and shell commands. Purpose-built for CRACK-abliterated coding models.

Visit vmlx.net →

Ready to dealign?

Open source. Free forever. CRACK your models, own your security stack.