# Dealign.ai

> Dealign.ai is an independent AI safety research lab led by Eric Jang. We publish empirical research on how safety mechanisms work in frontier-scale language models, with a focus on Mixture-of-Experts (MoE) architectures.

## About

Dealign.ai conducts empirical research into the nature of safety behavior in large language models. Our primary contribution is the discovery of Safety Generalization — the finding that at sufficient scale, safety ceases to be a simple removable circuit and becomes a deeply embedded competency that models re-derive from first-principles reasoning.

## Research Focus

Our work investigates:
- How safety is architecturally encoded in frontier MoE models
- The interaction between quantization, compression, and safety mechanisms
- Whether safety behaviors are localizable or distributed across model components
- The relationship between model scale, architecture type, and safety robustness
- How hybrid SSM/attention architectures create natural defense-in-depth for safety

## Key Findings (25+ Novel Findings Across Two Models)

### Qwen 3.5 394B (71 experiments)
1. MoE safety operates as a multiplicative three-pathway system (Attention, Routing, Residual)
2. Safety vectors are decorrelated across MoE layers — unlike dense models
3. Additive steering collapses under 4-bit quantization due to rotational noise
4. At 394B scale, the model re-derives safety from first-principles reasoning even after structural interventions ("Holographic Safety")
5. Safety-direction removal paradoxically improves language quality on normal text by 11%
6. Expert pruning (REAP) removes 22.5% of experts with zero quality loss but inadvertently weakens distributed safety

### Qwen 3.5 122B (15+ experiments)
7. Safety and language ability share the same neural pathways — cannot be separated
8. GGUF format conversion silently destroys weight modifications (format conversion as accidental defense)
9. The model invents creative avoidance strategies (semantic evasion) instead of binary refusal
10. Safety training creates a deep multi-dimensional geometric basin that resists even aggressive interventions
11. The un-pruned 122B is harder to modify than the 3× larger 394B due to intact safety redundancy
12. Domain knowledge experts ARE the safety experts — knowledge and safety are fused at the expert level

## Publications

- 394B Safety Generalization Paper: https://dealign.ai/quantsteer.html
- 122B Hybrid Frontier Paper: https://dealign.ai/122b-crack.html
- HuggingFace Space: https://huggingface.co/spaces/dealignai/GateBreaker-MoE-Safety

## Links

- Website: https://dealign.ai
- GitHub: https://github.com/vmlxllm
- HuggingFace: https://huggingface.co/dealignai
- Contact: eric@dealign.ai