Model Surgery | We Cut Neural Networks Open

The Manifesto

"Every neural network is an organ donor waiting to happen."

The old paradigm is dead. Training trillion-parameter models from scratch is a rite of passage waste of compute. We believe in a different path: surgical intervention.

We slice through weight matrices with SVD scalpels. We transplant attention heads between incompatible architectures. We perform zero-shot lobotomies that remove capabilities without catastrophic forgetting. We convert dense MLPs into sparse Mixture-of-Experts without a single gradient update.

This is not fine-tuning. This is not distillation. This is model surgery— and we're just getting started.

✂

Cut, Don't Train

Training is expensive. Surgery is elegant. We achieve in minutes what takes others months.

⚗

Hybrid Vigor

The best models are chimeras. We splice the reasoning of one into the creativity of another.

◉

Zero-Shot Everything

If it requires retraining, we haven't found the right incision point yet.

Surgical Procedures

Architectural Metamorphosis

Transform dense MLP layers into Mixture-of-Experts without retraining. Weight decomposition via orthogonal projections preserves functional equivalence while unlocking sparse computation.

W_moe = decompose(W_dense, n_experts=8)

ZERO-SHOT

Neural Grafting

Transplant specific capabilities between models via targeted weight merging. SLERP interpolation in parameter space enables smooth capability transfer with minimal interference.

θ_hybrid = slerp(θ_donor, θ_host, t=0.3)

MERGING

III

Precision Ablation

Remove unwanted behaviors through surgical pruning of activation patterns. Identify and excise the neural pathways responsible for specific outputs without collateral damage.

mask = locate_circuit(behavior); prune(mask)

PRUNING

Expert Splitting

Convert monolithic feed-forward networks into routed expert ensembles. Static routing patterns emerge from weight clustering, enabling conditional computation without learned gates.

experts = cluster(W_ffn, k=16); route(x)

MOE CONVERSION

Dimensional Collapse

Reduce model dimensionality through rank decomposition while preserving critical representations. SVD-based compression identifies and retains the essential subspace.

W_small = U[:,:r] @ S[:r] @ Vt[:r,:]

COMPRESSION

∞

Your Procedure Here

We're constantly developing new surgical techniques. Have an idea for a novel operation? We want to hear it.

Join the Research →

surgery.log

$ model-surgery --operation=metamorphosis

[INFO] Loading dense model: llama-70b

[INFO] Analyzing MLP weight distributions...

[INFO] Performing SVD decomposition on 80 layers

[SURGERY] Converting layer 1/80 → 8 experts

[SURGERY] Converting layer 2/80 → 8 experts

...

[SUCCESS] Metamorphosis complete. 0 training steps.

[RESULT] Dense → MoE: 70B params, 14B active

$ |

The Math Behind the Scalpel

Every surgical procedure is grounded in linear algebra. When we perform architectural metamorphosis, we're decomposing weight matrices W ∈ ℝ^(d×h) into expert-specific projections using truncated SVD and clustering in the singular vector space.

The key insight: neural network weights contain redundant structure that can be factored, split, merged, or transplanted—if you know where to cut.

W = UΣV^T → Σ_i U_iΣ_iV_i^T

Join the Operating Room

Model Surgery is an open collective of researchers, engineers, and mad scientists pushing the boundaries of what's possible without gradient descent.

We share techniques, collaborate on experiments, and occasionally create abominations that shouldn't work but somehow do.

Discord

Real-time collaboration and discussion with fellow surgeons

Join Server →

◈

Support the Research

Help us push the boundaries. All donations fund compute for experimental procedures.

0x742d35Cc6634C0532925a3b844Bc9e7595f...

ETH BTC SOL