Zero-shot architectural metamorphosis. Weight transplants. Neural lobotomies. The future of AI isn't training from scratch—it's surgery.
"Every neural network is an organ donor waiting to happen."
The old paradigm is dead. Training trillion-parameter models from scratch is a rite of passage waste of compute. We believe in a different path: surgical intervention.
We slice through weight matrices with SVD scalpels. We transplant attention heads between incompatible architectures. We perform zero-shot lobotomies that remove capabilities without catastrophic forgetting. We convert dense MLPs into sparse Mixture-of-Experts without a single gradient update.
This is not fine-tuning. This is not distillation. This is model surgery— and we're just getting started.
Training is expensive. Surgery is elegant. We achieve in minutes what takes others months.
The best models are chimeras. We splice the reasoning of one into the creativity of another.
If it requires retraining, we haven't found the right incision point yet.
Transform dense MLP layers into Mixture-of-Experts without retraining. Weight decomposition via orthogonal projections preserves functional equivalence while unlocking sparse computation.
W_moe = decompose(W_dense, n_experts=8)
Transplant specific capabilities between models via targeted weight merging. SLERP interpolation in parameter space enables smooth capability transfer with minimal interference.
θ_hybrid = slerp(θ_donor, θ_host, t=0.3)
Remove unwanted behaviors through surgical pruning of activation patterns. Identify and excise the neural pathways responsible for specific outputs without collateral damage.
mask = locate_circuit(behavior); prune(mask)
Convert monolithic feed-forward networks into routed expert ensembles. Static routing patterns emerge from weight clustering, enabling conditional computation without learned gates.
experts = cluster(W_ffn, k=16); route(x)
Reduce model dimensionality through rank decomposition while preserving critical representations. SVD-based compression identifies and retains the essential subspace.
W_small = U[:,:r] @ S[:r] @ Vt[:r,:]
We're constantly developing new surgical techniques. Have an idea for a novel operation? We want to hear it.
Join the Research →
Every surgical procedure is grounded in linear algebra. When we perform
architectural metamorphosis, we're decomposing weight matrices
W ∈ ℝ^(d×h) into expert-specific projections using
truncated SVD and clustering in the singular vector space.
The key insight: neural network weights contain redundant structure that can be factored, split, merged, or transplanted—if you know where to cut.
Model Surgery is an open collective of researchers, engineers, and mad scientists pushing the boundaries of what's possible without gradient descent.
We share techniques, collaborate on experiments, and occasionally create abominations that shouldn't work but somehow do.
Help us push the boundaries. All donations fund compute for experimental procedures.
0x742d35Cc6634C0532925a3b844Bc9e7595f...