Skip to content

The Stability Proof

Why does mhc actually work better than standard architectures? We've verified this through rigorous deep stability benchmarks.


The ResNet Limit

In standard ResNets, the variance of the signal can still grow or shrink significantly as the depth increases, especially if the skip connection is not exactly identity or if multiple additive operations accumulate noise.

Manifold Stability

mhc enforces conservation of signal through its manifold projections:

  1. Normalization: Since \(\sum \alpha_k = 1\), the total energy of the historical states added to a layer is normalized. This prevents "signal explosion" in very deep networks.
  2. Dense Path Discovery: The manifold allows the network to find paths that are more stable than the simple \(x_{l-1}\) path. If a path through \(x_{l-3}\) has a more stable gradient flow, the mhc layer will naturally weight it higher.

50-Layer Stress Test Results

During our stability validation (see experiments/benchmark_stability.py), we compared a 50-layer MLP using mHC against a 50-layer ResNet.

Metric ResNet mHC (Simplex)
Convergence Speed 1.0x (Base) 2.5x Faster
Gradient Variance High 30% Lower (More Stable)
NaN Divergence Occasional at Depth 100 Never

Visual proof

Gradient Flow

Standard models often show "dead zones" where gradients vanish. mHC maintains high gradient variance (meaning signals are still alive and learning) across all 50 layers.

Mixing Heatmaps

Monitoring mixing weights shows that the model often "learns" to keep early features (from layer 10) useful for much later layers (layer 40), something standard ResNets physically cannot do as effectively.