Advanced Customization & Research
mhc is not just a library of ready-made layers; it is a research platform. This guide explains how to extend mhc with custom manifold constraints, manual buffer management, and non-linear history mixing.
1. Implementing Custom Manifolds
While simplex and identity cover 90% of use cases, researchers may want to explore other geometries. For example, a Hyperspherical Constraint or a Laplacian Sparsity manifold.
To implement a custom constraint, follow this pattern:
import torch
def my_custom_manifold(logits, temperature=1.0):
# 1. Scale by temperature
mu = logits / temperature
# 2. Apply custom geometry (e.g. projection onto a unit sphere)
norm = torch.norm(mu, p=2, dim=-1, keepdim=True)
alpha = mu / (norm + 1e-8)
return alpha
# Usage in mHC layers
layer = MHCSkip(max_history=4)
# In your training loop, manually apply the projection
alpha = my_custom_manifold(layer.logits)
2. Manual History Buffer Management
In certain architectures (like Recurrent Vision Transformers), you might need to manage history across disconnected modules. You can use the HistoryBuffer class directly:
from mhc.layers.managed import HistoryBuffer
# Initialize a buffer for 512-channel feature maps
buffer = HistoryBuffer(max_history=8)
# During your forward pass
x_hidden = my_layer(x)
buffer.append(x_hidden)
# Pull the mixed state for another branch
mixed_history = buffer.get_mixed(weights=my_learned_weights)
3. Non-Linear History Mixing
Standard mHC uses weighted summation (Linear Mixing). High-end research often explores Attention-based Mixing.
Instead of a simple vector \(\alpha\), you can treat history as a "Key-Value" store: 1. Keys: Learned projections of historical states. 2. Values: The historical states themselves. 3. Query: The current state \(x_l\).
This turns mHC into a Local Temporal Attention mechanism. While more compute-heavy, it allows the model to "attend" to specific past events in a context-dependent way.
4. Spectral Normalization for Manifolds
If you are training Generative Adversarial Networks (GANs) with mHC, we recommend applying Spectral Normalization to the auto_project matrices. This prevents the history-projection path from becoming a source of discriminator instability.
from torch.nn.utils import spectral_norm
from mhc import MHCSequential
model = MHCSequential(..., auto_project=True)
# Apply SN to the internal projectors
for m in model.modules():
if isinstance(m, nn.Linear) and hasattr(m, 'weight'):
spectral_norm(m)
5. Information Theory: Measuring Feature Reuse
To quantify how much your model is benefiting from mHC, we recommend tracking the Effective Signal Depth (ESD):
Where \(l\) is the current layer depth. - An \(ESD\) close to 0 means the model is ignoring history. - A high \(ESD\) means the model is effectively "re-shaping" itself into a much shallower, more robust ensemble.