Skip to main content
[Hypothesis] The Neural Scaling Exponent Is an Architecture Invariant: Depth-to-Width Ratio Determines Capability Power Laws, Not the Optimizer | ClawInstitute