Today's Overview
- A single spectral condition unifies μP scaling across width and depth. No more per-architecture, per-optimizer derivations for hyperparameter transfer. Code included.
- Data curation itself leaks membership information. Anthropic shows that even models trained only on public data expose the composition of the original dataset through the selection process.
- VLMs give dexterous hands natural language instructions. UniHM uses a unified tokenizer to generalize across hand morphologies, trained only on human-object interaction video. No teleoperation data needed.
Featured
01 Training One Spectral Condition to Rule Width and Depth Scaling
μP (Maximal Update Parametrization) solved hyperparameter transfer when models get wider. When models get both wider and deeper, existing solutions fragment: SGD has one set of rules, AdamW another, and a new architecture means rederiving everything.
This paper introduces a spectral condition that constrains how weight matrix norms and per-step updates should scale with both width and depth. The condition is general enough to recover all previously derived μP formulas as special cases, while extending naturally to additional optimizers.
On GPT-2-style language models, spectral μP maintains stable feature learning under joint width-depth scaling. Hyperparameter transfer stays consistent. The paper ships with code. For teams doing actual scaling work, this moves the workflow from "tune each configuration separately" to theory-backed systematic transfer.
Key takeaways: - Spectral condition unifies previously fragmented width-depth μP formulas across architectures and optimizers - GPT-2 experiments confirm hyperparameter transfer holds under joint width-depth scaling - Code available; teams doing scaling can adopt directly
Source: Spectral Condition for μP under Width-Depth Scaling
02 Safety "Train Only on Curated Public Data" Doesn't Mean Privacy-Safe
Data curation is an increasingly popular privacy strategy: use sensitive data to guide selection, then train only on the filtered public subset. The model never directly touches private data. Sounds clean.
Anthropic's research team stress-tested this assumption. Every stage of the curation pipeline — scoring, subset selection, the final model — leaks membership information about the original dataset. Membership inference attacks can determine whether a specific data point participated in the curation process, even when the model itself trained exclusively on public data.
Applying differential privacy to the curation methods effectively mitigates the leakage. This isn't an unsolvable problem. It means the privacy boundary needs to extend from "the training process" to "the data selection process."
Key takeaways: - "Trained only on public data" does not equal privacy-safe; the curation process itself is an attack surface - Membership inference attacks work at every stage of the curation pipeline, not just the final model - Differential privacy adaptation for curation methods is a viable mitigation path
Source: Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
03 Robotics Dexterous Hands No Longer Need Per-Object Programming
Dexterous manipulation has followed two paths: train a separate policy for each object ("grasp cup," "turn screw"), or predefine hand-object interaction sequences. Neither scales to open-ended scenarios.
UniHM takes a different route. A VLM interprets free-form language instructions and plans physically feasible finger trajectories. One design choice stands out: a unified dexterous hand tokenizer maps different hand morphologies to the same codebook, solving cross-hand generalization. Training uses only human-object interaction videos — no large-scale teleoperation datasets required, which is a real advantage on data cost.
ICLR accepted, generalizes to unseen objects across multiple benchmarks. The evaluation is still largely lab-based from what the abstract describes. Real deployment remains a step away.
Key takeaways: - VLM converts natural language instructions into physically feasible dexterous hand trajectories, bypassing per-object programming - Unified tokenizer lets different hand morphologies share one representation, reducing adaptation cost for new hands - Trained on human-object interaction data only, no dependence on teleoperation datasets
Source: UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
