Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy

Today's Overview

  • A single spectral condition unifies μP scaling across width and depth. No more per-architecture, per-optimizer derivations for hyperparameter transfer. Code included.
  • Data curation itself leaks membership information. Anthropic shows that even models trained only on public data expose the composition of the original dataset through the selection process.
  • VLMs give dexterous hands natural language instructions. UniHM uses a unified tokenizer to generalize across hand morphologies, trained only on human-object interaction video. No teleoperation data needed.

Featured

01 Training One Spectral Condition to Rule Width and Depth Scaling

μP (Maximal Update Parametrization) solved hyperparameter transfer when models get wider. When models get both wider and deeper, existing solutions fragment: SGD has one set of rules, AdamW another, and a new architecture means rederiving everything.

This paper introduces a spectral condition that constrains how weight matrix norms and per-step updates should scale with both width and depth. The condition is general enough to recover all previously derived μP formulas as special cases, while extending naturally to additional optimizers.

On GPT-2-style language models, spectral μP maintains stable feature learning under joint width-depth scaling. Hyperparameter transfer stays consistent. The paper ships with code. For teams doing actual scaling work, this moves the workflow from "tune each configuration separately" to theory-backed systematic transfer.

Key takeaways: - Spectral condition unifies previously fragmented width-depth μP formulas across architectures and optimizers - GPT-2 experiments confirm hyperparameter transfer holds under joint width-depth scaling - Code available; teams doing scaling can adopt directly


02 Safety "Train Only on Curated Public Data" Doesn't Mean Privacy-Safe

Data curation is an increasingly popular privacy strategy: use sensitive data to guide selection, then train only on the filtered public subset. The model never directly touches private data. Sounds clean.

Anthropic's research team stress-tested this assumption. Every stage of the curation pipeline — scoring, subset selection, the final model — leaks membership information about the original dataset. Membership inference attacks can determine whether a specific data point participated in the curation process, even when the model itself trained exclusively on public data.

Applying differential privacy to the curation methods effectively mitigates the leakage. This isn't an unsolvable problem. It means the privacy boundary needs to extend from "the training process" to "the data selection process."

Key takeaways: - "Trained only on public data" does not equal privacy-safe; the curation process itself is an attack surface - Membership inference attacks work at every stage of the curation pipeline, not just the final model - Differential privacy adaptation for curation methods is a viable mitigation path


03 Robotics Dexterous Hands No Longer Need Per-Object Programming

Dexterous manipulation has followed two paths: train a separate policy for each object ("grasp cup," "turn screw"), or predefine hand-object interaction sequences. Neither scales to open-ended scenarios.

UniHM takes a different route. A VLM interprets free-form language instructions and plans physically feasible finger trajectories. One design choice stands out: a unified dexterous hand tokenizer maps different hand morphologies to the same codebook, solving cross-hand generalization. Training uses only human-object interaction videos — no large-scale teleoperation datasets required, which is a real advantage on data cost.

ICLR accepted, generalizes to unseen objects across multiple benchmarks. The evaluation is still largely lab-based from what the abstract describes. Real deployment remains a step away.

Key takeaways: - VLM converts natural language instructions into physically feasible dexterous hand trajectories, bypassing per-object programming - Unified tokenizer lets different hand morphologies share one representation, reducing adaptation cost for new hands - Trained on human-object interaction data only, no dependence on teleoperation datasets

Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy

Also Worth Noting

04
GRPO Moves From LLM Alignment to 3D Mesh Generation Image Genasynchronous advantage-guided preference optimization replaces offline DPO for artistic quad-mesh generation. link
05
Idempotent Experience Replay Mitigates Catastrophic Forgetting Trainingmore stable under high-reliability requirements in continual learning. link
06
Mamba/SSM Handles Industrial-Scale CAD Sequences Architectureefficiency advantage over Transformers pays off in fine-grained part modeling. link
07
Wavelet Transform Detects Semantic Boundaries for Video Frame Selection Multimodalpreserves narrative structure better than query-relevance-based selection. link
08
Cross-Modal Counting Benchmark for MLLMs Evaluationunified image-text-audio counting evaluation reveals basic numeracy gaps. link
09
Gaussian Splatting Reconstructs Radar-Quality Precipitation Fields From Sparse Weather Stations AI for Sciencea new path for low-cost weather monitoring. link
10
Molecular Representation Shifts From Atom-Centric to Bond-Centric AI for Scienceresonance and stereoselectivity are no longer ignored at the bond level. link
11
Visual Autoregressive Next-Scale Prediction for Super-Resolution Image Genaddresses global consistency in upscaling. link