大模型更抗谣言却更易被噪声带跑

从64篇论文中选出10篇

重点关注

Exploration and Exploitation Errors Are Measurable for Language Model Agents score 10
入选 HF Daily Papers；HF 热度: 22 upvotes (+4)；有代码实现；关键词(3): coding, reasoning, embodied

也值得关注

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis score 4
入选 HF Daily Papers；关键词(1): reasoning
4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview score 4
关键词(1): real-time；顶会接收: CVPR
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size score 4
机构: Google；关键词(1): scaling
MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models score 4
机构: Google；关键词(2): pruning, post-training
WebXSkill: Skill Learning for Autonomous Web Agents score 4
机构: Microsoft Research；关键词(1): deployment
Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation score 4
关键词(1): edge；顶会接收: ACL
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs score 3
顶会接收: ICLR
Some Theoretical Limitations of t-SNE score 3
机构: MIT
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting score 3
顶会接收: ICLR