AI论文简报
搜索
方法论
公众号
EN
Lorem Ipsum救回GRPO难题样本
从578篇论文中选出12篇
重点关注
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
score 10
入选 HF Daily Papers;HF 热度: 74 upvotes (+4);有代码实现;关键词(3): lightweight, agentic, reasoning
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
score 10
入选 HF Daily Papers;HF 热度: 34 upvotes (+4);有代码实现;关键词(1): fine-tuning
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
score 10
入选 HF Daily Papers;HF 热度: 24 upvotes (+4);有代码实现;关键词(1): distillation
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
score 9
入选 HF Daily Papers;HF 热度: 17 upvotes (+3);有代码实现;关键词(2): GRPO, agentic
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
score 8
入选 HF Daily Papers;HF 热度: 63 upvotes (+4);关键词(1): distillation
Continuous Latent Diffusion Language Model
score 8
入选 HF Daily Papers;HF 热度: 59 upvotes (+4);关键词(2): scaling, compression
MiA-Signature: Approximating Global Activation for Long-Context Understanding
score 8
入选 HF Daily Papers;HF 热度: 49 upvotes (+4);关键词(3): lightweight, RAG, agentic
SkillOS: Learning Skill Curation for Self-Evolving Agents
score 8
入选 HF Daily Papers;HF 热度: 32 upvotes (+4);关键词(2): agentic, reasoning
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
score 8
入选 HF Daily Papers;HF 热度: 31 upvotes (+4);关键词(2): GRPO, reasoning
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
score 8
入选 HF Daily Papers;HF 热度: 12 upvotes (+3);有代码实现
也值得关注
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
score 5
入选 HF Daily Papers;HF 热度: 2 upvotes (+1);关键词(3): scaling, post-training, reasoning
TIDE: Every Layer Knows the Token Beneath the Context
score 4
入选 HF Daily Papers;HF 热度: 4 upvotes (+1)