Sources | 120B on One GPU, and 40% of Video Benchmarks Are Guessable

Featured

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces score 11
机构: Apple; 入选 HF Daily Papers; HF 热度: 16 upvotes (+3); 有代码实现
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding score 10
入选 HF Daily Papers; HF 热度: 201 upvotes (+4); 有代码实现; 关键词(2): reasoning, leaderboard
Watch Before You Answer: Learning from Visually Grounded Post-Training score 10
入选 HF Daily Papers; HF 热度: 26 upvotes (+4); 有代码实现; 关键词(4): post-training, reasoning, vision-language, data curation
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU score 10
入选 HF Daily Papers; HF 热度: 25 upvotes (+4); 有代码实现; 关键词(1): throughput
General Multimodal Protein Design Enables DNA-Encoding of Chemistry score 10
入选 HF Daily Papers; HF 热度: 21 upvotes (+4); 有代码实现; 关键词(1): scaling
MedGemma 1.5 Technical Report score 6
入选 HF Daily Papers; HF 热度: 9 upvotes (+2); 关键词(1): reasoning