元语音 [浏览需要 0 积分] 发布于 2026-04-13 10:57:20 ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models 论文链接 语音 #Remax#LLM 浏览 (128) 点赞 收藏