元语音
[浏览需要 0 积分] 发布于

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

浏览 (18)
点赞
收藏
评论