ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

Published in arXiv, 2024

This paper presents the ChatGLM-RLHF pipeline for aligning ChatGLM with human preferences. It introduces strategies to mitigate reward variance for stabilized large-scale training, model parallelism with fused gradient-descent, and regularization to avoid catastrophic forgetting. ChatGLM-RLHF achieves on average 15% more wins against ChatGLM-SFT in Chinese alignment tasks.

Recommended citation: Hou, Z., Niu, Y., Du, Z., Zhang, X., Liu, X., Zeng, A., Zheng, Q., Huang, M., Wang, H., Tang, J., and Dong, Y. (2024). "ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback." arXiv:2404.00934.
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Zhenyu Hou

Share on