ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
Published in arXiv, 2024
This paper presents the ChatGLM-RLHF pipeline for aligning ChatGLM with human preferences. It introduces strategies to mitigate reward variance for stabilized large-scale training, model parallelism with fused gradient-descent, and regularization to avoid catastrophic forgetting. ChatGLM-RLHF achieves on average 15% more wins against ChatGLM-SFT in Chinese alignment tasks.
Recommended citation: Hou, Z., Niu, Y., Du, Z., Zhang, X., Liu, X., Zeng, A., Zheng, Q., Huang, M., Wang, H., Tang, J., and Dong, Y. (2024). "ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback." arXiv:2404.00934.
Download Paper
