Reinforcement Learning from Human Feedback

From ACT Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Information technology - software - natural language processing - artificial intelligence - chatbots - training.

(RLHF).

Reinforcement Learning from Human Feedback is a training process for machine learning.

It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.


The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.


See also


Other resource