Reinforcement Learning from Human Feedback

From ACT Wiki
Revision as of 21:20, 8 April 2023 by imported>Doug Williamson (Create page - sources - Wikipedia - https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#:~:text=In%20machine%20learning%2C%20reinforcement%20learning,learning%20(RL)%20through%20an%20optimization - ACT - https://www.treasurers.org/hub)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Information technology - software - natural language processing - artificial intelligence - chatbots - training.

(RLHF).

Reinforcement Learning from Human Feedback is a training process for machine learning.

It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.


The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.


See also


Other resource