Reinforcement Learning from Human Feedback: Difference between revisions

From ACT Wiki
Jump to navigationJump to search
imported>Doug Williamson
(Create page - sources - Wikipedia - https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#:~:text=In%20machine%20learning%2C%20reinforcement%20learning,learning%20(RL)%20through%20an%20optimization - ACT - https://www.treasurers.org/hub)
 
(Add link.)
 
Line 18: Line 18:
* [[Enterprise-wide resource planning system]]
* [[Enterprise-wide resource planning system]]
* [[Generative pre-trained transformer]]  (GPT)
* [[Generative pre-trained transformer]]  (GPT)
* [[Google Gemini]]
* [[Information technology]]
* [[Information technology]]
* [[Machine learning]]
* [[Machine learning]]
Line 30: Line 31:
*[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]
*[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]


[[Category:The_business_context]]
[[Category:Identify_and_assess_risks]]
[[Category:Identify_and_assess_risks]]
[[Category:Manage_risks]]
[[Category:Manage_risks]]
[[Category:Risk_reporting]]
[[Category:Risk_frameworks]]
[[Category:Risk_frameworks]]
[[Category:Risk_reporting]]
[[Category:The_business_context]]
[[Category:Technology]]

Latest revision as of 22:23, 11 May 2024

Information technology - software - natural language processing - artificial intelligence - chatbots - training.

(RLHF).

Reinforcement Learning from Human Feedback is a training process for machine learning.

It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.


The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.


See also


Other resource