Reinforcement Learning from Human Feedback: Difference between revisions

Latest revision as of 22:23, 11 May 2024

Information technology - software - natural language processing - artificial intelligence - chatbots - training.

(RLHF).

Reinforcement Learning from Human Feedback is a training process for machine learning.

It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.

The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.

Other resource

Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018

@@ Line 18: / Line 18: @@
 * [[Enterprise-wide resource planning system]]
 * [[Generative pre-trained transformer]]  (GPT)
+* [[Google Gemini]]
 * [[Information technology]]
 * [[Machine learning]]
@@ Line 30: / Line 31: @@
 *[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]
-[[Category:The_business_context]]
 [[Category:Identify_and_assess_risks]]
 [[Category:Manage_risks]]
+[[Category:Risk_reporting]]
 [[Category:Risk_frameworks]]
-[[Category:Risk_reporting]]
+[[Category:The_business_context]]
-[[Category:Technology]]

Reinforcement Learning from Human Feedback: Difference between revisions

Latest revision as of 22:23, 11 May 2024

See also

Other resource

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

external links

Tools

Print/export