Doug: Add link.

2024-05-11T22:23:48Z

Add link.

← Older revision		Revision as of 22:23, 11 May 2024
Line 18:		Line 18:
	* [[Enterprise-wide resource planning system]]		* [[Enterprise-wide resource planning system]]
	* [[Generative pre-trained transformer]] (GPT)		* [[Generative pre-trained transformer]] (GPT)
			* [[Google Gemini]]
	* [[Information technology]]		* [[Information technology]]
	* [[Machine learning]]		* [[Machine learning]]
Line 30:		Line 31:
	*[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]		*[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]

	~~[[Category:The_business_context]]~~
	[[Category:Identify_and_assess_risks]]		[[Category:Identify_and_assess_risks]]
	[[Category:Manage_risks]]		[[Category:Manage_risks]]
			[[Category:Risk_reporting]]
	[[Category:Risk_frameworks]]		[[Category:Risk_frameworks]]
	[[Category:~~Risk_reporting]]~~		[[Category:The_business_context]]
	~~[[Category:Technology~~]]

imported>Doug Williamson: Create page - sources - Wikipedia - https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#:~:text=In%20machine%20learning%2C%20reinforcement%20learning,learning%20(RL)%20through%20an%20optimization - ACT - https://www.treasurers.org/hub

2023-04-08T21:20:14Z

Create page - sources - Wikipedia - https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#:~:text=In%20machine%20learning%2C%20reinforcement%20learning,learning%20(RL)%20through%20an%20optimization - ACT - https://www.treasurers.org/hub

New page

''Information technology - software - natural language processing - artificial intelligence - chatbots - training.''

(RLHF).

Reinforcement Learning from Human Feedback is a training process for machine learning.

It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.

The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.

== See also ==
* [[Artificial intelligence]] (AI)
* [[Bot]]
* [[Chatbot]]
* [[ChatGPT]]
* [[Enterprise-wide resource planning system]]
* [[Generative pre-trained transformer]] (GPT)
* [[Information technology]]
* [[Machine learning]]
* [[Natural language]]
* [[Natural language processing]]
* [[Robotics]]
*[[Software]]
* [[Software robot]]

==Other resource==
*[https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans & Sutskever, 2018]

[[Category:The_business_context]]
[[Category:Identify_and_assess_risks]]
[[Category:Manage_risks]]
[[Category:Risk_frameworks]]
[[Category:Risk_reporting]]
[[Category:Technology]]

Reinforcement Learning from Human Feedback - Revision history

Doug: Add link.

imported>Doug Williamson: Create page - sources - Wikipedia - https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#:~:text=In%20machine%20learning%2C%20reinforcement%20learning,learning%20(RL)%20through%20an%20optimization - ACT - https://www.treasurers.org/hub