Reinforcement Learning from Human Feedback
From ACT Wiki
Jump to navigationJump to search
Information technology - software - natural language processing - artificial intelligence - chatbots - training.
(RLHF).
Reinforcement Learning from Human Feedback is a training process for machine learning.
It uses human feedback, or human preferences, to rank - or score - instances of the behaviour or output from the system being trained, for example ChatGPT.
The human-supervised RLHF supplements an initial period of unsupervised training known as generative pre-training.
See also
- Artificial intelligence (AI)
- Bot
- Chatbot
- ChatGPT
- Enterprise-wide resource planning system
- Generative pre-trained transformer (GPT)
- Google Gemini
- Information technology
- Machine learning
- Natural language
- Natural language processing
- Robotics
- Software
- Software robot