Reinforcement Learning (Q-Learning)

Watch an agent learn to navigate a grid world through trial and error. Rewards and penalties guide the agent to discover optimal paths (policies).

FastSlow
Metrics
Episodes: 0
Epsilon: 1.000
The agent learns to navigate to the Goal (Green) while avoiding the Pit (Red) and Obstacles (Black). Arrows indicate the learned policy (optimal direction) for each cell.
Go Deeper

Learn Reinforcement Learning on DataCamp

Curated courses and career tracks to take your understanding from this demo to real-world mastery. All links open directly on DataCamp.

DataCamp
Cirby AI

Cirby AI

ML / AI Mastery Learning Assistant

Powered by Gemini AI • ML / AI Mastery