Artificial Intelligence: Reinforcement Learning in Python course will teach students about stock trading and online advertising applications. Students will also learn about Markov Decision Processes (MDPs) and ways to calculate means and moving averages and their relationship to stochastic gradient descent.

The course will teach students how to use OpenAI Gym, with zero code changes. The course is usually available at INR 2,999 on Udemy but now you can get Artificial Intelligence: Reinforcement Learning in Python for up to 85% off i.e. INR 455

Who all can opt for this course?

  • Students who wants to learn about artificial intelligence, data science, machine learning, and deep learning
  • Professionals who wants to gain expertise in artificial intelligence

Course Highlights

Key HighlightsDetails
Registration LinkApply Now!
PriceINR 2,999 (INR 2,999) 85% off
Duration14.5 Hours
Student Enrollment43,633 students
InstructorLazy Programmer Team
Topics CoveredPython programming, Reinforcement learning, Markov Decision process, Dynamic programming
Course LevelIntermediate
Total Student Reviews9,610

Learning Outcomes

  • Employ reinforcement learning techniques based on supervised machine learning gradients
  • Technical understanding of reinforcement learning
  • Recognize the connection between psychology and reinforcement learning
  • Apply 17 different algorithms for reinforcement learning
  • Apply gradient-based machine learning methods to reinforcement learning

Course Content

S.No.Module (Duration)Topics
1.Welcome (40 minutes)Introduction
Course Outline and Big Picture
Where to get the Code
How to Succeed in this Course
2.Return of the Multi-Armed Bandit (02 hours 56 minutes)Section Introduction: The Explore-Exploit Dilemma
Applications of the Explore-Exploit Dilemma
Epsilon-Greedy Theory
Calculating a Sample Mean (pt 1)
Epsilon-Greedy Beginner’s Exercise Prompt
Designing Your Bandit Program
Epsilon-Greedy in Code
Comparing Different Epsilons
Optimistic Initial Values Theory
Optimistic Initial Values Beginner’s Exercise Prompt
Optimistic Initial Values Code
UCB1 Theory
UCB1 Beginner’s Exercise Prompt
UCB1 Code
Bayesian Bandits / Thompson Sampling Theory (pt 1)
Bayesian Bandits / Thompson Sampling Theory (pt 2)
Thompson Sampling Beginner’s Exercise Prompt
Thompson Sampling Code
Thompson Sampling With Gaussian Reward Theory
Thompson Sampling With Gaussian Reward Code
Exercise on Gaussian Rewards
Why don’t we just use a library?
Nonstationary Bandits
Bandit Summary, Real Data, and Online Learning
(Optional) Alternative Bandit Designs
Suggestion Box
3.High Level Overview of Reinforcement Learning (16 minutes)What is Reinforcement Learning?
From Bandits to Full Reinforcement Learning
4.Markov Decision Proccesses (01 hour 59 minutes)MDP Section Introduction
Choosing Rewards
The Markov Property
Markov Decision Processes (MDPs)
Future Rewards
Value Functions
The Bellman Equation (pt 1)
The Bellman Equation (pt 2)
The Bellman Equation (pt 3)
Bellman Examples
Optimal Policy and Optimal Value Function (pt 1)
Optimal Policy and Optimal Value Function (pt 2)
MDP Summary
5.Dynamic Programming (02 hours 04 minutes)Dynamic Programming Section Introduction
Iterative Policy Evaluation
Designing Your RL Program
Gridworld in Code
Iterative Policy Evaluation in Code
Windy Gridworld in Code
Iterative Policy Evaluation for Windy Gridworld in Code
Policy Improvement
Policy Iteration
Policy Iteration in Code
Policy Iteration in Windy Gridworld
Value Iteration
Value Iteration in Code
Dynamic Programming Summary
6.Monte Carlo (58 minutes)Monte Carlo Intro
Monte Carlo Policy Evaluation
Monte Carlo Policy Evaluation in Code
Monte Carlo Control
Monte Carlo Control in Code
Monte Carlo Control without Exploring Starts
Monte Carlo Control without Exploring Starts in Code
Monte Carlo Summary
7.Temporal Difference Learning (37 minutes)Temporal Difference Introduction
TD(0) Prediction
TD(0) Prediction in Code
SARSA in Code
Q Learning
Q Learning in Code
TD Learning Section Summary
8.Approximation Methods (01 hour 13 minutes)Approximation Methods Section Introduction
Linear Models for Reinforcement Learning
Feature Engineering
Approximation Methods for Prediction
Approximation Methods for Prediction Code
Approximation Methods for Control
Approximation Methods for Control Code
CartPole Code
Approximation Methods Exercise
Approximation Methods Section Summary
9.Interlude: Common Beginner Questions (07 minutes)This Course vs. RL Book: What’s the Difference?
10.Stock Trading Project with Reinforcement Learning (01 hour 21 minutes)Beginners, halt! Stop here if you skipped ahead
Stock Trading Project Section Introduction
Data and Environment
How to Model Q for Q-Learning
Design of the Program
Code pt 1
Code pt 2
Code pt 3
Code pt 4
Stock Trading Project Discussion
11.Setting Up Your Environment (FAQ by Student Request) (37 minutes)Anaconda Environment Setup
How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow
12.Extra Help With Python Coding for Beginners (FAQ by Student Request) (42 minutes)How to Code by Yourself (part 1)
How to Code by Yourself (part 2)
Proof that using Jupyter Notebook is the same as not using it
Python 2 vs Python 3
13.Effective Learning Strategies for Machine Learning (FAQ by Student Request) (59 minutes)How to Succeed in this Course (Long Version)
Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?
Machine Learning and AI Prerequisite Roadmap (pt 1)
Machine Learning and AI Prerequisite Roadmap (pt 2)
14.Appendix / FAQ Finale (08 minutes)What is the Appendix?

Resources Required

  • Understanding of Calculus (derivatives)
  • Know about Markov models and probability
  • Knowledge of Numpy, Matplotlib
  • Beneficial to have experience with at least a few supervised machine learning methods
  • Gradual ascent
  • Excellent programming skills in object-oriented

Featured Review

Luca D’Alessandro (5/5) : One of the best courses I’ve ever taken on the topic. Super useful to learn building basic environments and RL agent dynamics, and to develop intuitions for more sophisticated models. The mathematical rigor comes together with a nice teaching strategy for the coding part.


  • Hermon Alfaro (5/5) : it has the perfect equilibrium between the conceptual explanations and the code.
  • Harrison Yoon (5/5) : The provided code is masterfully written and worked without a hitch.
  • Donal John (4/5) : This is probably the best course that I have found on the topic of reinforcement learning.
  • Marcin Soboci?ski (5/5) : the explanations given (like in Bellman examples chapter) are just the best I could find anywhere.


  • Javid Jamae (1/5) : Instead of just giving you the problems and solutions and walking you through everything (the way EVERY other Udemy instructor does it) he gives vague descriptions of what you have to do and then says that he expects that you should go figure it out yourself.
  • Jonathan Hogg (2/5) : Well, Lazy Programmer will tell you repeatedly throughout all of his courses.
  • Niels Pichon (2/5) : The name of the guy is well chosen: Everywhere where he can be lazy he is.
  • Con Land (1/5) : To the “Lazy Programmer”, I recommend that you become an “Expert Programmer” before selling courses.

About the Author

The instructor of this course is Lazy Programmer Team who is a Artificial Intelligence and Machine Learning Engineer. With 4.7 Instructor Rating and 51,767 Reviews on Udemy, he/she offers 17 Courses and has taught 188,509 Students so far.

  • Instructor have also been recognised as a data scientist, big data engineer, and full stack software engineer, Instructor currently spend the majority of his time as an artificial intelligence and machine learning engineer with an emphasis on deep learning
  • Instructor earned his first master’s degree in computer engineering with a focus on machine learning and pattern recognition more than ten years ago
  • Instructor’s second master’s degree in statistics with a focus on financial engineering was awarded to him
  • Data scientist and big data engineer with experience in online advertising and digital media (optimising click and conversion rates) (building data processing pipelines)
  • Instructor routinely use big data technologies like Hadoop, Pig, Hive, MapReduce, and Spark
  • Instructor has developed deep learning models for text modelling, image and signal processing, user behaviour prediction, and click-through rate estimation
  • Instructor work with recommendation systems, I’ve used collaborative filtering and reinforcement learning, and we validated the findings using A/B testing
  • Instructor have taught students at universities like Columbia University, NYU, Hunter College, and The New School in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics
  • Instructor’s web programming skills have helped numerous businesses
  • Instructor handle all of the server-side backend work, frontend HTML/JS/CSS work, and operations/deployment work

Comparison Table

ParametersArtificial Intelligence: Reinforcement Learning in PythonAdvanced AI: Deep Reinforcement Learning in PythonDeep Learning Prerequisites: Linear Regression in Python
OffersINR 455 (INR 2,999) 85% offINR 455 (INR 3,499) 87% offINR 455 (INR 3,499) 87% off
Duration14.5 hours10.5 hours6.5 hours
Rating4.8 /54.6 /54.6 /5
Student Enrollments43,63336,71731,439
InstructorsLazy Programmer TeamLazy Programmer TeamLazy Programmer Inc.
Register HereApply Now!Apply Now!Apply Now!

Leave feedback about this

  • Rating