Preface
About the Book
Various intelligent applications such as video games, inventory management software, warehouse robots, and translation tools use Reinforcement Learning (RL) to make decisions and perform actions that maximize the probability of the desired outcome. This book will help you to get to grips with the techniques and the algorithms for implementing RL in your machine learning models.
Starting with an introduction to RL, you'll be guided through different RL environments and frameworks. You'll learn how to implement your own custom environments and use OpenAI baselines to run RL algorithms. Once you've explored classic RL techniques such as Dynamic Programming, Monte Carlo, and TD Learning, you'll understand when to apply the different deep learning methods in RL and advance to deep Q-learning. The book will even help you understand the different stages of machine-based problem-solving by using DARQN on a popular video game Breakout. Finally, you'll find out when to use a policy-based method to tackle an RL problem.
By the end of The Reinforcement Learning Workshop, you'll be equipped with the knowledge and skills needed to solve challenging machine learning problems using reinforcement learning.
Audience
If you are a data scientist, machine learning enthusiast, or a Python developer who wants to learn basic to advanced deep reinforcement learning algorithms, this workshop is for you. A basic understanding of the Python language is necessary.
About the Chapters
Chapter 1, Introduction to Reinforcement Learning, introduces you to RL, which is one of the most exciting fields in machine learning and artificial intelligence.
Chapter 2, Markov Decision Processes and Bellman Equations, teaches you about Markov chains, Markov reward processes, and Markov decision processes. You will learn about state values and action values, as well as using the Bellman equation to calculate these quantities.
Chapter 3, Deep Learning in Practice with TensorFlow 2, introduces you to TensorFlow and Keras, giving you an overview of their key features and applications and how they work in synergy.
Chapter 4, Getting Started with OpenAI and TensorFlow for Reinforcement Learning, sees you working with two popular OpenAI tools, Gym and Universe. You will learn how to formalize the interfaces of these environments, how to interact with them, and how to create a custom environment for a specific problem.
Chapter 5, Dynamic Programming, teaches you how to use dynamic programming to solve problems in RL. You will learn about the concepts of policy evaluation, policy iteration, and value iteration, and see how to implement them.
Chapter 6, Monte Carlo Methods, teaches you how to implement the various types of Monte Carlo methods, including the "first visit" and "every visit" techniques. You will see how to use these Monte Carlo methods to solve the frozen lake problem.
Chapter 7, Temporal Difference Learning, prepares you to implement TD(0), SARSA, and TD(λ) Q-learning algorithms in both stochastic and deterministic environments.
Chapter 8, The Multi-Armed Bandit Problem, introduces you to the popular multi-armed bandit problem and shows you some of the most commonly used algorithms to solve the problem.
Chapter 9, What Is Deep Q-Learning?, educates you on deep Q-learning and covers some hands-on implementations of advanced variants of deep Q-learning, such as double deep Q-learning, with PyTorch.
Chapter 10, Playing an Atari Game with Deep Recurrent Q-Networks, introduces you to Deep Recurrent Q-Networks and its variants. You will get hands-on experience in training RL agents to play an Atari game.
Chapter 11, Policy-Based Methods for Reinforcement Learning, teaches you how to implement different policy-based methods of RL, such as policy gradients, deep deterministic policy gradients, trust region policy optimization, and proximal policy optimization.
Chapter 12, Evolutionary Strategies for RL, combines evolutionary strategies with traditional machine learning methods, specifically in the selection of neural network hyperparameters. You will also identify the limitations of these evolutionary methods.
Note
The interactive version of The Reinforcement Learning Workshop contains a bonus chapter, Recent Advancements and Next Steps. This chapter teaches you novel methods of implementing reinforcement learning algorithms with an emphasis on areas of further exploration such as one-shot learning and transferable domain priors. You can find the interactive version here: courses.packtpub.com.
Conventions
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Recall that an algorithm class' implementation needs two specific methods to interact with the bandit API, decide() and update(), the latter of which is simpler and is implemented."
Words that you see onscreen (for example, in menus or dialog boxes) also appear in the text like this: "The DISTRIBUTIONS tab provides an overview of how the model parameters are distributed across epochs."
A block of code is set as follows:
class Greedy:
def __init__(self, n_arms=2):
self.n_arms = n_arms
self.reward_history = [[] for _ in range(n_arms)]
New terms and important words are shown like this: "Its architecture allows users to run it on a wide variety of hardware, from CPUs to Tensor Processing Units (TPUs), including GPUs as well as mobile and embedded platforms."
Code Presentation
Lines of code that span multiple lines are split using a backslash ( \ ). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
For example:
history = model.fit(X, y, epochs=100, batch_size=5, verbose=1, \
validation_split=0.2, shuffle=False)
Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the # symbol, as follows:
# Print the sizes of the dataset
print("Number of Examples in the Dataset = ", X.shape[0])
print("Number of Features for each example = ", X.shape[1])
Multi-line comments are enclosed by triple quotes, as shown below:
"""
Define a seed for the random number generator to ensure the
result will be reproducible
"""
seed = 1
np.random.seed(seed)
random.set_seed(seed)
Setting up Your Environment
Before we explore the book in detail, we need to set up specific software and tools. In the following section, we shall see how to do that.
Installing Anaconda for Jupyter Notebook
Jupyter notebooks are available once you install Anaconda on your system. Anaconda can be installed on Windows systems using the steps available at https://docs.anaconda.com/anaconda/install/windows/.
For other systems, navigate to the respective installation guide from https://docs.anaconda.com/anaconda/install/.
Installing a Virtual Environment
In general, it is good practice to use separate virtual environments when installing Python modules, to be sure that the dependencies of different projects do not conflict with one another. So, it is recommended that you adopt this approach before executing these instructions.
Since we are using Anaconda here, it is highly recommended that you use conda-based environment management. Run the following commands in Anaconda Prompt to create an environment and activate it:
conda create --name [insert environment name here]
conda activate [insert environment name here]
Installing Gym
To install Gym, please make sure you have Python 3.5+ installed on your system. You can simply install Gym using pip. Run the code in Anaconda Prompt, as shown in the following code snippet:
pip install gym
You can also build the Gym installation from source, by cloning the Gym Git repository directly. This type of installation proves useful when modifying Gym or adding environments if required. Use the following code to install Gym from source:
git clone https://github.com/openai/gym
cd gym
pip install -e .
Run the following code to perform a full installation of Gym. This installation may need you to install other dependencies, which include cmake and a recent version of pip:
pip install -e .[all]
In Chapter 11, Policy-Based Methods for Reinforcement Learning, you will be working in the Box2D environment available in Gym. You can install the Box2D environment by using the following command:
pip install gym "gym[box2d]"
Installing TensorFlow 2
To install TensorFlow 2, run the following command in Anaconda Prompt:
pip install tensorflow
If you are using a GPU, you can use the following command:
pip install tensorflow-gpu
Installing PyTorch
PyTorch can be installed on Windows using the steps available at https://pytorch.org/.
In the case of non-availability of a GPU on your system, you can install the CPU version of PyTorch by running the following code in Anaconda Prompt:
conda install pytorch-cpu torchvision-cpu -c pytorch
Installing OpenAI Baselines
OpenAI Baselines can be installed using the instructions at https://github.com/openai/baselines.
Download the OpenAI Baselines repository, check out the TensorFlow 2 branch, and install it as follows:
git clone https://github.com/openai/baselines.git
cd baselines
git checkout tf2
pip install -e .
We use OpenAI Baselines in Chapter 1, Introduction to Reinforcement Learning, and Chapter 4, Getting Started with OpenAI and TensorFlow for Reinforcement Learning. As OpenAI Baselines uses a version of Gym that is not the latest version, 0.14, you might get an error as follows:
AttributeError: 'EnvSpec' object has no attribute '_entry_point'
The solution to this bug is to change the two env.entry_point attributes in baselines/run.py back to env._entry_point.
The detailed solution is available at https://github.com/openai/baselines/issues/977#issuecomment-518569750.
Alternatively, you can also use the following command to upgrade the Gym installation in that environment:
pip install --upgrade gym
Installing Pillow
Use the following command in Anaconda Prompt to install Pillow:
conda install -c anaconda pillow
Alternatively, you can also run the following command using pip:
pip install pillow
You can read more about Pillow at https://pypi.org/project/Pillow/2.2.1/.
Installing Torch
Use the following command to install torch using pip:
pip install torch==0.4.1 -f https://download.pytorch.org/whl/torch_stable.html
Note that you will be using version 0.4.1 of torch only in Chapter 11, Policy-Based Methods for Reinforcement Learning. You can revert to the updated version of PyTorch by using the command under the Installing PyTorch section for the other chapters.
Installing Other Libraries
pip comes pre-installed with Anaconda. Once Anaconda is installed on your machine, all the required libraries can be installed using pip, for example, pip install numpy. Alternatively, you can install all the required libraries using pip install –r requirements.txt. You can find the requirements.txt file at https://packt.live/311jlIu.
The exercises and activities will be executed in Jupyter Notebooks. Jupyter is a Python library and can be installed in the same way as the other Python libraries – that is, with pip install jupyter, but fortunately, it comes pre-installed with Anaconda. To open a notebook, simply run the command jupyter notebook in the Terminal or Command Prompt.
Accessing the Code Files
You can find the complete code files of this book at https://packt.live/2V1MwHi.
We've tried to support interactive versions of all activities and exercises, but we recommend a local installation as well for instances where this support isn't available.
If you have any issues or questions about installation, please email us at workshops@packt.com.