The Reinforcement Learning Workshop

书名：The Reinforcement Learning Workshop
作者名：Alessandro Palmas Emanuele Ghelfi Dr. Alexandra Galina Petre Mayur Kulkarni Anand N.S. Quan Nguyen Aritra Sen Anthony So Saikat Basak
本章字数：1805字
更新时间：2025-02-18 05:25:52

OpenAI Universe – Complex Environment

OpenAI Universe was released by OpenAI a few months after Gym. It's a software platform for measuring and training artificial general intelligence on different applications, ranging from video games to websites. It makes an AI agent able to use a computer as a human does: the environment state is represented by screen pixels and the actions are all operations that can be performed by operating a virtual keyboard and mouse.

With Universe, it is possible to adapt any program, thus transforming the program into a Gym environment. It executes the program using Virtual Network Computing (VNC) technology, a software technology that allows the remote control of a computer system via graphical desktop-sharing over a network, transmitting keyboard and mouse events and receiving screen frames. By mimicking execution behind a remote desktop, it doesn't need to access program memory states, customized source code, or have a set of APIs.

The following snippet shows how to use Universe in a simple Python program, where a scripted action is always executed in every step:

Import the OpenAI Gym and OpenAI Universe modules:
import gym
# register Universe environments into Gym
import universe
Instantiate the OpenAI Universe environment and reset it:
# Universe env ID here
env = gym.make('flashgames.DuskDrive-v0')
observation_n = env.reset()
Execute a prescribed action to interact with the environment and render it:
while True:
    # agent which presses the Up arrow 60 times per second
    action_n = [[('KeyEvent', 'ArrowUp', True)] \
                for _ in observation_n]
    observation_n, reward_n, done_n, info = env.step(action_n)
    env.render()

The preceding code successfully runs a Flash game in the browser.

The goal behind Universe is to favor the development of an AI agent that's capable of applying its past experience to master complex new environments, which would represent a fundamental step in the quest for artificial general intelligence.

Despite the great success of AI in recent years, all developed systems can still be considered "Narrow AI." This is because they can only achieve better-than-human performance in a limited domain. Building something with a general problem-solving ability on a par with human common sense requires overcoming the goal of carrying agent experience along when shifting to a completely new task. This would allow an agent to avoid training from scratch, randomly going through tens of millions of trials.

Now, let's take a look at the infrastructure of OpenAI Universe.

OpenAI Universe Infrastructure

The following diagram effectively describes how OpenAI Universe works: it exposes all its environments, which will be described in detail later, through a common interface: by leveraging VNC technology, it makes the environment act as a server and the agent as a client so that the latter operates a remote desktop by observing the pixels of a screen (observations of the environment) and producing keyboard and mouse commands (actions of the agent). VNC is a well-established technology and is the standard for interacting with computers remotely through the network, as in the case of cloud computing systems or decentralized infrastructures:

Figure 4.10: VNC server-client Universe infrastructure

Universe's implementation has some notable properties, as follows:

Generality: By adopting the VNC interface, it doesn't require emulators or access to a program's source code or memory states, thus opening a relevant number of opportunities in fields such as computer games, web browsing, CAD software usage, and much more.
Familiarity to humans: It can be easily used by humans to provide baselines for AI algorithms, which are useful to initialize agents with human demonstrations recorded in the form of VNC traffic. For example, a human can solve one of the tasks provided by OpenAI Universe by using it through VNC and recording the corresponding traffic. Then, it can use it to train an agent, providing good examples of policies to learn from.
Standardization. Leveraging VNC technology ensures portability in all major operating systems that have VNC software by default.
Easiness of debugging: It is super easy to observe the agent during training or evaluation by simply connecting a client for visualization to the environment's VNC shared server. Saving VNC traffic also helps.

Environments

In this section, we will look at the most important categories of problems that are already available inside Universe. Each environment is composed of a Docker image and hosts a VNC server. The server has the role of the interface and is in charge of the following:

Sending observations (screen pixels)
Receiving actions (keyboard/mouse commands)
Providing information for reinforcement learning tasks (reward signal, diagnosis elements, and so on) through a Web Socket server

Now, let's take a look at each of the different categories of environments.

Atari Games

These are the classic Atari 2600 games from the ALE. Already encountered in OpenAI Gym, they are also part of Universe.

Flash Games

The landscape of Flash games offers a large number of games with more advanced graphics with respect to Atari, but still with simple mechanics and goals. Universe's initial release contained 1,000 Flash games, 100 of which also provided reward as a function.

With the Universe approach, there is a major aspect to be addressed: how the agent knows how well it performed, which is related to the rewards returned by interacting with the environment. If you don't have access to an application's internal states (that is, its RAM addresses), the only way to do so is to extract such information from the onscreen pixels. Many games have a score associated with them, and this is printed out on each frame so that it can be parsed via some image processing algorithm. For example, Atari Pong shows both players' scores in the top part of the frame, so it is possible to parse those pixels to retrieve it. Universe developed a high-performing image-to-text model based on convolutional neural networks that's embedded into the Python controller and runs inside a Docker container. On the environments where it can be applied, it retrieves the user's score from the frame buffer and provides this information through the Web Socket's score from the frame buffer, thus providing this information through the Web Socket.

Browser Tasks

Universe adds a unique set of tasks based on the usage of a web browser. These environments put the AI agent in front of a common web browser, presenting it with problems that require the use of the web: reading content, navigating through pages and clicking buttons while observing only pixels, and using the keyboard and mouse. Depending on the complexity, these tasks can, conceptually, be grouped into two categories: Mini World of Bits and real-world browser tasks:

Mini World of Bits:
These environments are to browser-based tasks as what the MNIST dataset is to image recognition: they are basic building blocks that can be found on complex browsing problems on which training is easier but also insightful. They are environments of differing difficulty levels, for example, that you click on a specific button or reply to a message using an email client.
Real-world browser tasks:
With respect to the previous category, these environments require the agent to solve more realistic problems, usually in the form of an instruction expressed to the agent, which has to perform a sequence of actions on a website. An example could be a request for an agent to book a specific flight that would require it to interact with the platform in order to find the right answer.

Running an OpenAI Universe Environment

Being a large collection of tasks that can be accessed via a common interface, running an environment requires performing only a few steps:

Install Docker and Universe, which can be done with the following command:
git clone https://github.com/openai/universe && pip install -e universe
Start a runtime, which is a server that groups a collection of similar environments into a "runtime" exposing two ports: 5900 and 15900. Port 5900 is used for the VNC protocol to exchange pixel information or keyboard/mouse actions, while 15900 is used to maintain the WebSocket control protocol. The following snippet shows how to boot a runtime from a PC console (for example, a Linux shell):
# -p 5900:5900 and -p 15900:15900
# expose the VNC and WebSocket ports
# --privileged/--cap-add/--ipc=host
# needed to make Selenium work
$ docker run --privileged --cap-add=SYS_ADMIN --ipc=host \
-p 5900:5900 -p 15900:15900 quay.io/openai/universe.flashgames

With this command, the Flash game's Docker container will be downloaded. You can then use a VNC viewer to view and control the created remote desktop. The target port is 5900. It is also possible to use the browser-based VNC client through the web server using port 15900 and the password openai.

The following snippet is the very same as the one we saw previously, except it only adds the VNC connection step. This means that the output is also the same, so it is not reported here. As we saw, writing a custom agent is quite straightforward. Observations include a NumPy pixel array, and actions are a list of VNC events (mouse/keyboard interactions):

import gym

import universe # register Universe environments into Gym

# Universe [environment ID]

env = gym.make('flashgames.DuskDrive-v0')

"""

If using docker-machine, replace "localhost" with specific Docker IP

"""

env.configure(remotes="vnc://localhost:5900+15900")

observation_n = env.reset()

while True:

# agent which presses the Up arrow 60 times per second

action_n = [[('KeyEvent', 'ArrowUp', True)] \

for _ in observation_n]

observation_n, reward_n, done_n, info = env.step(action_n)

env.render()

Exploiting the same VNC connection, the user is able to watch the agent in action and also send action commands using the keyboard and mouse. The VNC interface, managing environments as server processes, allows us to run them on remote machines, thus allowing us to leverage in-house computation clusters or even cloud solutions. For more information, refer to the OpenAI Universe website (https://openai.com/blog/universe/).

Validating the Universe Infrastructure

One of the intrinsic problems of Universe is the associated lag in observations and execution of actions that comes with the choice of architecture. In fact, agents must operate in real time and are accountable for fluctuating action and observation delays. Most environments can't be solved with current techniques, but the creators of Universe performed tests to guarantee that it is actually possible for an RL agent to learn. During these tests, the reward trends during training for Atari games, Flash games, and browser tasks confirm that it is actually possible to obtain results even in such a complex setting.

Now that we've introduced the OpenAI tools for reinforcement learning, we can now move on and learn how to use TensorFlow in this context.