Summary of Reinforcement Learning Frameworks

There are many reinforcement learning frameworks out there, but since relatively few people use them, documentation tends to be lacking.

In this article, I have put together a rough summary of the pros and cons of each framework.

I hope this serves as a helpful reference when choosing a framework.

What Is a Reinforcement Learning Framework?

There is no strict definition, but here we consider something a reinforcement learning framework if it satisfies at least one of the following:

Pre-implemented RL agent algorithms (e.g., TRPO, A3C, ...) are ready to use
Training progress can be easily visualized during learning (e.g., Tensorboard support)
You can add custom environments or agents for training
Training code is provided in an abstracted form so you don't have to implement the details yourself

In short, it is a library that makes at least one of the following easier: model and training implementation, training execution, or visualization.

Caveats

While convenient...

Once you hit a bug, it can be very hard to recover.

Since the user base is small, documentation tends to be scarce
The framework implementations themselves tend to be complex

These are notable drawbacks. For example, the OpenAI Spinning Up Document states:

RL libraries frequently make choices for abstraction that are good for code reuse between algorithms, but which are unnecessary if you're only writing a single algorithm or supporting a single use case.

The more feature-rich and abstract a framework becomes, the more complex its internal implementation gets, so it is recommended to consider:

Do you actually need a framework?
How much do you want the framework to handle for you?

when choosing a framework.

Viable Frameworks

RL Coach

A framework made by Intel that supports Tensorflow and MXnet.

Pros

A very large number of algorithms are already implemented
Supports many training environments, including OpenAI Gym (https://github.com/NervanaSystems/coach#supported-environments)
A preset feature allows you to create files containing only hyperparameters to pass to the model By writing multiple presets, you can try different hyperparameters without modifying the code
You can display agent behavior in real time during training (https://nervanasystems.github.io/coach/usage.html#rendering-the-environment）)
The dashboard feature enables detailed visualization of training results
- In multi-agent training, you can display all agents' training results on a single graph
- When running multiple experiments with the same algorithm, it can automatically display a graph showing the average of those learning curves 　-> This is useful for evaluating model performance when training is unstable

Cons

The overall implementation is complex, making it difficult to add your own agents
Distributing agents is harder than with RLlib (described below), and requires AWS S3
The aforementioned dashboard is not user-friendly

RLlib

A framework that supports all deep learning frameworks and excels at agent distribution. Implementation of agents is somewhat easier with Tensorflow or PyTorch.

Pros

You can set up multi-agent training just by implementing individual workers
- The Trainer handles the "worker trains -> parameter update" process in multi-agent training
- Multi-agent training is possible even with deep learning frameworks that don't natively support it
Automatic hyperparameter tuning using distributed computing (tune library)
- State-of-the-art hyperparameter tuning algorithms such as Population Based Training are also available
Training environments can be added by writing them in OpenAI Gym format
Tensorboard can be used for visualization

Cons

The internal implementation is complex, making it difficult to add your own agents
You need to learn the unique distributed processing notation of the Ray library used internally

dopamine

A reinforcement learning framework from Google (though not an official Google project).

Pros

Records detailed logs during training (learning curves, replay buffers, model parameters) in a memory-efficient manner using gin
Replay videos of agent behavior can be easily created

Cons

Only 4 agent algorithms are implemented (DQN, Rainbow, C51, IQN)
Insufficient documentation
The only supported environment is Atari, and since there is no documentation on how to add environments, adding your own is quite laborious
There are only 4 contributors, and the majority of the code was written by one of them, which raises concerns about the project's future

(Update: There have been no commits since December of last year.)

Stable Baseline

A fork of OpenAI Baselines that supports Tensorflow. More of a collection of model implementations than a framework.

Pros

The API is modeled after scikit-learn, making it easy to use
Basic models are already implemented
The agent algorithm implementations are well-commented and easy to understand, making them easy to modify
Multi-process execution is also supported

Cons

Only environments supported by OpenAI Gym are supported, so you need to write your own wrapper for other environments
You need to write code to run experiments

Frameworks That Seem Less Useful

Tensorforce

A framework for Tensorflow, as the name suggests.

Pros

Properly modularized, making it easy to study and write code.

Cons

Not many agent algorithms are implemented, so you need to write your own
The volume and depth of documentation is somewhat lacking

Keras-RL

A framework for Keras, as the name suggests.

Pros

There is so little documentation that it is not even clear what the advantages are.

Cons

The documentation is unbelievably sparse. Truly sparse. (https://keras-rl.readthedocs.io/en/latest/core）)
Not many agent algorithms are implemented, so you need to write your own

What Is a Reinforcement Learning Framework?​

Caveats​

Viable Frameworks​

RL Coach​

Pros​

Cons​

RLlib​

Pros​

Cons​

dopamine​

Pros​

Cons​

Stable Baseline​

Pros​

Cons​

Frameworks That Seem Less Useful​

Tensorforce​

Pros​

Cons​

Keras-RL​

Pros​

Cons​

References​

What Is a Reinforcement Learning Framework?

Caveats

Viable Frameworks

RL Coach

Pros

Cons

RLlib

Pros

Cons

dopamine

Pros

Cons

Stable Baseline

Pros

Cons

Frameworks That Seem Less Useful

Tensorforce

Pros

Cons

Keras-RL

Pros

Cons

References