$$\newcommand{\vmath}[1]{\mathsf{#1}} \newcommand{\mapsfrom}{\leftarrow\!\shortmid} \newcommand{\maps}{\mapsto} \newcommand{\exc}{\vmath{\colF{exec}}} \newcommand{\eval}{\vmath{\colR{eval}}} \newcommand{\ufloor}{{\vmath{\colL floor}}} \newcommand{\uceil}{{\vmath{\colU ceil}}} \newcommand{\triv}{\text{Triv}} \newcommand{\idFunc}{\mathrm{Id}} \newcommand{\reals}{\mathbb{R}} \newcommand{\One}{\mathbf{1}} \renewcommand{\mathbbm}[1]{\mathbb{#1}} \renewcommand{\mathscr}[1]{\mathcal{#1}} \newcommand{\varocircle}{⦾} \newcommand{\varotimes}{⊗} \newcommand{\varovee}{(\vee)} \newcommand{\colR}{\color{darkred}} \newcommand{\colF}{\color{darkgreen}} \newcommand{\colH}{\color{blue}} \newcommand{\colI}{\color{orange}} \newcommand{\rtof}{{\colH{\varphi}}} \newcommand{\ftor}{{\colH{h}}} \newcommand{\ftoR}{{\colH{H}}} \newcommand{\Rcomp}{{\mathbb{R}}^{*}_{\small+}} \newcommand{\nonNegRealsComp}{\Rcomp} \newcommand{\nonNegReals}{\mathbb{R}_+} \newcommand{\Rcpu}[1]{\Rcomp{}[\textrm{#1}]} \newcommand{\funsp}{{\colF{\mathscr{F}}}} \newcommand{\impsp}{{\colI{\mathscr{I}}}} \newcommand{\ressp}{{\colR{\mathscr{R}}}} \newcommand{\funleq}{\posleq_{\funsp}} \newcommand{\fun}{\vmath{\colF{f}}} \newcommand{\res}{\vmath{\colR{r}}} \newcommand{\funtop}{\top_\funsp} \newcommand{\funbot}{\bot_\funsp} \newcommand{\imp}{\vmath{i}} \newcommand{\paramsp}{\mathscr{P}} \newcommand{\resleq}{\posleq_{\ressp}} \newcommand{\restop}{\top_\ressp} \newcommand{\resbot}{\bot_\ressp} \newcommand{\resspleq}{\resleq} \newcommand{\tressp}{\trof(\ressp)} \newcommand{\trof}{\mathscr{T}} \newcommand{\tres}{T} \newcommand{\tresleq}{\leq_{\trof}} \newcommand{\trleq}{\leq_{\trof}} \newcommand{\dpisp}{\ensuremath{\vmath{DPI}}} \newcommand{\cdpisp}{\ensuremath{\vmath{CDPI}}} \newcommand{\dprobsp}{\ensuremath{\vmath{DP}}} \newcommand{\dprob}{\vmath{dp}} \newcommand{\dpseries}{\vmath{series}} \newcommand{\dppar}{\vmath{par}} \newcommand{\dploop}{\vmath{loop}} \newcommand{\dploopb}{\vmath{loopb}} \newcommand{\cdprobsp}{\ensuremath{\vmath{CDP}}} \newcommand{\cdprob}{\vmath{cdp}} \newcommand{\dpatoms}{\vmath{atoms}} \newcommand{\resMin}{{\Min_{\resleq}}} \newcommand{\unconnectedfun}{\mathsf{UF}} \newcommand{\unconnectedres}{\mathsf{UR}} \newcommand{\Aressp}{{\mathsf{\colR A}\colR\ressp}} \newcommand{\Afunsp}{{\mathsf{\colF A}\colF\funsp}} \newcommand{\udpa}{\boldsymbol{u}_a} \newcommand{\udpb}{\boldsymbol{u}_b} \newcommand{\udpL}{\boldsymbol{\mathsf{L}}} \newcommand{\udpU}{\boldsymbol{\mathsf{U}}} \newcommand{\udpsp}{\vmath{UDP}} \newcommand{\udpleq}{\posleq_\udpsp} \newcommand{\dpsp}{\vmath{DP}} \newcommand{\dpleq}{\posleq_\dpsp} \newcommand{\terms}{\vmath{Terms}} \newcommand{\udpsem}{\Phi} \newcommand{\dpsem}{\varphi} \newcommand{\atoms}{\mathcal{A}} \newcommand{\atree}{\boldsymbol{\vmath{T}}} \newcommand{\val}{\boldsymbol{v}} \newcommand{\ops}{\vmath{ops}} \newcommand{\ftorL}{\ftor_L} \newcommand{\ftorU}{\ftor_U} \newcommand{\acprod}{\mathbin{\boldsymbol{\times}}} \newcommand{\oploop}{\dagger} \newcommand{\opseries}{\mathbin{\varocircle}} \newcommand{\oppar}{\mathbin{\varotimes}} \newcommand{\opcoprod}{\mathbin{\varovee}} \newcommand{\UId}{\vmath{UId}} \newcommand{\vdc}{\vmath{vdc}} \newcommand{\makedp}{\Gamma} \newcommand{\colU}{\color{purple}} \newcommand{\colL}{\color{orange}} \newcommand{\R}[1]{{\colR{#1}}} \newcommand{\F}[1]{{\colF{#1}}} \newcommand{\I}[1]{{\colI{#1}}} \newcommand{\cdpiN}{\mathcal{V}} \newcommand{\cdpin}{v} \newcommand{\cdpinA}{v_1} \newcommand{\cdpinB}{v_2} \newcommand{\cdpiresind}{i} \newcommand{\cdpifunind}{j} \newcommand{\cdpiresindA}{i_1} \newcommand{\cdpifunindB}{j_2} \newcommand{\dpinumf}{\vmath{nf}} \newcommand{\dpinumr}{\vmath{nr}} \newcommand{\cdpinnumf}{\dpinumf_\cdpin} \newcommand{\cdpinnumr}{\dpinumr_\cdpin} \newcommand{\cdpiE}{\mathcal{E}} \newcommand{\subto}{\text{s.t.}} \newcommand{\with}{\text{using}} \newcommand{\pset}{\mathscr{P}} \DeclareMathOperator*{\Min}{Min} \DeclareMathOperator*{\Inf}{Inf} \DeclareMathOperator*{\Sup}{Sup} \DeclareMathOperator*{\Max}{Max} \newcommand{\lowerbounds}{\vmath{lowerbounds}} \newcommand{\upperbounds}{\vmath{upperbounds}} \newcommand{\posMin}{\Min} \newcommand{\posleq}{\preceq} \newcommand{\poslt}{\prec} \newcommand{\posgeq}{\succeq} \newcommand{\posA}{\mathcal{P}} \newcommand{\posAleq}{\mathrel{{\posleq_\posA}}} \newcommand{\posAMin}{\mathop{{\posMin_{\posAleq}}}} \newcommand{\posAmin}{\mathop{{\min_{\posAleq}}}} \newcommand{\posAmax}{\mathop{{\max_{\posAleq}}}} \newcommand{\posB}{\mathcal{Q}} \newcommand{\posBleq}{\mathrel{{\posleq_\posB}}} \newcommand{\posC}{\mathcal{R}} \newcommand{\lfp}{\vmath{lfp}} \newcommand{\prefixed}{\vmath{prefixed}} \newcommand{\CPOs}{\textsc{CPO}s\xspace} \newcommand{\CPO}{\textsc{CPO}\xspace} \newcommand{\DCPOs}{\textsc{DCPO}s\xspace} \newcommand{\DCPO}{\textsc{DCPO}\xspace} \newcommand{\antichains}{\vmath{A}} \newcommand{\upsets}{\vmath{U}} \newcommand{\downsets}{\vmath{D}} \newcommand{\upresleq}{\posleq_{\upressp}} \newcommand{\upressp}{\upsets\ressp} \newcommand{\allupsets}{\vmath{Up}} \newcommand{\upit}{{\uparrow\,}} \newcommand{\stupit}{\dot{\upit}} \newcommand{\posetwidth}{\vmath{width}} \newcommand{\posetheight}{\vmath{height}} \newcommand{\posdef}[1]{\mathcal{P}_{#1}} \newcommand{\MR}{\M{R}} \newcommand{\myacronym}[1]{\textsc{#1}\xspace} \newcommand{\T}[1]{\boldsymbol{{\mathsf{#1}}}} \newcommand{\Tel}[1]{{\mathsf{#1}}} \newcommand{\Te}[1]{\Tel{#1}} \newcommand{\M}[1]{\mathbf{#1}} \newcommand{\Mel}[1]{\mathrm{#1}} \newcommand{\aset}[1]{\mathscr{#1}} \newcommand{\agroup}[1]{\mathrm{#1}} \newcommand{\aseq}[1]{\boldsymbol{#1}} \newcommand{\aseqe}[1]{#1} \newcommand{\dummyIndices}{} \newcommand{\aword}[1]{\mathsf{#1}} \newcommand{\vmath}[1]{\aword{#1}} \newcommand{\codefunc}[1]{\texttt{#1}\xspace} \newcommand{\swpackage}[1]{\textsc{#1}\xspace} \newcommand{\MA}{\M{A}} \newcommand{\MB}{\M{B}} \newcommand{\MC}{\M{C}} \newcommand{\MG}{\M{G}} \newcommand{\MH}{\M{H}} \newcommand{\ML}{\M{L}} \newcommand{\MQ}{\M{Q}} \newcommand{\MP}{\M{P}} \newcommand{\MS}{\M{S}} \newcommand{\MSigma}{\M{\Sigma}} \newcommand{\MV}{\M{V}} \newcommand{\MW}{\M{W}} \newcommand{\SP}{P_{\text{s}}} \newcommand{\AP}{P_{\text{a}}} \newcommand{\SE}{E} \newcommand{\ER}{r} \newcommand{\HP}{\Theta} \newcommand{\np}{n} \newcommand{\ones}{\boldsymbol{1}} \newcommand{\idMat}{\M{I}} \newcommand{\matTrace}{\vmath{Tr}} \newcommand{\angleFun}{\angle} \newcommand{\flatten}{\mathsf{vec}} \newcommand{\batterymass}{{\colR{m}}} \newcommand{\batterycapacity}{{\colF{C}}} \newcommand{\batterycost}{{\colR{c}}} \newcommand{\specificenergy}{{\colR{\rho}}} \newcommand{\specificcost}{{\colR{\alpha}}} \newcommand{\D}{\,\textrm{d}} \newcommand{\ex}{\mathbb{E}} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\aset}[1]{\mathcal{#1}} \newcommand{\amat}[1]{\mathbf{#1}} \newcommand{\avec}[1]{\mathbf{#1}} \newcommand{\rv}[1]{\boldsymbol{#1}} \newcommand{\definedas}{\triangleq} \newcommand{\nats}{\mathbb{N}} \newcommand{\ints}{\mathbb{Z}} \newcommand{\rats}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\comp}{\mathbb{C}} \newcommand{\Time}{\mathbb{T}} \newcommand{\SEthree}{\text{SE}(3)} \newcommand{\SEtwo}{\text{SE}(2)} \newcommand{\sethree}{\text{se}(3)} \newcommand{\setwo}{\text{se}(2)} \newcommand{\SOthree}{\text{SO}(3)} \newcommand{\pose}{\boldsymbol{q}} \newcommand{\state}{\boldsymbol{x}} \newcommand{\statesp}{\mathcal{X}} \newcommand{\bmu}{\boldsymbol{\mu}} \newcommand{\bSigma}{\boldsymbol{\Sigma}} \newcommand{\tup}[1]{\langle#1\rangle}$$

The Duckietown Simulator

✎

Modified 2020-11-07 by Andrea Censi

Welcome to Duckietown!

✎

Introduction

✎

Modified 2020-11-07 by Andrea Censi

Gym-Duckietown is a simulator for the Duckietown Universe, written in pure Python/OpenGL (Pyglet). It places your agent, a Duckiebot, inside of an instance of a Duckietown: a loop of roads with turns, intersections, obstacles, Duckie pedestrians, and other Duckiebots. It can be a pretty hectic place!

Gym-Duckietown is fast, open, and incredibly customizable. What started as a lane-following simulator has evolved into a fully-functioning autonomous driving simulator that you can use to train and test your Machine Learning, Reinforcement Learning, Imitation Learning, or even classical robotics algorithms. Gym-Duckietown offers a wide range of tasks, from simple lane-following to full city navigation with dynamic obstacles. Gym-Duckietown also ships with features, wrappers, and tools that can help you bring your algorithms to the real robot, including domain-randomization, accurate camera distortion, and differential-drive physics (and most importantly, realistic waddling).

Environments

✎

Modified 2019-11-05 by Gianmarco Bernasconi

There are multiple registered gym environments, each corresponding to a different map file:

Duckietown-straight_road-v0
Duckietown-4way-v0
Duckietown-udem1-v0
Duckietown-small_loop-v0
Duckietown-small_loop_cw-v0
Duckietown-zigzag_dists-v0
Duckietown-loop_obstacles-v0 (static obstacles in the road)
Duckietown-loop_pedestrians-v0 (moving obstacles in the road)

The MultiMap-v0 environment is essentially a wrapper for the simulator which will automatically cycle through all available map files. This makes it possible to train on a variety of different maps at the same time, with the idea that training on a variety of different scenarios will make for a more robust policy/model.

gym-duckietown is an accompanying simulator to real Duckiebots, which allow you to run your code on the real robot. We provide a domain randomization API, which can help you transfer your trained policies from simulation to real world. Without using a domain transfer method, your learned models will likely overfit to various aspects of the simulator, which won’t transfer to the real world. When you deploy, you and your Duckiebot will be running around in circles trying to figure out what’s going on.

The Duckiebot-v0 environment is meant to connect to software running on a real Duckiebot and remotely control the robot. It is a tool to test that policies trained in simulation can transfer to the real robot. If you want to control your robot remotely with the Duckiebot-v0 environment, you will need to install the software found in the duck-remote-iface repository on your Duckiebot.

Installation

✎

Modified 2019-10-28 by gibernas

To install the daffy version of the simulator, you can simple use

laptop $ pip3 install duckietown-gym-daffy

Alternative Installation Instructions (Alternative Method)

✎

Modified 2019-10-28 by gibernas

Alternatively, you can find furter installation instructions here

Docker Image

✎

Modified 2019-11-05 by Gianmarco Bernasconi

There is a pre-built Docker image available on Docker Hub, which also contains an installation of PyTorch.

In order to get GPU acceleration, you should install and use nvidia-docker.*

Usage

✎

Modified 2019-10-28 by gibernas

Testing

✎

Modified 2019-10-28 by gibernas

There is a simple UI application which allows you to control the simulation or real robot manually. The manual_control.py application will launch the Gym environment, display camera images and send actions (keyboard commands) back to the simulator or robot. You can specify which map file to load with the --map-name argument:

$ ./manual_control.py --env-name Duckietown-udem1-v0

There is also a script to run automated tests (run_tests.py) and a script to gather performance metrics (benchmark.py).

Reinforcement Learning

✎

Modified 2019-11-04 by gibernas

To have a quickstart on Reinforcement Learning using the Duckietown simulator, check the reinforcement learning baseline.

Imitation Learning

✎

Modified 2019-11-04 by gibernas

To have a quickstart on Reinforcement Learning using the Duckietown simulator, check either simulation based or the log based baselines.

Design

✎

Modified 2019-10-28 by gibernas

Map File Format

✎

Modified 2020-11-06 by Andrea Censi

The simulator supports a YAML-based file format which is designed to be easy to hand edit. See the maps subdirectory for examples. Each map file has two main sections: a two-dimensional array of tiles, and a listing of objects to be placed around the map. The tiles are based on the Duckietown appearance specification.

The available tile types are:

empty
straight
curve_left
curve_right
3way_left (3-way intersection)
3way_right
4way (4-way intersection)
asphalt
grass
floor (office floor)

The available object types are:

barrier
cone (traffic cone)
duckie
duckiebot (model of a Duckietown robot)
tree
house
truck (delivery-style truck)
bus
building (multi-floor building)
sign_stop, sign_T_intersect, sign_yield, etc. (see meshes subdirectory )

Although the environment is rendered in 3D, the map is essentially two-dimensional. As such, objects coordinates are specified along two axes. The coordinates are rescaled based on the tile size, such that coordinates [0.5, 1.5] would mean middle of the first column of tiles, middle of the second row. Objects can have an optional flag set, which means that they randomly may or may not appear during training, as a form of domain randomization.

Observations

✎

Modified 2019-10-28 by gibernas

The observations are single camera images, as numpy arrays of size (120, 160, 3). These arrays contain unsigned 8-bit integer values in the [0, 255] range. This image size was chosen because it is exactly one quarter of the 640x480 image resolution provided by the camera, which makes it fast and easy to scale down the images. The choice of 8-bit integer values over floating-point values was made because the resulting images are smaller if stored on disk and faster to send over a networked connection.

Actions

✎

Modified 2020-11-04 by dubi-d

The simulator uses continuous actions by default. Actions passed to the step() function should be numpy arrays containining two numbers between -1 and 1. These two numbers correspond to the left and right wheel input respectively. A positive value makes the wheel go forward, a negative value makes it go backwards. There is also a Gym wrapper class named DiscreteWrapper which allows you to use discrete actions (turn left, move forward, turn right) instead of continuous actions if you prefer.

Reward Function

✎

Modified 2019-10-28 by gibernas

The default reward function tries to encourage the agent to drive forward along the right lane in each tile. Each tile has an associated bezier curve defining the path the agent is expected to follow. The agent is rewarded for being as close to the curve as possible, and also for facing the same direction as the curve’s tangent. The episode is terminated if the agent gets too far outside of a drivable tile, or if the max_steps parameter is exceeded. See the step function in this source file.

Simulator API

✎

Modified 2019-11-05 by Gianmarco Bernasconi

You can find some help on how to use the simulator in your projects in here

Troubleshooting

✎

Modified 2019-10-28 by gibernas

If you run into problems of any kind, don’t hesitate to open an issue on this repository. It’s quite possible that you’ve run into some bug we aren’t aware of. Please make sure to give some details about your system configuration (ie: PC or Max, operating system), and to paste the command you used to run the simulator, as well as the complete error message that was produced, if any.

ImportError: Library “GLU” not found

✎

Modified 2019-10-28 by gibernas

You may need to manually install packaged needed by Pyglet or OpenAI Gym on your system. The command you need to use will vary depending which OS you are running. For example, to install the glut package on Ubuntu:

$ sudo apt-get install freeglut3-dev

And on Fedora:

$ sudo dnf install freeglut-devel

NoSuchDisplayException: Cannot connect to “None”

✎

Modified 2019-10-28 by gibernas

If you are connected through SSH, or running the simulator in a Docker image, you will need to use xvfb to create a virtual display in order to run the simulator. See the “Running Headless” subsection below.

Running headless

✎

Modified 2019-10-28 by gibernas

The simulator uses the OpenGL API to produce graphics. This requires an X11 display to be running, which can be problematic if you are trying to run training code through on SSH, or on a cluster. You can create a virtual display using Xvfb. The instructions shown below illustrate this. Note, however, that these instructions are specific to MILA, look further down for instructions on an Ubuntu box:

# Reserve a Debian 9 machine with 12GB ram, 2 cores and a GPU on the cluster
sinter --reservation=res_stretch --mem=12000 -c2 --gres=gpu

# Activate the gym-duckietown Conda environment
source activate gym-duckietown

cd gym-duckietown

# Add the gym_duckietown package to your Python path
export PYTHONPATH="&#36;{PYTHONPATH}:`pwd`"

# Load the GLX library
# This has to be done before starting Xvfb
export LD_LIBRARY_PATH=/Tmp/glx:&#36;LD_LIBRARY_PATH

# Create a virtual display with OpenGL support
Xvfb :&#36;SLURM_JOB_ID -screen 0 1024x768x24 -ac +extension GLX +render -noreset &#38;<code>&gt;</code> xvfb.log &#36;
export DISPLAY=:&#36;SLURM_JOB_ID

# You are now ready to train

Running headless and training in a cloud based environment (AWS)

✎

Modified 2019-10-28 by gibernas

We recommend using the Ubuntu-based Deep Learning AMI to provision your server which comes with all the deep learning libraries.

# Install xvfb
sudo apt-get install xvfb mesa-utils -y

# Remove the nvidia display drivers (this doesn\'t remove the CUDA drivers)
# This is necessary as nvidia display doesn\'t play well with xvfb
sudo nvidia-uninstall -y

# Sanity check to make sure you still have CUDA driver and its version
nvcc --version

# Start xvfb
Xvfb :1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset &#38;<code>&gt;</code> xvfb.log &#38;

# Export your display id
export DISPLAY=:1

# Check if your display settings are valid
glxinfo

# You are now ready to train

Poor performance, low frame rate

✎

Modified 2019-10-28 by gibernas

It’s possible to improve the performance of the simulator by disabling Pyglet error-checking code. Export this environment variable before running the simulator:

$ export PYGLET_DEBUG_GL=True

RL training doesn’t converge

✎

Modified 2019-10-28 by gibernas

Reinforcement learning algorithms are extremely sensitive to hyperparameters. Choosing the wrong set of parameters could prevent convergence completely, or lead to unstable performance over training. You will likely want to experiment. A learning rate that is too low can lead to no learning happening. A learning rate that is too high can lead unstable performance throughout training or a suboptimal result.

The reward values are currently rescaled into the [0,1] range, because the RL code in pytorch_rl doesn’t do reward clipping, and deals poorly with large reward values. Also note that changing the reward function might mean you also have to retune your choice of hyperparameters.

Unknown encoder ‘libx264’ when using gym.wrappers.Monitor

✎

Modified 2019-11-04 by gibernas

It is possible to use gym.wrappers.Monitor to record videos of the agent performing a task. See examples here.

The libx264 error is due to a problem with the way ffmpeg is installed on some linux distributions. One possible way to circumvent this is to reinstall ffmpeg using conda:

$ conda install -c conda-forge ffmpeg

Alternatively, screencasting programs such as Kazam can be used to record the graphical output of a single window.

How to cite

✎

Modified 2019-11-05 by Liam Paull

Please use this bibtex if you want to cite this repository in your publications:

@misc{gym_duckietown,
  author = {Chevalier-Boisvert, Maxime and Golemo, Florian and Cao, Yanjun and Mehta, Bhairav and Censi, Andrea and Paull, Liam},
  title = {Duckietown Environments for OpenAI Gym},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {<a href="https://github.com/duckietown/gym-duckietown">https://github.com/duckietown/gym-duckietown</a>},
}