$$\newcommand{\vmath}[1]{\mathsf{#1}} \newcommand{\mapsfrom}{\leftarrow\!\shortmid} \newcommand{\maps}{\mapsto} \newcommand{\exc}{\vmath{\colF{exec}}} \newcommand{\eval}{\vmath{\colR{eval}}} \newcommand{\ufloor}{{\vmath{\colL floor}}} \newcommand{\uceil}{{\vmath{\colU ceil}}} \newcommand{\triv}{\text{Triv}} \newcommand{\idFunc}{\mathrm{Id}} \newcommand{\reals}{\mathbb{R}} \newcommand{\One}{\mathbf{1}} \renewcommand{\mathbbm}[1]{\mathbb{#1}} \renewcommand{\mathscr}[1]{\mathcal{#1}} \newcommand{\varocircle}{⦾} \newcommand{\varotimes}{⊗} \newcommand{\varovee}{(\vee)} \newcommand{\colR}{\color{darkred}} \newcommand{\colF}{\color{darkgreen}} \newcommand{\colH}{\color{blue}} \newcommand{\colI}{\color{orange}} \newcommand{\rtof}{{\colH{\varphi}}} \newcommand{\ftor}{{\colH{h}}} \newcommand{\ftoR}{{\colH{H}}} \newcommand{\Rcomp}{{\mathbb{R}}^{*}_{\small+}} \newcommand{\nonNegRealsComp}{\Rcomp} \newcommand{\nonNegReals}{\mathbb{R}_+} \newcommand{\Rcpu}[1]{\Rcomp{}[\textrm{#1}]} \newcommand{\funsp}{{\colF{\mathscr{F}}}} \newcommand{\impsp}{{\colI{\mathscr{I}}}} \newcommand{\ressp}{{\colR{\mathscr{R}}}} \newcommand{\funleq}{\posleq_{\funsp}} \newcommand{\fun}{\vmath{\colF{f}}} \newcommand{\res}{\vmath{\colR{r}}} \newcommand{\funtop}{\top_\funsp} \newcommand{\funbot}{\bot_\funsp} \newcommand{\imp}{\vmath{i}} \newcommand{\paramsp}{\mathscr{P}} \newcommand{\resleq}{\posleq_{\ressp}} \newcommand{\restop}{\top_\ressp} \newcommand{\resbot}{\bot_\ressp} \newcommand{\resspleq}{\resleq} \newcommand{\tressp}{\trof(\ressp)} \newcommand{\trof}{\mathscr{T}} \newcommand{\tres}{T} \newcommand{\tresleq}{\leq_{\trof}} \newcommand{\trleq}{\leq_{\trof}} \newcommand{\dpisp}{\ensuremath{\vmath{DPI}}} \newcommand{\cdpisp}{\ensuremath{\vmath{CDPI}}} \newcommand{\dprobsp}{\ensuremath{\vmath{DP}}} \newcommand{\dprob}{\vmath{dp}} \newcommand{\dpseries}{\vmath{series}} \newcommand{\dppar}{\vmath{par}} \newcommand{\dploop}{\vmath{loop}} \newcommand{\dploopb}{\vmath{loopb}} \newcommand{\cdprobsp}{\ensuremath{\vmath{CDP}}} \newcommand{\cdprob}{\vmath{cdp}} \newcommand{\dpatoms}{\vmath{atoms}} \newcommand{\resMin}{{\Min_{\resleq}}} \newcommand{\unconnectedfun}{\mathsf{UF}} \newcommand{\unconnectedres}{\mathsf{UR}} \newcommand{\Aressp}{{\mathsf{\colR A}\colR\ressp}} \newcommand{\Afunsp}{{\mathsf{\colF A}\colF\funsp}} \newcommand{\udpa}{\boldsymbol{u}_a} \newcommand{\udpb}{\boldsymbol{u}_b} \newcommand{\udpL}{\boldsymbol{\mathsf{L}}} \newcommand{\udpU}{\boldsymbol{\mathsf{U}}} \newcommand{\udpsp}{\vmath{UDP}} \newcommand{\udpleq}{\posleq_\udpsp} \newcommand{\dpsp}{\vmath{DP}} \newcommand{\dpleq}{\posleq_\dpsp} \newcommand{\terms}{\vmath{Terms}} \newcommand{\udpsem}{\Phi} \newcommand{\dpsem}{\varphi} \newcommand{\atoms}{\mathcal{A}} \newcommand{\atree}{\boldsymbol{\vmath{T}}} \newcommand{\val}{\boldsymbol{v}} \newcommand{\ops}{\vmath{ops}} \newcommand{\ftorL}{\ftor_L} \newcommand{\ftorU}{\ftor_U} \newcommand{\acprod}{\mathbin{\boldsymbol{\times}}} \newcommand{\oploop}{\dagger} \newcommand{\opseries}{\mathbin{\varocircle}} \newcommand{\oppar}{\mathbin{\varotimes}} \newcommand{\opcoprod}{\mathbin{\varovee}} \newcommand{\UId}{\vmath{UId}} \newcommand{\vdc}{\vmath{vdc}} \newcommand{\makedp}{\Gamma} \newcommand{\colU}{\color{purple}} \newcommand{\colL}{\color{orange}} \newcommand{\R}[1]{{\colR{#1}}} \newcommand{\F}[1]{{\colF{#1}}} \newcommand{\I}[1]{{\colI{#1}}} \newcommand{\cdpiN}{\mathcal{V}} \newcommand{\cdpin}{v} \newcommand{\cdpinA}{v_1} \newcommand{\cdpinB}{v_2} \newcommand{\cdpiresind}{i} \newcommand{\cdpifunind}{j} \newcommand{\cdpiresindA}{i_1} \newcommand{\cdpifunindB}{j_2} \newcommand{\dpinumf}{\vmath{nf}} \newcommand{\dpinumr}{\vmath{nr}} \newcommand{\cdpinnumf}{\dpinumf_\cdpin} \newcommand{\cdpinnumr}{\dpinumr_\cdpin} \newcommand{\cdpiE}{\mathcal{E}} \newcommand{\subto}{\text{s.t.}} \newcommand{\with}{\text{using}} \newcommand{\pset}{\mathscr{P}} \DeclareMathOperator*{\Min}{Min} \DeclareMathOperator*{\Inf}{Inf} \DeclareMathOperator*{\Sup}{Sup} \DeclareMathOperator*{\Max}{Max} \newcommand{\lowerbounds}{\vmath{lowerbounds}} \newcommand{\upperbounds}{\vmath{upperbounds}} \newcommand{\posMin}{\Min} \newcommand{\posleq}{\preceq} \newcommand{\poslt}{\prec} \newcommand{\posgeq}{\succeq} \newcommand{\posA}{\mathcal{P}} \newcommand{\posAleq}{\mathrel{{\posleq_\posA}}} \newcommand{\posAMin}{\mathop{{\posMin_{\posAleq}}}} \newcommand{\posAmin}{\mathop{{\min_{\posAleq}}}} \newcommand{\posAmax}{\mathop{{\max_{\posAleq}}}} \newcommand{\posB}{\mathcal{Q}} \newcommand{\posBleq}{\mathrel{{\posleq_\posB}}} \newcommand{\posC}{\mathcal{R}} \newcommand{\lfp}{\vmath{lfp}} \newcommand{\prefixed}{\vmath{prefixed}} \newcommand{\CPOs}{\textsc{CPO}s\xspace} \newcommand{\CPO}{\textsc{CPO}\xspace} \newcommand{\DCPOs}{\textsc{DCPO}s\xspace} \newcommand{\DCPO}{\textsc{DCPO}\xspace} \newcommand{\antichains}{\vmath{A}} \newcommand{\upsets}{\vmath{U}} \newcommand{\downsets}{\vmath{D}} \newcommand{\upresleq}{\posleq_{\upressp}} \newcommand{\upressp}{\upsets\ressp} \newcommand{\allupsets}{\vmath{Up}} \newcommand{\upit}{{\uparrow\,}} \newcommand{\stupit}{\dot{\upit}} \newcommand{\posetwidth}{\vmath{width}} \newcommand{\posetheight}{\vmath{height}} \newcommand{\posdef}[1]{\mathcal{P}_{#1}} \newcommand{\MR}{\M{R}} \newcommand{\myacronym}[1]{\textsc{#1}\xspace} \newcommand{\T}[1]{\boldsymbol{{\mathsf{#1}}}} \newcommand{\Tel}[1]{{\mathsf{#1}}} \newcommand{\Te}[1]{\Tel{#1}} \newcommand{\M}[1]{\mathbf{#1}} \newcommand{\Mel}[1]{\mathrm{#1}} \newcommand{\aset}[1]{\mathscr{#1}} \newcommand{\agroup}[1]{\mathrm{#1}} \newcommand{\aseq}[1]{\boldsymbol{#1}} \newcommand{\aseqe}[1]{#1} \newcommand{\dummyIndices}{} \newcommand{\aword}[1]{\mathsf{#1}} \newcommand{\vmath}[1]{\aword{#1}} \newcommand{\codefunc}[1]{\texttt{#1}\xspace} \newcommand{\swpackage}[1]{\textsc{#1}\xspace} \newcommand{\MA}{\M{A}} \newcommand{\MB}{\M{B}} \newcommand{\MC}{\M{C}} \newcommand{\MG}{\M{G}} \newcommand{\MH}{\M{H}} \newcommand{\ML}{\M{L}} \newcommand{\MQ}{\M{Q}} \newcommand{\MP}{\M{P}} \newcommand{\MS}{\M{S}} \newcommand{\MSigma}{\M{\Sigma}} \newcommand{\MV}{\M{V}} \newcommand{\MW}{\M{W}} \newcommand{\SP}{P_{\text{s}}} \newcommand{\AP}{P_{\text{a}}} \newcommand{\SE}{E} \newcommand{\ER}{r} \newcommand{\HP}{\Theta} \newcommand{\np}{n} \newcommand{\ones}{\boldsymbol{1}} \newcommand{\idMat}{\M{I}} \newcommand{\matTrace}{\vmath{Tr}} \newcommand{\angleFun}{\angle} \newcommand{\flatten}{\mathsf{vec}} \newcommand{\batterymass}{{\colR{m}}} \newcommand{\batterycapacity}{{\colF{C}}} \newcommand{\batterycost}{{\colR{c}}} \newcommand{\specificenergy}{{\colR{\rho}}} \newcommand{\specificcost}{{\colR{\alpha}}} \newcommand{\D}{\,\textrm{d}} \newcommand{\ex}{\mathbb{E}} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\aset}[1]{\mathcal{#1}} \newcommand{\amat}[1]{\mathbf{#1}} \newcommand{\avec}[1]{\mathbf{#1}} \newcommand{\rv}[1]{\boldsymbol{#1}} \newcommand{\definedas}{\triangleq} \newcommand{\nats}{\mathbb{N}} \newcommand{\ints}{\mathbb{Z}} \newcommand{\rats}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\comp}{\mathbb{C}} \newcommand{\Time}{\mathbb{T}} \newcommand{\SEthree}{\text{SE}(3)} \newcommand{\SEtwo}{\text{SE}(2)} \newcommand{\sethree}{\text{se}(3)} \newcommand{\setwo}{\text{se}(2)} \newcommand{\SOthree}{\text{SO}(3)} \newcommand{\pose}{\boldsymbol{q}} \newcommand{\state}{\boldsymbol{x}} \newcommand{\statesp}{\mathcal{X}} \newcommand{\bmu}{\boldsymbol{\mu}} \newcommand{\bSigma}{\boldsymbol{\Sigma}} \newcommand{\tup}[1]{\langle#1\rangle}$$

Reinforcement Learning

✎

Modified 2021-11-03 by tanij

This section describes the basic procedure for making a submission with a model trained in simulation using reinforcement learning with PyTorch.

That you have made a submission with the PyTorch template.

You should install CUDA10.2+ locally. This baseline works with CUDA 11, and it should also work with CUDA 10.2.

Patience, training RL agents is not easy.

You have a functional agent trained with RL. Your expectations in regards to end-to-end RL’s capabilities should be realistic.

Before getting started, you should be aware that RL is very much an active area of research. Simply getting a successful turn with this baseline should be celebrated. It is still provided to you because this implementation is a good stepping point to other algorithms. We also assume here that you are relatively familiar with the basics of reinforcement learning. There are many tutorials and resources, and even complete courses, online for learning about RL, but for a succinct introduction, you can check out the Reinforcement Learning lecture from the IFT6757 class at the University of Montreal, or try our reinforcement learning Jupyter notebook which is in the Duckietown exercises repository.

You should also make sure you have access to good hardware. A recent graphics card (probably GTX1060+) is a must, and more than 8GB of RAM is required.

Quickstart

✎

Modified 2020-11-15 by Liam Paull

Clone this repo

$ git clone git@github.com/duckietown/challenge-aido_LF-baseline-sim-pytorch.git

Change into the directory:

$ cd challenge-aido_LF-baseline-sim-pytorch

Test the submission, either locally with:

$ dts challenges evaluate --challenge CHALLENGE_NAME

or make an official submission when you are ready with

$ dts challenges submit --challenge CHALLENGE_NAME

You can find the list of challenges here. Make sure that it is marked as “Open”.

How to Train your Policy

✎

Modified 2020-11-15 by Liam Paull

The previous uses the model that is included in the baseline repository. You are going to want to train your own policy.

To do so:

Change into the directory:

$ cd challenge-aido_LF-baseline-RL-sim-pytorch

Install this package:

$ pip3 install -e .

and the gym-duckietown package:

$ pip3 install -e git://github.com/duckietown/gym-duckietown.git@daffy#egg=gym-duckietown

Depending on your configuration, you might need to use pip instead of pip3

Change into the duckietown_rl directory and run the training script

$ cd duckietown_rl
$ python3 -m scripts.train_cnn.py --seed 123

When it finishes, try it out (make sure you pass in the same seed as the one passed to the training script)

$ python3 -m scripts.test_cnn.py --seed 123

How to submit the trained policy

✎

Modified 2021-11-03 by tanij

Once you’re done training, you need to copy your model and the saved weights of the policy network.

Specifically if you use this repo then you need to copy the following artifacts into the corresponding locations of the root directory:

duckietown_rl/ddpg.py and rename to model.py
scripts/pytorch_models/DDPG_XXX_actor.pth and DDPG_XXX_critic.pth and rename to models/model_actor.pth and models/model_critic.pth respectively, where XXX is the seed of your best policy

Also, make sure that the root-level wrappers.py contains all the wrappers you used in duckietown_rl/wrappers.py.

Then edit the solution.py file over to make sure everything is loaded correctly (i.e., all the imports point to the right place).

Finally, you evaluate or submit your agent using the process described above in the Quickstart.

How to improve your policy

✎

Modified 2020-11-15 by Liam Paull

Here are some ideas for improving your policy:

Check out the DtRewardWrapper and modify the rewards (set them higher or lower and see what happens)
Try resizing the images. Make them smaller to speed up training, or bigger for ensuring that your RL agent can extract everything it can from them. You will need to also edit the layers in ddpg.py accordingly.
Try making the observation image grayscale instead of color.
Try stacking multiple images, like 4 monochrome images instead of 1 color image. You will need to also edit the layers in ddpg.py accordingly.
You can also try increasing the contrast in the input to make the difference between road and road-signs clearer. You can do so by adding another observation wrapper.
Cut off the horizon from the image (and correspondingly change the convnet parameters).
Check out the default hyperparameters in duckietown_rl/args.py and tune them. For example increase the expl_noise or increase the start_timesteps to get better exploration.
(more sophisticated) Use a different map in the simulator, or - even better - use randomized maps. But be mindful that some maps include obstacles on the road, which might be counter-productive to a LF submission.
(more advanced) Use a different/bigger convnet for your actor/critic. And add better initialization.
(very advanced) Use the ground truth from the simulator to construct a better reward.
(extremely advanced) Use an entirely different training algorithm - like PPO, A2C, or DQN. But this might take significant time, even if you’re familiar with the matter.

Sim2Real Transfer (Optional)

✎

Modified 2020-11-15 by Liam Paull

You should try your agent on the real Duckiebot.

It is possible, even likely, that your agent will not generalize well to the real environment. One approach to mitigate this problem is to randomize the simulator environment during training, in the hope that this improves generalization. This approach is referred to as “Domain Randomization”.

To implement this, you will need to modify the env.py file. You’ll notice that we launch the Simulator class from gym-duckietown. When we take a look at the constructor, you’ll notice that we aren’t using all of the parameters listed. In particular, the three you should focus on are:

map_name: What map to use; hint, take a look at gym_duckietown/maps for more choices
domain_rand: Applies domain randomization, a popular, black-box, sim2real technique
randomized_maps_on_reset: Slows training time, but increases training variety.

Mixing and matching different values for these will help you improve your training diversity, and thereby improving your evaluation robustness.

If you’re interested in more advanced techniques, like learning a representation that is a bit easier for your network to work with, or one that transfers better across the simulation-to-reality gap, there are some alternative, more advanced methods you may be interested in trying out.

Training headless

✎

Modified 2020-11-13 by Velythyl

Should you want to train on a server, you will notice that the simulator requires an X server to run. Fear not, however, as we can use a fake X server for it.

$ xvfb-run -s "-screen 0 1400x900x24" python3 -m scripts.train_cnn.py --seed 123

That way, we trick the simulator into thinking that an X server is running. And, to be honest, from its point of view, it’s actually true!

Controlling which GPU is being used

✎

Modified 2020-11-15 by Liam Paull

Your machine might have more than one GPU. To select the nth instead of the 0th, you can use

$ CUDA_VISIBLE_DEVICES=n python3 -m scripts.train_cnn.py --seed 123

This is, of course, combinable with running on a server

$ CUDA_VISIBLE_DEVICES=n xvfb-run -s "-screen 0 1400x900x24" python3 -m scripts.train_cnn.py --seed 123