$$\newcommand{\vmath}[1]{\mathsf{#1}} \newcommand{\mapsfrom}{\leftarrow\!\shortmid} \newcommand{\maps}{\mapsto} \newcommand{\exc}{\vmath{\colF{exec}}} \newcommand{\eval}{\vmath{\colR{eval}}} \newcommand{\ufloor}{{\vmath{\colL floor}}} \newcommand{\uceil}{{\vmath{\colU ceil}}} \newcommand{\triv}{\text{Triv}} \newcommand{\idFunc}{\mathrm{Id}} \newcommand{\reals}{\mathbb{R}} \newcommand{\One}{\mathbf{1}} \renewcommand{\mathbbm}[1]{\mathbb{#1}} \renewcommand{\mathscr}[1]{\mathcal{#1}} \newcommand{\varocircle}{⦾} \newcommand{\varotimes}{⊗} \newcommand{\varovee}{(\vee)} \newcommand{\colR}{\color{darkred}} \newcommand{\colF}{\color{darkgreen}} \newcommand{\colH}{\color{blue}} \newcommand{\colI}{\color{orange}} \newcommand{\rtof}{{\colH{\varphi}}} \newcommand{\ftor}{{\colH{h}}} \newcommand{\ftoR}{{\colH{H}}} \newcommand{\Rcomp}{{\mathbb{R}}^{*}_{\small+}} \newcommand{\nonNegRealsComp}{\Rcomp} \newcommand{\nonNegReals}{\mathbb{R}_+} \newcommand{\Rcpu}[1]{\Rcomp{}[\textrm{#1}]} \newcommand{\funsp}{{\colF{\mathscr{F}}}} \newcommand{\impsp}{{\colI{\mathscr{I}}}} \newcommand{\ressp}{{\colR{\mathscr{R}}}} \newcommand{\funleq}{\posleq_{\funsp}} \newcommand{\fun}{\vmath{\colF{f}}} \newcommand{\res}{\vmath{\colR{r}}} \newcommand{\funtop}{\top_\funsp} \newcommand{\funbot}{\bot_\funsp} \newcommand{\imp}{\vmath{i}} \newcommand{\paramsp}{\mathscr{P}} \newcommand{\resleq}{\posleq_{\ressp}} \newcommand{\restop}{\top_\ressp} \newcommand{\resbot}{\bot_\ressp} \newcommand{\resspleq}{\resleq} \newcommand{\tressp}{\trof(\ressp)} \newcommand{\trof}{\mathscr{T}} \newcommand{\tres}{T} \newcommand{\tresleq}{\leq_{\trof}} \newcommand{\trleq}{\leq_{\trof}} \newcommand{\dpisp}{\ensuremath{\vmath{DPI}}} \newcommand{\cdpisp}{\ensuremath{\vmath{CDPI}}} \newcommand{\dprobsp}{\ensuremath{\vmath{DP}}} \newcommand{\dprob}{\vmath{dp}} \newcommand{\dpseries}{\vmath{series}} \newcommand{\dppar}{\vmath{par}} \newcommand{\dploop}{\vmath{loop}} \newcommand{\dploopb}{\vmath{loopb}} \newcommand{\cdprobsp}{\ensuremath{\vmath{CDP}}} \newcommand{\cdprob}{\vmath{cdp}} \newcommand{\dpatoms}{\vmath{atoms}} \newcommand{\resMin}{{\Min_{\resleq}}} \newcommand{\unconnectedfun}{\mathsf{UF}} \newcommand{\unconnectedres}{\mathsf{UR}} \newcommand{\Aressp}{{\mathsf{\colR A}\colR\ressp}} \newcommand{\Afunsp}{{\mathsf{\colF A}\colF\funsp}} \newcommand{\udpa}{\boldsymbol{u}_a} \newcommand{\udpb}{\boldsymbol{u}_b} \newcommand{\udpL}{\boldsymbol{\mathsf{L}}} \newcommand{\udpU}{\boldsymbol{\mathsf{U}}} \newcommand{\udpsp}{\vmath{UDP}} \newcommand{\udpleq}{\posleq_\udpsp} \newcommand{\dpsp}{\vmath{DP}} \newcommand{\dpleq}{\posleq_\dpsp} \newcommand{\terms}{\vmath{Terms}} \newcommand{\udpsem}{\Phi} \newcommand{\dpsem}{\varphi} \newcommand{\atoms}{\mathcal{A}} \newcommand{\atree}{\boldsymbol{\vmath{T}}} \newcommand{\val}{\boldsymbol{v}} \newcommand{\ops}{\vmath{ops}} \newcommand{\ftorL}{\ftor_L} \newcommand{\ftorU}{\ftor_U} \newcommand{\acprod}{\mathbin{\boldsymbol{\times}}} \newcommand{\oploop}{\dagger} \newcommand{\opseries}{\mathbin{\varocircle}} \newcommand{\oppar}{\mathbin{\varotimes}} \newcommand{\opcoprod}{\mathbin{\varovee}} \newcommand{\UId}{\vmath{UId}} \newcommand{\vdc}{\vmath{vdc}} \newcommand{\makedp}{\Gamma} \newcommand{\colU}{\color{purple}} \newcommand{\colL}{\color{orange}} \newcommand{\R}[1]{{\colR{#1}}} \newcommand{\F}[1]{{\colF{#1}}} \newcommand{\I}[1]{{\colI{#1}}} \newcommand{\cdpiN}{\mathcal{V}} \newcommand{\cdpin}{v} \newcommand{\cdpinA}{v_1} \newcommand{\cdpinB}{v_2} \newcommand{\cdpiresind}{i} \newcommand{\cdpifunind}{j} \newcommand{\cdpiresindA}{i_1} \newcommand{\cdpifunindB}{j_2} \newcommand{\dpinumf}{\vmath{nf}} \newcommand{\dpinumr}{\vmath{nr}} \newcommand{\cdpinnumf}{\dpinumf_\cdpin} \newcommand{\cdpinnumr}{\dpinumr_\cdpin} \newcommand{\cdpiE}{\mathcal{E}} \newcommand{\subto}{\text{s.t.}} \newcommand{\with}{\text{using}} \newcommand{\pset}{\mathscr{P}} \DeclareMathOperator*{\Min}{Min} \DeclareMathOperator*{\Inf}{Inf} \DeclareMathOperator*{\Sup}{Sup} \DeclareMathOperator*{\Max}{Max} \newcommand{\lowerbounds}{\vmath{lowerbounds}} \newcommand{\upperbounds}{\vmath{upperbounds}} \newcommand{\posMin}{\Min} \newcommand{\posleq}{\preceq} \newcommand{\poslt}{\prec} \newcommand{\posgeq}{\succeq} \newcommand{\posA}{\mathcal{P}} \newcommand{\posAleq}{\mathrel{{\posleq_\posA}}} \newcommand{\posAMin}{\mathop{{\posMin_{\posAleq}}}} \newcommand{\posAmin}{\mathop{{\min_{\posAleq}}}} \newcommand{\posAmax}{\mathop{{\max_{\posAleq}}}} \newcommand{\posB}{\mathcal{Q}} \newcommand{\posBleq}{\mathrel{{\posleq_\posB}}} \newcommand{\posC}{\mathcal{R}} \newcommand{\lfp}{\vmath{lfp}} \newcommand{\prefixed}{\vmath{prefixed}} \newcommand{\CPOs}{\textsc{CPO}s\xspace} \newcommand{\CPO}{\textsc{CPO}\xspace} \newcommand{\DCPOs}{\textsc{DCPO}s\xspace} \newcommand{\DCPO}{\textsc{DCPO}\xspace} \newcommand{\antichains}{\vmath{A}} \newcommand{\upsets}{\vmath{U}} \newcommand{\downsets}{\vmath{D}} \newcommand{\upresleq}{\posleq_{\upressp}} \newcommand{\upressp}{\upsets\ressp} \newcommand{\allupsets}{\vmath{Up}} \newcommand{\upit}{{\uparrow\,}} \newcommand{\stupit}{\dot{\upit}} \newcommand{\posetwidth}{\vmath{width}} \newcommand{\posetheight}{\vmath{height}} \newcommand{\posdef}[1]{\mathcal{P}_{#1}} \newcommand{\MR}{\M{R}} \newcommand{\myacronym}[1]{\textsc{#1}\xspace} \newcommand{\T}[1]{\boldsymbol{{\mathsf{#1}}}} \newcommand{\Tel}[1]{{\mathsf{#1}}} \newcommand{\Te}[1]{\Tel{#1}} \newcommand{\M}[1]{\mathbf{#1}} \newcommand{\Mel}[1]{\mathrm{#1}} \newcommand{\aset}[1]{\mathscr{#1}} \newcommand{\agroup}[1]{\mathrm{#1}} \newcommand{\aseq}[1]{\boldsymbol{#1}} \newcommand{\aseqe}[1]{#1} \newcommand{\dummyIndices}{} \newcommand{\aword}[1]{\mathsf{#1}} \newcommand{\vmath}[1]{\aword{#1}} \newcommand{\codefunc}[1]{\texttt{#1}\xspace} \newcommand{\swpackage}[1]{\textsc{#1}\xspace} \newcommand{\MA}{\M{A}} \newcommand{\MB}{\M{B}} \newcommand{\MC}{\M{C}} \newcommand{\MG}{\M{G}} \newcommand{\MH}{\M{H}} \newcommand{\ML}{\M{L}} \newcommand{\MQ}{\M{Q}} \newcommand{\MP}{\M{P}} \newcommand{\MS}{\M{S}} \newcommand{\MSigma}{\M{\Sigma}} \newcommand{\MV}{\M{V}} \newcommand{\MW}{\M{W}} \newcommand{\SP}{P_{\text{s}}} \newcommand{\AP}{P_{\text{a}}} \newcommand{\SE}{E} \newcommand{\ER}{r} \newcommand{\HP}{\Theta} \newcommand{\np}{n} \newcommand{\ones}{\boldsymbol{1}} \newcommand{\idMat}{\M{I}} \newcommand{\matTrace}{\vmath{Tr}} \newcommand{\angleFun}{\angle} \newcommand{\flatten}{\mathsf{vec}} \newcommand{\batterymass}{{\colR{m}}} \newcommand{\batterycapacity}{{\colF{C}}} \newcommand{\batterycost}{{\colR{c}}} \newcommand{\specificenergy}{{\colR{\rho}}} \newcommand{\specificcost}{{\colR{\alpha}}} \newcommand{\D}{\,\textrm{d}} \newcommand{\ex}{\mathbb{E}} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\AC}[1]{{\color{blue}AC: #1}} \newcommand{\JZ}[1]{{\color{olive}JZ: #1}} \newcommand{\fix}{\marginpar{FIX}} \newcommand{\new}{\marginpar{NEW}} \newcommand{\dynamical}{\mathcal{D}} \newcommand{\robot}{\mathcal{R}} \newcommand{\config}{\mathcal{Q}} \newcommand{\sensors}{\{z\}} \newcommand{\bandwidth}{\mathcal{B}} \newcommand{\computation}{\mathcal{C}} \newcommand{\memory}{\mathcal{M}} \newcommand{\actuators}{\mathcal{A}} \newcommand{\knowledge}{\mathcal{K}} \newcommand{\perception}{P} \newcommand{\control}{U} \newcommand{\actions}{\mathcal{U}} \newcommand{\operator}{T} \newcommand{\groups}{G} \newcommand{\groupalgebra}{\mathfrak{g}} \newcommand{\timespace}{\mathbb{T}} \newcommand{\environment}{E} \newcommand{\scene}{\xi} \newcommand{\scenespace}{\Xi} \newcommand{\universe}{U} \newcommand{\sensor}{\zeta} \newcommand{\sensorproj}{z} \newcommand{\sensorspace}{Z} \newcommand{\projection}{\pi} \newcommand{\projectionspace}{\Pi} \newcommand{\viewport}{v} \newcommand{\viewportspace}{\mathcal{V}} \newcommand{\dataspace}{\mathcal{X}} \newcommand{\data}{x} \newcommand{\dataproj}{\phi} \newcommand{\datakernel}{\psi} \newcommand{\outputy}{y} \newcommand{\outputspace}{\mathcal{Y}} \newcommand{\task}{T} \newcommand{\taskspace}{\mathcal{T}} \newcommand{\objective}{\mathcal{J}} \newcommand{\robotictask}{RT} \newcommand{\rules}{\Phi} \newcommand{\constraints}{\Lambda} \newcommand{\action}{u} \newcommand{\actionspace}{\mathcal{U}} \newcommand{\nuisance}{\nu} \newcommand{\place}{\eta} \newcommand{\image}{I} \newcommand{\noise}{n} \newcommand{\pose}{p} \newcommand{\shape}{S} \newcommand{\albedo}{\rho} \newcommand{\information}{\mathcal{I}} \newcommand{\expectation}{\mathbb{E}} \newcommand{\loss}{L} \newcommand{\aset}[1]{\mathcal{#1}} \newcommand{\amat}[1]{\mathbf{#1}} \newcommand{\avec}[1]{\mathbf{#1}} \newcommand{\rv}[1]{\boldsymbol{#1}} \newcommand{\definedas}{\triangleq} \newcommand{\nats}{\mathbb{N}} \newcommand{\ints}{\mathbb{Z}} \newcommand{\rats}{\mathbb{Q}} \newcommand{\reals}{\mathbb{R}} \newcommand{\comp}{\mathbb{C}} \newcommand{\Time}{\mathbb{T}} \newcommand{\SEthree}{\text{SE}(3)} \newcommand{\SEtwo}{\text{SE}(2)} \newcommand{\sethree}{\text{se}(3)} \newcommand{\setwo}{\text{se}(2)} \newcommand{\SOthree}{\text{SO}(3)} \newcommand{\pose}{\boldsymbol{q}} \newcommand{\state}{\boldsymbol{x}} \newcommand{\statesp}{\mathcal{X}} \newcommand{\bmu}{\boldsymbol{\mu}} \newcommand{\bSigma}{\boldsymbol{\Sigma}} \newcommand{\tup}[1]{\langle#1\rangle}$$

Residual Policy Learning

✎

Modified 2021-10-30 by liampaull

This section describes the basic procedure for making a submission with a model trained in simulation using residual policy learning with PyTorch and ROS. In this approach, we use the basic Duckietown lane following stack as the base policy, and we use reinforcement learning to improve it.

That you have made a submission with the ROS template.

You have a submission that leverages both our ROS stack and reinforcement learning.

AI-DO 5 (NeurIPS 2020) - Fourth Webinar — Residual Policy Learning

Before getting started, you should be aware that this baseline is a combination of the RL baseline and of the ROS template. It is recommended that you are familar for each of those templates and baselines, as the workflow of this one is similar to those. Here are some links:

You should also make sure you have access to good hardware. A recent graphics card (probably GTX1060+) is a must, and more than 8GB of RAM is required.

Quickstart

✎

Modified 2020-11-15 by Liam Paull

To train a policy, you should first make sure that Docker on your machine can access the GPU/CUDA. You should also install CUDA10.2+ locally.

Here’s a few pointers:

Clone this repo:

$ git clone  https://github.com/duckietown/challenge-aido_LF-baseline-RPL-ros.git

Change into the directory:

$ cd challenge-aido_LF-baseline-RPL-ros

Test the submission, either locally with:

$ dts challenges evaluate --challenge CHALLENGE_NAME

or make an official submission when you are ready with

$ dts challenges submit --challenge CHALLENGE_NAME

You can find the list of challenges here. Make sure that it is marked as “Open”.

Baseline Overview

✎

Modified 2020-11-15 by Liam Paull

Since, this baseline uses both ROS and ML, we need to train inside an environment where both ROS and PyTorch are installed. We will use Docker for this purpose.

The ROS template already provides us with a submission docker image. Our strategy here is to directly use that agent docker image during training, but we’ll the addition of the simulator and the training architecture on top.

This could have been done using a second running docker container to provide a network interface to the simulator, but this adds unnecessary overhead since we don’t actually need the added security that comes with running things separately.

So, every time we train, we build the agent docker image, and then the “trainer docker” image builds directly FROM the agent image, adding the simulator on top.

The final docker container then runs the simulator and the agent in parallel, allowing the agent to directly interface with the simulator, just like we do in the other machine learning baselines.

How to train your policy

✎

Modified 2020-11-15 by Liam Paull

From the challenge-aido_LF-baseline-RPL-ros directory, change into the local_dev directory:

$ cd local_dev

and open the args.py file. This is how you will control the training and testing in this repo. For now, just change the --test argument to default=False. Then, we can train with:

$ make run

As mentioned Section 7.2 - Baseline Overview, this will first build two subsequent docker images. This might take a while. Then, it will train an RL policy over the ROS stack inside Docker.

When it finishes, see how it works. Simply change the --test flag back to default=True in args.py and test with:

$ make run

This will launch a simulator window on your host machine for you to view how your agent performs. You should see something like this:

You can use this gif to gauge how long it takes for the testing docker to start (do note that this assumes that the two required docker images have already been built!)

How to submit the rained policy

✎

Modified 2020-11-15 by Liam Paull

Make sure that rosagent.py uses the right weights for your RL agent. This is controlled by the MODEL_NAME global variable. Then follow the procedure in Section 7.1 - Quickstart to evaluate and submit.

How to improve your policy

✎

Modified 2020-11-15 by Liam Paull

First, you should probably improve the base ROS policy. By default, this baseline uses the basic lane_following demo that is provided in Duckietown (unknown ref opmanual_duckiebot/demo-lane-following)

warning (18 of 18) index

warning

I will ignore this because it is an external link. 

 > I do not know what is indicated by the link '#opmanual_duckiebot/demo-lane-following'.

Location not known more precisely.

Created by function n/a in module n/a.

You could build a Pure Pursuit controller, change the lane filter, etc. See the classical Duckietown baseline for more ideas. To do this, you would add your new ROS packages inside of submission_ws.

You could also limit RL’s influence over the final policy. Perhaps the current approach of giving it full control in [-1,1] action values isn’t restrictive enough. Perhaps it could be better if it could only change the base policy by smaller action values.

Or perhaps it’s the opposite: maybe the base policy needs to be changed by more than 1: since the min/max value that the base policy can output is 1/-1, the RL policy would need to be able to output from -2 to 2 to fully correct it.

Here are some ideas for improving your policy:

Check out the dtRewardWrapper in rl_agent and modify the rewards (set them higher or lower and see what happens). By default, this wrapper is not used: you will have to add it to train.py.
Try resizing the images. Make them smaller to have faster training, or bigger for making sure that RL can extract everything it can from them. You will need to also edit the layers in ddpg.py accordingly.
Try making the observation image grayscale instead of color.
Try stacking multiple images, like 4 monochrome images instead of 1 color image. You will need to also edit the layers in ddpg.py accordingly.
You can also try increasing the contrast in the input to make the difference between road and road-signs clearer. You can do so by adding another observation wrapper.
Cut off the horizon from the image (and correspondingly change the convnet parameters).
Check out the default hyperparameters in local_dev/args.py and tune them. For example increase the expl_noise or increase the start_timesteps to get better exploration.
(more sophisticated) Use a different map in the simulator, or - even better - use randomized maps. But be mindful that some maps include obstacles on the road, which might be counter-productive to a LF submission.
(more advanced) Use a different/bigger convnet for your actor/critic. And add better initialization.
(very advanced) Use the ground truth from the simulator to construct a better reward
(extremely advanced) Use an entirely different training algorithm - like PPO, A2C, or DQN. Go nuts. But this might take significant time, even if you’re familiar with the matter.

Sim2Real Transfer (Optional)

✎

Modified 2020-11-15 by Liam Paull

You should try your agent on the real Duckiebot.

It is possible, even likely, that your agent will not generalize well to the real environment. One approach to mitigate this problem is to randomize the simulator environment during training, in the hope that this improves generalization. This approach is referred to as “Domain Randomization”.

To implement this, you will need to modify the local_dev/env.py file. You’ll notice that we launch the Simulator class from gym-duckietown. When we take a look at the constructor, you’ll notice that we aren’t using all of the parameters listed. In particular, the three you should focus on are:

map_name: What map to use; hint, take a look at gym_duckietown/maps for more choices
domain_rand: Applies domain randomization, a popular, black-box, sim2real technique
randomized_maps_on_reset: Slows training time, but increases training variety.

Mixing and matching different values for these will help you improve your training diversity, and thereby improving your evaluation robustness!