Modified 2021-10-29 by liampaull
This section describes the procedure for generating logs from the
gym-duckietown,
and then using them to train a model with imitation learning using
TensorFlow.
It can be used as a starting point for any of the LF
, LFV
, and LFI
challenges.
That you have made a submission with the TensorFlow template and you understand how it works.
You already know something about Tensorflow.
You could win the AI-DO!
Modified 2019-10-21 by Philippe Laferrière
1) Clone this repo
$ git clone -b daffy git@github.com:duckietown/challenge-aido_LF-baseline-IL-sim-tensorflow.git
2) Change into the directory:
$ cd challenge-aido_LF-baseline-IL-sim-tensorflow/learning
3) Install this package
$ pip install -r requirements.txt # if you are in a python 3 conda env
$ sudo pip3 install -r requirements.txt # if you want to install this system-wide
and install gym-duckietown
$ pip install -e git://github.com/duckietown/gym-duckietown.git@daffy#egg=gym-duckietown # if you are in a python 3 conda env
$ sudo pip3 install -e git://github.com/duckietown/gym-duckietown.git@daffy#egg=gym-duckietown # if you want to install this system-wide
(4) Start training (see below)
Modified 2019-04-17 by Bhairav Mehta
You will find that most of the new code sits inside of the learning/
subdirectory. If you’ve been following along, or have looked at other template or baseline repositories for AIDO, you will find that many of the files in submission/
are the same as other repositories in the Duckietown Universe.
Here, we’ll be focusing on imitation learning, or learning a policy from a set of expert trajectories. In the following sections, we’ll cover both how to retrieve those training trajectories, as well as learn a policy from them. In each section, we will give hints and debugging tips on how to improve your submission to the baseline we’ve provided here.
Modified 2019-11-01 by frank_qcd_qk
To run and log the baseline expert, you can run:
$ python log.py
within the learning/
directory. If you have both python3 and python2 installed, use:
$ python3 log.py
Of course, if you are just interested in the baseline and seeing how this all works together, you can skip to the next section.
Most of of the logging procedure is implemented on learning/log.py
and learning/_loggers.py
. This logging will run a hard coded expert on a variety of gym-duckietown
maps, and record the actions it takes.
To improve from our baseline, you will need to focus on two crucial aspects:
The performance of pure pursuit controller implemented on teacher.py
can definitely be improved upon.
Even though it uses the internal state of the environment to compute the appropriate action, there are several parameters that need to be fine tuned.
We have prepare some basic debugging capabilities (the DEBUG=False
flag, line HERE) to help you debug this implementation.
In any case, feel free to provide an expert implementation of your own creation.
Another important aspect you need to take care of is the number of samples.
The number of samples logged are controlled by the EPISODES
and STEPS
parameters.
Obviously, the bigger these numbers are, the more samples you get.
As with all Deep Learning methods, the amount of data is crucial, but so is the variety of the samples we see. Remember, we are estimating a policy, so the better we capture the underlying distribution of the data, the more robust our policy will be.
Modified 2019-11-01 by frank_qcd_qk
In imitation learning, given the expert trajectories (which we recorded in the previous section), we want to learn a policy that imitates those trajectories. To do this, we will use a naive method called behavior cloning (BC) - BC has many issues, but we will leave it up to you to figure out what those are!
The output of the logging procedure is a file that we called train.log
, but you can rename it to your convenience.
We have prepared a very simple Reader
class in learning/_loggers.py
that is capable of reading the logs we store in the previous step.
To run the baseline training procedure, run:
$ python train.py
in the learning/
directory. If you have both python3 and python2 installed, use:
$ python3 train.py
The training procedure implemented in learning/train.py
is relatively simple, compared to many of today’s state-of-the-art imitation learning systems.
The baseline CNN is a one Residual module (Resnet1) network trained to regress the velocity and the steering angle of our simulated Duckiebot.
All the Tensorflow boilerplate code is encapsulated in TensorflowModel
class implemented on learning/model.py
.
You may find this abstraction quite useful as it is already handling model initialization and persistence for you.
To summarize the code in train.py
, we train the model for a number of EPOCHS
, using BATCH_SIZE
samples at each step to regress the steering and velocity from each input image. The model is saved every 10 episodes of training.
Of course, this can be improved upon in many ways, on both the teacher / expert side, as well as the student policy side.
Modified 2019-04-16 by Bhairav Mehta
You will need to copy the relevant files from the learning/
directory to the submission/
one. In particular, you will need to overwrite submission/model.py
to match any update you’ve made to the model, and place your final model inside of submission/tf_models/
so you can load it correctly. Then, you are ready to evaluate!
Either locally with
$ dts challenges evaluate
Or make an official submission
$ dts challengs submit
Modified 2019-04-16 by Bhairav Mehta
A simple evaluation script eval.py
is provided with this implementation.
It loads the latest model from the trained_models
directory and runs it on the simulator.
Although it is not an official metric for the challenge, you can use the cumulative reward emitted by the gym
to evaluate the performance of your latest model.
With the current implementation and hyper-parameters selection, we get something like:
total reward: -5646.22255589, mean reward: -565.0
which is not by any standard a good performance.
Modified 2019-10-08 by Liam Paull
Doing great on the simulated challenges, but not on the real evaluation? Or doing great in your training, but not on our simulated, held-out environments? Take a look at env.py
. You’ll notice that we launch the Simulator
class from gym-duckietown
. When we take a look at the constructor, you’ll notice that we aren’t using all of the parameters listed. In particular, the three you should focus on are:
map_name
: What map to use; hint, take a look at gym_duckietown/maps for more choicesdomain_rand
: Applies domain randomization, a popular, black-box, sim2real techniquerandomized_maps_on_reset
: Slows training time, but increases training variety.Mixing and matching different values for these will help you improve your training diversity, and thereby improving your evaluation robustness!
If you’re interested in more advanced techniques, like learning a representation that is a bit easier for your network to work with, or one that transfers better across the simulation-to-reality gap, there are some alternative, more advanced methods you may be interested in trying out. In addition, don’t forget to try using the logs infrastructure, which you can also use to do things like imitation learning!