Modified 2021-10-30 by liampaull
This section describes the procedure for training and testing an agent with the gym-duckietown simulator using the Dagger algorithm.
It can be used as a starting point for any of the LF
, LFV
, and LFI
challenges.
You are somewhat familiar with PyTorch and the Pytorch template.
You could win the AI-DO!
Modified 2021-10-30 by liampaull
We saw a first implementation of imitation learning in the behaviour cloning baseline. That baseline models the driving task as an end-to-end supervised learning problem where data can be collected offline from an expert. One of the central issues with this approach is that of distributional shift. Since this is a sequential decision making problem, the training data are not “identically and independently distributed”. The result is that if your agent deviates from the optimal trajectory that was demonstrated by the expert, it will not have any data in its dataset that shows it how to recover back to the optimal trajectory. As a result, it is unlikely that the behiaviour cloning approach will be robust.
For a better result than behaviour cloning this second version of imitation learning does not train only on a single trajectory given by the expert. We follow the Dataset Aggreagation algorithm (Dagger) where we also let the agent interact with the environment and allow the expert to recover. The actions between the expert and the learner are chosen randomly with a varying probability with the hope that the expert corrects the learner if it starts deviating from the optimal trajectory.
Modified 2020-11-10 by Liam Paull
Clone this repo:
$ git clone https://github.com/duckietown/challenge-aido_LF-baseline-dagger-pytorch.git
Change into the directory:
$ cd challenge-aido_LF-baseline-dagger-pytorch
In here you will see two directories submission
and learning
. To make a submission, enter the submission
folder:
$ cd submission
Then test the submission, either locally with:
$ dts challenges evaluate --challenge CHALLENGE_NAME
or make an official submission when you are ready with
$ dts challenges submit CHALLENGE_NAME
You can find the list of challenges here. Make sure that it is marked as “Open”.
Modified 2020-11-10 by Liam Paull
The previous submission used a model which is included in the repo, but you should try to improve upon it.
Modified 2020-11-10 by Liam Paull
We provide a Collab notebook that you can used to get started
During training the loss curve for each episode is available (by default on a folder created on root called iil_baseline
) and may be checked using tensorboard
and specifying the --logidr
. On the same folder you will have data.dat
and target.dat
which are the memory maps used by the dataset.
Modified 2020-11-10 by Liam Paull
Start by cloning the gym-duckietown simulator repo:
$ git clone https://github.com/duckietown/gym-duckietown.git
Change into the directory:
$ cd gym-duckietown
Install the package:
$ pip3 install -e .
To run the baseline training procedure, run:
$ python -m learning.train
in the root directory.
Modified 2020-11-10 by Liam Paull
There are several optional flags that may be used to modify hyperparameters of the algorithm:
--episode
or -i
an integer specifying the number of episodes to train the agent, defaults to 10.--horizon
or -r
an integer specifying the length of the horizon in each episode, defaults to 64.--learning-rate
or -l
integer specifying the index from the list [1e-1, 1e-2, 1e-3, 1e-4, 1e-5] to select the learning rate, defaults to 2.--decay
or -d
integer specifying the index from the list [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95] to select the initial probability to choose the teacher, the learner.--save-path
or -s
string specifying the path where to save the trained model, models will be overwritten to keep latest episode, defaults to a file named iil_baseline.pt on the project root.--map-name
or -m
string specifying which map to use for training, defaults to loop_empty.--num-outputs
integer specifying the number of outputs the model will have, can be modified to train only angular speed, defaults to 2 for both linear and angular speed.--domain-rand
or -dr
a flag to enable domain randomization for the transferability to real world from simulation.--randomize-map
or -rm
a flag to randomize training maps on reset.The baseline model is based on the Dronet model. The feature extractor of the model is frozen while the classifier is modified for the regression task.
All the PyTorch boilerplate code is encapsulated in the NeuralNetworkPolicy
class implemented on learning/imitation/iil-dagger/learner/neural_network_policy.py
and is based on previous work done by Manfred Díaz on Tensorflow.
Modified 2020-11-10 by Liam Paull
A simple testing script test.py
is provided with this implementation.
It loads the latest model from the the provided directory and runs it on the simulator. To test the model:
$ python -m learning.test --model-path path
The model path flag has to be provided for the script to load the model:
--model-path
or -mp
string specifying the path to the saved model to be used in testing.Other optional flags that may be used are:
--episode
or -i
an integer specifying the number of episodes to test the agent, defaults to 10.--horizon
or -r
an integer specifying the length of the horizon in each episode, defaults to 64.--save-path
or -s
string specifying the path where to save the trained model, models will be overwritten to keep latest episode, defaults to a file named iil_baseline.pt on the project root.--num-outputs
integer specifying the number of outputs the model has, defaults to 2.--map-name
or -m
string specifying which map to use for training, defaults to loop_empty.Modified 2020-11-10 by Liam Paull
The following video shows the results for training the agent during 130 episodes and keeping the rest of the configuration to its default:
Modified 2020-11-10 by Liam Paull
Some ideas on how to improve on the provided baseline:
Modified 2020-11-10 by Liam Paull
@phdthesis{diaz2018interactive,
title={Interactive and Uncertainty-aware Imitation Learning: Theory and Applications},
author={Diaz Cabrera, Manfred Ramon},
year={2018},
school={Concordia University}
}
@inproceedings{ross2011reduction,
title={A reduction of imitation learning and structured prediction to no-regret online learning},
author={Ross, St{\'e}phane and Gordon, Geoffrey and Bagnell, Drew},
booktitle={Proceedings of the fourteenth international conference on artificial intelligence and statistics},
pages={627--635},
year={2011}
}
@article{loquercio2018dronet,
title={Dronet: Learning to fly by driving},
author={Loquercio, Antonio and Maqueda, Ana I and Del-Blanco, Carlos R and Scaramuzza, Davide},
journal={IEEE Robotics and Automation Letters},
volume={3},
number={2},
pages={1088--1095},
year={2018},
publisher={IEEE}
}