Modified 2020-06-09 by Linus1994
Maintainer: Linus Lingg
Modified 2020-06-11 by Linus1994
This section of the book focuses on the concept of behaviour benchmarking in Duckietown. There will be first a brief general introduction to introduce the general architecture of the Behaviour Benchmarking. The most important parts are explained as well as some general information is given. The presentation as well as the repository corresponding to the Behaviour Benchmarking are also available.
Modified 2020-06-11 by Linus1994
Below you can see the general architecture that was set up for the Behaviour Benchmarking. The Nodes that are marked in green as well as the general architecture form are the same for no mather what behaviour that is benchmarked. The other nodes might vary slightly from behaviour to behaviour.
So first of all, the Behaviour defines the experiment definition as well as the environment definition. Based on that the hardware set up is explained and can be done. After this there will be a Hardware check which assures that everything is ready to compute results that can then be compared fairly with other Benchmarks of the same behaviour. Furthermore, the software set up will be explained and can be prepared. With all that done, it is all ready to run the actual experiments in which all the needed data is recorded. It is necessary to run at least two experiments to be able to check if the data received is reliable. Reliable means, that the there was no falsified measurement by the localization system for example, and that “lucky” runs can be excluded for sure. Therefore, the user is asked in the beginning to run at least two experiments, out of those measurements it will then later, based on the repetition criteria, be concluded if this data is reliable or if the user needs to run another experiment. Collecting all the needed data means that on one hand the diagnostic toolbox is running to record all kind of information about the overall CPU usage, memory usage etc as well as the CPU usage etc from all the different containers running and even of each node. On the other hand a bag is recorded directly on the Duckiebot which records all kind of necessary data published directly by the Duckiebot which are information about its estimations, the latency, update frequency of all the different topics etc. And last but not least, a bag is recorded that records everything from the localization system (same procedure as offline localization) which is taken as the Ground Truth . The localization system measures the position of the center of the April Tag placed on top of the localization standoff on the Duckiebot. So each time the position of the Duckiebot or the speed (for example time needed per tile etc) of the Duckiebot is mentioned it is always referred to the center of the AprilTag on top of the Duckiebot. This means that these results require that this April Tag has been mounted very precisely.
The recorded bags are then processed:
The one recorded on the Duckiebot is processed by the container analyze_rosbag
which extracts the Information about latency, update frequency of the nodes as well as the needed information published by the Duckiebot and saves it into json files.
The one from the localization system is processed as explained in the classical offline localization system to extract the ground truth information about the whereabouts of the Duckiebot. The original post-processor just processes the apriltags and odometry from the bag and writes it in a processed bag. However, there exists a modified version which extract the relative pose estimation of the Duckiebot and write it directly in a .yaml file (More details are given below). The graph-optimizer then will extract the ground truth trajectories of all the Duckiebots that were in sight and store it in a yaml file.
The data collected by the diagnostic toolbox does not need any processing and only needs to be downloaded from this link.
The processed data is then analyzed in the following notebooks: 1. 95-Trajectory-statistics.ipynb: This extracts all the information measured by the localization system and calculates all the things related to the poses of the Duckiebots. 2. 97-compare_calc_mean_benchmarks.ipynb: In this notebook it is analyzed if the measurements are stable enough such they can be actually compared to other results. There the repetition criteria is checked and if the standard deviation is too high the user is told to run another experiment to collect more data. As soon as the standard deviation is low enough the mean of all the measurements is calculated and then stored into yaml file that then can be compared to other results. 3. 96-compare_2_benchmarks.ipynb: This notebook serves to do the actual comparison between two Benchmark results or to do an analysis of your Benchmark (if there is no other one to compare to). It analyzes the performance based on many different properties and scores the performance. In the very end a final report can be extracted which summarizes the entire Benchmark.
So let me specify a couple of things:
Please note that the analysis is also possible if not all the data is collected, for example if the user does not record a bag on the Duckiebot or if the Diagnostic Toolbox was not running. At the moment the data recorded from the localization system is necessary to get an actual scoring in the end, however this can easily be adapted.
Modified 2020-06-11 by Linus1994
All the packages created for the Behvaiour Benchmarkin can be found here. This repository includes the following new contributions:
Packages:
Notebooks:
Please note, that for each notebook there exists an example notebook that shows the results computed when the notebook is run with actual data.
Modified 2020-06-11 by Linus1994
The goal of this Behaviour Benchmarking is that for each behaviour and each major version (Ex: Master19, Daffy, Ente etc) there is a huge data set out of which the mean of all the measurements is calculated to build a stable and very reliable reference Benchmark. The developer can then compare its contribution to the results achieved by the global Duckietown performance.
At the moment the user is asked to upload its recorded bags to this link. However, this link should be changed as soon as possible as the storage is limited. The data storage should be automated anyways.
Modified 2020-06-11 by Linus1994
To design a Behaviour Benchmark in the future, all the needed packages and notebooks can be found in this repository, simply (fork and) clone it and then add your contribution. The readme file found in the repository corresponds to the set up and preparation of the lane following benchmark. Just copy it and adapt the sections that need to be changed. The most important sections that need to be adapted are the actual environment and experiment set up and the topics one has to subscribe to for the bag recorded on the Duckiebot.