Publications based on BeamNG.tech

Citation

Use of BeamNG.tech in non-commercial, academic studies should be properly cited in any articles, conference papers, presentations made about research projects, etc. Please adhere to the citation format required by your institution or publication, using the information below:

Title: BeamNG.tech
Author: BeamNG GmbH
Address: Bremen, Germany
Year: 2022
Version: 0.25.0.0
URL: https://www.beamng.tech/

Send us your filled form incl. your research paper(s) to tech@beamng.gmbh using the following subject line: Consent to publish research paper(s) using BeamNG.tech.

@software{beamng_tech,
    title = "{B}eam{NG}.tech",
    author = {{BeamNG GmbH}},
    url = {https://www.beamng.tech/},
    version = {0.25.0.0},
    date = {2022-06-15},
}

Below you will find a collection of bachelor/master theses, research papers, papers for conferences and/or case studies using BeamNG.tech.

2024
Stefan Klikovits, Alessio Gambi, Deepak Dhungana and Rick Rabiser Johannes Kepler University, Austria, IMC Krems University of Applied Science, Austria Association for Computing Machinery

Abstract

Extensive testing of Automated Driving Systems (ADS), such as Advanced Driver Assistance Systems and Autonomous Vehicles, is commonly conducted using simulators programmed to implement various driving scenarios, a technique known as scenario-based testing. ADS scenario-based testing using simulations is challenging because it requires identifying scenarios that can effectively test ADS functionalities while ensuring that driving simulators’ features match the driving scenarios’ requirements. This short paper discusses the main challenges of systematically conducting simulation-based testing and proposes leveraging Software Product Line techniques to address them. Specifically, we argue that variability models can be used to support testers in generating test scenarios by effectively capturing and relating the variability in driving simulators, testing scenarios, and ADS implementations. We conclude by outlining an agenda for future research in this important area.

Christian Birchler, Cyrill Rohrbach, Timo Kehrer and Sebastiano Panichella University of Bern, Switzerland, Zurich University of Applied Sciences Switzerland arXiv
Abstract Developing tools in the context of autonomous systems [22, 24 ], such as self-driving cars (SDCs), is time-consuming and costly since researchers and practitioners rely on expensive computing hardware and simulation software. We propose SensoDat, a dataset of 32,580 executed simulation-based SDC test cases generated with state-of-the-art test generators for SDCs. The dataset consists of trajectory logs and a variety of sensor data from the SDCs (e.g., rpm, wheel speed, brake thermals, transmission, etc.) represented as a time series. In total, SensoDat provides data from 81 different simulated sensors. Future research in the domain of SDCs does not necessarily depend on executing expensive test cases when using SensoDat. Furthermore, with the high amount and variety of sensor data, we think SensoDat can contribute to research, particularly for AI development, regression testing techniques for simulation-based SDC testing, flakiness in simulation, etc.
Christian Birchler, Tanzil Kombarabettu Mohammed, Pooja Rani, Teodora Nechita, Timo Kehrer and Sebastiano Panichella Zurich University of Applied Sciences & University of Bern, Switzerland, University of Zurich, Switzerland arXiv
Abstract Software metrics such as coverage and mutation scores have been extensively explored for the automated quality assessment of test suites. While traditional tools rely on such quantifiable software metrics, the field of self-driving cars (SDCs) has primarily focused on simulation-based test case generation using quality metrics such as the out-of-bound (OOB) parameter to determine if a test case fails or passes. However, it remains unclear to what extent this quality metric aligns with the human perception of the safety and realism of SDCs, which are critical aspects in assessing SDC behavior. To address this gap, we conducted an empirical study involving 50 participants to investigate the factors that determine how humans perceive SDC test cases as safe, unsafe, realistic, or unrealistic. To this aim, we developed a framework leveraging virtual reality (VR) technologies, called SDC-Alabaster, to immerse the study participants into the virtual environment of SDC simulators. Our findings indicate that the human assessment of the safety and realism of failing and passing test cases can vary based on different factors, such as the test’s complexity and the possibility of interacting with the SDC. Especially for the assessment of realism, the participants’ age as a confounding factor leads to a different perception. This study highlights the need for more research on SDC simulation testing quality metrics and the importance of human perception in evaluating SDC behavior.
Amini, M.H., Naseri, S. and Nejati, S. University of Ottawa, Canada Springer Link
Abstract Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023).

Abstract

Deep Neural Networks (DNNs) for Autonomous Driving Systems (ADS) are typically trained on real-world images and tested using synthetic simulator images. This approach results in training and test datasets with dissimilar distributions, which can potentially lead to erroneously decreased test accuracy. To address this issue, the literature suggests applying domain-to-domain translators to test datasets to bring them closer to the training datasets. However, translating images used for testing may unpredictably affect the reliability, effectiveness and efficiency of the testing process. Hence, this paper investigates the following questions in the context of ADS: Could translators reduce the effectiveness of images used for ADS-DNN testing and their ability to reveal faults in ADS-DNNs? Can translators result in excessive time overhead during simulation-based testing? To address these questions, we consider three domain-to-domain translators: CycleGAN and neural style transfer, from the literature, and SAEVAE, our proposed translator. Our results for two critical ADS tasks – lane keeping and object detection – indicate that translators significantly narrow the gap in ADS test accuracy caused by distribution dissimilarities between training and test data, with SAEVAE outperforming the other two translators. We show that, based on the recent diversity, coverage, and fault-revealing ability metrics for testing deep-learning systems, translators do not compromise the diversity and the coverage of test data, nor do they lead to revealing fewer faults in ADS-DNNs. Further, among the translators considered, SAEVAE incurs a negligible overhead in simulation time and can be efficiently integrated into simulation-based testing. Finally, we show that translators increase the correlation between offline and simulation-based testing results, which can help reduce the cost of simulation-based testing.

Paolo Arcaini and Ahmet Cetinkaya National Institute of Informatics, Tokyo, Japan & Shibaura Institute of Technology, Tokyo, Japan ELSEVIER

Abstract

Simulation-based testing of autonomous driving systems (ADS) consists in finding scenarios in which the ADS misbehaves, e.g., it leads the car to drive off the road. The road geometry is an important feature of the scenario, as it has a direct impact on the ADS, e.g., its ability to keep the car inside the driving lane. In this paper, we present CRAG, a road generator for ADS testing. CRAG uses combinatorial testing to explore high level road configurations, and search for finding concrete road geometries in these configurations. CRAG has been designed in a way that it can be easily extended in terms of generator of combinatorial test suites, search algorithms, and test goals.

Moghadam, Mahshid Helali and Borg, Markus and Saadatmand, Mehrdad and Mousavirad, Seyed Jalaleddin and Bohlin, Markus and Lisper, Björn Smart Industrial Automation, RISE Research Institutes of Sweden, Humanized Autonomy, RISE Research Institutes of Sweden, Universidade da Beira Interior, Portugal & School of Innovation, Design and Engineering, Mälardalen University, Sweden WILEY Online Library

Abstract

This paper presents an extended version of Deeper, a search-based simulation-integrated test solution that generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system. In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), (μ+λ) and (μ,λ) evolution strategies (ES), and particle swarm optimization (PSO), that leverage a quality population seed and domain-specific crossover and mutation operations tailored for the presentation model used for modeling the test scenarios. In order to demonstrate the capabilities of the new test generators within Deeper, we carry out an empirical evaluation and comparison with regard to the results of five participating tools in the cyber-physical systems testing competition at SBST 2021. Our evaluation shows the newly proposed test generators in Deeper not only represent a considerable improvement on the previous version but also prove to be effective and efficient in provoking a considerable number of diverse failure-revealing test scenarios for testing an ML-driven lane-keeping system. They can trigger several failures while promoting test scenario diversity, under a limited test time budget, high target failure severity, and strict speed limit constraints.

ARYA, BHAVINKUMAR and Yao, Jianchun and Fard, Mohammad and Davy, John Laurence Royal Melbourne Institute of Technolog (RMIT University) SSRN

Abstract

Current methods for evaluating the crashworthiness of safety barriers, such as physical crash tests and a finite element analysis, often face limitations in terms of cost and computational time. This paper investigates the innovative application of soft body physics simulation as an effective method for evaluating the crashworthiness of tyre barriers in motorsports. A bottom-up approach is implemented to create a mass-spring model of a tyre barrier system. The model was validated against a physical crash test, demonstrating its accuracy in predicting peak deceleration during impact. The research highlights the advantages of soft body physics simulations, including real-time simulation capabilities and the ability to simulate large-scale structures like a tyre barrier. These benefits offer significant cost and time savings compared to traditional methods. The study concludes that soft body physics simulation holds great promise for evaluating the crashworthiness of safety barriers in motorsports. Further research in this area is warranted to explore the full potential of this innovative approach.

Anastasia Norenko and Alexander Franco Edlund Umeå University, Faculty of Social Sciences, Department of Psychology, Sweden DiVA

Abstract

Trust is important for the adoption of autonomous vehicles. Providing voiced explanations explaining a vehicle’s behavior has been found to improve trust, but it is unclear how explanations should be presented. In this experimental study, it was investigated whether manipulating the temporal sequence of “How” explanations, that describe what the vehicle does, and “Why” explanations, that describe why the vehicle does something, influence users’ trust, mental workload, situational awareness, and preferences. The research questions are: 1. What is the optimal sequence of presenting explanations, measured by the least amount of mental workload, increased trust, and increased situational awareness? 2. Which explanation type do people prefer the most? 3. Does explanation type affect attentional task performance? These questions were assessed by using questionnaires and qualitative data in a within group design.

Tahereh Zohdinasab Università della Svizzera Italiana, Lugano, Switzerland sonar

Abstract

Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s misbehaviour. This research delves into diverse methodologies aimed at overcoming challenges inherent in testing DL systems, with a particular focus on generating targeted test cases and interpreting system behaviours. To this aim, we proposed three novel testing approaches for DL systems, i.e., DEEPHYPERION-CS, DEEPATASH, and DEEPTHEIA.

Tahereh Zohdinasab, Vincenzo Riccio and Paolo Tonella Università della Svizzera Italiana, Lugano, Switzerland, University of Udine, Udine, Italy ACM

Abstract

Testing Autonomous Driving Systems (ADSs) is crucial to ensure their reliability when navigating complex environments. ADSs may exhibit unexpected behaviours when presented, during operation, with driving scenarios containing features inadequately represented in the training dataset. To address this shift from development to operation, developers must acquire new data with the newly observed features. This data can be then utilised to fine tune the ADS, so as to reach the desired level of reliability in performing driving tasks. However, the resource-intensive nature of testing ADSs requires efficient methodologies for generating targeted and diverse tests. In this work, we introduce a novel approach, DeepAtash-LR, that incorporates a surrogate model into the focused test generation process. This integration significantly improves focused testing effectiveness and applicability in resource-intensive scenarios. Experimental results show that the integration of the surrogate model is fundamental to the success of DeepAtash-LR. Our approach was able to generate an average of up to 60× more targeted, failure-inducing inputs compared to the baseline approach. Moreover, the inputs generated by DeepAtash-LR were useful to significantly improve the quality of the original ADS through fine tuning.

Timo Blattner and Christian Birchler and Timo Kehrer and Sebastiano Panichella University of Bern, Switzerland, Zurich University of Applied Sciences Switzerland arXiv

Abstract

The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. Prior research introduced simulation-based testing for SDCs, with Frenetic, a test generation approach based on Frenet space encoding, achieving a relatively high percentage of valid tests (approximately 50%) characterized by naturally smooth curves. The “minimal out-of-bound distance” is often taken as a fitness function, which we argue to be a sub-optimal metric. Instead, we show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model. We combine this “inherently learned metric” with a genetic algorithm, which has been shown to produce a high diversity of tests. To validate our approach, we conducted a large-scale empirical evaluation on a dataset comprising over 1,174 simulated test cases created to challenge the SDCs behavior. Our investigation revealed that our approach demonstrates a substantial reduction in generating non-valid test cases, increased diversity, and high accuracy in identifying safety violations during SDC test execution.

Neelofar Neelofar and Aldeida Aleti Monash University, Melbourne, Australia ACM

Abstract

Ensuring the safety of autonomous vehicles (AVs) is of utmost importance, and testing them in simulated environments is a safer option than conducting in-field operational tests. However, generating an exhaustive test suite to identify critical test scenarios is computationally expensive, as the representation of each test is complex and contains various dynamic and static features, such as the AV under test, road participants (vehicles, pedestrians, and static obstacles), environmental factors (weather and light), and the road’s structural features (lanes, turns, road speed, etc.). In this article, we present a systematic technique that uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in two dimensions. This visualisation helps to identify untested regions of the instance space and provides an indicator of the quality of the test suite in terms of the percentage of feature space covered by testing. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe. The high precision, recall, and F1 scores indicate that our proposed approach is effective in predicting the outcome of a test scenario without executing it and can be used for test generation, selection, and prioritisation.

Abstract

This thesis focuses on developing and refining simulation tools to better model road and traffic environments, using platforms such as BeamNG.tech and Hexagon’s Virtual Test Drive (VTD). A key aspect of this work is the customization of OpenStreetMap (OSM) imports, which improves the accuracy of road geometries and elevation profiles, crucial for realistic simulation environments. The implementation of these features enables more reliable testing of ADAS and autonomous vehicles in various driving conditions, including those with challenging terrain. This research not only aims to improve the realism of these simulations but also explores the differences in performance and outcomes between standard and customized imports. Through this work, the goal is to enhance the overall utility of simulation platforms for both academic research and industry applications in autonomous vehicle development.

2023
Carl Hildebrandt, Meriel von Stein, and Sebastian Elbaum University of Virginia Charlottesville, USA ACM

Abstract

Adequately exercising the behaviors of autonomous vehicles is fundamental to their validation. However, quantifying an autonomous vehicle’s testing adequacy is challenging as the system’s behavior is influenced both by its state as well as its physical environment. To address this challenge, our work builds on two insights. First, data sensed by an autonomous vehicle provides a unique spatial signature of the physical environment inputs. Second, given the vehicle’s current state, inputs residing outside the autonomous vehicle’s physically reachable regions are less relevant to its behavior. Building on those insights, we introduce an abstraction that enables the computation of a physical environment-state coverage metric, PhysCov. The abstraction combines the sensor readings with a physical reachability analysis based on the vehicle’s state and dynamics to determine the region of the environment that may affect the autonomous vehicle. It then characterizes that region through a parameterizable geometric approximation that can trade quality for cost. Tests with the same characterizations are deemed to have had similar internal states and exposed to similar environments and thus likely to exercise the same set of behaviors, while tests with distinct characterizations will increase PhysCov. A study on two simulated and one real system’s dataset examines PhysCovs’s ability to quantify an autonomous vehicle’s test suite, showcases its characterization cost and precision, investigates its correlation with failures found and potential for test selection, and assesses its ability to distinguish among real-world scenarios.

Sidharth Talia, Matt Schmittle, Alexander Lambert, Alexander Spitzer, Christoforos Mavrogiannis, Siddhartha S. Srinivasa University of Washington, USA arXiv
Abstract Off-road vehicles are susceptible to rollovers in terrains with large elevation features, such as steep hills, ditches, and berms. One way to protect them against rollovers is ruggedization through the use of industrial-grade parts and physical modifications. However, this solution can be prohibitively expensive for academic research labs. Our key insight is that a software-based rollover-prevention system (RPS) enables the use of commercial-off-the-shelf hardware parts that are cheaper than their industrial counterparts, thus reducing overall cost. In this paper, we present HOUND, a small-scale, inexpensive, off-road autonomy platform that can handle challenging outdoor terrains at high speeds through the integration of an RPS. HOUND is integrated with a complete stack for perception and control, geared towards aggressive offroad driving. We deploy HOUND in the real world, at high speeds, on four different terrains covering 50 km of driving and highlight its utility in preventing rollovers and traversing difficult terrain. Additionally, through integration with BeamNG, a state-of-the-art driving simulator, we demonstrate a significant reduction in rollovers without compromising turning ability across a series of simulated experiments. Supplementary material can be found on our website, where we will also release all design documents for the platform.
Dmytro Humeniuk, Foutse Khomh and Giuliano Antoniol Polytechnique Montréal arXiv
Abstract Evolutionary search-based techniques are commonly used for testing autonomous robotic systems. However, these approaches often rely on computationally expensive simulator-based models for test scenario evaluation. To improve the computational efficiency of the search-based testing, we propose augmenting the evolutionary search (ES) with a reinforcement learning (RL) agent trained using surrogate rewards derived from domain knowledge. In our approach, known as RIGAA (Reinforcement learning Informed Genetic Algorithm for Autonomous systems testing), we first train an RL agent to learn useful constraints of the problem and then use it to produce a certain part of the initial population of the search algorithm. By incorporating an RL agent into the search process, we aim to guide the algorithm towards promising regions of the search space from the start, enabling more efficient exploration of the solution space. We evaluate RIGAA on two case studies: maze generation for an autonomous ‘Ant’ robot and road topology generation for an autonomous vehicle lane keeping assist system. In both case studies, RIGAA converges faster to fitter solutions and produces a better test suite (in terms of average test scenario fitness and diversity). RIGAA also outperforms the state-of-the-art tools for vehicle lane keeping assist system testing, such as AmbieGen and Frenetic.
Jesper Winsten and Iván Porres Åbo Akademi University, Finland IEEE
Abstract WOGAN 2023 is an online test generation tool based on Wasserstein generative adversarial networks. We show how it can be applied to the SBFT 2023 competition to generate failure-inducing roads to test the lane keeping assist system of an autonomous car. The WOGAN 2023 tool is based on a previous entry to the competition. Here, we present improvements to WOGAN in the form of a combined fitness function based on the body out of lane percentage and the distance from the center of the lane. We also use road rotations in the road generation stage for greater test diversity.
Matteo Biagiola, Andrea Stocco, Vincenzo Riccio, Paolo Tonella Università della Svizzera italiana (USI), Università degli Studi di Udine arXiv
Abstract Simulation-based testing represents an important step to ensure the reliability of autonomous driving software. In practice, when companies rely on third-party general-purpose simulators, either for in-house or outsourced testing, the generalizability of testing results to real autonomous vehicles is at stake. In this paper, we strengthen simulation-based testing by introducing the notion of digital siblings, a novel framework in which the AV is tested on multiple general-purpose simulators, built with different technologies. First, test cases are automatically generated for each individual simulator. Then, tests are migrated between simulators, using feature maps to characterize of the exercised driving conditions. Finally, the joint predicted failure probability is computed and a failure is reported only in cases of agreement among the siblings. We implemented our framework using two open-source simulators and we empirically compared it against a digital twin of a physical scaled autonomous vehicle on a large set of test cases. Our study shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin. We discuss several ways in which our framework can help researchers interested in automated testing of autonomous driving software.
Christian Birchler, Cyrill Rohrbach, Hyeongkyun Kim, Alessio Gambi, Tianhai Liu, Jens Horneber, Timo Kehrer and Sebastiano Panichella Zurich University of Applied Sciences, University of Bern, University of Zurich, IMC University of Applied Sciences Krems, aicas GmbH arXiv

Abstract

Software systems for safety-critical systems like self-driving cars (SDCs) need to be tested rigorously. Especially electronic control units (ECUs) of SDCs should be tested with realistic input data. In this context, a communication protocol called Controller Area Network (CAN) is typically used to transfer sensor data to the SDC control units. A challenge for SDC maintainers and testers is the need to manually define the CAN inputs that realistically represent the state of the SDC in the real world. To address this challenge, we developed TEASER, which is a tool that generates realistic CAN signals for SDCs obtained from sensors from state-of-the-art car simulators. We evaluated TEASER based on its integration capability into a DevOps pipeline of aicas GmbH, a company in the automotive sector. Concretely, we integrated TEASER in a Continous Integration (CI) pipeline configured with Jenkins. The pipeline executes the test cases in simulation environments and sends the sensor data over the CAN bus to a physical CAN device, which is the test subject. Our evaluation shows the ability of TEASER to generate and execute CI test cases that expose simulation-based faults (using regression strategies); the tool produces CAN inputs that realistically represent the state of the SDC in the real world. This result is of critical importance for increasing automation and effectiveness of simulation-based CAN bus regression testing for SDC software.

Birchler, Christian and Khatiri, Sajad and Derakhshanfar, Pouria and Panichella, Sebastiano and Panichella, Annibale Zurich University of Applied Science, Delft University of Technology, Software Institute - USI, Lugano Association for Computing Machinery

Abstract

Testing with simulation environments helps to identify critical failing scenarios for self-driving cars (SDCs). Simulation-based tests are safer than in-field operational tests and allow detecting software defects before deployment. However, these tests are very expensive and are too many to be run frequently within limited time constraints. In this article, we investigate test case prioritization techniques to increase the ability to detect SDC regression faults with virtual tests earlier. Our approach, called SDC-Prioritizer, prioritizes virtual tests for SDCs according to static features of the roads we designed to be used within the driving scenarios. These features can be collected without running the tests, which means that they do not require past execution results. We introduce two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on these static features. These two approaches, called SO-SDC-Prioritizer and MO-SDC-Prioritizer, use single-objective and multi-objective genetic algorithms (GA), respectively, to find trade-offs between executing the less expensive tests and the most diverse test cases earlier. Our empirical study conducted in the SDC domain shows that MO-SDC-Prioritizer significantly (P- value <=0.1e-10) improves the ability to detect safety-critical failures at the same level of execution time compared to baselines: random and greedy-based test case orderings. Besides, our study indicates that multi-objective meta-heuristics outperform single-objective approaches when prioritizing simulation-based tests for SDCs. MO-SDC-Prioritizer prioritizes test cases with a large improvement in fault detection while its overhead (up to 0.45% of the test execution cost) is negligible.

Matteo Biagiola, Stefan Klikovits, Jarkko Peltomäki and Vincenzo Riccio Università della Svizzera italiana, Johannes Kepler University Linz, Åbo Akademi University, University of Udine IEEE

Abstract

We report on the organization and results of the third edition of the Cyber-Physical Systems tool competition, held as part of the SBFT workshop. Six tools (i.e., CRAG, EvoMBT, RIGAA, RoadSign, Spirale, and WOGAN) competed with the aim of triggering failures of two autonomous driving agents.We evaluated the effectiveness of the tools in exposing failures as well as the diversity of the generated failures. This report describes our methodology, the competitors, the results, and the challenges we faced while running the competition experiments.

Jon Ayerdi, Aitor Arrieta and Miren Illarramendi Mondragon University IEEE

Abstract

RoadSign is a search-based test generation tool which aims to generate failure-revealing roads for self-driving cars. RoadSign combines a seeding approach which ensures a diverse initial population of roads with a multi-objective optimization process which aims to maximize road features which may be likely to reveal driving failures. This tool has been evaluated in the SBFT 2023 CPS tool competition, where various road-generation tools have been evaluated in a simulated environment with two different driving agents.

Domenico De Vivo and Anna Rita Fasolino Università degli Studi di Napoli Federico II IEEE
Abstract In this paper, we present Spirale, a search-based testing tool designed to generate scenarios for testing Lane-Keeping Assist Systems (LKAS). Spirale took part in the CPS (Cyber-Physical Systems) testing competition held at SBFT 2023.
Jan Heidinger, Lukas Bernhardt and Thomas Franke University of Lübeck ACM
Abstract Conducting research on user-energy interaction in automotive systems in controlled settings is challenging due to the lack of availability of low-cost driving simulation environments that enable both (1) a precise simulation of vehicle energy dynamics and (2) a high-fidelity representation of the driving environment. This Extended Abstract presents EcoSimLab, a driving simulator environment for rapid prototyping, testing and evaluation of energy interface and eco assistance design concepts as well as for comprehensive studies on eco-driving behavior and further facets of user-energy interaction. We present the system architecture based on BeamNG.tech, initial data on usability and energy model validity, and we discuss future enhancements and potential applications of EcoSimLab.
Dmytro Humeniuk, Foutse Khomh and Giuliano Antoniol Polytechnique Montréal IEEE
Abstract Testing and verification of autonomous systems is critically important. In the context of SBFT 2023 CPS testing tool competition, we present our tool RIGAA for generating virtual roads to test an autonomous vehicle lane keeping assist system. RIGAA combines reinforcement learning as well as evolutionary search to generate test scenarios. It has achieved the second highest final score among 5 other submitted tools.
Stefan Klikovits, Ezequiel Castellano, Ahmet Cetinkaya and Paolo Arcaini Johannes Kepler University Linz, National Institute of Informatics - Tokyo, Shibaura Institute of Technology - Tokyo ELSEVIER
Abstract Being capable of identifying significant safety shortcomings, search-based methods are a core tool for testing automated driving system (ADS) technologies. In this domain, Frenetic has proven to be a popular and very effective tool, searching and identifying diverse sets of roads that point out potentially faulty ADS behaviour. This paper presents Frenetic-lib, a Python library that captures Frenetic’s novel combination of road representation and genetic algorithm, and makes it generally available in a customisable way. Next to the capacity to integrate additional ADS simulators, Frenetic-lib further creates new research opportunities on search-based road testing, novel road representations and mutation operators.
Maksim Katerishich, Mikhail Kurenkov, Sausar Karaf, Artem Nenashev and Dzmitry Tsetserukou 2023 arXiv
Abstract Motion planning in dynamically changing environments is one of the most complex challenges in autonomous driving. Safety is a crucial requirement, along with driving comfort and speed limits. While classical sampling-based, latticebased, and optimization-based planning methods can generate smooth and short paths, they often do not consider the dynamics of the environment. Some techniques do consider it, but they rely on updating the environment on-the-go rather than explicitly accounting for the dynamics, which is not suitable for self-driving. To address this, we propose a novel method based on the Neural Field Optimal Motion Planner (NFOMP), which outperforms state-of-the-art approaches in terms of normalized curvature and the number of cusps. Our approach embeds previously known moving obstacles into the neural field collision model to account for the dynamics of the environment. We also introduce time profiling of the trajectory and non-linear velocity constraints by adding Lagrange multipliers to the trajectory loss function. We applied our method to solve the optimal motion planning problem in an urban environment using the BeamNG.tech driving simulator. An autonomous car drove the generated trajectories in three city scenarios while sharing the road with the obstacle vehicle. Our evaluation shows that the maximum acceleration the passenger can experience instantly is -7.5 m/s2 and that 89.6% of the driving time is devoted to normal driving with accelerations below 3.5 m/s2. The driving style is characterized by 46.0% and 31.4% of the driving time being devoted to the light rail transit style and the moderate driving style, respectively.
Tahir, Mehwish, Yuansong Qiao, Nadia Kanwal, Brian Lee and Mamoona Naveed Asghar Technological University of the Shannon (TUS), University of Keele, University of Galway MDPI

Abstract

The purpose of smart surveillance systems for automatic detection of road traffic accidents is to quickly respond to minimize human and financial losses in smart cities. However, along with the self-evident benefits of surveillance applications, privacy protection remains crucial under any circumstances. Hence, to ensure the privacy of sensitive data, European General Data Protection Regulation (EU-GDPR) has come into force. EU-GDPR suggests data minimisation and data protection by design for data collection and storage. Therefore, for a privacy-aware surveillance system, this paper targets the identification of two areas of concern: (1) detection of road traffic events (accidents), and (2) privacy preserved video summarization for the detected events in the surveillance videos. The focus of this research is to categorise the traffic events for summarization of the video content, therefore, a state-of-the-art object detection algorithm, i.e., You Only Look Once (YOLOv5), has been employed. YOLOv5 is trained using a customised synthetic dataset of 600 annotated accident and non-accident video frames. Privacy preservation is achieved in two steps, firstly, a synthetic dataset is used for training and validation purposes, while, testing is performed on real-time data with an accuracy from 55% to 85%. Secondly, the real-time summarized videos (reduced video duration to 42.97% on average) are extracted and stored in an encrypted format to avoid un-trusted access to sensitive event-based data. Fernet, a symmetric encryption algorithm is applied to the summarized videos along with Diffie–Hellman (DH) key exchange algorithm and SHA256 hash algorithm. The encryption key is deleted immediately after the encryption process, and the decryption key is generated at the system of authorised stakeholders, which prevents the key from a man-in-the-middle (MITM) attack.

Christian Birchler, Sajad Khatiri, Bill Bosshard, Alessio Gambi and Sebastiano Panichella Zurich University of Applied Sciences, IMC University of Applied Sciences Krems Springer

Abstract

Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects’ quality and reliability, and the execution of “safe and uninformative” test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor outperforms the baselines. With the Logistic model, we achieve an accuracy of 70%, a precision of 65%, and a recall of 80% in selecting tests leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as outperformed two baseline strategies. Complementary to existing work, we also integrated SDC-Scissor into the context of an industrial organization in the automotive domain to demonstrate how it can be used in industrial settings.

Mayuresh Bhosale, Longxiang Guo, Gurcan Comert and Yunyi Jia Clemson University, Benedict College - Columbia MDPI

Abstract

Road hazards are one of the significant sources of fatalities in road accidents. The accurate estimation of road hazards can ensure safety and enhance the driving experience. Existing methods of road condition monitoring are time-consuming, expensive, inefficient, require much human effort, and need to be regularly updated. There is a need for a flexible, cost-effective, and efficient process to detect road conditions, especially road hazards. This work presents a new method to deal with road hazards using smartphones. Since most of the population drives cars with smartphones on board, we aim to leverage this to detect road hazards more flexibly, cost-effectively, and efficiently. This paper proposes a cloud-based deep-learning road hazard detection model based on a long short-term memory (LSTM) network to detect different types of road hazards from the motion data. To address the issue of large data requests for deep learning, this paper proposes to leverage both simulation data and experimental data for the learning process. To address the issue of misdetections from an individual smartphone, we propose a cloud-based fusion approach to further improve detection accuracy. The proposed approaches are validated by experimental tests, and the results demonstrate the effectiveness of road hazard detection.

Yuxi Li and Gang Hao Heilongjiang University, Key Laboratory of Information Fusion Estimation and Detection - Harbin MDPI

Abstract

Energy-optimal adaptive cruise control (EACC) is becoming increasingly popular due to its ability to save energy. Considering the negative impacts of system noise on the EACC, an improved modified model predictive control (MPC) is proposed, which combines the Sage-Husaadaptive Kalman filter (SHAKF), the cubature Kalman filter (CKF), and the back-propagation neural network (BPNN). The proposed MPC improves safety and tracking performance while further reducing energy consumption. The final simulation results show that the proposed algorithm has a stronger energy-saving capability compared to previous studies and always maintains an appropriate relative distance and relative speed to the vehicle in front, verifying the effectiveness of the proposed algorithm.

Zohdinasab, Tahereh and Riccio, Vincenzo and Gambi, Alessio and Tonella, Paolo Università della Svizzera Italiana, University of Passau ACM

Abstract

Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour. DeepHyperion was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this article, we propose DeepHyperion-CS, a test generator for DL systems that enhances DeepHyperion by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within DeepHyperion-CS outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of DeepHyperion. DeepHyperion-CS exposed significantly more misbehaviours for five out of six feature combinations and was up to 65% more efficient than DeepHyperion in finding misbehaviour-inducing inputs and exploring the feature space. DeepHyperion-CS was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.

Christian Birchler, Nicolas Ganz, Sajad Khatiri, Alessio Gambi, Sebastiano Panichella Zurich University of Applied Science, Software Institute - USI Lugano, University of Passau ELSEVIER

Abstract

Simulation environments are essential for the continuous development of complex cyber-physical systems such as self-driving cars (SDCs). Previous results on simulation-based testing for SDCs have shown that many automatically generated tests do not strongly contribute to the identification of SDC faults, hence do not contribute towards increasing the quality of SDCs. Because running such “uninformative” tests generally leads to a waste of computational resources and a drastic increase in the testing cost of SDCs, testers should avoid them. However, identifying “uninformative” tests before running them remains an open challenge. Hence, this paper proposes SDC-Scissor, a framework that leverages Machine Learning (ML) to identify SDC tests that are unlikely to detect faults in the SDC software under test, thus enabling testers to skip their execution and drastically increase the cost-effectiveness of simulation-based testing of SDCs software. Our evaluation concerning the usage of six ML models on two large datasets characterized by 22’652 tests showed that SDC-Scissor achieved a classification F1-score up to 96%. Moreover, our results show that SDC-Scissor outperformed a randomized baseline in identifying more failing tests per time unit.

Dmytro Humeniuk, Foutse Khomh, Giuliano Antoniol Polytechnique Montréal arXiv

Abstract

Thorough testing of safety-critical autonomous systems, such as self-driving cars, autonomous robots, and drones, is essential for detecting potential failures before deployment. One crucial testing stage is model-in-the-loop testing, where the system model is evaluated by executing various scenarios in a simulator. However, the search space of possible parameters defining these test scenarios is vast, and simulating all combinations is computationally infeasible. To address this challenge, we introduce AmbieGen, a search-based test case generation framework for autonomous systems. AmbieGen uses evolutionary search to identify the most critical scenarios for a given system, and has a modular architecture that allows for the addition of new systems under test, algorithms, and search operators. Currently, AmbieGen supports test case generation for autonomous robots and autonomous car lane keeping assist systems. In this paper, we provide a high-level overview of the framework’s architecture and demonstrate its practical use cases.

Francesco Basciani, Vittorio Cortellessa, Sergio Di Martino, Dario Di Nucci, Daniele Di Pompeo, Carmine Gravino and Luigi Libero Lucio Starace Università degli Studi dell’Aquila, Università degli Studi di Napoli Federico II, Università degli Studi di Salerno IEEE

Abstract

Advanced Driver Assistance Systems (ADAS) are becoming mandatory for novel vehicles in many nations, as they are widely recognized as a key strategy to improve road safety. Due to their safety-critical nature, ADAS must guarantee the highest safety standards. Nevertheless, the verification and validation of these systems, which are often based on Artificial Intelligence techniques, is a pain point for the automotive industry, as field evaluations are not economically and temporally viable. Furthermore, the widely used Model-in-the-Loop (MiL) validation paradigm struggles when applied to novel ADAS due to the complexity of the scenarios to simulate. A strategy recently proposed in the literature to face this issue is co-simulation, namely the cooperation of a MiL framework with one or more tools, virtually simulating the environment around the vehicle. Although existing, these solutions are still highly tailored, requiring significant manual work to define testing scenarios, also due to the lack of a solid reference framework. This paper presents a preliminary model-based framework to support the design of co-simulation test scenarios for ADAS, featuring model-based testing assertions through first-order logic formulas. The proposed framework includes a visual editor which empowers domain experts to easily design test scenarios that can be automatically executed using the state-of-the-art virtual environment simulator BeamNG. The solution presented in this paper is our first step towards defining a more comprehensive framework for testing ADAS in co-simulation, providing an environment where testers are not burdened with the time-consuming and low-level task of manually defining each aspect of the virtual testbed.

2022
Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Mohammad-Ali Nikouei Mahani, Nassir Navab, Benjamin Busam, Federico Tombari Technical University of Munich, BMW Group, Johns Hopkins University, Google ACM

Abstract

As 3D object detection on point clouds relies on the geometrical relationships between the points, non-standard object shapes can hinder a method’s detection capability However, in safety-critical settings, robustness on out-ofdistribution and long-tail samples is fundamental to circumvent dangerous issues, such as the misdetection of damaged or rare cars. In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by taking into account deformed point clouds during training. We achieve this with 3D-VField: a novel method that plausibly deforms objects via vectors learned in an adversarial fashion. Our approach constrains 3D points to slide along their sensor view rays while neither adding nor removing any of them. The obtained vectors are transferrable, sample-independent and preserve shape smoothness and occlusions. By augmenting normal samples with the deformations produced by these vector fields during training, we significantly improve robustness against differently shaped objects, such as damaged/deformed cars, even while training only on KITTI. Towards this end, we propose and share open source CrashD: a synthetic dataset of realistic damaged and rare cars, with a variety of crash scenarios. Extensive experiments on KITTI, Waymo, our CrashD and SUN RGB-D show the high generalizability of our techniques to out-of-domain data, different models and sensors, namely LiDAR and ToF cameras, for both indoor and outdoor scenes.

Christian Birchler, Nicolas Ganz, Sajad Khatiri, Alessio Gambi, Sebastiano Panichella Zurich University of Applied Sciences, Software Institute - USI, Zurich University of Applied Sciences, University of Passau ACM

Abstract

Simulation platforms facilitate the continuous development of complex systems such as self-driving cars (SDCs). However, previous results on testing SDCs using simulations have shown that most of the automatically generated tests do not strongly contribute to establishing confidence in the quality and reliability of the SDC. Therefore, those tests can be characterized as “uninformative”, and running them generally means wasting precious computational resources. We address this issue with SDC-Scissor, a framework that leverages Machine Learning to identify simulation-based tests that are unlikely to detect faults in the SDC software under test and skip them before their execution. Consequently, by filtering out those tests, SDC-Scissor reduces the number of long-running simulations to execute and drastically increases the cost-effectiveness of simulation-based testing of SDCs software. Our evaluation concerning two large datasets and around 12’000 tests showed that SDC-Scissor achieved a higher classification F1-score (between 47% and 90%) than a randomized baseline in identifying tests that lead to a fault and reduced the time spent running uninformative tests (speedup between 107% and 170%).

Vuong Nguyen, Alessio Gambi, Jasim Ahmed, Gordon Fraser University of Passau ACM
Abstract Cyber-Physical Systems are increasingly deployed to perform safety-critical tasks, such as autonomously driving a vehicle. Therefore, thoroughly testing them is paramount to avoid accidents and fatalities. Driving simulators allow developers to address this challenge by testing autonomous vehicles in many driving scenarios; nevertheless, systematically generating scenarios that effectively stress the software controlling the vehicles remains an open challenge. Recent work has shown that effective test cases can be derived from simulations of critical driving scenarios such as car crashes. Hence, generating those simulations is a stepping stone for thoroughly testing autonomous vehicles. Towards this end, we propose CRISCE (CRItical SketChEs), an approach that leverages image processing (e.g., contour analysis) to automatically generate simulations of critical driving scenarios from accident sketches. Preliminary results show that CRISCE is efficient and can generate accurate simulations; hence, it has the potential to support developers in effectively achieving high-quality autonomous vehicles.
Jarkko Peltomäki, Frankie Spencer and Iván Porres Åbo Akademi University ACM
Abstract WOGAN is an online test generation algorithm based on Wasserstein generative adversarial networks. In this note, we present how WOGAN works and summarize its performance in the SBST 2022 CPS tool competition concerning the AI of a self-driving car.
Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, Paolo Tonella ACM Transactions on Software Engineering and Methodology (TOSEM) ACM

Abstract

Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour. DeepHyperion was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this paper, we propose DeepHyperion-CS, a test generator for DL systems which enhances DeepHyperion by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within DeepHyperion-CS outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of DeepHyperion. DeepHyperion-CS exposed significantly more misbehaviours for 5 out of 6 feature combinations and was up to 65% more efficient than DeepHyperion in finding misbehaviour-inducing inputs and exploring the feature space. DeepHyperion-CS was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.

Alessio Gambi, Vuong Nguyen, Jasim Ahmed, Gordon Fraser IMC University of Applied Science, Krems, University of Passau ACM
Abstract Artificial Intelligence (AI) technologies are increasingly deployed to perform safety-critical tasks in various systems, including driverless vehicles. Therefore, ensuring the high quality of these AI-based systems is paramount to avoiding accidents and fatalities. Software testing has been proven to be a cost-effective quality assurance method for traditional software systems but requires adaptations to address the peculiarities of AI applications. For instance, thoroughly testing the software controlling autonomous vehicles requires the definition of relevant driving scenarios and their implementation in physically accurate driving simulators, which remains an open challenge. Recent work showed that simulations of critical driving scenarios such as car crashes are fundamental to generating effective test cases. However, generating such complex simulations is challenging, and state-of-the-art approaches based on natural language descriptions struggle to complete the task. Therefore, we propose CRISCE, the first approach to create accurate car crash simulations from accident sketches. Our extensive evaluation shows that CRISCE is efficient, effective, and generates accurate simulations, drastically improving state-of-art approaches based on natural language processing.
Christian Birchler, Sajad Khatiri, Bill Bosshard, Alessio Gambi, Sebastiano Panichella Zurich University of Applied Science, Zurich University of Applied Science & Software Institute - USI, Meier Planungsdienste GmbH, IMC University of Applied Science Krems, University of Passau ACM
Abstract Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects’ quality and reliability, and the execution of “safe and uninformative” test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor outperforms the baselines. With the Logistic model, we achieve an accuracy of 70%, a precision of 65%, and a recall of 80% in selecting tests leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as outperformed two baseline strategies. Complementary to existing work, we also integrated SDC-Scissor into the context of an industrial organization in the automotive domain to demonstrate how it can be used in industrial settings.
Alessio Gambi; Gunel Jahangirova; Vincenzo Riccio; Fiorella Zampetti University of Passau, Software Institute - USI, University of Sannio ACM
Abstract We report on the organization, challenges, and results of the tenth edition of the Java Unit Testing Competition as well as the second edition of the Cyber-Physical Systems (CPS) Testing Competition. Java Unit Testing Competition. Seven tools, i.e., BBC, EvoSuite, Kex, Kex-Reflection, Randoop, UTBot, and UTBot-Mocks, were executed on a benchmark with 65 classes sampled from four open-source Java projects, for two time budgets: 30 and 120 seconds. CPS Testing Tool Competition. Six tools, i.e., AdaFrenetic, AmbieGen, FreneticV, GenRL, EvoMBT and WOGAN competed on testing two driving agents by generating simulation-based tests. We considered one configuration for each test subject and evaluated the tools’ effectiveness and efficiency as well as the failure diversity. This paper describes our methodology, the statistical analysis of the results together with the competing tools, and the challenges faced while running the competition experiments.
Christian Birchler, Sajad Khatiri, Pouria Derakhshanfar, Sebastiano Panichella, Annibale Panichella Zurich University of Applied Sciences, Delft University of Technology ACM

Abstract

Simulation platforms facilitate the continuous development of complex systems such as self-driving cars (SDCs). However, previous results on testing SDCs using simulations have Testing with simulation environments helps to identify critical failing scenarios for self-driving cars (SDCs). Simulation-based tests are safer than in-field operational tests and allow detecting software defects before deployment. However, these tests are very expensive and are too many to be run frequently within limited time constraints. In this paper, we investigate test case prioritization techniques to increase the ability to detect SDC regression faults with virtual tests earlier. Our approach, called SDC-Prioritizer, prioritizes virtual tests for SDCs according to static features of the roads we designed to be used within the driving scenarios. These features can be collected without running the tests, which means that they do not require past execution results. We introduce two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on these static features. These two approaches, called SO-SDC-Prioritizer and MO-SDC-Prioritizer, use single-objective and multi-objective genetic algorithms, respectively, to find trade-offs between executing the less expensive tests and the most diverse test cases earlier. Our empirical study conducted in the SDC domain shows that MO-SDC-Prioritizer significantly improves the ability to detect safety-critical failures at the same level of execution time compared to baselines: random and greedy-based test case orderings. Besides, our study indicates that multi-objective meta-heuristics outperform single-objective approaches when prioritizing simulation-based tests for SDCs. MO-SDC-Prioritizer prioritizes test cases with a large improvement in fault detection while its overhead (up to 0.45% of the test execution cost) is negligible.

2021
Thomas Fischer, Pascal Godejohann, Pascale Maul, Marc Mueller, Eva Pigova, Lefteris Stamatogiannakis BeamNG GmbH, HDI Deutschland AG
Abstract After an accident, filing insurance claims is a time consuming and often complicated process, requiring collaborative effort by the insured, the claims appraiser and the insurance company. Multiple approaches to (partially) automate the claims submission process have been explored. While annotated real-world data is available for insurance companies through the documentation of previous claims process, the EU privacy policy prevents companies to use them for the training of neural networks (Kop, 2020). This work explores the use of synthetic images for automated damage evaluation of real-world data. Thus, the authors present their two-fold contribution: a) a data generation framework based on the BeamNG.tech simulator and b) a deep learning computer vision system for damage estimation.
Tahereh Zohdinasab, Alessio Gambi, Paolo Tonella, Vincenzo Riccio University of Passau, Università della Svizzera Italiana ACM

Abstract

Deep Learning (DL) has been successfully applied to a wide range of application domains, including safety-critical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system’s behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving),spread across the cells of a map representing the feature space of the system. We introduce a methodology that guides the users of our approach in the tasks of identifying and quantifying the dimensions of the feature space for a given domain. We developed DeepHyperion, a search-based tool for DL systems that illuminates, i.e.,explores at large, the feature space, by providing developers with an interpretable feature map where automatically generated inputs are placed along with information about the exposed behaviours.

Mahshid Helali Moghadam, Markus Borg, Seyed Jalaleddin Mousavirad Mälardalen University, RISE Research Institutes of Sweden, Hakim Sabzevari University ACM

Abstract

Deeper is a simulation-based test generator that uses an evolutionary process, i.e., an archive-based NSGA-II augmented with a quality population seed, for generating test cases to test a deep neural network-based lane-keeping system. This paper presents Deeper briefly and summarizes the results of Deeper’s participation in the Cyber-physical systems (CPS) testing competition at SBST 2021.

Ezequiel Castellano, Ahmet Cetinkaya, Cédric Ho Thanh, Stefan Klikovits, Xiaoyi Zhang, Paolo Arcaini National Institute of Informatics, Tokyo ACM

Abstract

Frenetic is a genetic approach that leverages a curvature-based road representation. Given an autonomous driving agent, the goal of Frenetic is to generate roads where the agent fails to stay within its lane. In other words, Frenetic tries to minimize the “out of bound distance”, which is the distance between the car and either edge of the lane if the car is within the lane, and proceeds to negative values once the car drives off. This work resembles classic aspects of genetic algorithms such as mutations and crossover, but introduces some nuances aiming at improving diversity of the generated roads.

Florian Klück, Lorenz Klampfl, Franz Wotawa Christian Doppler Laboratory for Quality Assurance Methodologies for Autonomous Cyber-Physical Systems, Institute for Software Technology, Graz University of Technology ACM

Abstract

GABezier is a search-based tool for the automatic generation of challenging road networks for virtual testing of an automated lane keep system (ALKS). This paper provides a brief overview on the tool and summarizes the results of GABezier’s participation at the first edition of the Cyber-Physical Systems Testing Tool Competition. We submitted our tool in two configurations, namely GABExplore and GABExploit. Especially the latter configuration has efficiently generated valid test cases and triggered many faults.

Sebastiano Panichella, Alessio Gambi, Fiorella Zampetti, Vincenzo Riccio University of Passau, University of Sannio, Software Institute - USI ACM

Abstract

We report on the organization, challenges, and results of the ninth edition of the Java Unit Testing Competition as well as the first edition of the Cyber-Physical Systems Testing Tool Competition. Java Unit Testing Competition. This year, five tools, Randoop, UtBot, Kex, Evosuite, and EvosuiteDSE, were executed on a benchmark with (i) new classes under test, selected from three open-source software projects, and (ii) the set of classes from three projects considered in the eighth edition. We relied on an improved Docker infrastructure to execute the tools and the subsequent coverage and mutation analysis. Given the high number of participants, we considered only two time budgets for test case generation: thirty seconds and two minutes. Cyber- Physical Systems Testing Tool Competition. Five tools, Deeper, Frenetic, GABExplore, GAB Exploit, and Swat, competed on testing self-driving car software by generating simulation-based tests using our new testing infrastructure. We considered two experimental settings to study test generators’ transitory and asymptotic behaviors and evaluated the tools’ test generation effectiveness and the exposed failures’ diversity. This paper describes our methodology, the statistical analysis of the results together with the contestant tools, and the challenges faced while running the competition experiments.

Dmytro Humeniuk, Giuliano Antoniol, Foutse Khomh Polytechnique Montréal ACM

Abstract

SWAT is a test case generating tool for testing cyber-physical systems (CPS). In the context of SBST 2021 CPS testing competition, it has been adapted to generating virtual roads to test a lane keeping assist system. It has achieved the best ratio between valid and generated test cases, producing over 95% valid test cases in both testing configurations.

Abstract

Photogrammetry is the technique that allows to obtain information from objects in image captures. Structure from Motion is a computer vision technique by which it is possible to obtain a three-dimensional representation of objects visible in a set of image captures. On the other hand, Traffic Accident Reconstruction refers to the set of techniques used for the analysis of traffic accidents. Within this set of techniques, the energy loss models can be found. These models allow to obtain a measure of the severity of a particular traffic accident. In this work, indirect photogrammetry (namely, Structure from Motion) is applied on a set of crash tests images obtained with BeamNG.drive, in order to estimate the energy equivalent speed based on the McHenry energy loss model. The results of this work show that three-dimensional computer vision techniques, such as Structure from Motion, are capable of providing estimations that are comparable to the results provided in other works related to the analysis traffic accidents.

2020

Abstract

Self-driving cars are an emerging part of automotive industry and a vital aspect of future. When it comes to automation of vehicles, that is transferring the control automobile to software, safety is the biggest concern as it can risk human life. In order to ensure safety in any driving conditions industry has to maintain some safety standards for the certification of self-driving cars. It is important to ensure that the software is intelligent enough to not only handle critical situations but also to predict or address any upcoming harmful event before deployment to prevent future mishaps. Discovery of test cases which can disclose the malfunction of an autonomous car is a thought-provoking task because the possibility of such test cases is infinite. Hence, one technique to analytically examine the autonomous cars safety is the simulations which are executed with no risk,no harmful condition and fast execution. Automatically generating driving simulations from real world driving videos could reduce time consumption of testers by manually creating different driving simulations. The intend to generate simulations from freely available videos will help testers to understand how the self-driving car would perform in similar situations. My thesis addresses this problem and define a new method to automatically generate driving simulations from commonly available geo-tagged videos recorded during driving. The process uses the GPS data to identify the roads in which the recorded driving took place, recreates those roads in the driving simulation, and configures the ego car to drive as the original car was driving in the videos. Next, the videos are analysed using a machine learning classifier to identify leading cars by means of bounding boxes, their relative position and speed w.r.t. the ego car. Finally, a driving simulation reproducing the movement of the ego car and (one) leading car in front of it is generated. Evaluation results obtained by analysing randomly selected driving videos show that the proposed method is efficient and produce reasonably accurate simulations,suggesting that the proposed method is viable and can pave the way for future self-driving cars testing by providing an efficient tool to support testers in developing safe self-driving cars.

Johannes Müller University of Passau

Abstract

Autonomous cars are no vision of the future any more, with car manufacturers planning on releasing them in the near future. With software controlling vehicles in traffic, human live is at risk, when this software malfunctions. There is an infinite amount of traffic scenarios this software needs to be able to handle without failure, hence it is necessary to cover as much different scenarios as possible. Computer simulations provide a fast method to execute test cases, but generating test configuration for virtual environments still remains a costly and time consuming process. In this work I propose a method that tackles this problem by automatically generating test cases based on a received specification. For this task a model for describing test cases in form of finite state machines is introduced, that allows splitting the generation process into small steps. In each step values for properties of the test case model are picked constrained by the received specification and already present values until every property has a value assigned. The test case generated this way is then used as a starting point for local search algorithms to find test cases that reach goals in terms of trajectory, velocity and timing. The focus of the generation process lies on the generation of a test case that’s values are as close as possible to the received specification. For the evaluation of the proposed method for automated test case generation a prototype system was implemented, that can generate three different accident types. It is shown, that the system can generate test cases fast and that the scenarios are conform to the received specification in most cases. This method allows testers to focus on what they want to test by providing a test case specification rather than on how to set up trajectories feasibly and enable the correct timing between vehicles.

Abstract

Simulation based testing is the most common technique for testing autonomous vehicles(AVs). For each test a tester needs to describe a scenario, specify test criteria, setup a simulation, connect the artificial intelligences (AIs) under test to it, execute the test,determine its results and collect all generated data e. g. for further analysis or training AIs. This process is tedious and error prone. There is no well-established procedure how to cope with or solve these problems. I present DriveBuild, a research toolkit for simulation based testing of AVs. DriveBuild comes with an abstract scheme to describe tests and provides a scalable client-server-architecture based on micro services. DriveBuildis able to execute automatically generated tests and to connect AIs under test which control AVs in a simulation. It also offers many metrics to analyze AVs and test generators. This thesis shows that DriveBuild automates the process of setting up simulators, distributing test runs across a cluster, frequently checking test criteria during a simulation, gathering data and analyzing test results. So it reduces the amount of time which a tester needs to invest into preparing, running and evaluating simulation based tests. There are already students,courses as well as research groups that are interested in DriveBuildand use it for their own purpose.

Abstract

Researchers proposed many groundbreaking ideas and techniques to make autonomous vehicles safer and more reliable on the streets. But society still has many concerns about the safety and reliability of autonomous cars. Many contributions focused on the most important property: the safety of passengers. But also other non-functional properties have to be ensured. One of them is the fuel consumption of cars. Nobody wants to drive in a car which consumes fuel and pollutes the environment more than necessary. My thesis aims to expose the problems of autonomous cars concerning their fuel-inefficiency. By defining a set of scoring oracles, the decisions taken by a car can be assessed. These oracles check several sensor values and driving patterns to classify whether the car under test drives fuel-inefficient or not, e.g. the ego-car drives with a high RPM (rounds per minute) with a low gear and hence, this is an infraction against the oracle. Every infraction is logged and then used to calculate a scoring function. By using procedural content generation, a method commonly used in the gaming industry to algorithmically generate various content, I randomly create urban-like scenarios with intersections, traffic, traffic lights and signs as well as parked cars to stress fuel-inefficient driving behaviour of self-driving vehicles. The evaluation results show that my scoring function can expose faults concerning fuel-inefficiency of different driving behaviours. I also proved a strong positive correlation of 0.779 between the score and consumed fuel. My test cases, on the other hand, exposed many faults of a traffic light detection system.

Jubril Gbolahan Adigun Tallin University of Technology

Abstract

Automotive technology is seeing tremendous application in the emerging field of autonomous vehicles (AV). With the prevalence of autonomous vehicles in our world today, driven by advances in computer vision algorithms on large amount of data, it is important to look for more efficient ways for development working models with high level of reliability. For us to achieve better and accurate development of algorithms, we need to get results that reflect ground truth information about pose of objects around the vehicle. Although, low-level, object detection and tracking are crucial for the buildup of many higher level functionalities of AVs. This has proven to be challenging because, it requires laborious accurate sensors’ calibration, annotations of images and semantic segmentation of image pixels.

In this thesis, we adopt an open approach to explore the generation of ground truth data that would be useful for object detection using a driving simulator called beamng.research (beamng). This is important because sensors for generating training data for AVs and other autonomous systems can be expensive and many of the stack in autonomous vehicles are proprietary. We introduce a method of creating the ground truth data with a simulator. The simulator gives the true location of objects in the simulation and sensor data streams. The proposed method, can help us address the problem of annotation of ground truth data that is otherwise, difficult with real life data. We made a prototype that can generate annotated images and point cloud data. The prototype allows generation of training/validation data in an unlimited amount with the existing beamng scenarios. We generated 200 frames of images and 200 sets of point cloud data. We tested the data in practice by using it to validate YOLO v3 object detection running on ROS platform.

Alessio Gambi, Pascale Maul, Marc Mueller, Lefteris Stamatogiannakis, Thomas Fischer, Sebastiano Panichella University of Passau, BeamNG GmbH, Zurich University of Applied Science

Abstract

Industry and research organizations increasingly rely on simulation platforms to facilitate the development and validation of Cyber-physical Systems (CPSs). The main factors for this trend are simulation’s cost-efficiency and the possibility of evaluating the system’s performance early on and through-out the development cycle in a fully controlled environment. However, simulations need to meet stringent functional and non-functional requirements to benefit development and debugging activities. In particular, high simulation accuracy and the ability to systematically generate relevant test scenarios are paramount for effectively assessing CPSs’ behaviors in nominal and critical test scenarios. This paper (i) discusses soft-body simulation and procedural content generation relevance to achieve the systematic generation of physically accurate virtual tests; (ii) and presents BeamNG.tech a novel simulation framework featuring both soft-body simulation and procedural content generation. Hence, we report on the main advantages and research results in testing self-driving car software enabled by BeamNG.tech. Finally, we reflect on the central role of simulation-based continuous integration and testing pipelines to improve current CPSs development practices.

2019
Alessio Gambi, Tri Huynh, Gordon Fraser University of Passau / University of Saarlandes / CISPA ACM

Abstract

Autonomous driving carries the promise to drastically reduce car accidents, but recently reported fatal crashes involving self-driving cars suggest that the self-driving car software should be tested more thoroughly. For addressing this need, we introduce AC3R (Automatic Crash Constructor from Crash Report) which elaborates police reports to automatically recreate car crashes in a simulated environment that can be used for testing self-driving car software in critical situations.AC3R enables developers to quickly generate relevant test cases from the massive historical dataset of recorded car crashes. We demonstrate how AC3R can generate simulations of different car crashes and report the findings of a large user study which concluded that AC3R simulations are accurate.

Alessio Gambi, Marc Müller, Gordon Fraser University of Passau, BeamNG GmbH ACM

Abstract

Ensuring the safety of self-driving cars is important,but neither industry nor authorities have settled on a standard way to test them. Deploying self-driving cars for testing in regular traffic is a common, but costly and risky method, which has already caused fatalities. As a safer alternative, virtual tests, in which self-driving car software is tested in computer simulations,have been proposed. One cannot hope to sufficiently cover the huge number of possible driving situations self-driving cars must be tested for by manually creating such tests. Therefore,we developed ASFAULT, a tool for automatically generating virtual tests for systematically testing self-driving car software. We demonstrate ASFAULT by testing the lane keeping feature of an artificial intelligence-based self-driving car software, for which ASFAULT generates scenarios that cause it to drive off the road.

Alessio Gambi, Tri Huynh, Gordon Fraser University of Passau / University of Saarlandes / CISPA ACM

Abstract

Autonomous driving carries the promise to drastically reduce the number of car accidents; however, recently reported fatal crashes involving self-driving cars show this important goal is not yet achieved, and call for better testing of the software controlling self-driving cars. To better test self-driving car software, we propose to specifically test critical scenarios. Since these are difficult to test in field operation, we create simulations of critical situations. These simulations are automatically derived from natural language police reports of actual car crashes, which are available in historical datasets. Our initial evaluation shows that we can generate accurate simulations in a matter of minutes.

Alessio Gambi, Marc Müller, Gordon Fraser University of Passau, BeamNG GmbH ACM

Abstract

Self-driving cars rely on software which needs to be thoroughly tested. Testing self-driving car software in real traffic is not only expensive but also dangerous, and has already caused fatalities. Virtual tests, in which self-driving car software is tested in computer simulations, offer a more efficient and safer alternative compared to naturalistic field operational tests. However, creating suitable test scenarios is laborious and difficult. In this paper we combine procedural content generation, a technique commonly employed in modern video games, and search-based testing, a testing technique proven to be effective in many domains, in order to automatically create challenging virtual scenarios for testing self-driving car soft-ware. Our AsFault prototype implements this approach to generate virtual roads for testing lane keeping, one of the defining features of autonomous driving. Evaluation on two different self-driving car software systems demonstrates that AsFault can generate effective virtual road networks that succeed in revealing software failures,which manifest as cars departing their lane. Compared to random testing AsFault was not only more efficient, but also caused up to twice as many lane departures.

Abstract

Evaluating the safety of autonomous driving systems is one of the biggest obstacles for the deployment of these systems. Because of the high number of test scenarios, which arouses through the huge variety of possible interactions with the environment, relying only on expensive real-life testing is not practical and therefor, simulation testing is been widely used. As a consequence of the needed time and computation power to generate and run a test case,it is desirable, that few test scenarios cover a wide range of the possible testing parameters. This is why diversity of the test scenarios is an important element to consider to decrease the number of test case executions. I compare in this thesis two approaches which aim to maximize diversity: novelty search and multi-objective search. The result of this thesis shows that multi-objective search generates more effective test cases, despite both have a similar distribution regarding where the test subject fails.

Abstract

The objective of this bachelor thesis is the adaptation of a low-budget driving simulator, to a recently developed driving simulator software BeamNG.research. This simulator is made as a close resemblance to the battery electric vehicle Renault ZOE. The thesis primarily focuses on all hardware aspects of the simulator. A market analysis compares already used and common solutions and hardware as base for the development of a concept. Based on a requirement and feature analysis, a simulator with primarily computer gaming hardware is build and reconfigured from typical race car ergonomics to a more fitting compact car appeal. This requires the construction of several steel adaptor frames during the realization. A four-display setup is also realized, with three 55” UHD TVs (180degree FOV) and a fourth 10” HD display as designated driver display. In parallel,a data interface for taping into the ZOEs CAN-Busses is selected. And a new concept, the CanSee of the CanZE community is recreated and adapted. The first iteration of the simulator setup as well as a prove of concept for the CanSee is evaluated and tested. Several improvements follow the evaluation concluding the realization. Both, the simulator setup and the data interface, showed promising results. The simulator could be used effectively during more than 60 hours of use in survey and provided all desired data. The CanSee data interface proved to more than a tool for validifying the simulator, but also as base for future development of energy interfaces.

Christian Zellier, Jakob Claußen, Alexander Danetzky, Maximilian Kayser, Eric Foerster University of Lübeck

Abstract

This project deals with the creation of displays for energy-efficient driving with electric cars. Energy-efficient driving is particularly important in relation to electric cars, as it has a very strong impact on the range of electric cars. The aim of the displays is to show drivers an energy-efficient driving style. These displays were developed as UI mods and then integrated into the BeamNG.re-search software. The Institute for Multimedia and Interactive Systems (IMIS) has developed a driving simulator to test the effectiveness of displays faster. The developed displays were then integrated into the driving simulator. In the beginning in order to develop the displays, mockups were created based on research. The best mockups were selected and prototyped with the help of the software SimHub. These prototypes were evaluated in a mid-term evaluation and then the two with the greatest potential were further developed. Based on the feedback, adjustments were made and then implemented as web-based UI mods. Finally, the implemented displays were presented and evaluated at the EMI-Award. In the following, the procedure mentioned is explained in more detail in the individual chapters

Thomas Franke, Daniel Görges, Matthias G. Arend University of Lübeck, TU Kaiserslautern, RWTH Aachen University ACM

Abstract

The design of effective energy interfaces for electric vehicles needs an integrated perspective on the technical and psychological factors that together establish real-world vehicle energy efficiency. The objective of the present research was to provide a transdisciplinary synthesis of key factors for the design of energy interfaces for battery electric vehicles (BEVs) that effectively support drivers in their eco-driving efforts. While previous research tends to concentrate on the (visual) representation of common energy efficiency measures, we focus on the design of action-integrated metrics and indicators for vehicle energy efficiency that account for the perceptual capacities and bounded rationality of drivers. Based on this rationale,we propose energy interface examples for the most basic driving maneuvers (acceleration, constant driving, deceleration) and discuss challenges and opportunities of these design solutions.

Alessio Gambi, Tri Huynh, Gordon Fraser University of Passau / University of Saarlandes / CISPA ACM

Abstract

Autonomous driving carries the promise to drastically reduce the number of car accidents; however, recently reported fatal crashes involving self-driving cars show that such an important goal is not yet achieved. This calls for better testing of the software controlling self-driving cars, which is difficult because it requires producing challenging driving scenarios. To better test self-driving car soft-ware, we propose to specifically test car crash scenarios, which are critical par excellence. Since real car crashes are difficult to test in field operation, we recreate them as physically accurate simulations in an environment that can be used for testing self-driving car software. To cope with the scarcity of sensory data collected during real car crashes which does not enable a full reproduction,we extract the information to recreate real car crashes from the police reports which document them. Our extensive evaluation, consisting of a user study involving 34 participants and a quantitative analysis of the quality of the generated tests, shows that we can generate accurate simulations of car crashes in a matter of minutes. Compared to tests which implement non critical driving scenarios,our tests effectively stressed the test subject in different ways and exposed several shortcomings in its implementation.

2018

Abstract

This thesis describes the development of the software BeamNG.research to a driving-simulation environment for researches on the field of the user-energy-interaction. Initially, it was identified that there were requirements that potentially were not fulfilled by default. These requirements were found in the areas of exporting data, energy simulation, vehicle, experimental control, UI-mods and tracks. All identified requirements were collected in a requirements document which was further extended in the course of the work. A manual was created to facilitate the use of BeamNG.research across all the areas identified. Regarding the data ex-port, instructions have been added to the manual for using a mod provided for this purpose. Therefore it was necessary to identify the corresponding names of the parameters. Further-more the mod was simplified regarding the configuration of the mod using comments. In the scope of energy simulation a live data exchange between BeamNG.research and Matlab was realised via File-based Interprocess Communication in order to externalise the simulation for the purpose of a high precision. In addition, a way was to be found to actually use the data from Matlab in BeamNG.research. Furthermore an electrical vehicle was integrated in BeamNG.research and the camera was optimized for a realistic driving experience. On the basis of a predetermined sequence of a potential first experiment, it was investigated how various requirements of experimental control could be solved. For this purpose, several co-ordinated Mods were written and even some partially existing original files were modified. Furthermore, the manual on creating scenarios was extended by helpful hints. In the areas of UI-mods and driving route, only a few notes on UI-mods were mentioned in the manual, after it was determined that UI-mods already fulfilled nearly all requirements. Finally, a real experiment was realized using the results of this thesis to validate that all of the most important requirements were met. During this thesis BeamNG.drive was used but all results are transferable to BeamNG.research.

While BeamNG.drive is a simulation game, BeamNG.research represents the research variant. Therefore this document always refers to BeamNG.research.

Marc Müller University of Saarland

Abstract

Autonomous vehicles are becoming an increasingly relevant part of the automotive industry and will only become more important in the near future. Ensuring safety is naturally important when handing over full control of a vehicle to software, but neither industry nor authorities have settled on a standard way to certify autonomous vehicles. Deploying autonomous vehicles for testing in regular urban traffic is a common but costly and risky method. Simulated, or virtual, tests have been introduced as a way to expose problems before deployment, but traditional software testing techniques cannot cope with the massive amount of situations an autonomous vehicle faces. For limited cases, more advanced techniques like search-based testing show more promising results. This work will continue in that direction, utilizing procedural content generation and genetic algorithms to evolve driving tasks meant to test the lane keeping capability of autonomous vehicles. Meaningful metrics to characterise input scenarios, output behaviour of the car, and how well tests cover the input space will be introduced to guide test suite evolution towards stressing the vehicle’s lane keeping behaviour effectively

Nourelhoda S. K. Mohamed University of Bremen

Abstract

Driving a vehicle requires practices and exercises, particularly for hazardous situations. In general, driving is an activity that requires the humans mental and physical abilities to achieve safe driving. In hazard situations, drivers must have the cognitive abilities to detect and anticipate hazards. In additions, they must have knowledge that empowers them to react in a proper way. In such situations,a wrong action may lead to significant damages and dramatic consequences. At the same time, physical real training of driving hazard situations is limited, due to crash consequences. In this thesis, we argue using the crash experience to enhance drivers’ hazard perception. From a cognitive perspective, raising drivers’ awareness of the crash and its physical damage consequences would influence their driving behaviours. We utilized BeamNG.drive that provides a dynamic soft-body physics vehicle simulation. We developed a practical study, when participants are required to drive certain scenarios - typically to reality - to learn a specific traffic situation (e.g. yield to priority road). We implemented various learning scenarios for hazard situations. In this study, two learning modules are proposed: instructional video experience and dynamic physical crash experience. After learning, participants drive an evaluation scenario, where their driving performance is assessed by quantitative and qualitative measures. The study usability and usefulness, as well as, participants’ enjoyment and tensions are evaluated by qualitative questionnaires. Statistical analysis shows significant influences of crash experience in raising participant’s awareness of crash regardless of their age or their previous driving experience. The findings illustrate the feasibility of the developed study and consequently proofs the proposed hypotheses.

---