Enhancing autonomous driving simulations: A hybrid metamorphic testing framework with metamorphic relations generated by GPT

Yifan Zhang, Tsong Yueh Chen, Matthew Pike, Dave Towey, Zhihao Ying, Zhi Quan Zhou

Information and Software Technology, July 2025

autonomous driving systems, metamorphic testing, metamorphic relations, CARLA simulator, oracle problem, large language model, software testing

Abstract

Autonomous Driving Systems (ADSs) have rapidly developed over the past decade. Given the complexity and cost of testing ADSs, advanced simulation tools like the CARLA simulator are essential for efficient algorithm development and validation. However, the intricacies of autonomous driving (AD) simulations pose challenges for software testing, particularly the oracle problem, which relates to the difficulty in determining the correctness of outputs within reasonable timeframes. While many studies validate ADS algorithms using simulations, few address the validity of the simulated data, a fundamental premise for ADS testing. This study addresses the oracle problem in AD simulations by employing Metamorphic Testing (MT) and Metamorphic Relations (MRs) to detect software defects in the CARLA simulator. Additionally, we explore AI-driven approaches, specifically integrating ChatGPT's customizable features to enhance MR generation and refinement. We propose a human-AI hybrid MT framework that combines human inputs with AI-driven automation to generate and refine MRs. The framework uses the GPT-MR generator, a customized large language model (LLM) based on Metamorphic Relation Patterns (MRPs) and ChatGPT, to produce MRs according to user specifications. These MRs are then refined by MT experts and fed into a test harness, automating test-case creation and execution while supporting diverse parameter inputs. The GPT-MR generator produced effective MRs, leading to the discovery of four significant defects in the CARLA simulator, demonstrating their effectiveness in identifying software flaws. The test harness enabled efficient, automated testing across multiple modules and vehicle-control approaches, which enhanced the robustness and efficiency of our methods. Our study highlights the effectiveness of MT and MRPs in addressing the oracle problem for AD simulations, enhancing software reliability, and ensuring robust validation processes. The combination of AI-driven tools and human knowledge offers a structured methodology for validating simulated data and ADS performance, contributing to more reliable ADS development and testing.