[5] Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. Automated unit test generation for python. In Aldeida
Aleti and Annibale Panichella, editors, Search-Based Software Engineering, pages 9–24, Cham, 2020. Springer
International Publishing.
[6] Andrea Arcuri. It really does matter how you normalize the branch distance in search-based software testing.
Software Testing, Verification and Reliability, 23(2):119–147, 2013.
[7] Ke Mao, Mark Harman, and Yue Jia. Sapienz: Multi-objective automated testing for android applications. In
Proceedings of the 25th International Symposium on Software Testing and Analysis, pages 94–105, 2016.
[8] R.M. Hierons. Comparing test sets and criteria in the presence of test hypotheses and fault domains. ACM
Transactions on Software Engineering and Methodology (TOSEM), 11(4):448, 2002.
[9] Simon Poulding and Robert Feldt. The automated generation of humancomprehensible xml test sets. In Proc.
1st North American Search Based Software Engineering Symposium (NasBASE), 2015.
[10] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian
Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with
deep neural networks and tree search. nature, 529(7587):484–489, 2016.
[11] Phil McMinn, Mark Stevenson, and Mark Harman. Reducing qualitative human oracle costs associated with
automatically generated test data. In Proceedings of the First International Workshop on Software Test Output
Validation, STOV ’10, pages 1–4, New York, NY, USA, 2010. ACM.
[12] Abdullah Alsharif, Gregory M. Kapfhammer, and Phil McMinn. What factors make sql test cases understandable
for testers? a human study of automatic test data generation techniques. In International Conference on Software
Maintenance and Evolution (ICSME 2019), pages 437–448, 2019.
[13] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harri Edwards, Yura
Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint
arXiv:2107.03374, 2021.
[14] Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. An empirical
cybersecurity evaluation of github copilot’s code contributions. arXiv preprint arXiv:2108.09293, 2021.
[15] Robert Feldt and Felix Dobslaw. Towards automated boundary value testing with program derivatives and search.
In Shiva Nejati and Gregory Gay, editors, Search-Based Software Engineering, pages 155–163, Cham, 2019.
Springer International Publishing.
[16] Felix Dobslaw, Francisco Gomes de Oliveira Neto, and Robert Feldt. Boundary value exploration for software
analysis. In 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops
(ICSTW), pages 346–353, 2020.
[17] Hussein Almulla and Gregory Gay. Learning how to search: Generating effective test cases through adaptive
fitness function selection. CoRR, abs/2102.04822, 2021.
[18] Christopher Henard, Mike Papadakis, Mark Harman, Yue Jia, and Yves Le Traon. Comparing white-box and
black-box test prioritization. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE),
pages 523–534. IEEE, 2016.
[19] Robert Feldt, Simon Poulding, David Clark, and Shin Yoo. Test set diameter: Quantifying the diversity of sets
of test cases. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST),
pages 223–233. IEEE, 2016.
[20] Breno Miranda, Emilio Cruciani, Roberto Verdecchia, and Antonia Bertolino. Fast approaches to scalable
similarity-based test case prioritization. In 2018 IEEE/ACM 40th International Conference on Software En-
gineering (ICSE), pages 222–232. IEEE, 2018.
[21] Francisco Gomes De Oliveira Neto, Robert Feldt, Linda Erlenhov, and Jos
´
e Benardi De Souza Nunes. Visualizing
test diversity to support test optimisation. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC),
pages 149–158. IEEE, 2018.
28