AequeVox 21
34. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions.
In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan,
S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30,
pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/
7062-a-unified-approach-to-interpreting-model-predictions.pdf
35. Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L.,
Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deep
learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference
on Automated Software Engineering, ASE 2018, Montpellier, France, September
3-7, 2018. pp. 120–131 (2018)
36. Ma, P., Wang, S., Liu, J.: Metamorphic testing and certified mitigation of fairness
violations in NLP models. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth
International Joint Conference on Artificial Intelligence, IJCAI 2020. pp. 458–465
37. Odena, A., Olsson, C., Andersen, D., Goodfellow, I.: Tensorfuzz: Debugging neural
networks with coverage-guided fuzzing. In: International Conference on Machine
Learning. pp. 4901–4911. PMLR (2019)
38. Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: Automated whitebox testing
of deep learning systems. In: Proceedings of the 26th Symposium on Operating
Systems Principles, Shanghai, China, October 28-31, 2017. pp. 1–18 (2017)
39. Phillips, A.: Defending equality of outcome. Journal of political philosophy 12(1),
1–19 (2004)
40. Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust,
and targeted adversarial examples for automatic speech recognition. In: Interna-
tional conference on machine learning. pp. 5231–5240. PMLR (2019)
41. Ribeiro, M.T., Singh, S., Guestrin, C.: "why should I trust you?": Explaining the
predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, San Francisco, CA,
USA, August 13-17, 2016. pp. 1135–1144 (2016)
42. Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: High-precision model-agnostic
explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence.
vol. 32 (2018)
43. Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond accuracy: Behavioral test-
ing of NLP models with checklist. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault,
J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Compu-
tational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 4902–4912. Association
for Computational Linguistics (2020)
44. Sharma, A., Demir, C., Ngomo, A.C.N., Wehrheim, H.: Mlcheck-property-driven
testing of machine learning models. arXiv preprint arXiv:2105.00741 (2021)
45. Sharma, A., Wehrheim, H.: Testing machine learning algorithms for balanced data
usage. In: 2019 12th IEEE Conference on Software Testing, Validation and Verifi-
cation (ICST). pp. 125–135. IEEE (2019)
46. Sharma, A., Wehrheim, H.: Automatic fairness testing of machine learning models.
In: IFIP International Conference on Testing Software and Systems. pp. 255–271.
Springer (2020)
47. Sharma, A., Wehrheim, H.: Higher income, larger loan? monotonicity testing of
machine learning models. In: Proceedings of the 29th ACM SIGSOFT International
Symposium on Software Testing and Analysis. pp. 200–210 (2020)
48. Soremekun, E., Udeshi, S., Chattopadhyay, S.: Astraea: Grammar-based fairness
testing. arXiv preprint arXiv:2010.02542 (2020)