RealPerformance: A Dataset of Language Model Business Compliance Issues

David Berenstein

Jean-Marie John-Mathews

RealPerformance is a dataset of functional issues of language models, that mirrors failure patterns identified through rigorous testing in real LLM agents