Arena is a community-powered platform for evaluating AI model performance based on real-world usage. Tens of millions of builders, researchers, and creative professionals use the platform to test frontier models and provide feedback on outputs, with millions of monthly active users. This human-in-the-loop approach informs Arena's public leaderboards, which rank models on performance and reliability grounded in practical use rather than controlled benchmarks alone.
The platform addresses a core challenge in AI development: understanding how state-of-the-art models perform outside laboratory conditions. Leading enterprises and AI research labs rely on Arena's evaluations to assess model reliability and alignment. The leaderboards have become influential in discussions about AI progress across the industry, shaped by feedback from a global user base spanning builders, researchers, and practitioners across creative and technical domains.
Arena was created by researchers from UC Berkeley. The company's focus centres on transparency and rigor in model evaluation, positioning human feedback and real-world usage as essential to measuring AI advancement. The platform serves as both a testing ground for frontier models and a data source for understanding their practical capabilities and limitations.