Anthropic has introduced a program aimed at funding the creation of new benchmarks to evaluate AI models, including generative models like Claude. This initiative will allocate resources to third-party organizations capable of developing effective measures for advanced AI capabilities. Anthropic emphasizes the growing need for high-quality, safety-focused evaluations that can keep pace with the rapid advancements in AI. The company is particularly interested in benchmarks that assess AI’s ability to execute tasks with significant societal and security implications, such as cyberattacks, weapons enhancement, and misinformation spread. Additionally, Anthropic aims to support research into benchmarks that examine AI’s potential in scientific research, multilingual communication, and bias mitigation, among other areas. The program will feature a range of funding options and involve collaboration with Anthropic’s domain experts. While the initiative has noble goals, its success may depend on the level of funding and manpower committed. Critics, however, may question Anthropic’s definitions of “safe” and “risky” AI and the company’s commercial motives. Despite these concerns, Anthropic aspires for its program to set a new industry standard for comprehensive AI evaluation.

Anthropic Launches Funding Program for New AI Benchmark Development
Anthropic introduces a funding program to develop new AI benchmarks focusing on safety and societal impact.
1–2 minutes










