Understanding the Shift in AI Evaluation
Hanna Wallach’s research at Microsoft highlights a significant transformation in how AI models are assessed. Initially, the focus was on straightforward tasks like image recognition or speech transcription. However, with the rise of generative AI, the evaluation has become more intricate. Wallach’s work now centers on understanding risks related to social concepts like fairness and psychological safety, which are not easily quantifiable.
Key Insights on AI Risk Measurement
- Wallach’s team merges social science insights with technical AI understanding.
- They analyze risks identified through customer feedback and internal testing teams.
- The team addresses issues like unfair stereotypes in AI outputs, ensuring a comprehensive assessment.
- They employ a method called “systematization” to define and measure risks, using annotation techniques for evaluation.
The Importance of Responsible AI
This approach to AI risk measurement is crucial for creating safer technology. By addressing social implications, the team helps ensure that AI systems do not perpetuate harmful biases. Their work not only informs engineering decisions but also guides policy-making within Microsoft. This holistic view is essential for the responsible deployment of AI, ultimately fostering trust and safety for users.











