Evaluating LLM Performance with AWS Bedrock

Key Takeaways:

Tuesday, December 10, 11 AM ET

Description

As LLMs become more prevalent, there is an increasing need to systematically evaluate their responses to ensure they generate reliable, accurate, and ethical content. LLM-as-a-Judge is a framework that leverages other configured LLMs as evaluators, assessing responses based on specific metrics to ensure quality and relevance. This approach is gaining momentum because human evaluation is often slow and expensive.

In this workshop, Aishwarya Reganti, an Applied Science Tech Lead for Generative AI at AWS, teaches you how to configure LLMs as evaluation judges, providing hands-on experience with commonly used metrics through the AWS Bedrock platform, the fm-eval library, and the Ragas library. We will also explore designs that can make the evaluation process more robust and effective for various use cases.

Presenter Bio

Aishwarya Naresh RegantiApplied Science Tech Lead for Generative AI at AWS

View More Webinars