HomeUpcoming webinars

Evaluating LLM Performance with AWS Bedrock

Key Takeaways:
  • Learn how to evaluate the performance of LLMs.
  • Understand the LLM-as-a-Judge framework for using LLMs to evaluate other LLMs.
  • Learn how to use AWS Bedrock for working with LLMs.
Tuesday, December 10, 11 AM ET
View More Webinars

Register for the webinar

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Description

As LLMs become more prevalent, there is an increasing need to systematically evaluate their responses to ensure they generate reliable, accurate, and ethical content. LLM-as-a-Judge is a framework that leverages other configured LLMs as evaluators, assessing responses based on specific metrics to ensure quality and relevance. This approach is gaining momentum because human evaluation is often slow and expensive. 

In this workshop, Aishwarya Reganti, an Applied Science Tech Lead for Generative AI at AWS, teaches you how to configure LLMs as evaluation judges, providing hands-on experience with commonly used metrics through the AWS Bedrock platform, the fm-eval library, and the Ragas library. We will also explore designs that can make the evaluation process more robust and effective for various use cases.

Presenter Bio

Aishwarya Naresh Reganti Headshot
Aishwarya Naresh RegantiApplied Science Tech Lead for Generative AI at AWS

Aishwarya works in the AWS Generative AI Innovation Center helping customers pinpoint and evaluate valuable uses for generative AI. She has 8 years of machine learning experience and over 30 publications on AI. Aishwarya also works as a guest lecturer for MIT and the University of Oxford. She is a cofounder of the LevelUp Org tech community. Previously, Aishwarya was an Applied Scientist in Amazon's Aearch team and Alexa Teachable AI team.

View More Webinars