Google Releases LLM-Evalkit for Structured Prompt Engineering on Vertex AI

October 21, 2025
Google has introduced LLM-Evalkit, an open-source framework built on Vertex AI SDKs that centralizes prompt engineering workflows. The tool enables teams to measure performance systematically using objective metrics and a no-code interface.

Google has introduced LLM-Evalkit, an open-source framework designed to organize and measure prompt engineering for large language models, according to a Google Cloud blog post. Built on Vertex AI SDKs, the tool provides a unified environment where teams can create, test, version, and benchmark prompts using consistent evaluation metrics.

LLM-Evalkit consolidates previously scattered workflows by combining prompt creation, testing, and comparison into a single interface. It allows teams to define specific tasks, assemble representative datasets, and evaluate outputs against objective benchmarks. This standardized approach replaces guesswork with measurable performance data.

The framework also includes a no-code interface, making it accessible to non-technical users such as product managers and UX writers. By enabling collaboration across disciplines, it helps teams iterate on prompt design more efficiently.

LLM-Evalkit is available as an open-source project on GitHub and integrates directly with Google Cloud tools. New users can explore it using the $300 trial credit offered through Google Cloud.

Alphabet’s new framework aims to streamline prompt engineering by providing a structured, data-driven workflow within the Vertex AI ecosystem.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Daily AI Brief.

Also, consider following us on social media:

Subscribe to Daily AI Brief

Daily report covering major AI developments and industry news, with both top stories and complete market updates

Market report

2025 Generative AI in Professional Services Report

Thomson Reuters

This report by Thomson Reuters explores the integration and impact of generative AI technologies, such as ChatGPT and Microsoft Copilot, within the professional services sector. It highlights the growing adoption of GenAI tools across industries like legal, tax, accounting, and government, and discusses the challenges and opportunities these technologies present. The report also examines professionals' perceptions of GenAI and the need for strategic integration to maximize its value.

Read more