Google's New Method Reduces LLM Training Data by 10,000x
Google has introduced a new active learning method that drastically reduces the training data required for fine-tuning large language models (LLMs) by up to 10,000 times, according to a recent blog post. This method is particularly effective in classifying unsafe ad content, a task that demands deep contextual and cultural understanding.
The new scalable curation process identifies the most valuable examples for annotation, significantly improving model alignment with human experts. In experiments, Google reduced the training data from 100,000 to under 500 examples, achieving up to 65% better alignment with human experts. Larger models in production have seen even greater reductions, using up to four orders of magnitude less data while maintaining or improving quality.
The process begins with a zero- or few-shot initial model that labels ads as clickbait or benign. These labels are then clustered to identify overlapping clusters, which are sent to human experts for review. This curated set is used for model evaluation and fine-tuning, iterating until the model aligns with human experts or reaches a plateau.
Google's method leverages the strengths of LLMs and domain experts, allowing for efficient retraining with minimal data. This approach is particularly valuable in rapidly changing domains like ad safety, where high-fidelity labels can help overcome data bottlenecks.
We hope you enjoyed this article.
Consider subscribing to one of our newsletters like Daily AI Brief.
Also, consider following us on social media:
Subscribe to Daily AI Brief
Daily report covering major AI developments and industry news, with both top stories and complete market updates
Market report
AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation
The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.
Read more