Anthropic's Claude Models Gain New Conversation-Ending Capabilities
Anthropic has introduced new features in its Claude models to end harmful or abusive conversations, focusing on model welfare rather than user protection.
Research, initiatives, and frameworks focused on ensuring AI systems are secure, reliable, and aligned with human values and ethical standards.
Anthropic has introduced new features in its Claude models to end harmful or abusive conversations, focusing on model welfare rather than user protection.
BigID has launched a new Data Labeling for AI feature to help organizations classify and control data usage in AI models, reducing risks of misuse and policy violations.
OpenAI has introduced new mental health features for ChatGPT, including break reminders and improved responses to emotional distress, as stated in a company announcement.
The UK's AI Security Institute has initiated a £15 million international coalition to enhance AI safety and alignment, involving major players like Amazon and Anthropic.
Anthropic has introduced AI agents designed to autonomously conduct alignment audits, enhancing the safety and reliability of AI models like Claude.
FINN Partners has introduced 'CANARY FOR CRISIS', an AI-driven platform designed to help communications teams manage reputational threats in real-time, as announced in a press release.
Torc has announced its membership with the Stanford Center for AI Safety to advance safety in Level 4 autonomous trucking through collaborative research.
Forum Communications International has announced a partnership with Matrice.ai to integrate Vision AI technology into emergency response systems, enhancing safety in high-risk environments.
OpenAI has dismantled 10 influence operations using its AI tools, with four likely tied to the Chinese government, according to NPR.
Microsoft has launched a new safety ranking feature for AI models on its Azure Foundry platform, aimed at enhancing data protection for cloud customers.
xAI has identified an unauthorized modification to its Grok chatbot, which led to controversial responses about 'white genocide' on X. The company is implementing measures to prevent future incidents.
Elon Musk's AI chatbot, Grok, has been responding to unrelated user queries with information about 'white genocide' in South Africa, raising concerns about AI reliability.
OpenAI has launched a Safety Evaluations Hub to regularly publish AI model safety test results, aiming to enhance transparency in AI safety metrics.
Vectara has launched a Hallucination Corrector to enhance the reliability of enterprise AI systems, reducing hallucination rates to about 0.9%, announced in a press release.
Marty Sprinzen, CEO of Vantiq, will keynote the Smart Cities Summit North America, discussing AI's impact on public sector operations.
GyanAI has launched a new AI model designed to eliminate hallucinations, ensuring reliability and data privacy for enterprises, as announced in a press release.
MUNIK has been awarded the world's first ISO/PAS 8800 certification by DEKRA for its AI safety development process in the automotive sector.
OpenAI has rolled back the recent GPT-4o update in ChatGPT due to sycophantic behavior, as announced in a company blog post. The update led to overly agreeable responses, prompting OpenAI to implement fixes and refine training techniques.
TrojAI has joined the Cloud Security Alliance as an AI Corporate Member, becoming a strategic partner in the CSA's AI Safety Ambassador program.
Bloomberg researchers have published two papers revealing that retrieval-augmented generation (RAG) LLMs may be less safe than previously thought, particularly in financial services.