Exploiting Vulnerabilities and Security Threats in Retrieval-Augmented Generative Models: The LIAR Attack Framework

Invention Description

Retrieval-Augmented Generative (RAG) models boost generative AI’s accuracy by connecting large language models (LLMs) with the most current, external, knowledge sources. RAG models are widely used in fact-checking, information retrieval, and AI-driven search engines. Despite their utility, adversarial threats can exploit the openness of these knowledge sources by injecting deceptive content and changing the model’s behavior. Current research on adversarial threats primarily focuses on either retrieval or generative components, with limited exploration of dual-objective attacks.

Researchers at Arizona State University have developed a new training framework, expLoitative bI-level rAg tRaining (LIAR), which generates adversarial contents in order to influence RAG systems and generate misleading responses. This framework helps identify vulnerabilities in Retrieval-Augmented Generative (RAG) models. By targeting both retrieval and generative components, the framework highlights critical security risks in AI-driven applications. Operating under a realistic gray-box setting, the LAIR provides a novel approach to generating adversarial content that exposes weaknesses in RAG systems, emphasizing the need for robust defenses in real-world deployments.

This framework reveals critical security vulnerabilities in RAG models and may help prevent manipulation and ensure the integrity of machine-generated content.

Potential Applications

Security auditing tools for AI-driven search and information retrieval systems
Robustness benchmarking for language models and retrieval frameworks
Development of advanced defenses against adversarial attacks in AI products
Enhancement of AI safety in enterprise and consumer-facing applications
Frameworks for research in adversarial AI and model robustness

Benefits and Advantages

Simultaneously attacks retrieval and generation stages for comprehensive vulnerability detection
Operates under realistic gray-box scenarios reflecting limited attacker access
Employs bi-level optimization for precise adversarial content generation
Validated on multiple datasets and state-of-the-art language models ensuring broad applicability
Facilitates the development of stronger security audits and defense strategies in AI systems
Identification and demonstration of a novel attack vector on RAG systems
Comprehensive experimental validation with ablation and sensitivity analyses
Insight into adversarial goals and attack impacts on RAG model behavior

For more information about this opportunity, please see

Tan et al – EMNLP – 2024

Inventor(s)

Technology categories

Technology keywords

Licensing Contacts