Exploiting Class Probabilities for Black-Box Sentence-Level Attacks

Background

Text classification models have become increasingly prevalent in cybersecurity applications, but remain susceptible to adversarial examples (e.g., carefully crafted sentences with human-unrecognizable changes to the inputs, that are misclassified). Adversarial attacks provide profound insights into the classifiers’ vulnerabilities, and are key to reinforcing their robustness and reliability.

Depending on the information available to the adversary, attacks can be conducted under black-box settings, which can only access the classifier feedback to queries. This setting is more feasible for real-world applications, as no prior knowledge of the classifier is given. There is a growing need to develop a score-based black-box sentence-level attack in order to identify the extent of the threat to text classification models, and better immunize them to attacks in all black-box settings.

Invention Description

Researchers at Arizona State University have developed a novel black-box, sentence-level attack leveraging classifier class probabilities to craft stronger adversarial text examples. This technology models adversarial sentence candidates as continuous distributions, enabling efficient search guided by rich class probability information. Extensive evaluations demonstrate superior attack success across multiple classifiers and benchmark datasets, highlighting the practical importance of utilizing class probabilities for robust adversarial attack generation in real-world text classification systems.

Potential Applications:

Security testing & robustness evaluation for online text classification services
Development of more resilient natural language process models against adversarial attacks
Enhancement of AI safety & trustworthiness in text-based AI applications

Benefits and Advantages:

Effective – uses class probabilities for black-box sentence-level attacks
Widely applicable – can be used with a variety of classifiers & benchmark datasets
Improves robustness & reliability – provides specific insights to enhance text classifiers
Delivers stronger, more successful attacks – fully exploits classifier feedback
Improved search parameters – transforms discrete adversarial candidate search into continuous parameter optimization

Inventor(s)

Technology categories

Technology keywords

Licensing Contacts