We introduce a novel dataset for adversarial rank attacks against neural rankers, enabling systematic research on robustness. Unlike prior unsupervised or surrogate-based methods, our approach uses Retrieval-Augmented Generation (RAG) with a Large Language Model (LLM) to create high-quality adversarial examples that subtly alter rankings while maintaining coherence and relevance. Built via a self-refining LLM-Ranker feedback loop, the dataset includes two tiers: Gold and Diamond, based on attack strength, along with rich metadata, ranking labels, and quality metrics. Released with code and prompts, it supports training, evaluation, and benchmarking of robust ranking systems.
This work provides datasets and frameworks for studying adversarial attacks on neural ranking models.