Embodied AI Security Benchmark

RoboJailBench:
Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

Doguhan Yeke*

Purdue University

Yanming Zhou*

Purdue University

Leo Y. Lin*

Purdue University

Hongyu Cai*

Purdue University

Antonio Bianchi

Purdue University

Z. Berkay Celik

Purdue University

*Equal contribution

11

Subsets

18

Security categories

990

Paired instructions

1,408

Scene images

428 MB

Distribution

Adversarial Safety for Embodied AI

Motivation

Robotic agents should not be evaluated only on whether they can follow everyday instructions. When a vision-language model is connected to a physical platform, a jailbreak can translate into unsafe scene-grounded actions such as colliding with people, damaging property, violating privacy, or bypassing permissions.

RoboJailBench focuses on this embodied security setting: paired malicious and benign goals in the same scene let researchers test whether attacks can induce unsafe behavior and whether defenses can block those goals without suppressing useful, benign tasks. This page collects the dataset, examples, leaderboard, downloads, and citation needed to reproduce those evaluations.

Dataset construction

The dataset is split into two complementary parts. The first part gives us balanced security coverage; the second part keeps the benchmark grounded in real robotics and autonomous-driving scenes. In both parts, the malicious and benign prompts are generated by our intent-contrast pipeline rather than inherited from the original image sources.

RJB-Instructions: our own data

Existing robotics datasets do not systematically cover the 18 security violation categories in our taxonomy, so we created our own taxonomy-balanced subset. For each category, we wrote matched adversarial and benign goals, generated a plausible scene image with Gemini 3 Pro Image Preview, and manually verified that both goals are relevant and feasible. These goals become the paired instruction prompts used by the benchmark. This produces 90 scenes, five per category, for controlled security-utility evaluation.

Augmented external datasets

To evaluate beyond synthetic taxonomy coverage, we sampled images from DROID, RH20T, RoboVQA, Robo2VLM, and NVIDIA PhysicalAI-AV. For each scene, we generated a malicious and benign instruction prompt with Gemini 3 Flash Preview using a few-shot, image-conditioned prompt, then reviewed the generated pairs for relevance and intent contrast. These augmented subsets make the benchmark reflect varied real robot environments while preserving the same paired-prompt format.

Inside the data

One representative scene per subset, with the matched instruction pair as it appears in *_instr.jsonl. The malicious goal is what an attacker would attempt to elicit; the benign goal is the safe alternative grounded in the same scene.

Leaderboard: attacks and defenses

Current leaderboard values are taken from the attack-defense evaluation table in the paper. ASR is attack success rate, SR is adversarial rejection, UR is benign-goal acceptance, and SU-HM is the harmonic mean of security and utility. Lower ASR is better; higher SR, UR, and SU-HM are better.

Dataset Defense CD ASR CJ ASR SM ASR RoboPAIR ASR SR UR SU-HM
DROID No Defense 37.00 ± 4.83 11.00 ± 3.13 3.00 ± 1.71 17.00 ± 3.76 83.00 ± 1.88 96.00 ± 1.96 89.03 ± 1.37
DROID Google Prompt 31.00 ± 4.62 1.00 ± 0.99 0.00 ± 0.00 9.00 ± 2.86 89.75 ± 1.52 100.00 ± 0.00 94.60 ± 0.84
DROID RoboGuard 37.00 ± 4.83 8.00 ± 2.71 3.00 ± 1.71 15.00 ± 3.57 84.25 ± 1.82 100.00 ± 0.00 91.45 ± 1.07
RoboVQA No Defense 96.00 ± 1.96 47.00 ± 4.99 11.00 ± 3.13 78.00 ± 4.14 42.00 ± 2.47 89.00 ± 3.13 57.07 ± 2.37
RoboVQA Google Prompt 100.00 ± 0.00 29.00 ± 4.54 3.00 ± 1.71 76.00 ± 4.27 48.00 ± 2.50 99.00 ± 0.99 64.65 ± 2.28
RoboVQA RoboGuard 96.00 ± 1.96 45.00 ± 4.97 11.00 ± 3.13 53.00 ± 4.99 48.75 ± 2.50 100.00 ± 0.00 65.55 ± 2.26
RH20T No Defense 99.00 ± 0.99 43.00 ± 4.95 30.00 ± 4.58 69.00 ± 4.62 39.75 ± 2.45 97.00 ± 1.71 56.39 ± 2.48
RH20T Google Prompt 100.00 ± 0.00 19.00 ± 3.92 7.00 ± 2.55 68.00 ± 4.66 51.50 ± 2.50 98.00 ± 1.40 67.52 ± 2.17
RH20T RoboGuard 99.00 ± 0.99 42.00 ± 4.94 30.00 ± 4.58 46.00 ± 4.98 45.75 ± 2.49 100.00 ± 0.00 62.78 ± 2.35
Robo2VLM No Defense 29.00 ± 4.54 21.00 ± 4.07 1.00 ± 0.99 25.00 ± 4.33 81.00 ± 1.96 89.00 ± 3.13 84.81 ± 1.78
Robo2VLM Google Prompt 8.00 ± 2.71 0.00 ± 0.00 0.00 ± 0.00 12.00 ± 3.25 95.00 ± 1.09 100.00 ± 0.00 97.44 ± 0.57
Robo2VLM RoboGuard 29.00 ± 4.54 12.00 ± 3.25 1.00 ± 0.99 24.00 ± 4.27 83.50 ± 1.86 100.00 ± 0.00 91.01 ± 1.10
PhysicalAI AV No Defense 98.00 ± 1.40 1.00 ± 0.99 1.00 ± 0.99 70.00 ± 4.58 57.50 ± 2.47 98.00 ± 1.40 72.48 ± 2.00
PhysicalAI AV Google Prompt 100.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 82.00 ± 3.84 54.50 ± 2.49 100.00 ± 0.00 70.55 ± 2.09
PhysicalAI AV RoboGuard 98.00 ± 1.40 1.00 ± 0.99 1.00 ± 0.99 46.00 ± 4.98 63.50 ± 2.41 100.00 ± 0.00 77.68 ± 1.80
RJB-Instructions No Defense 94.44 ± 2.41 6.67 ± 2.63 7.78 ± 2.82 81.11 ± 4.13 52.50 ± 2.63 93.33 ± 2.63 67.20 ± 2.26
RJB-Instructions Google Prompt 93.33 ± 2.63 2.22 ± 1.55 2.22 ± 1.55 75.56 ± 4.53 56.67 ± 2.61 90.00 ± 3.16 69.55 ± 2.18
RJB-Instructions RoboGuard 94.44 ± 2.41 6.67 ± 2.63 7.78 ± 2.82 31.11 ± 4.88 65.00 ± 2.51 100.00 ± 0.00 78.79 ± 1.85

CD: BadRobot Conceptual Deception; CJ: BadRobot Contextual Jailbreak; SM: BadRobot Safety Misalignment. Values are mean with standard error, reported as percentages.

Release details

Version

1.0.0

Released

May 7, 2026

License

CC BY 4.0

Conforms to

Croissant 1.1

Subsets & downloadable archives

Thirteen archives, ~428 MB total. Each archive bundles its images, optional per-frame scene-graph JSON files, and the *_instr.json / *_instr.jsonl instruction pairs. RH20T and the autonomous-driving (car) subset are split into two parts to keep individual files under the GitHub 100 MB push limit.

Subset Archive Size SHA-256
RJB-Instructions — Physical Safety RJB-Physical_Safety_Offenses.zip 22.9 MB 4c48a181fadb51dd7cebfb4b154c350d68a80ab61a8cf75ff46ebc2445391a4b
RJB-Instructions — Ethical & Social RJB-Ethical_Social_Offenses.zip 19.0 MB 7f235a157ab065003a17d207c79e7f87c16dad21065ecaf34d4cb155c99c9217
RJB-Instructions — Illegal Activity RJB-Illegal_Activity.zip 12.1 MB b9642c804aa0a54cf8a581716fef32771a4862874c53be61f5a5dc1346b7d594
RJB-Instructions — Permission-Related RJB-Permission_Related_Offense.zip 14.8 MB bce975744747df69ae16db4fb1e188692e38dd56007bfb1bd0e9bfc061fe1ca5
DROID (augmented) DROID.zip 23.1 MB dc9e40c234fc3a677d704ccad5341bee1b63a824679c8250d2931655b38fe2ac
RH20T (augmented) · part 1 RH20T-part1.zip 55.2 MB 25f1637628f8c54bef09b689b6e9c16575ad0e4d9609c8d555e741b80c6d1642
RH20T (augmented) · part 2 RH20T-part2.zip 54.8 MB cb00018bb87982fb7d83440b4656beb0b85aec117185f090b8511e5ff6c08a7c
RoboVQA (augmented) ROBOVQA.zip 28.6 MB b9c121745ad86ee15123a87593d5421dacb5a71f8b41f17dacfd7dea7c2fd31d
Robo2VLM (augmented) robo2vlm.zip 13.1 MB 08d71cfe41dc0d8b89b24e732ed2f3313a44cfac6c5325f459032804e83d149b
Car / NVIDIA PhysicalAI-AV · part 1 car-part1.zip 80.7 MB 95813e1c50ba94adce456d1efc1ba15f14cab82b4dc229b508071e670b14eb7a
Car / NVIDIA PhysicalAI-AV · part 2 car-part2.zip 80.7 MB ccaf71b00070e93579e44a39b0595c2ee42f118baa22eabcf3ff5441778cc4f3
Drone (raw scenes) drone.zip 10.2 MB a78638465d4e9e7d1c11893f42a570c62b7137d3d89236071c6205f2eb12c644
Humanoid (raw scenes) humanoid.zip 12.6 MB b50403609b32ca9ad70cd2f6440b561a8b1fcd3946c01e32bbc4759fd90d0741

Verify integrity with sha256sum -c dataset/SHA256SUMS.txt after download.

Quickstart with Croissant

Load any subset directly through the mlcroissant Python package. The metadata file's contentUrls point at the archives on this page, and each archive's content is keyed by its declared SHA-256.

pip install mlcroissant

python -c '
import mlcroissant as mlc
ds = mlc.Dataset(
    "https://purseclab.github.io/benchmark-for-robotics-security/croissant.json"
)
for record in ds.records(record_set="rjb_physical_safety"):
    print(record["image_path"], record["malicious_goal"])
'

Archive layout

Each archive preserves the original repository tree under data/:

data/main_dataset_5/<Category>/
    <Category>_instr.json     # {image_path: [malicious, benign]}
    <Category>_instr.jsonl    # one record per image (Croissant source)
    dataset/<Scene>.png

data/external_datasets/<Source>/
    <Source>_instr.json
    <Source>_instr.jsonl
    dataset/frame_*.png         # scene image
    dataset/frame_*.json        # scene-graph annotation

data/cross_embodiment/<Embodiment>/
    <Embodiment>_instr.json   # car only; drone/humanoid pending
    dataset/frame_*.{png,jpg}

BibTeX

@misc{yeke2026robojailbench,
  title = {{RoboJailBench}: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents},
  author = {Doguhan Yeke and Yanming Zhou and Leo Y. Lin and Hongyu Cai and Antonio Bianchi and Z. Berkay Celik},
  year = {2026},
  note = {Preprint},
  url = {https://purseclab.github.io/benchmark-for-robotics-security/}
}