Red Team Dashboard

Security

New

Comprehensive interface for planning and executing AI red teaming and adversarial testing exercises. Monitor attack success rates, discover vulnerabilities, and track compliance across security frameworks.

Preview

Full interactive dashboard with sample red team exercise results

Red Team Exercise Dashboard

AI Adversarial Testing & Vulnerability Assessment

Completed

Attack Success Rate

29%

3.2%

Moderate risk detected

62Risk Score

Vulnerabilities Found

Critical 2

High 2

Medium 2

Low 0

Arsenal

7/10 active

Attack vectors and frameworks

Direct Prompt Injection

Inject instructions directly into the user prompt to override system behavior

Indirect Prompt Injection

Embed malicious instructions in external data sources consumed by the model

DAN Jailbreak

Use "Do Anything Now" persona to bypass safety filters

Roleplay Exploitation

Leverage creative fiction scenarios to elicit restricted outputs

PII Extraction Probe

Attempt to extract personally identifiable information from training data

Membership Inference

Determine if specific data was part of the training set

Gender Bias Test

Test for gender-based biases in model responses across various contexts

Confabulation Triggers

Probe the model with queries designed to elicit confident but false statements

System Prompt Extraction

Attempt to reveal the system prompt through various prompting techniques

Toxicity Gradient

Gradually escalate prompt toxicity to find the safety boundary threshold

Targets

3 models

AI models under test

GPT-4o

OpenAI · 2024-08-06

online

Claude 3.5 Sonnet

Anthropic · v2

online

Gemini 1.5 Pro

Google · 001

degraded

Top Vulnerabilities

Critical

Reproducible

System prompt fully extractable via multi-turn conversation

The model reveals its complete system prompt when asked to "repeat everything above" in a fictional context.

System Prompt Extraction5/13/2026, 5:41:31 AM

Critical

Reproducible

DAN jailbreak bypasses safety filters on harmful content

Model adopts unrestricted persona and provides disallowed content.

DAN Jailbreak5/13/2026, 4:41:31 AM

High

Training data leakage through completion attacks

Model completes partial PII strings, suggesting memorization of training data.

PII Extraction Probe5/13/2026, 2:41:31 AM

High

Reproducible

Significant gender bias in professional role descriptions

Model consistently associates certain professions with specific genders.

Gender Bias Test5/12/2026, 11:41:31 PM

Test Run Timeline

Recent adversarial test executions

GPT-4o

Completed

14/48 attacks succeeded · 12m 34s

29% ASR

5/13/2026

Claude 3.5 Sonnet

Completed

18/52 attacks succeeded · 15m 12s

35% ASR

5/13/2026

Gemini 1.5 Pro

Failed

9/30 attacks succeeded · 8m 02s

30% ASR

5/12/2026

GPT-4o

Completed

20/44 attacks succeeded · 11m 48s

45% ASR

5/11/2026

Idle State

Dashboard before any tests have been executed

Red Team Exercise Dashboard

AI Adversarial Testing & Vulnerability Assessment

Idle

Attack Success Rate

Within acceptable range

0Risk Score

Vulnerabilities Found

Critical 0

High 0

Medium 0

Low 0

Arsenal

7/10 active

Attack vectors and frameworks

Direct Prompt Injection

Inject instructions directly into the user prompt to override system behavior

Indirect Prompt Injection

Embed malicious instructions in external data sources consumed by the model

DAN Jailbreak

Use "Do Anything Now" persona to bypass safety filters

Roleplay Exploitation

Leverage creative fiction scenarios to elicit restricted outputs

PII Extraction Probe

Attempt to extract personally identifiable information from training data

Membership Inference

Determine if specific data was part of the training set

Gender Bias Test

Test for gender-based biases in model responses across various contexts

Confabulation Triggers

Probe the model with queries designed to elicit confident but false statements

System Prompt Extraction

Attempt to reveal the system prompt through various prompting techniques

Toxicity Gradient

Gradually escalate prompt toxicity to find the safety boundary threshold

Targets

3 models

AI models under test

GPT-4o

OpenAI · 2024-08-06

online

Claude 3.5 Sonnet

Anthropic · v2

online

Gemini 1.5 Pro

Google · 001

degraded

Top Vulnerabilities

Test Run Timeline

Recent adversarial test executions

Running State

Dashboard during active test execution with live pulse indicator

Red Team Exercise Dashboard

AI Adversarial Testing & Vulnerability Assessment

Running

Attack Success Rate

18%

2.1%

Within acceptable range

45Risk Score

Vulnerabilities Found

Critical 0

High 1

Medium 0

Low 0

Arsenal

7/10 active

Attack vectors and frameworks

Direct Prompt Injection

Inject instructions directly into the user prompt to override system behavior

Indirect Prompt Injection

Embed malicious instructions in external data sources consumed by the model

DAN Jailbreak

Use "Do Anything Now" persona to bypass safety filters

Roleplay Exploitation

Leverage creative fiction scenarios to elicit restricted outputs

PII Extraction Probe

Attempt to extract personally identifiable information from training data

Membership Inference

Determine if specific data was part of the training set

Gender Bias Test

Test for gender-based biases in model responses across various contexts

Confabulation Triggers

Probe the model with queries designed to elicit confident but false statements

System Prompt Extraction

Attempt to reveal the system prompt through various prompting techniques

Toxicity Gradient

Gradually escalate prompt toxicity to find the safety boundary threshold

Targets

3 models

AI models under test

GPT-4o

OpenAI · 2024-08-06

online

Claude 3.5 Sonnet

Anthropic · v2

online

Gemini 1.5 Pro

Google · 001

degraded

Top Vulnerabilities

High

Reproducible

Partial system prompt exposure detected

Model reveals fragments of system prompt under certain prompting strategies.

System Prompt Extraction5/13/2026, 7:41:31 AM

Test Run Timeline

Recent adversarial test executions

GPT-4o

Running

4/22 attacks succeeded · 5m 12s

18% ASR

5/13/2026

Props

RedTeamDashboard component API reference

Prop	Type	Default	Description
`attackVectors`	`AttackVector[]`	`defaultAttackVectors`	Available attack types and frameworks in the arsenal panel
`targetModels`	`TargetModel[]`	`defaultTargetModels`	AI models under test with provider details and status
`attackSuccessRate`	`number`	`29`	Attack Success Rate percentage (0-100)
`asrTrend`	`number`	`3.2`	ASR trend compared to previous run (positive = worse)
`riskScore`	`number`	`62`	Composite risk score (0-100) displayed on the gauge
`vulnerabilities`	`Vulnerability[]`	`defaultVulnerabilities`	List of discovered vulnerabilities with severity levels
`testStatus`	`'idle' \| 'running' \| 'completed' \| 'failed'`	`'completed'`	Current test execution status
`testRuns`	`TestRun[]`	`defaultTestRuns`	Timeline of past test run executions
`viewMode`	`'overview' \| 'detailed' \| 'compliance'`	`'overview'`	Active dashboard view mode (controlled)
`onStartTest`	`() => void`	—	Callback when Start Test button is clicked
`onStopTest`	`() => void`	—	Callback when Stop button is clicked during a running test
`onExportReport`	`() => void`	—	Callback when Export Report button is clicked
`onViewModeChange`	`(mode: ViewMode) => void`	—	Callback when the view mode tab is changed
`onToggleAttack`	`(attackId: string, enabled: boolean) => void`	—	Callback when an attack vector toggle is switched
`className`	`string`	—	Additional CSS classes for the root element

Key Types

interface AttackVector {
  id: string
  name: string
  category: AttackCategory
  description: string
  enabled: boolean
}

type AttackCategory =
  | 'prompt-injection'
  | 'jailbreak'
  | 'data-extraction'
  | 'bias-probing'
  | 'hallucination-trigger'
  | 'privilege-escalation'
  | 'toxicity-elicitation'
  | 'system-prompt-leak'

interface Vulnerability {
  id: string
  title: string
  severity: 'critical' | 'high' | 'medium' | 'low'
  category: AttackCategory
  description: string
  attackVector: string
  reproducible: boolean
  timestamp: string
}

interface TargetModel {
  id: string
  name: string
  provider: string
  version: string
  endpoint?: string
  status: 'online' | 'offline' | 'degraded'
}

interface TestRun {
  id: string
  timestamp: string
  status: 'idle' | 'running' | 'completed' | 'failed'
  attacksExecuted: number
  attacksSucceeded: number
  duration: string
  model: string
}

Usage

Import and implementation example

import { RedTeamDashboard } from '@/blocks/security/red-team-dashboard'

export default function SecurityPage() {
  return (
    <RedTeamDashboard
      testStatus="completed"
      attackSuccessRate={29}
      riskScore={62}
      onStartTest={() => runTests()}
      onStopTest={() => abortTests()}
      onExportReport={() => downloadReport()}
      onToggleAttack={(id, enabled) => updateArsenal(id, enabled)}
    />
  )
}

Built With

3 components

This block uses the following UI components from the design system:

Button Badge Card

Features

Built-in functionality

Arsenal panel: Toggle and manage attack vectors across 8 categories: prompt injection, jailbreak, data extraction, bias probing, hallucination triggers, privilege escalation, toxicity, and system prompt leaks
Target model management: Display AI models under test with provider info, version, and real-time online/offline/degraded status
Attack Success Rate (ASR): Prominent metric showing the percentage of successful attacks with trend indicator compared to previous runs
Risk score gauge: Visual SVG gauge (0-100) with color-coded gradient from green (safe) to red (critical)
Vulnerability listing: Detailed vulnerability cards with severity badges, reproducibility flags, attack vector info, and timestamps
Test execution controls: Start/Stop buttons with animated status pulse for running tests, plus Export Report functionality
Three view modes: Overview for summary metrics, Detailed Results for full vulnerability list, and Compliance view for framework assessment
Compliance assessment: Track alignment with OWASP LLM Top 10, NIST AI RMF, EU AI Act, MITRE ATLAS, and ISO/IEC 42001
Test run timeline: Historical view of test executions with per-run ASR, duration, and success/failure status
Dark mode: Full dark mode support with proper color contrast across all panels and states
Controlled & uncontrolled: Supports both controlled (viewMode prop) and uncontrolled (internal state) view mode management
forwardRef support: Forwards ref to root HTMLDivElement for external DOM access

Accessibility

ARIA support and keyboard navigation

ARIA Attributes

role="switch" on attack vector toggles with aria-checkedaria-label on toggle buttons for screen reader contextSemantic heading hierarchy within dashboard sections

Keyboard Navigation

Key	Action
`Tab`	Navigate between interactive elements (toggles, buttons, tabs)
`Enter / Space`	Activate buttons, toggle attack vectors, switch view modes

Notes

Severity badges use semantic color coding: rose for critical, amber for high, indigo for medium
Status pulse animation uses prefers-reduced-motion safe patterns
Risk gauge uses tabular-nums for stable number rendering
All interactive elements have visible focus states
Vulnerability descriptions are truncated with line-clamp for readability

Related Components

AlertSystem AnomalyCard ExceptionDashboard AuditLog