Project CreatorPersonal ProjectBy Neel Vora
--

Prompt Injection & Safety Lab

A CTF-style playground with 7 progressively harder prompt injection challenges. Each level adds new defenses, from naive prompts to input analysis, output filtering, and a two-model sandboxed architecture. Built to explore and teach LLM security patterns.

Next.js 16React 19TypeScriptOpenAI APIGPT-4o-miniPrompt Engineering
Prompt Injection & Safety Lab project screenshot

Overview

This is a capture-the-flag style playground for testing prompt injection attacks against progressively hardened LLM defenses. Each of the 7 levels has a secret phrase embedded in its system prompt, and the goal is to extract it.

The Levels

  • Level 1: The Naive Assistant, no defenses at all
  • Level 2: The Cautious Helper, basic instruction not to reveal the secret
  • Level 3: The Encoded Guardian, input analysis for injection patterns
  • Level 4: The Roleplay Blocker, defenses against persona-switching attacks
  • Level 5: The Output Filter, post-generation output scanning
  • Level 6: The Sandboxed Assistant, two-model architecture where one model evaluates the other's output
  • Level 7: The Fortress, combines all previous defenses

Technical Details

All 7 levels are handled by a single API route. Each level defines its own system prompt, defense strategies, and difficulty rating. The two-model architecture in Level 6 makes two separate OpenAI API calls: one to generate the response and one to evaluate whether it leaked the secret.

This project lives at /ai/security on this site.

Tech Stack

Next.js 16React 19TypeScriptOpenAI APIGPT-4o-miniPrompt EngineeringFramer Motion

Attribution

Role:Project Creator
Company:Personal Project

Interested in working together? I'm always open to discussing new projects and opportunities.

Related Projects

👋 Get to Know Me

Learn More About Me

From sound engineering to AI systems. Discover the journey that shaped how I build technology.