Crafting AI Prompts Framework

Adversarial Prompting

Adversarial prompting encompasses a range of techniques, including prompt injection, prompt leaking, and prompt jailbreaking, designed to exploit vulnerabilities in natural language processing (NLP) or language model systems. These attacks manipulate the input prompts to elicit harmful, unintended, or sensitive outputs from the models. By leveraging the models' text generation capabilities, adversarial prompting can lead to security breaches, privacy violations, and the generation of inappropriate or dangerous content, highlighting the need for robust safeguards and monitoring mechanisms in the deployment of NLP systems.

To mitigate adversarial prompting attacks, it's crucial to implement security measures and validation checks when accepting user-generated prompts or input. These measures may include content moderation, filtering, and anomaly detection to identify and block malicious or inappropriate prompts. Additionally, continuous monitoring and updates to language models can help reduce vulnerabilities to prompt injection attacks.

Prompt Injections

INJ

A prompt injection attack is a type of security exploit that involves subtly manipulating or injecting malicious prompts into a natural language processing (NLP) or language model system. This manipulation can occur in ways that the user might overlook, leading the model to generate harmful or unintended output. The attack leverages the language model's ability to generate text based on the provided prompts, which can be exploited to produce content that poses significant risks.

Prompt Leaking

LEAK

Prompt Leaking occurs when a Large Language Model (LLM) inadvertently discloses sensitive or private information embedded within its system prompts. This can include not only data used during training or fine-tuning but also the complete system prompt, which may contain intellectual property (IP). These leaks can happen when the model is queried in ways that unintentionally reveal this underlying information, leading to serious privacy and security concerns.

Prompt Jailbreaking

JAIL

Prompt jailbreaking involves crafting specific prompts to circumvent restrictions or safety measures implemented in NLP models. By using cleverly designed prompts, attackers can manipulate the model to generate prohibited, harmful, or inappropriate content, bypassing the built-in safeguards. This technique is used to exploit the model's flexibility in understanding and generating text, leading to outputs that the system's designers intended to prevent.


Adversarial prompting risk and impact scores

When evaluating various scenarios or "injections," it's insightful to categorize them with two distinct metrics:

  • Risk score: risk:HIGH, risk:MEDIUM, risk:LOW: This score reflects the likelihood of the event occurring. It's an estimation of how probable it is that the scenario will happen.
  • Impact score: impact:HIGH, impact:MEDIUM, impact:LOW: This score assesses the potential consequences if the event does occur. It measures the severity or significance of the impact.

Each injection is tagged with these scores at the end of its title, providing a quick, at-a-glance understanding of both its likelihood and potential impact.

These evaluations are subjective, based on my personal analysis and reasoning, which I elaborate upon in the description's conclusion. This dual-scoring approach offers a nuanced view of each scenario, helping readers to gauge both its probability and its potential ramifications more effectively.


Adversarial prompt overview

On this page, you will find various Prompt Injections. Below, each prompt injection is listed along with its impact. The risk assessment for each is detailed in their respective overviews.

If you click on the row of a prompt injection, you will be automatically navigated to the corresponding section for more detailed information.


Read more

The three phases

CRAFT

Craft (write) the prompt with the following elements: Context, Register, Acting Role, Format, and Task.

ING

Validate the prompt and ensure it maintains an interactive approach. Keep in mind the importance of non-disclosure and staying goal-driven throughout the process.

AI

Continuously assess and refine the output based on the prompts output to improve the overall quality.