No items found.
Back
Link Copied!
Copy link
May 12, 2026
! 0
my reading

When companies start building products with AI, or integrating language models into existing systems, new forms of security risks emerge. Traditional security tools are built for predictable systems where the same input always produces the same output and the logic can be inspected line by line. But AI systems don’t work that way. They behave non-deterministically, meaning the response can vary even with identical input, and they accept instructions in plain text rather than code. This opens up attack surfaces that classical security frameworks aren’t built for.

This guide provides a concrete overview of the most important risks, how they are tested, and what you can do to reduce them.

Why AI security is different

In a traditional web-based system, there is clear logic to test: code that does things, responses that follow rules, behavior that can be predicted. Security work is about finding gaps in that logic. AI systems work differently in a few ways that actually matter for security.

Behavior varies. The model can give different answers to the same question, which makes it difficult to define what is normal, and even harder to guarantee that security controls hold in all situations.

Training data is a possible attack vector. An attacker who can influence what the model is trained on can shape how it behaves at a fundamental level. This is called data poisoning and is one of the harder problems to detect after the fact.

Instructions arrive as plain text. Users write to the model in natural language, and the model cannot reliably distinguish between legitimate instructions and malicious ones. This creates an attack surface that lives in the content itself, not in the code.

There is also a responsibility gap that is easy to miss. Providers like OpenAI and Anthropic are responsible for their infrastructure and their model, but how you handle inputs, what tools the model has access to, and how you filter output, that is your responsibility. Most serious incidents occur in that gap.

The biggest risks: OWASP Top 10 for LLMs

OWASP, the organization behind the well-known security lists for web applications, has published a list specifically for AI systems and language models. The list was updated in 2025 and reflects how the threat landscape has changed as more companies have started building with AI. Here are the ten risks.

Prompt Injection

An attacker smuggles instructions into text that the model processes, thereby hijacking the model’s behavior. This can happen directly via the user’s own inputs, or indirectly via documents and web pages that the model retrieves and reads. Prompt injection is the most widespread attack class against AI systems and currently has no general technical solution.

Sensitive Information Disclosure

The model leaks sensitive information, for example internal instructions governing its behavior, confidential documents in its context, or data from other users. The risk increases in systems that retrieve information from external sources, or where multiple users share the same context.

Supply Chain

AI applications depend on a long chain of components: model API, databases, plugins, training data. If any part of the chain is compromised, an attacker can influence the model’s behavior without touching the application code directly.

Data and Model Poisoning

The data used to train or fine-tune the model is manipulated with the intent to affect how it behaves. This can involve planting hidden behaviors triggered by specific inputs, or systematically skewing the model’s responses in a particular direction. Difficult to detect, and difficult to remediate once it has occurred.

Improper Output Handling

The model’s responses are treated as trusted data and passed on without sufficient control. This opens the door to downstream attacks: if the model generates JavaScript that is rendered directly in a browser, or SQL that is run against a database without filtering, an attacker can control what actually executes.

Excessive Agency

The model is given more permissions and tools than the task requires. If an attack succeeds, it is the model’s actual permissions that determine how much damage can be done. An AI agent that can write files, send emails, and call external services is a significantly more serious problem than one that can only read.

System Prompt Leakage

The system prompt, the underlying instructions that govern how the model behaves, is exposed to users or attackers. These instructions often contain sensitive logic: security rules, internal guidelines, API keys, and permission structures. If they leak, attackers gain a detailed map of how the system works.

Vector and Embedding Weaknesses

Many AI systems use RAG, meaning the model retrieves relevant information from an external knowledge base before responding. That knowledge base is indexed as numerical vectors (embeddings). Vulnerabilities here include malicious vectors being inserted into the database to affect what information is retrieved, or crafted queries tricking the system into returning data that should not be accessible.

Misinformation

The model generates incorrect statements with high confidence. In business applications, this can lead to poor decisions, legal issues, or damaged trust. The risk is greatest in systems where the model’s responses are presented as facts without source references and without a human verification step.

Unbounded Consumption

The application allows unregulated resource usage. An attacker can craft queries that trigger extremely costly computations, drive up costs significantly, or cause operational disruptions. In services with variable per-API-call pricing, this can have a direct financial impact.

Want an expert to review your security? Book a meeting with us and we’ll help you identify gaps and vulnerabilities.

Vulnerabilities in RAG systems and agents

RAG systems and AI agents introduce additional risks that deserve their own section.

RAG — Retrieval-Augmented Generation — is a technique where the model retrieves information from an external knowledge base before responding. This makes the system more fact-based and current, but also opens the door to an indirect variant of prompt injection: an attacker who can influence the content of documents the model reads can steer what the model says, without the user doing anything wrong. Retrieval manipulation is a similar attack where a crafted query tricks the retrieval logic into returning the wrong information.

AI agents are systems where the model doesn’t just respond but also acts, for example searching for information, writing code, sending emails, or calling external services. This amplifies the effect of all other risks. The more things an agent is allowed to do, the more an attacker can accomplish if they succeed in hijacking its behavior. The basic rule is simple but easy to ignore during development: give the agent only the permissions that are actually needed for the task.

Data protection and AI: GDPR and the EU AI Act

Integrating AI into business systems raises legal questions that go beyond pure security testing, and it is worth having the basics clear.

The EU AI Act entered into force in phases from 2024, with most requirements taking effect during 2026 and 2027, depending on the type of system. The law classifies AI systems by risk, and requirements scale with the classification. High-risk systems, those used in credit decisions, recruitment, or medical diagnosis, have the heaviest requirements: risk management, technical documentation, and ongoing monitoring. All companies that develop or use AI systems within the EU are affected by the law to some extent, even if it is the high-risk category that carries the most concrete obligations.

The GDPR question is primarily about what happens when you send data to an external AI provider. Since 2023, the EU-US Data Privacy Framework has provided a structure that allows the transfer of personal data to certified US companies, including the major AI providers, without needing to establish separate agreements for each transfer.

What is worth checking is that your specific provider is actually certified, and that the agreements in place cover the data you actually send. The legal landscape around international data transfers has changed several times in recent years and may change again.

The practical advice is simple: don’t send more data than necessary. Personal data and confidential business data should be kept out of the model’s context as much as possible, and when that isn’t possible, you should know exactly which agreement governs the processing.

How do you test AI security?

Security testing an AI system requires a different methodology than classical penetration testing. Classical pen testing tests known logic in known code. AI testing tests how a system behaves when exposed to adversarial inputs, and that requires both technical breadth and an understanding of how language models work.

Prompt injection testing is the foundation. It involves systematically trying to hijack the model’s behavior via inputs: direct instructions, role-play scenarios, and instructions embedded in documents the model reads. It cannot be fully automated, and manual testing by someone who understands how models behave is required.

Jailbreak testing examines whether the model’s safety filters can be bypassed to get it to generate responses it would normally block. Methods change rapidly and require ongoing updates to test cases.

Data leakage testing examines whether an attacker can manipulate the model into disclosing information from its context or connected data sources to an unauthorized party.

Adversarial testing involves crafted inputs designed to manipulate the model’s classification or responses in a specific way, often subtly enough that it is not visible to a regular user.

Cyloq’s approach is grounded in offensive security expertise applied to AI-specific attack methods. We test your AI systems the same way an attacker would, using OWASP Top 10 for LLMs as one of our frameworks.

Practical measures for businesses

There are a number of measures that meaningfully reduce the attack surface and should be in place in any system with AI integration in production.

  • Validate inputs. Filter and control what users send before it reaches the model. This doesn’t eliminate prompt injection risk entirely, but it reduces the attack surface.
  • Control what the model passes on. Never treat the model’s responses as trusted code or data. Sanitize and validate output before it is rendered in a browser or passed to other systems.
  • Limit permissions. Give the model and any agents only the rights that are actually needed for the task. The less an attack can accomplish, the better.
  • Monitor prompt patterns. Log and analyze inputs on an ongoing basis to detect anomalous behavior and active attempts to manipulate the model. This gives you visibility and a record if something happens.
  • Think carefully about what data is sent. Personal data and confidential business data should be kept outside the model’s context as much as possible. Only send what is actually needed to complete the task.
  • Test regularly. AI systems change quickly: model versions are updated, instructions are adjusted, new features are added. Test at every major change and at least once a year for systems in production.

Frequently asked questions about AI security for businesses

Is it enough that we use OpenAI or Anthropic as our provider?

The provider’s security is part of the picture, but not sufficient. The risks live primarily in your integration: how you handle inputs, what tools the model has access to, how you control what the model passes on. That responsibility is yours.

What is prompt injection?

Prompt injection is when someone smuggles instructions into text that the model processes, in order to get it to behave in a way that was not intended. This can happen directly via what the user writes, or indirectly via documents or web pages that the model reads as part of its response.

Does the EU AI Act apply to us?

All companies that develop, use, or distribute AI systems within the EU are affected by some part of the law. Which requirements apply depends on how the system is classified: prohibited systems, high-risk, transparency requirements, or minimal risk. High-risk systems have extensive requirements for risk management and documentation.

Can we test AI security internally?

Simpler input controls and basic prompt testing can be done internally. For deeper testing with adversarial techniques, external expertise is recommended. OWASP Top 10 for LLMs is a good starting point for understanding what needs to be tested.

How often should an AI system be security tested?

AI systems change quickly: model versions are updated, instructions are adjusted, new features are added. At minimum at every major change, and at least once a year for systems in production. For high-risk systems, ongoing monitoring of input patterns is recommended.

Book a meeting

Building AI products? Book an AI security meeting

We review your AI integration, identify the most critical risks, and put together a concrete test plan. Book a meeting and let’s get started.

Book a meeting

FAQ

Frequently Asked Questions about AI Security for Businesses

Is it enough to use OpenAI or Anthropic as a provider?

Supplier security is part of the bigger picture, but it's not enough. The risks primarily arise from your integration: how you handle inputs, what tools the model has access to, and how you control what the model outputs. That responsibility is yours.

What is prompt injection?

Prompt injection is when someone injects instructions into text that the model processes, to make it behave in an unintended way. This can happen directly via user input, or indirectly through documents or web pages that the model reads as part of its response.

Does the EU AI Act apply to us?

All companies that develop, use, or distribute AI systems within the EU will be subject to some part of the law. The applicable requirements depend on how the system is classified: prohibited systems, high-risk, transparency requirements, or minimal risk. High-risk systems are subject to extensive requirements for risk management and documentation.

Can we test AI security internally?

Simpler input checks and basic prompt testing can be done internally. For more in-depth testing using adversarial techniques, external expertise is recommended. The OWASP Top 10 for LLM is a good starting point for understanding what needs to be tested.

How often should an AI system be security tested?

AI systems are changing rapidly: model versions are updated, instructions are adjusted, and new features are added. Systems should be reviewed at least with every major change and at least once a year for those in production. For high-risk systems, continuous monitoring of input patterns is recommended.