Cloud-hosted AI models are exposed to a new class of threats. Attackers can steal models without
breaching your systems. Instead, they interact with public APIs and extract the model's behaviour over time.
This is called a model extraction attack.
In this guide, we’ll explain how these attacks work, what signals to watch for, and how to defend cloud inference
endpoints using techniques like output obfuscation, rate limiting, and watermarking.
What Is a Model Extraction Attack?
A model extraction attack happens when an attacker sends many inputs to a model and observes the outputs.
Over time, they can train a new model that behaves like yours. This can result in:
- Intellectual property theft
- Regulatory risk (if your model makes sensitive decisions)
- Evasion of rate-limited pricing models
These attacks often happen over time and look like normal API usage
How Attackers Steal Cloud Models
Most attacks follow a similar process:
- Reconnaissance: The attacker learns about the model’s purpose and the expected input/output.
- Querying: They send a large number of crafted inputs to the API.
- Collection: They record the model’s outputs.
- Training: They use these outputs to train a substitute model.
Common platforms like AWS SageMaker, Azure ML, and Vertex AI make it easy to deploy APIs, but they
often lack default protections against this attack vector.

Simulating a Basic Model Extraction Attack
Here’s a basic Python example that shows how an attacker might start an extraction:

This process is repeated with thousands of crafted inputs. The attacker uses the input-output pairs to train
a clone of your model.
Detection: Signs of Model Extraction in Progress
To detect extraction attempts, monitor for specific signals:

Use logs and SIEM tools to track these signals. Correlate them to identify coordinated attacks.

Defending Inference APIs
1. Rate Limiting
Limit the number of queries per user or IP. Use progressive back-off or captchas when thresholds are crossed.
Key tactics:
- Set low thresholds for new users.
- Use tiered access for trusted partners.
- Monitor for circumvention (e.g., proxy use).
AWS Example:Use Amazon API Gateway to apply throttling settings:
- Burst limit: 100 requests
- Rate limit: 10 requests per second
- Connect logs to CloudWatch for anomaly detection
2. Output Obfuscation
Reduce the detail in model responses. For example:
- Return class labels instead of full probability scores.
- Round or bin confidence scores.
- Add slight noise to outputs to reduce extractability.
These changes can reduce the attacker’s ability to mimic the decision boundary without harming end-user utility.
3. Prediction Watermarking
Embed subtle patterns in model outputs that are hard to remove but easy to prove.
- Use deterministic modifications tied to specific inputs.
- If a stolen model contains the same watermark, you can prove theft.
Some open-source libraries (e.g., MIPGuard, MLGuard) offer early-stage support for this. These tools are still
experimental and may not be robust against all forms of extraction.

Advanced Measures
1. Query Fingerprinting
Build a statistical profile of normal query behaviour. Alert when a query session deviates.
- Use cosine similarity across inputs.
- Flag batch querying with low variance in structure.
2. Differential Privacy Layers
Add Laplace or Gaussian noise to prediction outputs.
- Particularly effective for numeric regressions.
- Use libraries like TensorFlow Privacy.
- Note: These methods may reduce model accuracy and should be tested against real-world use cases.
3. Model Randomisation
Introduce slight, randomised variation between model deployments.
- Makes uniform extraction harder.
- Useful in multi-tenant environments.
4. Tenant-Level Inference Isolation
In cloud SaaS platforms, isolate inference access by tenant.
- Deploy containerised model instances.
- Use unique API keys and logging per tenant.

Conclusion
Inference-time attacks are quiet, hard to detect, and growing more common. If your cloud-hosted model
drives revenue, makes regulated decisions, or is the result of expensive training, it’s a target.
Protecting your model doesn’t require rewriting it. Instead, focus on detecting misuse patterns and deploying
simple defences at the API layer. Rate limiting, obfuscation, and watermarking are easy to implement and go
a long way towards making extraction unfeasible.
You don’t have to make your model unbreakable - just expensive and time-consuming to steal.
Related Resources
Find your Tribe
Membership is by approval only. We'll review your LinkedIn to make sure the Tribe stays community focused, relevant and genuinely useful.
To join, you’ll need to meet these criteria:
> You are not a vendor, consultant, recruiter or salesperson
> You’re a practitioner inside a business (no consultancies)
> You’re based in Australia or New Zealand