The API vulnerabilities nobody talks about: excessive data exposure

Joviane Jardim

The API vulnerabilities nobody talks about: excessive data exposure

TLDR: Excessive Data Exposure (leaking internal data via API responses) is the silent, pervasive threat that is more dangerous than single dramatic flaws like SQL Injection. It amplifies every other API vulnerability (like BOLA) and happens everywhere because developers prioritize speed over explicit data filtering. Fixing it means systematically checking hundreds of endpoints for unneeded PII and sensitive internal data.

After writing about how API security is different from web app security – one thing that sticks is the idea that APIs can have hundreds of small issues that add up over time, rather than one big dramatic vulnerability.

Let me give you a concrete example of what I mean.

SQL injection is serious. Everyone knows that. But what about APIs that just… hand over sensitive data by design?
I’m not saying it’s worse than SQL injection. But it might be more insidious, because it amplifies every other vulnerability you have.

Excessive data exposure: the silent problem

The patterns even encourage this and you can see it everywhere. You have an endpoint like GET /api/users/123 and it returns something like:

{
    user_id: 42,
    name: "Joviane",
    email: "myemail@gmail.com",
    role: "student"
}
… but also returns
{
   internal_user_id: 64, 
   full_address: "Secure Street, 403", 
   ssn_last_4: 1234, 
   phone_number: "73737-7373"  
}

and a lot of stuff that you weren’t planning to expose. The frontend only displays name and email, but the API is returning EVERYTHING from the database.
You might think, “but only authenticated users can call this endpoint, so it’s fine!”. And yeah, that’s true. But what happens when an attacker compromises ANY user account? When a developer accidentally logs the full response? When a browser extension scrapes the data? When the response gets cached somewhere it shouldn’t be? All of that sensitive data is just sitting there, waiting.

The worst part? This compounded with other vulnerabilities. Say you have a BOLA vulnerability where users can access other users’ data by changing an ID. If your API only returned public fields, the impact would be limited. But if it’s leaking PII, internal IDs, or sensitive business data, now that BOLA just became a massive data breach waiting to happen.

Why this happens everywhere

Here’s the thing: this isn’t malicious. Usually, it’s convenient. Returning the whole object is faster than filtering fields. ORMs don’t help either, they return everything by default unless you explicitly use projection or select specific fields. Sometimes teams are trying to be clever and “future-proof” their APIs with fields they might need later. And sometimes? It’s just copy-paste. One endpoint did it this way, so all the others followed.
It makes sense from a development velocity perspective. I’ve done this myself when shipping features under pressure. You write a quick endpoint, test that the frontend displays correctly, and ship it. The API is returning 20 fields but the UI only uses 3? Nobody notices because it works.

The real-world impact

Let me give you a concrete example I’ve seen play out in a code review. An e-learning platform had an endpoint GET /api/courses/{courseId}/students that returned student enrollment data. Makes sense for instructors to see their students, right? But it wasn’t just returning names and progress percentages. It was also returning full email addresses, enrollment dates, payment status, quiz attempt histories with timestamps, discussion forum activity metrics, and even device information from where students were accessing the course.

The frontend displayed student names and their course completion percentage. That’s it. And if you were a student? You could only see your own status in the UI. But any enrolled student could hit that endpoint directly, change the course ID, and pull data from other courses. Someone could iterate through course IDs and build a complete database of who’s taking what courses, payment patterns, learning behaviors, and personal contact information. They didn’t need to break anything or find some clever exploit. The API was just handing it all over.
Luckily, this got caught before production, but as the feature was working fine in the UI and the API, this could’ve easily slipped through and reached production.

And let’s talk about the PII implications here. That leaked student data? We’re talking full names, email addresses, phone numbers, physical addresses, potentially payment information. In a lot of jurisdictions, that’s a GDPR violation or equivalent waiting to happen. Even if the attacker never uses the data maliciously, you’ve just exposed yourself to regulatory fines, mandatory breach notifications, and a PR nightmare. All because the API returned 15 extra fields that nobody actually needed. The business intelligence leak is bad for competitive reasons, sure. But the PII exposure? That’s the kind of thing that gets you on the front page of technical channels for all the wrong reasons.

Another common pattern: pagination endpoints that leak way too much. You call GET /api/students?page=1&limit=100 expecting a list of students, and you get back not just the students, but also their hashed passwords, API keys, internal permissions, last login times, IP addresses… all stuff that should never leave the backend.

The scale problem

SQL injection is one vulnerability. You can find it, fix it and you are done. Excessive data exposure? That’s hundreds of endpoints, each leaking a little data, compounding over time.

Which one is easier for an attacker to exploit at scale? The one that exists in every single endpoint. They don’t need to find a clever injection payload. They just need to iterate through your API and collect everything you’re giving them for free. And because it’s “technically working as designed,” it might not even trigger your security monitoring. No failed requests, no suspicious payloads, just normal API calls returning way too much information.

Other “boring” vulnerabilities that actually matter

There’s Mass Assignment – where a user sends {"name": "Deckan", "isAdmin": true} and the API just… accepts both fields. No validation on what should be updatable. Suddenly, regular users are admins. Or Improper Rate Limiting. No limits on password reset? Account takeover via brute force. No limits on OTP verification? Bye-bye 2FA. No limits on search? Congrats, someone just scraped your entire database.
And the classic: Predictable Resource IDs. /api/invoices/1001, /api/invoices/1002… you see where this is going. An attacker just iterates and collects everything. Classic BOLA.

What makes this hard

These aren’t the sexy zero-day exploits that make headlines. They’re architectural problems baked into dozens or hundreds of endpoints. Finding them means actually understanding what each endpoint does. You need to know what each endpoint returns, what it needs to return, and what’s just extra baggage. Then multiply that by every endpoint in your API. It’s tedious, but it matters.

This is why API security testing is tricky. You’re not hunting for one big vulnerability. You’re checking every single endpoint for these patterns. Data leaking where it shouldn’t, auth checks that are missing, rate limits that don’t exist. All these problems are everywhere and they add on top of each other. At Detectify, our API scanning handles the tedious part, systematically checking every endpoint for vulnerabilities. That way your team can spend time on the stuff that actually needs human judgment, like business logic vulnerabilities and understanding your specific app’s security context.

How does your team handle this?

And here’s the hard question that we’d love to hear about: when you’re building a new endpoint, how do you make sure developers only return the necessary fields? Code review? Automated checks? Response DTOs that force explicit field selection?

Joviane Jardim

Senior Engineering Manager at Detectify

Check out more content