Search Go hack yourself with Detectify
×

A web security blog from Detectify

Web scanners are evolving to secure modern web applications and their APIs

Jocelyn Chan / September 2, 2021

Tom Hudson (TH), Senior Security Researcher at Detectify, joined the Application Security Weekly podcast to talk about the status quo on web scanners and securing modern web applications. We’ve edited the transcript for brevity and taken some highlights from the pod episode below.

How have web applications and web scanners evolved over the last decades?

TH: I think modern web apps have gotten harder to scan. Back 10-15 years or so ago, web pages were simple HTML, and maybe with a bit of JavaScript to augment their functionality. They were, for the most part, stateless applications that sent requests and then got HTML back. 

That meant that web scanners were also pretty simple: they sent requests, they got HTML back, they analyzed the HTML, they saw if anything had happened. 

But the modern web applications, not so much. The modern web application can be described as made up of single-page applications and APIs. If you run a Single-Page Application through a “traditional” scanner, if there is such a thing as a traditional web scanner, you get enough HTML and JavaScript back to bootstrap a single-page application, and then every single request that you send ends up in that same thing.

Web scanners built for the stateless applications start to fail with more modern websites that are made up of single-page applications. API is equally as problematic as well because we spend a lot of years figuring out how to crawl web pages, and then the web pages went and changed. And API has also become difficult to crawl as well. (Learn more on how Detectify is approaching API scanning.)

How about security testing APIs? How does this differ from web apps?

TH: A great deal of APIs are fairly well-documented, including self-documenting APIs, or at least the ones that are intended for public consumption are the kinds of APIs that companies write to power their own websites, mobile apps and things like that. If the API is documented internally, pentesters, bounty hunters and web scanners have to resort to attempting to reverse engineering which can be pretty tricky. 

Although they’re self-documenting, they’re pretty good at telling you about a particular API endpoint or a particular API method, but what they miss is telling you how those methods work together, for example what the authentic authentication flow looks like. There’ll be a plain English description of how that works, but not in a way that allows a machine to be able to consume and automatically do all of that stuff for you. 

“…people who are building the APIs are going to want to know that they’re secure and will probably start to try and build them in a way that makes them more amenable to automate because why wouldn’t you?” 

Should we be concerned that API documentation is also accessible to malicious actors?

TH: Sure, I can see how this benefits the malicious side of things, although I don’t have any experience with being actually malicious, of course. From my perspective, that kind of documentation being available to attackers, it speeds things up for them, for sure. But it speeds up all of the ethical hackers, pentester, red teamer, bounty hunter, etc. and helps security testing tremendously! It’s probably a net positive. 

Having absolutely zero documentation for an API can leave you in a position where you can’t make it do anything at all. You don’t necessarily know what authentication mechanism supports or even how to call a method in the first place. Without it, you can spend hours, days reverse engineering out stuff instead of trying to break things. You can just skip all that if it’s documented, getting more value out of the security audit.

Some listeners may be wondering, “should we stop using OpenAPI…?”

TH: I think one thing that we’ve probably all learnt is that attackers of all sorts can be incredibly determined and often don’t work on a finite budget. Freelancers, hobbyists or malicious ones often have a lot more time than a pentester would have in the week-long engagement. They can spend years keeping an eye on your stuff and reverse engineering it. If there is a vulnerability there, all you’re really doing is delaying the inevitable, rather than actually increasing security by withholding documentation.

Tom hudson on ASW

image: Tom Hudson, Senior Security Researcher at Detectify was interviewed on Application Security Weekly

What are the different types of vulnerabilities affecting API that defenders should pay more attention to? 

TH: Cross-site scripting is less of a problem on APIs generally because they tend to return application JSON on content types that aren’t rendered as HTML in browsers and so on. Mistakes still do happen, but it’s certainly less common. 

Business logic errors are incredibly common in APIs because, really, what is an API if not an abstraction around business logic? There’s no presentation or anything there. It’s about as close to business logic as you get as the consumer of an application.

The most common things tend to be a mix-up between authentication and authorization, so you get insecure, direct object reference (IDOR) type vulnerabilities, where the code is checking someone’s logged in, but misses whether they should actually have access to that particular piece of information in the first place. This leads to the attacker seeing all the user’s data and so on and so forth. That crops up a lot!

With APIs being powered by other APIs, and the increasing popularity of microservices architecture, we’re seeing problems with the middleware, effectively, the transport between those APIs where headers can be injected and you get path traversals between things. It’s pretty common to pass user data into the path of a downstream API call and this is something we’ve seen more of in the wild.

How about SSRF and CORS? isn’t this much more relevant to API design? 

I think CORS, especially, is still pretty misunderstood in a lot of cases. There are edge cases to consider with CORS. For example, trusting a null origin is a really common issue. Unlike the wild card, it allows credentials to be sent with the null origin as default if you are loading HTML files in your browser, just off disk. So if your developers are doing development in that way, they might be tempted to allow the null origin so that the developers can do their work. But then once that gets out into the wild, you can force a null origin with a sandboxed iFrame, and then suddenly it becomes a vulnerability.

Also the lack of built-in partial wild card into the CORS specification is an issue. You can either say any site, and that implies that credentials won’t be sent with requests, or you can accept the origin header and do some allowlisting or processing on it or something, and then reflect that back into the response. But it’s really easy to get that wrong. Domains have dots in them, so if you’re using regular expressions, the dots end up as wildcards if you don’t escape them. I’ve seen that time and time again.

On the SSRF side of things, I think more servers are talking to other servers than they used to be. It is inevitably going to create more of those kinds of issues, especially where you can get things like a header injection in downstream responses that can cause redirects that bounce things around to different pages and so on and so forth.

How can the security team help developers be better protected against common API weaknesses?

TH: One of the best things we can do is try to make it hard to do the wrong thing as much as we can. In the case of the authentication versus authorization, if the authorization part is baked into your data access layer at a code level – so for example, you have to provide a user object to a class constructor for your model or a method signature or whatever it is enforced at a code level – it makes it a lot harder to make those mistakes than if you’re, ad hoc doing those checks more like the controller layer in the MVC model. I assume people still use MVC, right? It’s not gone away just yet. 

Having that enforced so that your developers can’t avoid it, or they have to jump through a lot of hoops to do something that’s potentially risky, is probably one of the best things we can do.

What advice do you have on web scanners for teams building APIs and trying to test them through their CI/CD process?

TH: So there’s increasingly many static analysis tools that do a pretty good job of identifying potentially problematic patterns in code, but they tend to only really be as powerful as the patterns that you give them. They can turn a bunch of code into an abstract syntax tree and look for particular patterns where you’ve done a non-constant time comparison for passwords or something like that. Those exist, and they’re getting better, but they’re not magic. You’re not going to find anything. I believe dynamic analysis tools are a place to look.

So one of the things that I worked with in the past was a dynamic analysis tool that tracked “variables that had been tainted,” was the terminology they used. As you’re running your application, this particular variable was created using some user input. It’s going to be flagged, and then it’s going to fire off stack traces every time it’s used so that you can see every place, and that gives the web scanners the context to understand what’s really going on because there’s only so much you can do with static analysis. How much you can do depends on the language you’re using as well. Perhaps a silly example, but I think some time ago, someone proved you can’t statically analyze Perl, which was a surprise to nobody. But if you’re writing in Java then the wealth of tools that are available to you are going to be much better at doing their job.

I think one of the main things for me is there are no silver bullets. There’s only bullets, and you probably need a machine gun, right? So the CI/CD type tools that you mentioned before, things that have gone live, but also once things have gone live, those problems that can only rear their head, whether they’re environmental or they’re related to data, or any of those other things.

“…documentation being available to attackers, it speeds things up for them, for sure. But it speeds up all of the ethical hackers, pentester, red teamer, bounty hunter, etc. and helps security testing tremendously! It’s probably a net positive. “

What’s an API surprise that defenders should keep an eye out for?

TH: One of the things that I think doesn’t crop up very often, but I found it quite interesting is where the data itself can have an impact on the API and how it behaves. If you’re familiar with the idea of a second-order SQL injection, for example, which is where the results of some SQL statement, which were user controlled somehow, are then used in a second SQL statement, and then the injection happens there. So there’s this second order effect.

I’ve seen that happen where in the test environment with just simple test data, there was no problem because all the data was input by engineers who knew how things would work. Then later when another API consumes that in production, it turns out there’s loads of junk data in there. And suddenly it’s causing problems and making things crash, etc.

I think that’s a good example of what can go wrong. It’s simple things where the configuration is different in production from test because in test, debugging was enabled or it was running in a single-thread mode, and then in production it’s multi-threaded, and suddenly there’s race conditions.

How should defenders approach the security for these internal APIs?

TH: Internal APIs are overlooked greatly, and the traditional model of network security is you have this ring of trust, and if you’re on the inside of it, then you’re trusted and everything’s fine. This can mean that sometimes they do not have any kind of authentication at all, or if they do have authentication, it’s single user, and there’s no access control. The context of who the originator of the request is, is lost once you cross that internal API boundary. This is probably the root cause of a lot of issues to do with things like path traversal on APIs, or occasionally IDORs as well. It also increases the risk of things like SSRFs.

I’ve looked at systems before where I’ve discovered an SSRF vulnerability, and I’ve been able to hit internal APIs, which had no authentication, and served up all of the user data. I think companies are becoming more aware of that and moving more towards the zero-trust model where being on the network doesn’t actually gain you anything as an attacker. 

From a tooling perspective, what do you see as a future for API scanning or web app scanners? Is there hope?

TH: I think there is definitely some hope. Increasingly, people who are building the APIs are going to want to know that they’re secure and will probably start to try and build them in a way that makes them more amenable to automate because why wouldn’t you? 

From the black box scanner side of things, we’re increasingly moving away from the old style fetch the HTML and do some assertions on it, and into driving real web browsers because if you fetch the HTML for a single-page web application, it doesn’t tell you anything. You get one request for the HTML and you find the JavaScript source in there. There’s no amount of static analysis, or at least no easy amount of static analysis that’s going to really tell you anything about that application.

By driving a real web browser, we can load the real page, we can monitor the interactions that it has with the backend so calls that happen over XHR fetch, calls that the JavaScript makes, basically, including two APIs. If these applications are getting all of their data from an API and it’s not documented, one of the things we can do is instrument a real web browser and have it navigate around the page and make options and observe those API requests so that we can resend them with modified parameters. From there we can do fuzzing against those different parameters, including trying to add new ones and things that we might have garnered from an open API specification.


Detectify customers get real hacker insights to harden production

Deetctify UI showing the vulnerabilities found

image: Detectify web scanner users get access to vulnerability information sourced from leading ethical hackers to stay updated

Detectify works with the best ethical hackers in the world to crowdsource the latest critical vulnerabilities, and put this into the hands of application engineers. The fully automated web scanner simulates real hacker payloads in a safe way against your production environment. Go hack yourself before someone else does.

Check your web applications for critical web vulnerabilities using Detectify web scanner. Start a 2-week free trial today.