EU Regulating InfoSec: How Detectify helps achieving NIS 2 and DORA compliance
**Disclaimer: The content of this blog post is for general information purposes only and is not legal advice. We are very passionate about cybersecurity rules and …
Detectify
“You can’t secure what you don’t know exists.” It’s a common refrain in cybersecurity (and for good reason!). But the reality is a bit more complex: it’s not enough to just know that something exists. To effectively secure your assets, you need to understand what each of them is. Without proper classification, applying the right security processes or tools becomes a guessing game.
There’s a discrepancy between what you think you’re exposing and what you actually are exposing. Critically, an attacker only cares about what is actually accessible to them, not what you think it is. Research from Detectify indicates that the average organization is missing testing 9 out of 10 of its complex web apps that are potential attack targets.
Imagine you’ve identified a few thousand assets exposed to the internet. The crucial next step is to determine what you are actually exposing. Different tools can help depending on what’s on your attack surface, but instead of focusing on specific tools right away, let’s concentrate on the methods and data points used to understand what each asset is.
Numerous data points can be used for classification. Let’s examine them in the order of a typical connection flow, assuming an outside-in, black-box analysis perspective. Internal network data or based on source code would require a different approach.
Asset classification methods covered in this guide
The data available for deeper classification heavily depends on the protocol encountered. For this blog post, we’ll focus primarily on HTTP, the backbone of web applications.
Key HTTP data points include:
If the response is HTML, we can delve even deeper:
If we haven’t gone down the HTTP and HTML path (e.g., we’ve encountered an SSH or SMTP server), we would then look further into the binary response or protocol-specific handshake data to understand what software components are running. However, that’s a topic for another article.
When we examine each data point individually, significant opportunities for fingerprinting and understanding exposed assets emerge. Combining them provides even richer insights:
Tools and Techniques: Manual inspection can be done with the dig command and basic human pattern recognition for small-scale analysis. For larger-scale testing, open-source tools like MassDNS can be highly effective.
Tools and Techniques: Nmap is a widely used tool for IP and port scanning. Alternatives for large-scale scanning include Zmap and MASSCAN. Whois lookups (command-line or web-based) are essential for ASN information.
Understanding which ports are open can help determine the firewall in place and the underlying systems running.
Tools and Techniques: For scanning at scale, masscan is fast, though it may produce a higher number of false positives. You’ll need to decide between speed and accuracy, as they often involve trade-offs. Nmap offers more accuracy and service detection features.
The identified protocol/schema is connected to the combination of hostname (e.g., the Host header for domain fronting, or TLS-based routing using SNI), IP address, and port in the request.
Tools and Techniques: Nmap is the most known service. Other tools like JA4T (for TLS client/server fingerprinting) and fingerprintx can also help identify protocols and services.
Tools and Techniques: JARM fingerprinting tools actively probe servers. Certificate Transparency (CT) logs are valuable public data sources for discovering issued certificates for domains, like crt.sh.
A simple 200 OK status code might offer limited information in isolation. However, observing an application’s status codes in response to crafted payloads can be far more revealing. Different payloads will trigger different behaviors, and a WAF may interfere. Additionally, response codes can vary based on the user-agent and accept-header.
$ curl -v http://whitehouse.gov * Trying 192.0.66.51:80... * Connected to whitehouse.gov (192.0.66.51) port 80 (#0) > GET / HTTP/1.1 > Host: whitehouse.gov > User-Agent: curl/7.81.0 > Accept: / > * Mark bundle as not supporting multiuse < HTTP/1.1 301 Moved Permanently < Server: nginx < Date: Wed, 16 Apr 2025 12:15:21 GMT < Content-Type: text/html < Content-Length: 162 < Connection: keep-alive < Location: https://whitehouse.gov/ < <html> <head><title>301 Moved Permanently</title></head> <body> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
The response body of the redirect clearly states that nginx is used.
Tools and Techniques: Common web scanning tools like Burp Suite, combined with human ingenuity, can help us understand more.
For example, sometimes triggering a non 200 status code might expose more information about a system or an underlying technology. As an example, if you’re looking to identify assets running IBM Notes/IBM Domino it can be helpful to request an nsf-file that does not exist.
Sending a GET request to example.com/foo.nsf
can trigger a 404
response containing strings such as <h1>Error 404</h1>HTTP Web Server: IBM Notes Exception - File does not exist</body>
.However, simply sending a request to a non-existing path such as example.com/foo
will not trigger the same descriptive error.
This category is vast, so we’ll focus on key areas:
$ curl -Iks https://www.paypal.com/se/home | grep -Eo '.{16}salesforce.{16}' e.com https://.salesforce.com https://.f l.com https://*.salesforce.com https://sec
Many file types can be identified by the first few bytes of the file.
Tools and Techniques: The file command and other command like xxd or hexdump can be used to inspect these bytes.
Content-Type and Length
The combination of the Content-Type header and Content-Length can indicate application types:
Tools and Techniques: curl
is useful here.
With libraries of known favicons (or their hashes), this can be a very fast way to scan a large number of assets. Running favicon fingerprinting across broad domain sets can yield significant insights into the technologies used.
Tools and Techniques: Tools like httpx -favicon or platforms like Shodan (which has a favicon hash search) can automate this.
The structure of URLs can be very revealing. The most basic example is the existence of an admin page at a specific path (e.g., /wp-admin
for WordPress). Other examples are how product categories or user profiles are represented (e.g., /product/{id}
, /user/{username}
), the encoding used in parameters and the presence of directory listings.
Tools and Techniques: Wordlists of common paths can be sent with tools like Burp Intruder, ffuf, or dirsearch. Regex patterns can then be applied to the response data to identify interesting results.
<meta name="generator" content="WordPress 6.2.2" /> <meta id="shopify-digital-wallet" ...> <meta name="shopify-checkout-api-token" ...>
Tools and Techniques: These tricks are often hidden within tools and are typically opaque. Any DAST tool would fit into this category; Nuclei, for example, has open-source signatures for these purposes. Tools like Wappalyzer and WhatWeb could also be included, as they utilize similar techniques.
The structure and patterns in form tags, especially for logins, along with their action URLs, can provide strong clues about the underlying CMS.
Tools and Techniques: Some tools that can be used are Wappalyzer, WhatWeb, curl+grep.
Looking at code patterns involves identifying characteristic snippets, function names, variable naming conventions, CSS class structures, or HTML element arrangements typical of certain frameworks or libraries.
When analyzing code patterns, we are increasingly using statistical and linguistic models to match identified applications with known examples. One approach is to examine the linguistic structure of the code and remove all plain text content. However, this process becomes more challenging when the code is obfuscated or compressed.
Tools and Techniques: BishopFox has utilized bindings for Tree-sitter to parse the abstract syntax tree (AST) of JavaScript. Detectify co-founder and security researcher Fredrik Almroth has explored ANTLR for similar purposes, specifically aiming to parse GraphQL. Some useful links are https://tree-sitter.github.io/tree-sitter/ and https://www.antlr.org/. Comparing tree structures after obtaining an AST involves a relatively advanced field of mathematics.
It’s not uncommon for CMSs, themes, or plugins to include default links to their documentation or license agreement. For example:
<a href="https://www.espocrm.com" title="Powered by EspoCRM" Powered by <a href="http://ofbiz.apache.org" href="https://about.gitea.com">Powered by Gitea
Applications frequently load third-party resources, and both the type and location of these resources can provide insights about the application itself. This information can reveal details about any supply chain dependencies or technical components being utilized. For instance, the presence of an analytics platform (such as Amplitude) typically suggests that the application is of significant importance and is actively being developed.
Tools and Techniques: Wappalyzer provides this information, highlighting unique properties that may exist in external JavaScript, such as those hosted on CloudFront, which is a great source for links, domains, API operations, and more. Occasionally, these JavaScript files might contain sensitive information. Some interesting links to bookmark are TruffleHog and KeyHags.
While individual data points are helpful, their true power is unlocked when combined. Only then can we answer more elaborate and critical questions about our attack surface. One might envision an AI agent piecing this together, but a more standard approach involves defining the question and then selecting the appropriate data points and tools needed.
Some questions might require only a single data point, while others necessitate combining many to achieve an acceptable confidence level in the classification. Consider these:
Systematically collecting and analyzing these diverse data points can help security teams move beyond simple asset discovery to a much deeper understanding, classification, and potentially testing of their web applications. Some tools can automate asset classification and deliver intelligent recommendations on what assets are potential attack targets and warrant deep testing.
Are you interested in learning more about Detectify? Start a 2-week free trial or talk to our experts.
If you are a Detectify customer already, don’t miss the What’s New page for the latest product updates, improvements, and new vulnerability tests.
**Disclaimer: The content of this blog post is for general information purposes only and is not legal advice. We are very passionate about cybersecurity rules and …
If you are a mature organization, you might manage an external IP block of 65,000 IP addresses (equivalent to a /16 network). In contrast, very …