Building a Burp Suite Extension for Automated Undocumented API Endpoint Discovery
Automating the discovery of undocumented API endpoints during a penetration test can significantly enhance coverage and identify hidden attack surfaces. This post details the construction of a Burp Suite extension designed to passively identify potential API endpoints by analyzing HTTP traffic, focusing on common API patterns and content types.
The Undocumented API Problem
During web application assessments, documented APIs are often a starting point, but a wealth of functionality frequently resides in endpoints not explicitly covered by developer documentation or OpenAPI specifications. These "shadow APIs" or "internal APIs" can expose sensitive data, lead to business logic flaws, or even provide direct access to backend systems. Manually sifting through hundreds or thousands of HTTP requests to pinpoint these can be time-consuming and error-prone. A passive Burp extension can continuously scan traffic for patterns indicative of API endpoints, freeing up a tester to focus on deeper analysis.
Burp Extender Setup and Core Components
To begin, ensure Burp Suite is configured with Jython. Navigate to Extender -> Options -> Python Environment and set the 'Location of Jython standalone JAR file'. Once set, you can load extensions written in Python. Every Burp extension must implement the IBurpExtender interface. This is the entry point for Burp to interact with your code. Our extension will also implement IHttpListener to process every HTTP request and response passing through Burp's proxy.
from burp import IBurpExtender
from burp import IHttpListener
from burp import IHttpRequestResponse
from burp import IRequestInfo
from burp import IResponseInfo
import re
import urlparse
class BurpExtender(IBurpExtender, IHttpListener):
EXTENSION_NAME = "API Endpoint Discoverer"
API_PATHS = []
def registerExtenderCallbacks(self, callbacks):
self._callbacks = callbacks
self._helpers = callbacks.getHelpers()
callbacks.setExtensionName(self.EXTENSION_NAME)
callbacks.registerHttpListener(self)
callbacks.printOutput(self.EXTENSION_NAME + " loaded successfully.")
callbacks.printOutput("Monitoring HTTP traffic for API endpoints...")
def processHttpMessage(self, toolFlag, messageIsRequest, messageInfo):
# We only care about proxy traffic and responses to analyze URLs effectively
if toolFlag == self._callbacks.TOOL_PROXY and not messageIsRequest:
self.analyzeResponseForApiEndpoints(messageInfo)
def analyzeResponseForApiEndpoints(self, messageInfo):
# Implementation details will go here
pass
Analyzing HTTP Messages for API Indicators
The core logic resides within the processHttpMessage and analyzeResponseForApiEndpoints methods. We'll focus on responses (not messageIsRequest) from the Proxy tool (toolFlag == self._callbacks.TOOL_PROXY) as they often contain richer information about the application's structure and content types. We need to extract the URL, examine request and response headers, and potentially peek into the response body.
Key indicators for an API endpoint include:
- URL paths containing common API prefixes (e.g.,
/api/,/v1/,/service/,/rest/,/graphql,/rpc/). - Request headers like
Accept: application/json,Content-Type: application/json,X-Requested-With: XMLHttpRequest. - Response headers like
Content-Type: application/json,application/xml, or other structured data formats. - Response bodies containing JSON or XML structures.
- HTTP methods other than GET (e.g., POST, PUT, DELETE) used on non-static file paths.
To extract this information, we'll use Burp's helper methods, specifically _helpers.analyzeRequest() and _helpers.analyzeResponse().
def analyzeResponseForApiEndpoints(self, messageInfo):
httpService = messageInfo.getHttpService()
url = self._helpers.analyzeRequest(messageInfo).getUrl()
# Filter out static files and common image/css/js extensions
# This is a basic filter, refine as needed.
if re.search(r'\.(js|css|png|jpg|gif|svg|ico|woff|map)$', url.getPath(), re.IGNORECASE):
return
responseInfo = self._helpers.analyzeResponse(messageInfo.getResponse())
# Check URL path patterns
path = url.getPath()
is_api_path = self.checkApiPathPattern(path)
# Check content types
is_api_content_type = self.checkApiContentType(responseInfo.getHeaders())
# Check request headers for API indications
requestInfo = self._helpers.analyzeRequest(messageInfo)
is_api_request_header = self.checkApiRequestHeaders(requestInfo.getHeaders())
if (is_api_path or is_api_content_type or is_api_request_header):
self.addApiEndpoint(url, requestInfo.getMethod())
def checkApiPathPattern(self, path):
# Regex for common API prefixes and patterns
api_patterns = [
r'^/(api|v\d+|rest|graphql|service|rpc)/',
r'/api-docs',
r'\.json$', # direct JSON file access might indicate an API endpoint
r'\.xml$', # direct XML file access might indicate an API endpoint
r'/data/' # general data paths
]
for pattern in api_patterns:
if re.search(pattern, path, re.IGNORECASE):
return True
return False
def checkApiContentType(self, headers):
for header in headers:
if "Content-Type" in header:
if "application/json" in header or "application/xml" in header or "text/xml" in header:
return True
return False
def checkApiRequestHeaders(self, headers):
for header in headers:
if "Accept: application/json" in header or \
"Content-Type: application/json" in header or \
"X-Requested-With: XMLHttpRequest" in header:
return True
return False
def addApiEndpoint(self, url, method):
# We want to store a canonical representation of the endpoint
# Stripping query parameters for initial discovery to group similar endpoints
canonical_url = urlparse.urlunparse(urlparse.ParseResult(
scheme=url.getProtocol(),
netloc=url.getHost() + (":" + str(url.getPort()) if url.getPort() not in else ""),
path=url.getPath(),
params="",
query="",
fragment=""
))
endpoint = "{} {}".format(method, canonical_url)
if endpoint not in self.API_PATHS:
self.API_PATHS.append(endpoint)
self._callbacks.printOutput("Discovered API Endpoint: {}".format(endpoint))
Refining Endpoint Storage and Reporting
The addApiEndpoint method is crucial for storing discovered endpoints. We canonicalize the URL by stripping query parameters to prevent logging every unique parameter combination as a new endpoint. The method also logs the discovered endpoint to Burp's Extender output console. For more advanced use cases, you could integrate with Burp's custom UI components (e.g., ITab) to display these endpoints in a dedicated tab, allowing for easier review, filtering, and export. This would involve creating a Swing UI component and registering it with callbacks.addSuiteTab().
Consider the granularity of endpoint detection. An endpoint like /api/v1/users/123 and /api/v1/users/456 should ideally be grouped as /api/v1/users/{id}. This requires more sophisticated path analysis, potentially using regex or tree-based structures to identify dynamic segments. For a first iteration, simply stripping query parameters is effective. You could further refine addApiEndpoint to identify dynamic parts by looking for numeric or UUID-like segments in the path and replacing them with placeholders.
Here's an updated addApiEndpoint that attempts a basic dynamic path segment generalization:
def addApiEndpoint(self, url, method):
# Stripping query parameters and generalizing path segments
path_segments = url.getPath().split('/')
generalized_segments = []
for segment in path_segments:
if segment: # Avoid empty segments from leading/trailing slashes
# Heuristic: if segment is purely numeric or looks like a UUID, generalize it
if re.match(r'^\d+$', segment) or re.match(r'^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$', segment):
generalized_segments.append("{id}")
else:
generalized_segments.append(segment)
generalized_path = "/" + "/".join(generalized_segments)
canonical_url = urlparse.urlunparse(urlparse.ParseResult(
scheme=url.getProtocol(),
netloc=url.getHost() + (":" + str(url.getPort()) if url.getPort() not in else ""),
path=generalized_path,
params="",
query="",
fragment=""
))
endpoint = "{} {}".format(method, canonical_url)
if endpoint not in self.API_PATHS:
self.API_PATHS.append(endpoint)
self._callbacks.printOutput("Discovered API Endpoint: {}".format(endpoint))
Full Extension Code
Here is the complete Python code for the Burp Extender. Save this as a .py file (e.g., api_discoverer.py) and load it in Burp via Extender -> Extensions -> Add.
from burp import IBurpExtender
from burp import IHttpListener
from burp import IHttpRequestResponse
from burp import IRequestInfo
from burp import IResponseInfo
import re
import urlparse
class BurpExtender(IBurpExtender, IHttpListener):
EXTENSION_NAME = "API Endpoint Discoverer"
API_PATHS = []
def registerExtenderCallbacks(self, callbacks):
self._callbacks = callbacks
self._helpers = callbacks.getHelpers()
callbacks.setExtensionName(self.EXTENSION_NAME)
callbacks.registerHttpListener(self)
callbacks.printOutput(self.EXTENSION_NAME + " loaded successfully.")
callbacks.printOutput("Monitoring HTTP traffic for API endpoints...")
def processHttpMessage(self, toolFlag, messageIsRequest, messageInfo):
if toolFlag == self._callbacks.TOOL_PROXY and not messageIsRequest:
self.analyzeResponseForApiEndpoints(messageInfo)
def analyzeResponseForApiEndpoints(self, messageInfo):
httpService = messageInfo.getHttpService()
# Analyze request to get URL and method
requestInfo = self._helpers.analyzeRequest(messageInfo)
url = requestInfo.getUrl()
method = requestInfo.getMethod()
# Filter out static files and common image/css/js extensions
if re.search(r'\.(js|css|png|jpg|gif|svg|ico|woff|map|webp|eot|ttf)$', url.getPath(), re.IGNORECASE):
return
responseInfo = self._helpers.analyzeResponse(messageInfo.getResponse())
# Check URL path patterns
path = url.getPath()
is_api_path = self.checkApiPathPattern(path)
# Check content types
is_api_content_type = self.checkApiContentType(responseInfo.getHeaders())
# Check request headers for API indications
is_api_request_header = self.checkApiRequestHeaders(requestInfo.getHeaders())
if (is_api_path or is_api_content_type or is_api_request_header):
self.addApiEndpoint(url, method)
def checkApiPathPattern(self, path):
api_patterns = [
r'^/(api|v\d+|rest|graphql|service|rpc)/', # /api/, /v1/, /rest/, etc.
r'/api-docs', # Swagger/OpenAPI docs
r'\.json$', # direct JSON file or endpoint returning JSON
r'\.xml$', # direct XML file or endpoint returning XML
r'/data/', # general data paths
r'/search', # common search endpoints
r'/status', # status checks
r'/health', # health checks
r'/config' # configuration endpoints
]
for pattern in api_patterns:
if re.search(pattern, path, re.IGNORECASE):
return True
return False
def checkApiContentType(self, headers):
for header in headers:
if "Content-Type" in header:
if "application/json" in header or \
"application/xml" in header or \
"text/xml" in header or \
"text/json" in header: # legacy JSON content type
return True
return False
def checkApiRequestHeaders(self, headers):
for header in headers:
# Check for common API related request headers
if "Accept: application/json" in header or \
"Accept: application/xml" in header or \
"Content-Type: application/json" in header or \
"Content-Type: application/xml" in header or \
"X-Requested-With: XMLHttpRequest" in header or \
"SOAPAction:" in header: # SOAP APIs
return True
return False
def addApiEndpoint(self, url, method):
path_segments = url.getPath().split('/')
generalized_segments = []
for segment in path_segments:
if segment:
# Basic heuristic for dynamic segments: numeric, UUID, or long alphanumeric hashes
if re.match(r'^\d+$', segment) or \
re.match(r'^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$', segment) or \
(len(segment) > 10 and re.match(r'^[0-9a-fA-F]+$', segment)): # long hex strings
generalized_segments.append("{id}")
else:
generalized_segments.append(segment)
generalized_path = "/" + "/".join(generalized_segments)
# Reconstruct URL without query/fragment, using generalized path
canonical_url = urlparse.urlunparse(urlparse.ParseResult(
scheme=url.getProtocol(),
netloc=url.getHost() + (":" + str(url.getPort()) if url.getPort() not in else ""),
path=generalized_path,
params="",
query="",
fragment=""
))
endpoint = "{} {}".format(method, canonical_url)
if endpoint not in self.API_PATHS:
self.API_PATHS.append(endpoint)
self._callbacks.printOutput("Discovered API Endpoint: {}".format(endpoint))
Testing and Observations
To test this extension, simply browse a target application with Burp's proxy enabled. As you navigate through the application, the extension will log potential API endpoints to the Extender's output tab. For example, if you visit a page that makes an XHR request to https://example.com/api/v1/users/profile with an Accept: application/json header, you should see output similar to:
API Endpoint Discoverer loaded successfully.
Monitoring HTTP traffic for API endpoints...
Discovered API Endpoint: GET https://example.com/api/v1/users/profile
If the application makes a request like POST https://example.com/api/v2/items/12345/update, the generalized path logic should output:
Discovered API Endpoint: POST https://example.com/api/v2/items/{id}/update
This automated discovery acts as a force multiplier, quickly building a catalog of endpoints for further investigation. It will pick up endpoints used by single-page applications (SPAs), mobile applications, and other clients interacting with the web application through HTTP. Refining the regular expressions for path generalization and content-type detection is an ongoing process specific to the target environment.
Further enhancements could include: logging the HTTP service (host/port) associated with each endpoint, providing a dedicated tab within Burp for discovered endpoints, and integrating with other Burp functionalities like Scanner or Intruder for automated parameter discovery on these newly found endpoints. The current setup, however, provides a solid foundation for passive, automated API endpoint enumeration.
Remember, the goal is not to find every single endpoint immediately, but to establish a continuous discovery process that complements active testing. This tool will surface endpoints you might otherwise miss, allowing you to prioritize and probe them for vulnerabilities.