Skip to content

API Schema Discovery and Endpoint Analysis

Peakhour's API Discovery system automatically detects, catalogs, and analyzes API endpoints from live traffic to provide comprehensive visibility into your API infrastructure and enable advanced security policies.

How API Discovery Works

Automatic Detection Process

Traffic Analysis:

  1. Request Classification: Distinguish API traffic from web traffic based on content types, headers, and URL patterns
  2. Endpoint Extraction: Identify unique API endpoints from request paths and methods
  3. Parameter Discovery: Analyze request parameters (query, path, body) and their data types
  4. Response Analysis: Catalog response codes, content types, and structure patterns
  5. Schema Generation: Build OpenAPI-compatible schemas from observed traffic patterns

Discovery Components

Endpoint Cataloging:

  • Path Templates: Convert /api/users/123 to /api/users/{id} patterns
  • HTTP Methods: Track GET, POST, PUT, DELETE, PATCH operations per endpoint
  • Parameter Types: Identify query parameters, path variables, request body fields
  • Response Patterns: Catalog success/error response structures and status codes

Schema Learning:

  • Data Type Inference: Determine parameter types (string, integer, boolean, array)
  • Validation Rules: Extract constraints like min/max values, string patterns, required fields
  • Enum Detection: Identify fixed value sets for parameters
  • Nested Structure: Map complex object hierarchies in request/response bodies

Endpoint Discovery Features

Intelligent Traffic Classification

API Traffic Identification:

Content-Type: application/json     → API Request
Accept: application/json           → API Request  
User-Agent: mobile-app/1.2        → API Request
Content-Type: text/html            → Web Request

Endpoint Pattern Recognition:

/api/v1/users/123           → /api/v1/users/{id}
/api/v1/orders/456/items    → /api/v1/orders/{id}/items  
/api/v1/products?limit=10   → /api/v1/products + query params

Version Detection:

Path Versioning: /api/v1/, /api/v2/
Header Versioning: API-Version: 2.1
Parameter Versioning: ?version=3.0

Parameter Analysis

Request Parameter Discovery:

  • Query Parameters: ?limit=10&offset=20&sort=name
  • Path Parameters: /users/{userId}/orders/{orderId}
  • Header Parameters: X-API-Version, Authorization, custom headers
  • Body Parameters: JSON/XML request body field analysis

Parameter Metadata:

{
  "name": "limit",
  "location": "query",
  "type": "integer", 
  "required": false,
  "minimum": 1,
  "maximum": 100,
  "default": 20,
  "description": "Number of results to return"
}

Response Structure Analysis

Status Code Patterns:

GET /api/users/{id}:
  200: User found and returned
  404: User not found  
  401: Authentication required
  429: Rate limit exceeded

POST /api/users:
  201: User created successfully
  400: Invalid user data
  409: User already exists

Response Schema Detection:

{
  "endpoint": "/api/v1/users/{id}",
  "method": "GET",
  "responses": {
    "200": {
      "content-type": "application/json",
      "schema": {
        "type": "object",
        "properties": {
          "id": {"type": "integer"},
          "email": {"type": "string", "format": "email"},
          "created_at": {"type": "string", "format": "date-time"}
        }
      }
    }
  }
}

Security Benefits

Attack Surface Analysis

Endpoint Risk Assessment:

  • Authentication Requirements: Which endpoints require authentication
  • Sensitive Data Exposure: Endpoints returning PII or confidential data
  • Input Validation: Endpoints accepting user input without proper validation
  • Rate Limiting Gaps: High-risk endpoints lacking proper rate limiting

Vulnerability Detection:

High Risk Endpoints:

- /api/admin/* (administrative functions)
- /api/users/{id}/password (password changes)
- /api/payments/* (financial transactions)
- /api/debug/* (debugging endpoints in production)

Common Issues Found:

- Missing authentication on sensitive endpoints
- Overly permissive CORS policies
- Endpoints returning internal system information
- Lack of input validation on user-provided data

Automated Security Policy Generation

Rule Generation Based on Discovery:

// Auto-generated rate limiting rule
if (http.request.uri.path matches "/api/v1/search.*") {
  rate_limit(zone: "api_search", rate: "60r/m", key: ["api_key"])
}

// Auto-generated authentication requirement  
if (starts_with(http.request.uri.path, "/api/v1/users/") and 
    http.request.method ne "GET") {
  require_authentication()
}

// Auto-generated input validation
if (http.request.uri.path eq "/api/v1/users" and 
    http.request.method eq "POST") {
  validate_json_schema(user_creation_schema)
}

OpenAPI Schema Generation

Automated Documentation

Schema Export Formats:

  • OpenAPI 3.0: Industry-standard API documentation format
  • Swagger UI: Interactive API documentation interface
  • Postman Collections: Ready-to-import API testing collections
  • Insomnia Workspaces: API development environment setup

Generated OpenAPI Example:

openapi: 3.0.3
info:
  title: Discovered API
  version: 1.0.0
  description: Automatically generated from traffic analysis
paths:
  /api/v1/users:
    get:
      summary: List users
      parameters:

        - name: limit
          in: query
          schema:
            type: integer
            minimum: 1
            maximum: 100
            default: 20

        - name: offset
          in: query
          schema:
            type: integer
            minimum: 0
            default: 0
      responses:
        '200':
          description: User list retrieved successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  users:
                    type: array
                    items:
                      $ref: '#/components/schemas/User'
components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: integer
          format: int64
        email:
          type: string
          format: email
        created_at:
          type: string
          format: date-time

Schema Validation Integration

Real-time Validation:

  • Request Validation: Ensure incoming requests match discovered schemas
  • Response Validation: Verify API responses maintain consistent structure
  • Breaking Change Detection: Alert when API responses deviate from established patterns
  • Version Drift Monitoring: Track API evolution and compatibility

Traffic Pattern Analysis

Usage Analytics

Endpoint Performance Metrics:

/api/v1/users (GET):

  - Average Response Time: 156ms
  - 95th Percentile: 324ms  

  - Request Volume: 1,247 req/hour
  - Error Rate: 2.1%

  - Cache Hit Rate: 78%

/api/v1/orders (POST):

  - Average Response Time: 423ms
  - 95th Percentile: 892ms

  - Request Volume: 89 req/hour  
  - Error Rate: 5.3%

  - Success Rate: 94.7%

Consumer Behavior Analysis:

  • API Key Usage: Which consumers use which endpoints
  • Geographic Distribution: Where API requests originate
  • Time-based Patterns: Peak usage hours and seasonal trends
  • Device/Platform Analysis: Mobile vs web vs server-to-server usage

Security Event Correlation

Threat Detection Integration:

Endpoint: /api/v1/admin/users
Security Events:

- 15 brute force attempts (2024-01-15 14:30)
- 3 SQL injection attempts (2024-01-15 15:45) 
- 1 privilege escalation attempt (2024-01-15 16:12)

Risk Score: HIGH
Recommendation: Require additional authentication for admin endpoints

Anomaly Detection:

  • Usage Spikes: Unusual request volume increases
  • New Endpoints: Previously unseen API endpoints appearing
  • Parameter Anomalies: Requests with unexpected parameter values
  • Response Anomalies: Unusual error rates or response patterns

Discovery Configuration

Learning Parameters

Discovery Sensitivity:

Endpoint Detection Threshold:

- Minimum Requests: 5 (before cataloging endpoint)
- Time Window: 24 hours (learning period)  
- Parameter Confidence: 80% (before including in schema)
- Response Stability: 90% (consistent response structure)

Traffic Sampling:

  • Sample Rate: 10% of traffic (configurable)
  • Excluded Paths: Static assets, health checks, internal endpoints
  • Included Content Types: JSON, XML, form-data
  • User Agent Filtering: Exclude bots, include legitimate API clients

Privacy and Compliance

Data Handling:

  • Parameter Value Redaction: Never store actual parameter values
  • PII Detection: Automatically identify and mask sensitive data patterns
  • Retention Policies: Schema data retention and cleanup schedules
  • Access Controls: Who can view discovered API information

Compliance Features:

  • GDPR Compliance: No personal data stored in discovery process
  • SOC2 Controls: Audit trails for schema access and modifications
  • Data Residency: Schema storage location controls
  • Encryption: All discovered schema data encrypted at rest

Integration Capabilities

Development Workflow Integration

CI/CD Pipeline Integration:

# Export current API schema for validation
curl -H "Authorization: Bearer $API_KEY" \
  "https://api.peakhour.io/domains/example.com/api-discovery/schema" > current-schema.json

# Compare with expected schema in version control
diff expected-schema.json current-schema.json

# Fail build if breaking changes detected
if [ $? -ne 0 ]; then
  echo "Breaking API changes detected!"
  exit 1
fi

Documentation Generation:

  • Automatic Updates: Keep API documentation current with live traffic
  • Version Tracking: Maintain historical schemas for all API versions
  • Change Notifications: Alert teams when API schemas change
  • Documentation Publishing: Auto-publish to developer portals

Security Tool Integration

SIEM Integration:

{
  "event_type": "api_endpoint_discovered",
  "timestamp": "2024-01-15T14:30:15Z",
  "endpoint": "/api/v1/admin/reset-password",
  "method": "POST", 
  "risk_level": "HIGH",
  "authentication_required": false,
  "sensitive_data": ["password", "email"],
  "recommendation": "Add authentication requirement"
}

Vulnerability Scanning:

  • Endpoint Inventory: Provide complete API surface for security scanning
  • Risk Prioritization: Focus scans on high-risk discovered endpoints
  • Configuration Validation: Ensure security controls match discovered API surface
  • Compliance Checking: Verify API security posture against standards

This comprehensive API discovery system provides complete visibility into your API infrastructure, enabling better security policies, improved documentation, and enhanced operational insights.