How to Set Up Log-Based Alerting¶

This guide shows you how to configure Peakhour's comprehensive alerting system to monitor security events, performance issues, and operational anomalies using log-based triggers and multiple notification channels.

Before you begin: Understand advanced log queries and security investigation techniques to effectively configure alert thresholds.

Understanding Peakhour's Alerting System¶

Peakhour's "Instant Alerts" system monitors log events in real-time and triggers notifications when specific conditions are met. The system supports multiple alert types, notification channels, and includes intelligent cooldown mechanisms to prevent alert fatigue.

Alert Categories¶

Security Alerts¶

WAF blocks (web application firewall events)
IP reputation blocks (malicious traffic detection)
Geographic blocking events
Rate limiting violations

Performance Alerts¶

Origin server timeouts (90+ second delays)
Origin server errors (5xx responses)
Connection refused errors
High error rate thresholds

Operational Alerts¶

Service availability issues
Configuration problems
Certificate expiration warnings
Cache performance degradation

Notification Channels¶

Email: HTML and text notifications via Postmark
SMS: Mobile notifications via Twilio (Australian numbers supported)
Webhooks: HTTP integrations for custom systems
Log Forwarding: SIEM integrations (Azure Sentinel, GCP, etc.)

Configure Basic Instant Alerts¶

Access Alert Configuration¶

Navigate to Monitoring > Instant Alerts
Review current alert configuration status
Check notification channel setup

Set Up Notification Channels¶

Configure Email Notifications¶

Add Alert Emails: Enter email addresses for alert recipients
Test Email Delivery: Send test alerts to verify delivery
Email Templates: Review default alert email format

Configure SMS Notifications¶

Add Mobile Numbers: Enter Australian mobile numbers (+61 format)
SMS Rate Limiting: Understand built-in cooldown periods
Test SMS Delivery: Verify mobile notification delivery

Example Configuration¶

Alert Emails: 

- security@company.com
- ops-team@company.com
- admin@company.com

Alert Mobile Numbers:

- +61412345678 (Primary on-call)
- +61487654321 (Secondary contact)

Configure Alert Rules¶

Enable and configure specific alert types:

WAF Alerts:

Alert Type: WAF Block
Description: Web Application Firewall detected attacks
Notification: Email + SMS
Cooldown: 30 minutes

IP Block Alerts:

Alert Type: IP Block
Description: Malicious IP reputation blocks
Notification: Email only
Cooldown: 1 hour

Origin Error Alerts:

Alert Type: Origin 5xx Errors
Description: Backend server errors
Notification: Email + SMS
Cooldown: 15 minutes

Configure Alert Thresholds and Cooldowns¶

Set Appropriate Cooldown Periods¶

Critical Alerts (immediate action required):

WAF Attacks: 30 minutes cooldown
Origin Down: 30 minutes cooldown
Origin Timeouts: 30 minutes cooldown

Monitoring Alerts (informational):

IP Blocks: 2 hours cooldown
Geographic Blocks: 6 hours cooldown
Rate Limiting: 1 hour cooldown

Operational Alerts (planned maintenance):

Configuration Changes: 24 hours cooldown
Scheduled Maintenance: 24 hours cooldown

Understanding Cooldown Logic¶

Cooldown periods prevent alert spam during extended incidents:

First Alert: Immediate notification sent
Subsequent Events: Suppressed during cooldown period
Cooldown Reset: After period expires, next event triggers alert
Per-Rule Cooldown: Each alert type has independent cooldown

Example Scenario:

14:00 - WAF attack detected → Alert sent
14:15 - More WAF attacks → Suppressed (30min cooldown)
14:30 - More WAF attacks → Suppressed (cooldown continues)
14:31 - Different alert type (Origin error) → Alert sent (separate cooldown)
15:00 - WAF attack → Alert sent (cooldown expired)

Advanced Cooldown Configuration¶

Global Domain Settings:

{
  "cooldown_time": "PT1800S",  // 30 minutes global default
  "alert_emails": ["ops@company.com"],
  "alert_mobiles": ["+61412345678"]
}

Per-Rule Overrides:

{
  "rules": {
    "waf": {
      "notify": true,
      "cooldown_time": "PT900S"  // 15 minutes for WAF (override)
    },
    "origin_down": {
      "notify": true,
      "cooldown_time": "PT3600S"  // 1 hour for origin issues
    }
  }
}

Create Advanced Log-Based Alert Rules¶

Security Event Monitoring¶

While Peakhour's built-in alerts cover major security events, you can enhance monitoring using log forwarding to external systems:

High-Volume Attack Detection:

Log Forward to SIEM:
Event Type: WAF Blocks
Threshold: >50 events in 5 minutes
Action: Trigger advanced incident response

Geographic Anomaly Detection:

Log Analysis Rule:
Pattern: New country in top traffic sources
Threshold: >100 requests from new geographic region
Alert: Email notification with geographic analysis

Performance Monitoring Alerts¶

Response Time Degradation:

Custom Monitoring:
Metric: Average response time
Threshold: >2 seconds for >5 minutes
Action: Performance alert with trend analysis

Cache Hit Rate Drop:

Performance Alert:
Metric: Cache hit rate
Threshold: <70% for >10 minutes
Action: Cache performance investigation alert

Operational Health Monitoring¶

Error Rate Spikes:

Error Rate Monitor:
Metric: 4xx/5xx error percentage
Threshold: >10% for >5 minutes
Action: Service health alert

Traffic Volume Anomalies:

Traffic Monitoring:
Metric: Request volume deviation
Threshold: >300% of baseline or <10% of baseline
Action: Traffic anomaly alert

Integrate with External Systems¶

Webhook Integration for Custom Notifications¶

Slack Integration (via webhook):

# Configure webhook endpoint
POST https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK

# Webhook payload transformation
{
  "text": "Peakhour Alert: {{alert_type}} on {{domain}}",
  "channel": "#security-alerts",
  "username": "PeakHour Monitor",
  "icon_emoji": ":warning:"
}

PagerDuty Integration:

# PagerDuty Events API v2
POST https://events.pagerduty.com/v2/enqueue

{
  "routing_key": "YOUR_INTEGRATION_KEY",
  "event_action": "trigger",
  "payload": {
    "summary": "{{alert_type}} alert on {{domain}}",
    "source": "peakhour.io",
    "severity": "error"
  }
}

SIEM Integration via Log Forwarding¶

Azure Sentinel Integration:

Configure Log Forwarding: Set up HTTP endpoint for Sentinel
Create Analytics Rules: Define detection logic in Sentinel
Set Up Playbooks: Automated response workflows

Example Sentinel Rule:

PeakHourLogs_CL
| where block_by_s == "waf"
| where waf_matched_rule_severity_s == "CRITICAL"
| summarize count() by client_s, bin(TimeGenerated, 5m)
| where count_ > 10

Splunk Integration:

# HTTP Event Collector
curl -X POST "https://splunk.company.com:8088/services/collector" \

  -H "Authorization: Splunk YOUR_TOKEN" \
  -d '{
    "event": {
      "alert_type": "{{alert_type}}",
      "domain": "{{domain}}",
      "timestamp": "{{timestamp}}"
    }
  }'

Custom Alert Processing¶

Webhook Receiver Example (Python):

from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

@app.route('/peakhour-alert', methods=['POST'])
def handle_peakhour_alert():
    data = request.json

    alert_type = data.get('alert_type')
    domain = data.get('domain')

    if alert_type == 'waf' and 'critical' in data.get('severity', '').lower():
        # Escalate critical security alerts
        send_to_security_team(data)

    elif alert_type in ['origin_down', 'origin_timeout']:
        # Alert infrastructure team
        send_to_ops_team(data)

    return jsonify({'status': 'received'})

Monitor Alert Effectiveness¶

Alert Analytics Dashboard¶

Track alert performance and effectiveness:

Alert Volume Metrics:

Alerts triggered per day/week/month
Alert type distribution
Cooldown effectiveness (suppressed vs sent)
Response time to alerts

Alert Quality Metrics:

False positive rate
Time to resolution
Alert escalation patterns
Notification channel effectiveness

Alert Tuning Based on Analytics¶

Reduce Alert Fatigue:

High-Volume Alerts Analysis:

- IP Block alerts: 150/day (consider longer cooldown)
- WAF alerts: 25/day (appropriate level)  
- Origin alerts: 2/day (critical - keep current settings)

Tuning Actions:

- Increase IP block cooldown: 30min → 2 hours
- Add geographic filtering to reduce noise
- Create severity-based escalation

Improve Alert Coverage:

Gap Analysis:

- Missing alerts for certificate expiration
- No alerting for DNS resolution issues
- Limited visibility into cache performance

Enhancement Plan:

- Add certificate monitoring alerts
- Implement DNS health checks
- Create cache performance thresholds

Advanced Alerting Strategies¶

Multi-Tier Alert Escalation¶

Tier 1 - Information (Log only):

Events: Regular WAF blocks, standard IP blocks
Action: Log to SIEM, no immediate notification
Threshold: Standard activity levels

Tier 2 - Warning (Email notification):

Events: Elevated attack activity, minor performance issues
Action: Email to operations team
Threshold: 2-3x normal activity

Tier 3 - Critical (Email + SMS):

Events: Major attacks, service outages, significant performance degradation
Action: Email + SMS to on-call team
Threshold: 5x+ normal activity or service impact

Context-Aware Alerting¶

Business Hours vs. After Hours:

Business Hours (9 AM - 6 PM):

- Higher alert thresholds (more activity expected)
- Email notifications preferred
- Faster response time expectations

After Hours:

- Lower alert thresholds (less legitimate traffic)
- SMS notifications for critical issues
- Extended response time acceptable for non-critical

Geographic Context:

Expected Traffic Regions:

- US/CA/AU: Higher thresholds, standard alerting
- EU: Moderate thresholds during EU business hours  
- High-Risk Regions: Lower thresholds, immediate alerting

Intelligent Alert Correlation¶

Attack Campaign Detection:

Correlation Logic:

- Multiple IPs from same ASN attacking
- Similar attack patterns across time
- Geographic clustering of threats

Enhanced Alert:

- Single "coordinated attack" alert vs. many individual IP alerts
- Include campaign analysis and attribution
- Provide recommended response actions

Alert Response Procedures¶

Create Standard Operating Procedures¶

Security Alert Response:

WAF Alert Procedure:
1. Review alert details and affected domains
2. Check security dashboard for attack patterns
3. Verify WAF rules are blocking effectively
4. Escalate to security team if bypass detected
5. Document incident and response actions

Performance Alert Response:

Origin Error Procedure:
1. Check origin server health and connectivity
2. Review recent configuration changes
3. Test direct origin connectivity
4. Contact hosting provider if needed
5. Update monitoring thresholds if appropriate

Alert Response Templates¶

Security Incident Response:

Subject: Security Alert Response - {{alert_type}} on {{domain}}

Initial Assessment:

- Alert Time: {{timestamp}}
- Attack Vector: {{attack_details}}
- Source Analysis: {{source_ips_countries}}
- Current Status: {{blocking_effectiveness}}

Immediate Actions Taken:

- [ ] Verified WAF blocking effectiveness
- [ ] Reviewed attack patterns and scale
- [ ] Checked for successful bypasses
- [ ] Escalated to security team (if needed)

Next Steps:

- Continue monitoring for escalation
- Update threat intelligence with IOCs
- Review and enhance detection rules

Integration with Incident Management¶

Incident Tracking Integration¶

ServiceNow Integration:

Alert → ServiceNow Incident:

- Automatic incident creation for critical alerts
- Alert details mapped to incident fields
- Priority assignment based on alert severity
- Assignment to appropriate support groups

JIRA Integration:

Alert → JIRA Issue:

- Create issues for operational alerts
- Track resolution and root cause analysis
- Link related alerts and incidents
- Generate metrics on alert resolution time

Automated Response Actions¶

Immediate Response Automation:

Critical Security Alert Automation:
1. Create firewall rule to block attack sources
2. Increase rate limiting temporarily
3. Notify security team via multiple channels
4. Create incident tracking ticket
5. Begin evidence collection for forensics

Performance Issue Automation:

Origin Error Response:
1. Automatically failover to backup origins (if configured)
2. Increase cache TTL temporarily  
3. Enable maintenance page if needed
4. Alert operations and hosting teams
5. Begin diagnostic data collection

Troubleshooting Common Alerting Issues¶

Alert Delivery Problems¶

Problem: Alerts not being received Solutions:

Verify email addresses and mobile numbers are correct
Check spam/junk folders for email alerts
Test notification channels with manual test alerts
Review cooldown settings - alerts may be suppressed
Confirm alert rules are enabled

Alert Fatigue¶

Problem: Too many alerts, important ones get ignored Solutions:

Increase cooldown periods for noisy alerts
Implement severity-based escalation
Use log forwarding for informational events
Create alert summaries instead of individual notifications
Filter out expected patterns (maintenance, known good traffic)

Missing Important Alerts¶

Problem: Critical events not generating alerts Solutions:

Review alert rule configuration and thresholds
Check if events fall outside configured alert types
Verify log forwarding for custom monitoring
Test alert conditions with known scenarios
Consider additional monitoring tools for gaps

Best Practices for Log-Based Alerting¶

Alert Configuration¶

Start with conservative thresholds and adjust based on experience
Use appropriate cooldown periods to prevent alert spam
Test all notification channels regularly
Document alert procedures and escalation paths
Regular review and tuning based on effectiveness metrics

Notification Management¶

Use appropriate channels for different severity levels
Implement follow-the-sun on-call coverage
Provide context and recommended actions in alerts
Create alert summaries for management reporting
Maintain contact information and escalation procedures

Integration Strategy¶

Leverage existing monitoring and incident management tools
Use webhooks for custom integrations and workflows
Implement automated response for common scenarios
Maintain audit trails of alert actions and responses
Regular testing of integration points and escalation procedures

Your log-based alerting system is now configured to provide comprehensive monitoring coverage with intelligent notification management and integration capabilities for effective incident response.