PART 3: MEMORY MANAGEMENT AND PERFORMANCE OPTIMIZATION

Understanding Zeek’s Memory Usage

Zeek is a memory-intensive application because it maintains rich state about network connections and protocol sessions. Understanding how Zeek uses memory will help you size your systems appropriately and write scripts that don’t cause memory problems.

Major memory consumers:

┌──────────────────────────────────────────────────────────────┐
│                ZEEK MEMORY USAGE BREAKDOWN                   │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  CONNECTION STATE (40-60% of memory)                         │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                         │
│  • Connection records for all active connections             │
│  • Protocol-specific state (HTTP transactions, DNS queries)  │
│  • Packet reassembly buffers                                 │
│                                                              │
│  SCRIPT STATE (20-40% of memory)                             │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                         │
│  • Tables and sets maintained by scripts                     │
│  • Intel framework indicator database                        │
│  • Statistical tracking structures                           │
│  • Custom data structures in your scripts                    │
│                                                              │
│  PACKET BUFFERS (10-20% of memory)                           │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                         │
│  • Buffers for incoming packets                              │
│  • Reassembly buffers for fragmented traffic                 │
│                                                              │
│  ZEEK CORE (5-10% of memory)                                 │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                         │
│  • Zeek executable and libraries                             │
│  • Event engine data structures                              │
│  • Protocol analyzers                                        │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Memory usage estimation:

Here’s a rough formula for estimating Zeek’s memory requirements:

Base Memory = 2 GB (Zeek core + base scripts)

Connection Memory = (Active Connections) × (Memory per Connection)
                  = (Active Connections) × ~10 KB

Script Memory = Variable, depends on your detection scripts
              = 1-4 GB for typical deployments

Total = Base + Connection Memory + Script Memory

Example calculations:

Network TypeConcurrent ConnectionsEstimated Memory
Small office (100 Mbps)~5,0002 + (5K × 10KB) = ~2.5 GB
Medium enterprise (1 Gbps)~50,0002 + (50K × 10KB) = ~3 GB
Large enterprise (10 Gbps)~500,0002 + (500K × 10KB) = ~7 GB
Data center (100 Gbps)~5,000,0002 + (5M × 10KB) = ~50 GB

Add 50-100% headroom to these estimates for safety and to account for script memory usage.

Performance Tuning: Getting the Most from Your Hardware

Zeek’s performance depends on several factors: hardware capabilities, traffic characteristics, configuration choices, and script complexity. Let’s explore the key performance considerations and how to optimize them.

CPU Considerations

Zeek is CPU-intensive, particularly for protocol parsing and script execution. CPU speed and architecture significantly impact performance.

CPU requirements scale with:

  • Traffic volume (more packets = more processing)
  • Traffic complexity (application protocols require more parsing than simple TCP)
  • Script complexity (sophisticated detection logic uses more CPU)
  • Enabled features (file extraction, statistical analysis add overhead)

CPU optimization strategies:

StrategyImpactWhen to Use
Higher clock speedSignificantSingle-worker deployments
More coresSignificantCluster deployments
Modern CPU architectureModerateNew deployments
Disable unused analyzersModerateSpecialized monitoring
Optimize scriptsVariableAlways

Memory Performance

Memory speed and capacity affect Zeek’s ability to maintain state for large numbers of connections.

Memory optimization:

  • Use fast RAM (DDR4-3200 or better)
  • Ensure sufficient capacity to avoid swapping (swapping kills performance)
  • Consider memory channels (more channels = better bandwidth)

Connection timeout tuning:

One of the most effective ways to control memory usage is adjusting connection timeouts. Zeek expires connection state for inactive connections based on configured timeouts:

# Default timeouts (in /usr/local/zeek/share/zeek/base/init-bare.zeek)
redef tcp_inactivity_timeout = 5 min;  # TCP connections
redef udp_inactivity_timeout = 1 min;  # UDP "connections"
redef icmp_inactivity_timeout = 1 min; # ICMP

Tuning considerations:

┌──────────────────────────────────────────────────────────────┐
│            CONNECTION TIMEOUT TRADE-OFFS                     │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  SHORTER TIMEOUTS (e.g., 1-2 minutes)                        │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━    │
│  ✓ Lower memory usage                                        │
│  ✓ Faster state cleanup                                      │
│  ✗ May prematurely expire legitimate long connections        │
│  ✗ Could miss attacks that span long periods                 │
│                                                              │
│  LONGER TIMEOUTS (e.g., 10-15 minutes)                       │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━    │
│  ✓ Complete tracking of long-lived connections               │
│  ✓ Better detection of persistent threats                    │
│  ✗ Higher memory usage                                       │
│  ✗ Slower to reclaim memory from idle connections            │
│                                                              │
└──────────────────────────────────────────────────────────────┘

For high-volume networks where memory is constrained, shorter timeouts help. For threat hunting where you need to track persistent connections, longer timeouts are better.

Disk I/O Performance

Logging generates significant disk I/O. On a busy network, Zeek might write several gigabytes of logs per hour.

Storage recommendations:

Storage TypePerformanceUse Case
NVMe SSDExcellentHigh-volume logging, logger nodes
SATA SSDGoodMedium-volume logging
HDD RAIDAdequateArchival storage, budget deployments
Network storageVariableCentralized log storage (watch latency)

I/O optimization strategies:

  • Use fast local storage for active logs
  • Rotate logs frequently to prevent huge files
  • Compress and move old logs to slower archival storage
  • Consider separate file systems for logs vs OS

Network Performance

For cluster deployments, network connectivity between nodes affects performance.

Network requirements:

┌──────────────────────────────────────────────────────────────┐
│             CLUSTER NETWORK REQUIREMENTS                     │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  MANAGEMENT NETWORK (between cluster nodes)                  │
│  • Minimum: 1 Gbps                                           │
│  • Recommended: 10 Gbps for large clusters                   │
│  • Low latency critical for coordination                     │
│  • Separate from monitored network                           │
│                                                              │
│  MONITORING NETWORK (where packets are captured)             │
│  • Must match or exceed monitored network speed              │
│  • Often uses SPAN/TAP ports (receive-only)                  │
│  • No IP configuration needed (promiscuous mode)             │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Monitoring Zeek’s Performance

To ensure Zeek is performing well, you need to monitor its health and performance metrics.

Key metrics to track:

MetricWhat It IndicatesWarning Threshold
Packet drop rateCan’t keep up with traffic>1%
Memory usageApproaching capacity>80%
CPU usageProcessing load>80% sustained
Disk I/O waitStorage bottleneck>10%
Event queue depthScript processing lag>1000 events

Zeek provides performance statistics:

# Check for dropped packets
zeek-cut ts dropped captured < capture_loss.log

# Monitor Zeek's internal stats
zeek-cut ts mem event_queue_size < stats.log

Common performance problems and solutions:

┌──────────────────────────────────────────────────────────────┐
│           TROUBLESHOOTING PERFORMANCE ISSUES                 │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  SYMPTOM: Dropping packets                                   │
│  CAUSES: CPU overload, memory exhaustion, network bottleneck │
│  SOLUTIONS:                                                  │
│    • Add more workers to cluster                             │
│    • Optimize or disable expensive scripts                   │
│    • Increase memory                                         │
│    • Improve packet acquisition (AF_PACKET, PF_RING)         │
│                                                              │
│  SYMPTOM: High memory usage                                  │
│  CAUSES: Too many concurrent connections, memory leaks       │
│  SOLUTIONS:                                                  │
│    • Reduce connection timeouts                              │
│    • Check scripts for unbounded table growth                │
│    • Add more memory                                         │
│    • Implement table expiration policies                     │
│                                                              │
│  SYMPTOM: Event queue growing                                │
│  CAUSES: Slow event handlers blocking processing             │
│  SOLUTIONS:                                                  │
│    • Profile scripts to find slow handlers                   │
│    • Optimize expensive operations                           │
│    • Remove or disable unused scripts                        │
│    • Reduce analysis granularity                             │
│                                                              │
│  SYMPTOM: High disk I/O wait                                 │
│  CAUSES: Slow storage, excessive logging                     │
│  SOLUTIONS:                                                  │
│    • Upgrade to faster storage (SSD/NVMe)                    │
│    • Reduce logging verbosity                                │
│    • Enable log compression                                  │
│    • Use separate file system for logs                       │
│                                                              │
└──────────────────────────────────────────────────────────────┘

|TOC| |PREV| |NEXT|