Part 2 - Understanding Zeek’s Philosophical Foundation

PART 2: UNDERSTANDING ZEEK’S PHILOSOPHICAL FOUNDATION

Behavioural Analysis vs. Signature Matching

To truly understand what makes Zeek powerful - and why you’re investing time in learning it - you need to grasp the fundamental philosophical difference between signature-based detection and behavioural analysis. This isn’t just an academic distinction; it will shape how you think about threat hunting and how you write detection scripts throughout this course.

Traditional intrusion detection systems, and signature-based tools like Snort and Suricata, operate on a conceptually straightforward principle. They examine network traffic looking for specific patterns - signatures - that indicate known malicious activity. If a packet or series of packets matches a signature, the system generates an alert. This approach has significant advantages: it’s fast, it’s deterministic, and when properly tuned it generates relatively few false positives.

But signature-based detection has a fundamental limitation: it can only detect what it knows about. If an attacker uses a new technique, modifies existing malware to change its network signatures, or exploits a zero-day vulnerability for which no signature exists, signature-based systems will remain silent. They’re excellent at detecting known threats, but they struggle with novel or sophisticated attacks.

Zeek takes a completely different approach, one that’s rooted in behavioural analysis. Rather than looking for specific malicious patterns, Zeek focuses on understanding what’s actually happening on the network. It doesn’t ask “Does this traffic match a known bad pattern?” Instead, it asks “What protocols are being used? What are the characteristics of these connections? Are there unusual patterns in timing, volume, or behavior?”

Let me give you a concrete example that illustrates this distinction. Imagine an attacker has deployed a Remote Access Trojan (RAT) on one of your network endpoints. The RAT needs to communicate with its server to receive instructions and send back stolen data.

A signature-based system would look for known indicators of this specific RAT. Perhaps previous analysis of the malware revealed that it uses a particular HTTP User-Agent string, or sends requests to a specific URI path like /api/v2/beacon, or includes a distinctive string in its POST data. The security vendor would create signatures for these patterns and distribute them. Your IDS would watch for those specific indicators and alert if they appear.

This works well - until the malware author makes a minor modification. They change the User-Agent string to mimic a legitimate browser. They modify the URI path to /images/logo.png. They encode the POST data differently. Suddenly, your signatures no longer match, and the RAT communicates freely without detection.

Now consider how Zeek would approach the same scenario. Rather than looking for specific strings or patterns, Zeek analyzes the behavioural characteristics of the communication. It notices that one internal host has established an HTTP connection to an external IP address, and this connection exhibits unusual characteristics. The requests happen with remarkable regularity - one every sixty seconds, with very low jitter. The size of the requests is unusually consistent, always around 147 bytes. The connection persists for hours or days. The bidirectional byte ratio suggests a command-and-response pattern rather than normal web browsing.

None of these individual characteristics might be definitive proof of malicious activity, but together they form a behavioural signature that strongly suggests command-and-control traffic. More importantly, this behavioural analysis works regardless of what specific strings appear in the traffic. The attacker can change their User-Agent, modify their URI paths, and encode their data however they want, but as long as they maintain this periodic beaconing behaviour - which is fundamental to how the RAT operates - Zeek can detect it.

This is the power of behavioural analysis, and it’s why Zeek has become an indispensable tool for threat hunting. When you’re searching for sophisticated adversaries or novel threats, you need to look beyond signatures. You need to understand the behavioral patterns that indicate malicious activity, even when those patterns don’t match any known signatures.

The Event-Driven Architecture

Zeek’s approach to behavioural analysis is enabled by its event-driven architecture, which is quite different from how traditional packet inspection systems work. Understanding this architecture is crucial because it will directly affect how you write detection scripts later in this course.

In a traditional packet-based intrusion detection system, the analysis flow is relatively straightforward. The system captures packets from the network, and for each packet, it runs through a series of signature checks. If any signature matches, an alert is generated. The system maintains minimal state between packets - it’s essentially asking “Does this packet match a known bad pattern?” for each packet independently.

Zeek works completely differently. Rather than operating at the packet level, Zeek operates at the protocol and connection level through an event-driven model. Here’s how it works: as packets arrive, Zeek’s protocol analyzers parse them and reconstruct the higher-level protocols being used. When significant protocol-level events occur - a connection is established, an HTTP request is made, a DNS query is issued, a file transfer begins - Zeek generates events.

These events are the fundamental building blocks of Zeek’s analysis capabilities. An event represents something meaningful happening on the network at the protocol level. For example, when someone browses to a website, Zeek doesn’t just see a series of TCP packets. It sees a connection establishment event, followed by HTTP request events, followed by HTTP response events, and eventually a connection termination event. Each of these events carries rich contextual information about what’s happening.

Your detection scripts operate by responding to these events. Rather than examining individual packets, you write functions that are called when specific events occur. For instance, you might write a function that’s called every time an HTTP request event fires. This function receives detailed information about the request - the URL being accessed, the User-Agent header, the size of the request, the time it occurred, and much more. Based on this information, your script can make intelligent decisions about whether the activity is suspicious.

The event-driven model provides several crucial advantages for threat hunting. First, it gives you access to high-level protocol information without needing to manually parse packets. Zeek has already done the hard work of extracting HTTP headers, DNS query names, SSL certificate details, and hundreds of other protocol elements. Second, it allows you to maintain state across multiple related events. You can track characteristics of a connection over its entire lifetime, building up a behavioural profile that wouldn’t be possible if you were only examining individual packets. Third, it enables sophisticated analysis based on timing and patterns. You can detect that a host is making DNS queries at regular intervals, or that a connection is exhibiting unusual periodicity in its communications.

Protocol Context and Connection State

One of Zeek’s most powerful capabilities is its ability to maintain rich protocol context and connection state. This is intimately tied to the event-driven architecture, but it deserves special attention because it’s what enables much of the sophisticated analysis you’ll be doing later.

When Zeek observes network traffic, it doesn’t just pass events to your scripts and then forget about them. It maintains detailed state about ongoing connections, tracking everything from basic TCP connection parameters to application-layer protocol details. This state information is made available to your scripts through data structures that represent connections and the protocols they’re using.

Consider an HTTP connection as an example. When Zeek sees HTTP traffic, it doesn’t just generate isolated events for each request and response. It maintains a complete understanding of the HTTP session. It tracks all the requests made over the connection, all the responses received, the headers exchanged, the content types, the methods used, the status codes returned, and much more. When your script responds to an HTTP event, it has access to this complete context. You can ask questions like “Is this the first request on this connection, or have there been previous requests?” or “What was the User-Agent header from the initial request in this session?” or “Has this connection been reused for multiple requests?”

This contextual awareness is what enables behavioural analysis that would be impossible with simpler packet-level inspection. You can detect subtle anomalies like connections that claim to be HTTP but exhibit unusual characteristics, or sessions where the client and server behavior doesn’t match expected patterns for the protocols they’re supposedly using.

The same principle applies to other protocols. For DNS, Zeek tracks queries and their responses, building a complete picture of DNS activity for each host. For SSL/TLS, it extracts and validates certificates, tracks cipher suites, and identifies potential security issues. For file transfers, it can extract the files themselves and compute hashes for malware analysis.

This level of protocol awareness and state tracking is what sets Zeek apart from simpler packet capture tools. It provides you with a high-level, semantically meaningful view of network activity that forms the perfect foundation for behavioural analysis and threat hunting.