Data Model - Object Categories

Recall that within the Synapse data model:

Nodes commonly represent “things”: observables that can be verified and are unlikely to change over time.
Tags commonly represent “assessments”: judgements or evaluations that may change given new data or revised analysis.

Within Synapse, forms are the building blocks of our analysis system. Forms are used to create nodes, which are the objects used to represent (model) knowledge and answer analytical questions about the captured information. This means that the proper design of forms is essential.

In Synapse’s hypergraph-based model (where almost everything is a node) forms take on additional significance. Specifically, forms must be used to represent more than just “nouns” and must be used to capture several general categories of objects. These categories can be broadly defined as entities, relationships, and events.

Entity
Relationship
Event
Instance Knowledge vs. Fused Knowledge

This section discusses the informal “categories” of objects that can be modeled in Synapse. See Data Model - Form Categories for a discussion of some of the common “categories” of forms used to represent these objects.

Entity

Forms can represent atomic entities, whether real or abstract. For cyber threat data, entities include domains, IP addresses (IPv4 or IPv6), hosts (computers / devices), usernames, passwords, accounts, files, social media posts, and so on. Other entities include people, organizations, and countries. Any entity can be defined by a form and represented by a node. Entities are often (though not always) represented as a Simple Form. The term “simple” means that these forms can be represented as a primary property with a single value that uniquely defines the entity.

Example

An email address (inet:email) is a basic example of an entity-type node / simple form:

storm> inet:[email protected]
inet:[email protected]
        :fqdn = yandex.ru
        :user = kilkys
        .created = 2023/07/12 15:05:25.988

Relationship

Forms can represent specific relationships among entities. In a directed graph a relationship is represented as a directed edge joining exactly two nodes; but in a hypergraph the entire relationship is represented by a single node (form), and the relationship may consist of any number of elements – not just two.

For cyber threat data, relationships include a domain resolving to an IP address or a malware dropper containing or extracting another file. Other types of relationships include a company being a subsidiary of another business, an employee working for a company, or a person being a member of a group.

Relationship-type forms are often represented as a Composite (Comp) Form. Comp forms have a primary property consisting of a comma-separated list of two or more values that uniquely define the relationship.

Example

A DNS A record (inet:dns:a) is a basic example of a relationship-type form / comp form:

storm> inet:dns:a=(google.com,172.217.9.142)
inet:dns:a=('google.com', '172.217.9.142')
        :fqdn = google.com
        :ipv4 = 172.217.9.142
        .created = 2023/07/12 15:05:26.064

Event

Forms can represent individual time-based occurrences. The term event implies that an entity existed or a relationship occurred at a specific point in time. Events represent the combination of a form and a timestamp for when the form was observed.

Examples of event forms include an individual login to an account, a specific DNS query, or a domain registration (whois) record captured on a specific date.

The structure of an event form may vary depending on the specific event being modeled. For “simple” events that can be uniquely represented by the combination of a timestamp and an entity, the form may be a Composite (Comp) Form that happens to include a timestamp as one element of the form’s value. For exaple, an inet:whois:rec form consists of a whois record and the time that record was observed or retrieved.

For more “multi-dimensional” events involving several components, the form may be a Guid Form with the timestamp as one of several secondary properties on the form (i.e., as in an inet:dns:request form).

Example

A specific, individual DNS query (inet:dns:request) is an example of an event-type form:

storm> inet:dns:request=00000a17dbe261d10ce6ed514872bd37
inet:dns:request=00000a17dbe261d10ce6ed514872bd37
        :query = ('tcp://199.68.196.162', 'download.applemusic.itemdb.com', '1')
        :query:name = download.applemusic.itemdb.com
        :query:name:fqdn = download.applemusic.itemdb.com
        :query:type = 1
        :reply:code = 0
        :server = tcp://178.62.239.55
        :time = 2018/09/30 16:01:27.506
        .created = 2023/07/12 15:05:26.127

Instance Knowledge vs. Fused Knowledge

For certain types of data, event forms and relationship forms can encode similar information but represent the difference between instance knowledge and fused knowledge.

Event forms represent the specific point-in-time existence of an entity or occurrence of a relationship - an instance of that knowledge.
Relationship forms can leverage the universal .seen property to set “first observed” and “last observed” times during which an entity existed or a relationship was true. This date range can be viewed as fused knowledge - knowledge that summarizes or “fuses” the data from many individual observations (instances) of the node over time.

Instance knowledge and fused knowledge represent differences in data granularity. Whether to create an event form or a relationship form (or both) depends on how much detail is required for your analysis. This consideration often applies to relationships that change over time, particularly those that may change frequently.

Example

DNS A records are a good example of these differences. The IP address that a domain resolves to may change infrequently (e.g., for a website hosted on a stable server) or may change quite often (e.g., where the IP is dynamically assigned or where load balancing is used).

One option to represent and track DNS A records is to create individual timestamped forms (events) every time you check the domain’s current resolution (e.g., inet:dns:request and inet:dns:answer forms). This represents a very high degree of granularity as the nodes will record the exact time a domain resolved to a given IP, potentially down to the millisecond. The nodes can also capture additional detail such as the querying client, the responding server, the response code, and so on. However, the number of such nodes could readily reach into the hundreds of millions, if not billions, if you create nodes for every resolution of every domain you want to track.

On the other hand, it may be sufficient to know that a domain resolved to an IP address during a given period of time – a “first observed” and “last observed” (.seen) range. A single inet:dns:a node can be created to show that domain woot.com resolved to IP address 1.2.3.4, where the earliest observed resolution was 2014/08/06 at 13:56 and the most recently observed resolution was 2018/05/29 at 7:32. These timestamps can be extended (earlier or later) if additional data changes our observation boundaries.

This second approach loses some granularity:

The domain is not guaranteed to have resolved to that IP continuously throughout the entire time period.
Given only this node, we don’t know exactly when the domain resolved to the IP address during that time period, except for the earliest and most recent observations.

However, this fused knowledge may be sufficient for our needs and may be preferable to creating thousands of nodes for individual DNS resolutions.

Of course, a hybrid approach is also possible, where most DNS A record data is recorded in fused inet:dns:a nodes but it is also possible to record high-resolution, point-in-time inet:dns:request and inet:dns:answer nodes when needed.