Data Model - Terminology

Note: This documentation presents the data model from a User or Analyst perspective. See the Synapse Data Model technical documentation or the Synapse source code for more detailed information.

Recall that Synapse is a distributed key-value hypergraph analysis framework. That is, Synapse is a particular implementation of a hypergraph model, where an instance of a hypergraph is called a Cortex. In our brief discussion of graphs and hypergraphs, we pointed out some fundamental concepts related to the Synapse hypergraph implementation:

  • (Almost) everything is a node. There are no pairwise (“two-dimensional”) edges in a hypergraph the way there are in a directed graph. While Synapse includes some edge-like nodes (digraph nodes or “relationship” nodes) in its data model, but they are still nodes. (We later introduced “lightweight” (light) edges as additional edge-like constructs used for particular use cases to improve performance. But mostly everything is a node.)

  • Tags act as hyperedges. In a directed graph, an edge connects exactly two nodes. In Synapse, tags are labels that can be applied to an arbitrary number of nodes. These tags effectively act as an n-dimensional edge that can connect any number of nodes – a hyperedge.

  • (Almost) every key navigation of the graph is a pivot. Since there are no pairwise edges in a hypergraph, you can’t query or explore the graph by traversing its edges. Instead, navigation primarily consists of pivoting from the properties of one set of nodes to the properties of another set of nodes. (Since tags are hyperedges, there are ways to lift by or “pivot through” tags to effectively perform “hyperedge traversal”; and it is possible to traverse Synapse’s light edges. But most navigation is via pivots.)

To start building on those concepts, you need to understand the basic elements of the Synapse data model. The fundamental terms and concepts you should be familiar with are:

Synapse uses a query language called Storm (see Storm Reference - Introduction) to interact with data in the hypergraph. Storm allows a user to lift, filter, and pivot across data based on node properties, values, and tags. Understanding these model structures will significantly improve your ability to use Storm and interact with Synapse data.

Type

A type is the definition of a data element within the Synapse data model. A type describes what the element is and enforces how it should look, including how it should be normalized, if necessary, for both storage (including indexing) and representation (display).

The Synapse data model includes standard types such as integers and strings, as well as common types defined within or specific to Synapse, including globally unique identifiers (guid), date/time values (time), time intervals (ival), and tags (syn:tag). Many objects (Form) within the Synapse data model are built upon (extensions of) a subset of common types.

In addition, knowledge domain-specific objects may themselves be specialized types. For example, an IPv4 address (inet:ipv4) is its own specialized type. While an IPv4 address is ultimately stored as an integer, the type has additional constraints (i.e., to ensure that IPv4 objects in the Cortex can only be created using integer values that fall within the allowable IPv4 address space). These constraints may be defined by a constructor (ctor) that defines how a property of that type can be created (constructed).

Users typically will not interact with types directly; they are primarily used “behind the scenes” to define and support the Synapse data model. From a user perspective, it is important to keep the following points in mind for types:

  • Every element in the Synapse data model must be defined as a type. Synapse uses forms to define the objects that can be represented (modeled) within a Synapse hypergraph. Forms have properties (primary and secondary) and every property must be explicitly defined as a particular type.

  • Type enforcement is essential to Synapse’s functionality. Type enforcement means every property is defined as a type, and Synapse enforces rules for how elements of that type can (or can’t) be created. This means that elements of the same type are always created, stored, and represented in the same way which ensures consistency and helps prevent “bad data” from getting into a Cortex.

  • Type awareness facilitates interaction with a Synapse hypergraph. Synapse and the Storm query language are “model aware” and know which types are used for each property in the model. At a practical level this allows users to use a more concise syntax when using the Storm query language because in many cases the query parser “understands” which navigation options make sense, given the types of the properties used in the query. It also allows users to use wildcards to pivot (see Storm Reference - Pivoting) without knowing the “destination” forms or nodes - Synapse “knows” which forms can be reached from the current set of data based on types.

  • It is still possible to navigate (pivot) between elements of different types that have the same value. Type enforcement simplifies pivoting, but does not restrict you to only pivoting between properties of the same type. For example, a Windows registry value may be a string type (type str), but that string may represent a file path (type file:path). While the Storm query parser would not automatically “recognize” that as a valid pivot (because the property types differ), it is possible to explicitly tell Storm to pivot from a specific file:path node to any registry value nodes whose string property value (it:dev:regval:str) matches that path.

Type-Specific Behavior

Synapse implements various type-specific optimizations to improve performance and functionality. Some of these are “back end” optimizations (i.e., for indexing and storage) while some are more “front end” in terms of how users interact with data of certain types via Storm. See Storm Reference - Type-Specific Storm Behavior for additional detail.

Viewing or Working with Types

Types (both base and model-specific) are defined within the Synapse source code. An auto-generated dictionary (from current source code) of Types (Base Types and Types) can be found in the online documentation.

Types can also be viewed within a Cortex. A full list of current types can be displayed with the following Storm command:

storm> syn:type

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Type Example

The data associated with a type’s definition is displayed slightly differently between the Synapse source code, the auto-generated online documents, and from the Storm command line. Users wishing to review type structure or other elements of the Synapse data model are encouraged to use the source(s) that are most useful to them.

The example below shows the type for a fully qualified domain name (inet:fqdn) as it is represented in the Synapse source code, the online documents, and from the Storm CLI.

Source Code

('inet:fqdn', 'synapse.models.inet.Fqdn', {}, {
  'doc': 'A Fully Qualified Domain Name (FQDN).',
  'ex': 'vertex.link'}),

Auto-Generated Online Documents

inet:fqdn A Fully Qualified Domain Name (FQDN). It is implemented by the following class: synapse.models.inet.Fqdn.

A example of inet:fqdn:

  • vertex.link

Storm

storm> syn:type=inet:fqdn
syn:type=inet:fqdn
        :ctor = synapse.models.inet.Fqdn
        :doc = A Fully Qualified Domain Name (FQDN).

Form

A form is the definition of an object in the Synapse data model. A form acts as a “template” that tells you how to create an object (Node). While the concepts of form and node are closely related, it is useful to maintain the distinction between the template for creating an object (form) and an instance of a particular object (node). inet:fqdn is a form; inet:fqdn = woot.com (<form> = <valu>) is a node.

A form consists of the following:

  • A primary property. The primary property of a form must be selected / defined such that the value of that property is unique across all possible instances of that form. A form’s primary property must be defined as a specific type. In many cases, a form will have its own type definition - for example, the form inet:fqdn is of type inet:fqdn. All forms are types (that is, must be defined as a type) although not all types are forms.

  • Optional secondary properties. If present, secondary properties must also have a defined type, as well as any additional constraints on the property, such as:

    • Whether a property is read-only once set.

    • Any normalization (outside of type-specific normalization) that should occur for the property (such as converting a string to all lowercase, stripping any whitespace, etc.).

Secondary properties are form-specific and are explicitly defined for each form. However, Synapse also supports a set of universal secondary properties (universal properties) that are valid for all forms.

Property discusses these concepts in greater detail.

While types underlie the data model and are generally not used directly by analysts, forms comprise the essential “structure” of the data analysts work with. Understanding (and having a good reference) for form structure and options is essential for working with Synapse data.

Form Namespace

The Synapse data model uses a structured, hierarchical namespace for forms. Each form name consists of at least two namespace elements separated by a colon ( : ). For example:

  • file:bytes

  • inet:email

  • inet:fqdn

  • ou:org

The first element in the namespace represents a rough “category” for the form (i.e., inet for Internet-related objects). The Synapse data model is meant to be extensible to support any analytical discipline, from threat intelligence to business analytics and beyond. The ability to group portions of the data model into related categories makes an extensive model easier to manage, and also allows Synapse users to leverage or focus on those sub-portions of the model most relevant to them.

The second and / or subsequent elements in the form name define the specific “subcategory” or “thing” within the form’s primary category (e.g., inet:fqdn to represent a fully qualified domain name (FQDN) within the “Internet” (inet) category, or inet:dns:query to represent a query using the DNS protocol within the “Internet” category, etc.)

Properties have a namespace the leverages and extends the form namespace (note that form names are also primary properties). See Property and Property Namespace below for additional detail.

Viewing or Working with Forms

Like types, forms are defined within the Synapse source code and include a base set of forms intended to be generic across any data model, as well as a number of model-specific (knowledge domain-specific) forms. An auto-generated dictionary (from current source code) of Forms can be found in the online documentation.

Forms can also be viewed within a Cortex. A full list of current forms can be displayed with the following Storm command:

storm> syn:form

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Form Example

The data associated with a form’s definition is displayed slightly differently between the Synapse source code, the auto-generated online documents, and from the Storm command line. Users wishing to review form structure or other elements of the Synapse data model are encouraged to use the source(s) that are most useful.

The example below shows the form for a fully qualified domain name (inet:fqdn) as it is represented in the Synapse source code, the online documents, and from Storm. Note that the output displayed via Storm includes universal properties (.seen, .created), where the static source code (and the documents generated from it) do not. Universal properties are defined separately within the Synapse source and have their own section (Universal Properties) in the auto-generated online documents.

Source Code

('inet:fqdn', {}, (
   ('domain', ('inet:fqdn', {}), {
      'ro': True,
      'doc': 'The parent domain for the FQDN.',
   }),
   ('host', ('str', {'lower': True}), {
      'ro': True,
      'doc': 'The host part of the FQDN.',
   }),
   ('issuffix', ('bool', {}), {
      'doc': 'True if the FQDN is considered a suffix.',
   }),
   ('iszone', ('bool', {}), {
       'doc': 'True if the FQDN is considered a zone.',
   }),
   ('zone', ('inet:fqdn', {}), {
      'doc': 'The zone level parent for this FQDN.',
   }),
))

Auto-Generated Online Documents

inet:fqdn A Fully Qualified Domain Name (FQDN).

The base type for the form can be found at inet:fqdn.

An example of inet:fqdn:

  • vertex.link

Properties:

:domain / inet:fqdn:domain

The parent domain for the FQDN. It has the following property options set:

  • Read Only: True

The property type is inet:fqdn.

:host / inet:fqdn:host

The host part of the FQDN. It has the following property options set:

  • Read Only: True

The property type is str. Its type has the following options set:

  • lower: True

:issuffix / inet:fqdn:issuffix

True if the FQDN is considered a suffix.

The property type is bool.

:iszone / inet:fqdn:iszone

True if the FQDN is considered a zone.

The property type is bool.

:zone / inet:fqdn:zone

The zone level parent for this FQDN.

The property type is inet:fqdn.

Storm

Form (inet:fqdn) alone:

storm> syn:form=inet:fqdn
syn:form=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :runt = False
        :type = inet:fqdn

Form with secondary properties:

storm> syn:prop:form=inet:fqdn
syn:prop=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :extmodel = False
        :form = inet:fqdn
        :type = inet:fqdn
        :univ = False
syn:prop=inet:fqdn.seen
        :base = .seen
        :doc = The time interval for first/last observation of the node.
        :extmodel = False
        :form = inet:fqdn
        :relname = .seen
        :ro = False
        :type = ival
        :univ = False
syn:prop=inet:fqdn.created
        :base = .created
        :doc = The time the node was created in the cortex.
        :extmodel = False
        :form = inet:fqdn
        :relname = .created
        :ro = True
        :type = time
        :univ = False
syn:prop=inet:fqdn:domain
        :base = domain
        :doc = The parent domain for the FQDN.
        :extmodel = False
        :form = inet:fqdn
        :relname = domain
        :ro = True
        :type = inet:fqdn
        :univ = False
syn:prop=inet:fqdn:host
        :base = host
        :doc = The host part of the FQDN.
        :extmodel = False
        :form = inet:fqdn
        :relname = host
        :ro = True
        :type = str
        :univ = False
syn:prop=inet:fqdn:issuffix
        :base = issuffix
        :doc = True if the FQDN is considered a suffix.
        :extmodel = False
        :form = inet:fqdn
        :relname = issuffix
        :ro = False
        :type = bool
        :univ = False
syn:prop=inet:fqdn:iszone
        :base = iszone
        :doc = True if the FQDN is considered a zone.
        :extmodel = False
        :form = inet:fqdn
        :relname = iszone
        :ro = False
        :type = bool
        :univ = False
syn:prop=inet:fqdn:zone
        :base = zone
        :doc = The zone level parent for this FQDN.
        :extmodel = False
        :form = inet:fqdn
        :relname = zone
        :ro = False
        :type = inet:fqdn
        :univ = False

Node

A node is a unique object within the Synapse hypergraph. In Synapse nodes represent standard objects (“nouns”) such as IP addresses, files, people, conferences, airplanes, or software packages. They can also represent more abstract objects such as industries, risks, attacks, or goals. However, in Synapse nodes also represent relationships (“verbs”) because things that would be edges in a directed graph are generally nodes in a Synapse hypergraph. It may be better to think of a node generically as a “thing” - any “thing” you want to model within Synapse (entity, relationship, event) is represented as a node.

Every node consists of the following components:

  • A primary property that consists of the Form of the node plus its specific value (<form> = <valu>). All primary properties must be unique for a given form. For example, the primary property of the node representing the domain woot.com would be inet:fqdn = woot.com. The uniqueness of the <form> = <valu> pair ensures there can be only one node in Synapse that represents the domain woot.com. Because this unique pair “defines” the node, the comma-separated form / value combination (<form>,<valu>) is also known as the node’s ndef (short for “node definition”).

  • One or more universal properties. As the name implies, universal properties are applicable to all nodes.

  • Optional secondary properties. Similar to primary properties, secondary properties consist of a property name defined as a specific type, and the property’s associated value for the node (<prop> = <pval>). Secondary properties are specific to a given node type (form) and provide additional detail about that particular node.

  • Optional tags. A Tag acts as a label with a particular meaning that can be applied to a node to provide context. Tags are discussed in greater detail below.

Viewing or Working with Nodes

To view or work with nodes, you must have a Cortex that contains nodes (data). Users interact with data in Synapse using the Storm query language (Storm Reference - Introduction).

Node Example

The Storm query below lifts and displays the node for the domain google.com:

storm> inet:fqdn=google.com
inet:fqdn=google.com
        :domain = com
        :host = google
        :issuffix = False
        :iszone = True
        :zone = google.com
        .created = 2022/04/28 12:34:02.933
        #rep.majestic.1m

In the output above:

  • inet:fqdn = google.com is the primary property (<form> = <valu>).

  • While not explicitly displayed, the node’s ndef would be inet:fqdn,google.com.

  • .created is a universal property showing when the node was added to the Cortex.

  • :domain, :host, etc. are form-specific secondary properties with their associated values (<prop> = <pval>). For readability, secondary properties are displayed as relative properties within the namespace of the form’s primary property (e.g., :iszone as opposed to inet:fqdn:iszone).

  • #rep.majestic.1m is a tag indicating that google.com has been reported by web analytics company Majestic in their top million most-linked domains.

Property

Properties are the individual elements that define a Form or (along with their specific values) that comprise a Node.

Primary Property

Every Form consists of (at minimum) a primary property that is defined as a specific Type. Every Node consists of (at minimum) a primary property (its form) plus the node-specific value of the primary property (<form> = <valu>). In defining a form for a particular object (node), the primary property must be defined such that its value is unique across all possible instances of that form.

The concept of a unique primary property is straightforward for forms that represent simple objects; for example, the “thing” that makes an IP address unique is the IP address itself: inet:ipv4 = 1.2.3.4. Defining an appropriate primary property for more complex multidimensional nodes (such as those representing a Relationship or an Event) can be more challenging.

Because a primary property uniquely defines a node, it cannot be modified once the node is created. To “change” a node’s primary property you must delete and re-create the node.

Secondary Property

A Form can include optional secondary properties that provide additional detail about the form. As with primary properties, each secondary property must be defined as an explicit Type. Similarly, a Node includes optional secondary properties (as defined by the node’s form) along with their specific values (<prop> = <pval>).

Secondary properties are characteristics that do not uniquely define a form, but may further describe or distinguish a given form and its associated nodes. For example, the Autonomous System (AS) that an IP address belongs to does not “define” the IP (and in fact an IP’s associated AS can change), but it provides further detail about the IP address.

Many secondary properties are derived from a node’s primary property (derived properties) and are automatically set when the node is created. For example, creating the node file:path='c:\windows\system32\cmd.exe' will automatically set the properties :base = cmd.exe, :base:ext = exe, and :dir = c:/windows/system32. Because a node’s primary property cannot be changed once set, any secondary properties derived from the primary property cannot be changed (i.e., are read-only) as well. Non-derived secondary properties can be set, modified, or even deleted.

Universal Property

Most secondary properties are form-specific, providing additional detail about individual objects within the data model. However, Synapse defines a subset of secondary properties as universal properties that are applicable to all forms within the Synapse data model. Universal properties include:

  • .created, which is set for all nodes and whose value is the date / time that the node was created within a Cortex.

  • .seen, which is optional for all nodoes and whose value is a time interval (minimum or “first seen” and maximum or “last seen”) during which the node was observed, existed, or was valid.

Property Namespace

Properties exist within and extend the Form Namespace. Forms (form names) are primary properties, and consist of at least two elements separated by a colon ( : ). Secondary properties extend and exist within the namespace of their primary property (form). Secondary properties are preceded by a colon ( : ) and use the colon to separate additional namespace elements, if needed. (Universal properties are preceded by a period ( . ) to distinguish them from form-specific secondary properties.) For example, the secondary (both universal and form-specific) properties of inet:fqdn include:

  • inet:fqdn.created (universal property)

  • inet:fqdn:zone (secondary property)

Secondary properties also comprise a relative namespace / set of relative properties with respect to their primary property (form). In many cases the Storm query language allows (or requires) you to reference a secondary property using its relative property name where the context of the relative namespace is clear (i.e., :zone vs. inet:fqdn:zone).

Relative properties are also used for display purposes within Synapse for visual clarity (see the Node Example above).

In some cases secondary properties may have their own “namespace”. Viewed another way, while both primary and secondary properties use colons to separate elements of the property name, not all separators represent property “boundaries”; some act more as name “sub-namespace” separators. For example file:bytes is a primary property / form. A file:bytes form may include secondary properties such as :mime:pe:imphash and :mime:pe:complied. In this case :mime and :mime:pe are not themselves secondary properties, but sub-namespaces for individual MIME data types and the “PE executable” data type specifically.

Viewing or Working with Properties

As Properties are used to define Forms, they are defined within the Synapse source code with their respective Forms. Universal properties are not defined “per-form” but have their own section (Universal Properties) in the online documentation.

Properties can also be viewed within a Cortex. A full list of current properties can be displayed with the following Storm command:

storm> syn:prop

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Property Example

The data associated with a property’s definition is displayed slightly differently between the Synapse source code, the auto-generated online documents, and from the Storm command line. Users wishing to review property structure or other elements of the Synapse data model are encouraged to use the source(s) that are most useful to them.

As primary properties are forms and secondary properties (with the exception of universal properties) are form-specific, properties can be viewed within the Synapse source code and online documentation by viewing the associated Forms.

Within Storm, it is possible to view individual primary or secondary properties as follows:

Storm

Primary property:

storm> syn:prop=inet:fqdn
syn:prop=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :extmodel = False
        :form = inet:fqdn
        :type = inet:fqdn
        :univ = False

Secondary property:

storm> syn:prop=inet:fqdn:domain
syn:prop=inet:fqdn:domain
        :base = domain
        :doc = The parent domain for the FQDN.
        :extmodel = False
        :form = inet:fqdn
        :relname = domain
        :ro = True
        :type = inet:fqdn
        :univ = False

Tag

Tags are annotations applied to nodes. Simplistically, they can be thought of as labels that provide context to the data represented by the node.

Broadly speaking, within Synapse:

  • Nodes represent things: objects, relationships, or events. In other words, nodes typically represent observables that are verifiable and largely unchanging.

  • Tags typically represent assessments: observations or judgements that could change if the data or the analysis of the data changes.

For example, an Internet domain is an “observable thing” - a domain exists, was registered through a domain registrar, and can be created as a node such as inet:fqdn = woot.com. Whether a domain has been sinkholed (i.e., where a supposedly malicious domain is taken over or re-registered by a researcher to identify potential victims attempting to resolve the domain) is an assessment. A researcher may need to evaluate data related to that domain (such as domain registration records or current and past IP resolutions) to decide whether the domain appears to be sinkholed. This assessment can be represented by applying a tag such as #cno.infra.sink.holed to the inet:fqdn = woot.com node.

Tags are unique within the Synapse model because tags are both nodes and labels applied to nodes. Tags are nodes based on a form (syn:tag, of type syn:tag) defined within the Synapse data model. That is, the tag #cno.infra.sink.holed can be applied to another node; but the tag itself also exists as the node syn:tag = cno.infra.sink.holed. This difference is illustrated in the example below.

Tags are introduced here but are discussed in greater detail in Analytical Model - Tag Concepts.

Viewing or Working with Tags

As tags are nodes (data) within the Synapse data model, they can be viewed and operated upon just like other data in a Cortex. Users typically interact with Cortex data using the Storm query language (Storm Reference - Introduction).

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Tag Example

The Storm query below displays the node for the tag cno.infra.sink.holed:

storm> syn:tag=cno.infra.sink.holed
syn:tag=cno.infra.sink.holed
        :base = holed
        :depth = 3
        :doc = A domain (zone) that has been sinkholed.
        :title = Sinkholed domain
        :up = cno.infra.sink
        .created = 2022/04/28 12:34:03.075

The Storm query below displays the tag #cno.infra.sink.holed applied to the node inet:fqdn = hugesoft.org:

storm> inet:fqdn=hugesoft.org
inet:fqdn=hugesoft.org
        :domain = org
        :host = hugesoft
        :issuffix = False
        :iszone = True
        :zone = hugesoft.org
        .created = 2022/04/28 12:34:03.107
        #cno.infra.sink.holed = (2014/01/11 00:00:00.000, 2018/03/30 00:00:00.000)
        #rep.feye.apt1

Note that a tag applied to a node uses the “hashtag” symbol ( # ). This is a visual cue to distinguish tags on a node from the node’s secondary properties. The symbol is also used within the Storm syntax to reference a tag as opposed to a syn:tag node.