Data Model - Terminology

Note

This section describes the Synapse data model from a conceptual user perspective. See the Synapse Data Model technical documentation for information that may be more useful for developers.

Synapse is a distributed key-value hypergraph analysis framework. That is, Synapse is a particular implementation of a Hypergraph model; an instance of a Synapse hypergraph is called a Cortex. In our brief discussion of graphs and hypergraphs, we pointed out some fundamental concepts related to Synapse’s implementation:

  • (Almost) everything is a node. Most of Synapse’s data model consists of nodes; there are a limited number of pairwise (“two-dimensional”) edges in Synapse. We use “lightweight” (light) edges to support specific use cases, but mostly everything is a node.

  • Tags act as hyperedges. In a directed graph, an edge connects exactly two nodes. In Synapse, tags are labels that can be applied to an arbitrary number of nodes. These tags effectively act as an n-dimensional edge (a hyperedge) that connects or groups any number of nodes.

  • (Almost) every key navigation of the graph is a pivot. Because Synapse’s data model primarily consists of nodes, you generally don’t explore Synapse’s data by traversing edges. Instead, you pivot from the properties of one set of nodes to the properties of another set of nodes (though you also traverse the occasional light edge).

To build on those concepts, you need to understand the basic elements of the Synapse data model. The fundamental terms and objects you should be familiar with are:

Tip

Synapse uses a query language called Storm (see Storm Reference - Introduction) to interact with data and tags. Storm allows a user to lift, filter, pivot, traverse, and modify data based on node properties, values, tags, and light edges. Understanding these model elements will improve your ability to use Storm and interact with Synapse data.

Type

A type is the definition of a data element within the Synapse data model. A type describes what the element is and enforces how it should look, including how it should be normalized, if necessary, for both storage (including indexing) and representation (display).

The Synapse data model includes standard types such as integers and strings, as well as types defined within Synapse such as globally unique identifiers (guid), date/time values (time), time intervals (ival), and tags (syn:tag).

Knowledge domain-specific objects may also be specialized types. For example, an IPv4 address (inet:ipv4) is its own type. An IPv4 address is stored as an integer, but the type has additional constraints (i.e., to ensure that IPv4s created in Synapse only use integer values that fall within the allowable IPv4 address space). These constraints may be defined by a constructor (ctor) that specifies how a property of that type can be created (constructed).

For the most part, users do not interact with types directly. Types are primarily used “behind the scenes” to define and support the Synapse data model. From a user perspective:

  • Strong typing means every element in Synapse has a type. Forms define the objects that can be represented in Synapse. Forms have properties (primary and secondary) and every property is defined as a particular type.

  • Type enforcement helps prevent “bad data” from getting into Synapse. Synapse has “rules” for how properties of a given type can be created. This prevents simple errors like entering an email address value where you need an FQDN, but also ensures (to the extent possible) that values “make sense” for their type (e.g., that a URL looks reasonably like a URL).

  • Type awareness makes it easier to navigate data within Synapse. Synapse and Storm are “model aware” and know which types are used for each property in the model. This simplifies exploring or pivoting across data because Synapse and Storm automatically recognize relationships where different forms share properties with the same type and same value. This makes navigation easier in general, but also allows Synapse to show you relationships you may not know exist.

Type-Specific Behavior

Synapse includes various type-specific optimizations to improve performance and functionality. Some of these are “back end” optimizations (i.e., for indexing and storage) while some are more “front end” in terms of how users can interact with data of certain types via Storm. See Storm Reference - Type-Specific Storm Behavior for additional detail.

Viewing or Working with Types

Types are defined within the Synapse source code. An auto-generated dictionary of Types (Base Types and Types) can be found in the online documentation.

Types can also be viewed within Synapse. A full list of current types can be displayed with the following Storm command:

storm> syn:type

You can view detail about a specific type as follows:

storm> syn:type=inet:fqdn
syn:type=inet:fqdn
        :ctor = synapse.models.inet.Fqdn
        :doc = A Fully Qualified Domain Name (FQDN).

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Form

A form is the definition of an object in the Synapse data model. A form acts as a “template” that tells you how to create a particular object (Node). While the concepts of form and node are closely related, it is useful to maintain the distinction between the template for creating an object (a form) and an instance of a particular object (a node). inet:fqdn is a form; inet:fqdn = woot.com is a node.

A form consists of the following:

  • A primary property. The primary property must be unique across all possible instances of that form. In addition, the primary property must have a specific type. In many cases, a form will be its own type - for example, the inet:fqdn form has a type of inet:fqdn. While all forms are types (that is, must be defined as a type), not all types have associated forms.

  • Optional secondary properties. Secondary properties must also have a type. Properties may have additional constraints, such as:

    • Whether the property is read-only once set.

    • Any normalization (outside of type-specific normalization) that should occur for the property (such as converting a string to all lowercase, stripping any whitespace, etc.).

Secondary properties are form-specific and are explicitly defined for each form. Synapse also supports a set of universal secondary properties (universal properties) that are valid for all forms.

Property discusses these concepts in greater detail.

Forms comprise the essential “structure” of the data that analysts work with. Understanding the forms Synapse uses to represent various objects or concepts is key to working with Synapse data.

Form Namespace

The Synapse data model uses a structured namespace for forms. Each form name consists of at least two namespace elements separated by a colon ( : ). For example:

  • file:bytes

  • inet:email

  • inet:fqdn

  • ou:org

The first element in the namespace represents a rough “category” for the form (i.e., inet for Internet-related objects). The Synapse data model is meant to be extensible. The ability to group portions of the data model into related categories makes a large model easier to manage, and also allows Synapse users to focus on those portions of the model most relevant to them.

The second and / or subsequent elements in the form name define the specific “subcategory” or “thing” within the form’s primary category (e.g., inet:fqdn represents a fully qualified domain name (FQDN) within the “Internet” (inet) category, and inet:dns:query represents a query using the DNS protocol within the “Internet” category).

Properties have a namespace that extends the form namespace (form names are also primary properties). See Property and Property Namespace below for additional detail.

Viewing or Working with Forms

Like types, forms are defined within the Synapse source code. An auto-generated dictionary of Forms can be found in the online documentation.

Forms can also be viewed within Synapse. A full list of current forms can be displayed with the following Storm command:

storm> syn:form

You can view detail about a specific form as follows (form only):

storm> syn:form=inet:fqdn
syn:form=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :runt = false
        :type = inet:fqdn

Or a form with its secondary properties:

storm> syn:prop:form=inet:fqdn
syn:prop=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :extmodel = false
        :form = inet:fqdn
        :type = inet:fqdn
        :univ = false
syn:prop=inet:fqdn.seen
        :base = .seen
        :doc = The time interval for first/last observation of the node.
        :extmodel = false
        :form = inet:fqdn
        :relname = .seen
        :ro = false
        :type = ival
        :univ = false
syn:prop=inet:fqdn.created
        :base = .created
        :doc = The time the node was created in the cortex.
        :extmodel = false
        :form = inet:fqdn
        :relname = .created
        :ro = true
        :type = time
        :univ = false
syn:prop=inet:fqdn:domain
        :base = domain
        :doc = The parent domain for the FQDN.
        :extmodel = false
        :form = inet:fqdn
        :relname = domain
        :ro = true
        :type = inet:fqdn
        :univ = false
syn:prop=inet:fqdn:host
        :base = host
        :doc = The host part of the FQDN.
        :extmodel = false
        :form = inet:fqdn
        :relname = host
        :ro = true
        :type = str
        :univ = false
syn:prop=inet:fqdn:issuffix
        :base = issuffix
        :doc = True if the FQDN is considered a suffix.
        :extmodel = false
        :form = inet:fqdn
        :relname = issuffix
        :ro = false
        :type = bool
        :univ = false
syn:prop=inet:fqdn:iszone
        :base = iszone
        :doc = True if the FQDN is considered a zone.
        :extmodel = false
        :form = inet:fqdn
        :relname = iszone
        :ro = false
        :type = bool
        :univ = false
syn:prop=inet:fqdn:zone
        :base = zone
        :doc = The zone level parent for this FQDN.
        :extmodel = false
        :form = inet:fqdn
        :relname = zone
        :ro = false
        :type = inet:fqdn
        :univ = false

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Node

A node is a unique object within Synapse. Nodes represent standard objects (“nouns”) such as IP addresses, files, people, conferences, or airplanes. They can also represent more abstract objects such as industries, risks, attacks, or goals. However, in Synapse nodes can also represent relationships (“verbs”) because many things that would be edges in a directed graph are nodes in a Synapse hypergraph. You can think of a node generically as a “thing” - most “things” you want to model within Synapse are nodes.

Every node consists of the following:

  • A primary property, represented by the Form of the node plus its value (<form> = <valu>). All primary properties must be unique for a given form. For example, the primary property of the node for the FQDN woot.com is inet:fqdn = woot.com. The uniqueness of the <form> = <valu> pair ensures there can be only one node in Synapse that represents the domain woot.com. Because this unique pair “defines” the node, the comma-separated form / value combination (<form>,<valu>) is also known as the node’s Ndef (short for “node definition”).

  • One or more universal properties. As the name implies, universal properties are applicable to all nodes.

  • Optional secondary properties. Similar to primary properties, secondary properties consist of a property name (of a specific type) and the property’s associated value for the node (<prop> = <pval>). Secondary properties are specific to a given kind of node and provide additional detail about that particular node.

  • Optional tags. A Tag acts as a label with a particular meaning that can be applied to a node to provide context. Tags are discussed in greater detail below.

Viewing or Working with Nodes

To view or work with nodes, your instance of Synapse must contain nodes (data). Users interact with data in Synapse using the Storm query language (Storm Reference - Introduction).

Node Example

The Storm query below lifts and displays the node for the domain www.google.com:

storm> inet:fqdn=www.google.com
inet:fqdn=www.google.com
        :domain = google.com
        :host = www
        :issuffix = false
        :iszone = false
        :zone = google.com
        .created = 2023/07/12 15:06:39.484
        #rep.moz.500

In the output above:

  • inet:fqdn = www.google.com is the primary property (<form> = <valu>).

  • .created is a universal property showing when the node was added to the Cortex.

  • :domain, :host, etc. are form-specific secondary properties with their associated values (<prop> = <pval>). For readability, secondary properties are displayed as relative properties within the namespace of the form’s primary property (e.g., :iszone as opposed to inet:fqdn:iszone).

  • #rep.moz.500 is a tag indicating that www.google.com has been reported by web analytics company Moz as one of their top 500 most popular websites.

Property

Properties are the individual elements that define a Form or (with their specific values) that comprise a Node.

Primary Property

Every Form consists of (at minimum) a primary property that is defined as a specific Type. Every Node consists of (at minimum) a primary property (its form) plus the node-specific value of the primary property (<form> = <valu>). When defining a form to represent a particular “thing”, the primary property must be defined so that its value is unique across all possible instances of that “thing”.

The concept of a unique primary property is straightforward for forms that represent simple objects. For example, the “thing” that makes an IP address unique is the IP address itself: inet:ipv4 = 1.2.3.4. Defining a primary property for more complex nodes (such as those representing a Relationship or an Event) can be more challenging; these forms are often GUID forms.

Because a primary property uniquely defines a node, it cannot be modified once the node is created. To “change” a node’s primary property you must delete and re-create the node.

Secondary Property

A Form can include optional secondary properties that provide additional detail about the form. Each secondary property must be defined as an explicit Type. A Node may include secondary properties with their associated values (<prop> = <pval>).

Secondary properties may further describe a given form and its associated nodes. For example, the Autonomous System (AS) that an IP address belongs to (inet:ipv4:asn) does not “define” the IP (and in fact an IP’s associated AS can change), but it provides further detail about the IP address.

Many secondary properties are derived from a node’s primary property (derived properties) and are automatically set when the node is created. For example, creating the node file:path='c:\windows\system32\cmd.exe' will automatically set the properties :base = cmd.exe, :base:ext = exe, and :dir = c:/windows/system32. Because a node’s primary property cannot be changed once set, any secondary properties derived from the primary property also cannot be changed (i.e., are read-only). Non-derived secondary properties can be set, modified, or deleted.

Universal Property

Most secondary properties are form-specific and provide additional detail about particular objects within the data model. However, Synapse defines a subset of secondary properties as universal properties that are applicable to all forms. Universal properties include:

  • .created, which is set for all nodes and whose value is the date / time that the node was created within that instance of Synapse.

  • .seen, which is optional for all nodes and whose value is a time interval (minimum or “first seen” and maximum or “last seen”) during which the node was observed, existed, or was valid.

Property Namespace

Properties extend the Form Namespace. Forms (form names) are primary properties, and consist of at least two elements separated by a colon ( : ). Secondary properties exist within the namespace of their primary property (form). Secondary properties are preceded by a colon ( : ) and use the colon to separate additional namespace elements, if needed. (Universal properties are preceded by a period ( . ) to distinguish them from form-specific secondary properties.) For example, the secondary (both universal and form-specific) properties of inet:fqdn include:

  • inet:fqdn.created (universal property)

  • inet:fqdn:zone (secondary property)

Secondary properties also make up a relative namespace (set of relative properties) with respect to their primary property (form). The Storm query language allows (or in some cases, requires) you to reference a secondary property using its relative property name (i.e., :zone vs. inet:fqdn:zone).

Relative properties are also used for display purposes within Synapse for visual clarity (see the Node Example above).

Secondary properties may have their own “namespace”. Both primary and secondary properties use colons to separate elements of the property name. However, not all separators represent property “boundaries”; some act more as “sub-namespace” separators. For example file:bytes is a primary property / form. A file:bytes form may include secondary properties such as :mime:pe:imphash and :mime:pe:complied. In this case :mime and :mime:pe are not secondary properties, but sub-namespaces for individual MIME data types and the “PE executable” data type specifically.

Viewing or Working with Properties

Properties are used to describe forms and are defined within the Synapse source code with their respective Forms. Universal properties are not defined “per-form” but have their own section (Universal Properties) in the online technical documentation.

Properties can also be viewed within Synapse. A full list of current properties can be displayed with the following Storm command:

storm> syn:prop

You can view individual primary or secondary properties as follows:

Primary property:

storm> syn:prop=inet:fqdn
syn:prop=inet:fqdn
        :doc = A Fully Qualified Domain Name (FQDN).
        :extmodel = false
        :form = inet:fqdn
        :type = inet:fqdn
        :univ = false

Secondary property:

storm> syn:prop=inet:fqdn:domain
syn:prop=inet:fqdn:domain
        :base = domain
        :doc = The parent domain for the FQDN.
        :extmodel = false
        :form = inet:fqdn
        :relname = domain
        :ro = true
        :type = inet:fqdn
        :univ = false

See Storm Reference - Model Introspection for additional detail on working with model elements within Storm.

Tag

Tags are annotations applied to nodes. They can be thought of as labels that provide context to the data represented by the node.

Broadly speaking, within Synapse:

  • Nodes represent things: objects, relationships, or events. In other words, nodes typically represent observables that are verifiable and largely unchanging.

  • Tags typically represent assessments: observations that could change if the data or the analysis of the data changes.

For example:

  • An Internet domain is an “observable thing” - a domain exists, was registered through a domain registrar, and can be created as a node such as inet:fqdn = woot.com.

  • Whether a domain has been sinkholed is an assessment. A researcher may need to evaluate data related to that domain (such as domain registration records or current and past IP resolutions) to decide whether the domain appears to be sinkholed. This assessment can be represented by applying a tag such as #cno.infra.dns.sink.holed to the inet:fqdn = woot.com node.

Tags are unique within the Synapse model because tags are both nodes and labels applied to nodes. The tag #cno.infra.dns.sink.holed can be applied to another node; but the tag itself also exists as the node syn:tag = cno.infra.dns.sink.holed. This difference is illustrated in the example below.

Tip

Synapse does not have any pre-defined tags. Users are free to create tags that are meaningful for their analysis. See Analytical Model - Tag Concepts for more detail.

Viewing or Working with Tags

As tags are nodes (data) within the Synapse, they can be viewed and operated upon just like other nodes. Users typically interact with data using the Storm query language (Storm Reference - Introduction).

Tag Example

The Storm query below displays the node for the tag cno.infra.dns.sink.holed:

storm> syn:tag=cno.infra.dns.sink.holed
syn:tag=cno.infra.dns.sink.holed
        :base = holed
        :depth = 4
        :doc = A domain (zone) that has been sinkholed.
        :title = Sinkholed domain
        :up = cno.infra.dns.sink
        .created = 2023/07/12 15:06:39.666

The Storm query below displays the tag #cno.infra.dns.sink.holed applied to the node inet:fqdn = hugesoft.org:

storm> inet:fqdn=hugesoft.org
inet:fqdn=hugesoft.org
        :domain = org
        :host = hugesoft
        :issuffix = false
        :iszone = true
        :zone = hugesoft.org
        .created = 2023/07/12 15:06:39.689
        #cno.infra.dns.sink.holed

Note that a tag applied to a node uses the “hashtag” symbol ( # ). This is a visual cue to distinguish tags on a node from the node’s secondary properties. The symbol is also used within the Storm syntax to reference a tag as opposed to a syn:tag node.

Lightweight (Light) Edge

Lightweight (light) edges are used in Synapse to provide greater flexibility and improved performance when representing certain types of relationships. A light edge is similar to an edge in a traditional directed graph; each light edge links exactly two nodes (n1 and n2), and consists of:

  • A direction. Light edge relationships only “make sense” in one direction, given the forms that they link. For example, an article can reference an indicator such as an MD5 hash, but an MD5 hash does not “reference” an article.

  • A “verb” that represents the relationship (e.g., refs for “references” in the example above).

Light edges do not have properties, and you cannot apply tags to light edges - hence the “light” in light edge.

Light edges are used for performance and flexibility in certain use cases, such as:

  • The only information you need to record about a relationship is that it exists (that is, no properties are required to further “describe” the relationship). An example is meta:ruleset -(contains)> meta:rule.

  • The objects (nodes) involved in the relationship may vary. That is, either the n1 or n2 node (or both) may be any kind of node, depending on the context of the relationship. Examples include meta:source -(seen)> * (where a data source may “see”, observe, or provide data on any n2 object) and * -(refs)> * (where a variety of n1 nodes may “reference” or contain a reference to any n2 node).

Synapse’s data model does not include any pre-defined light edges. In addition, Synapse does not enforce or restrict the objects (nodes) that can be linked with light edges. Users are free to create / define their own light edges and use them as they see fit.

Tip

Light edges should not be used as a convenience to short-circuit proper data modeling using forms. Using forms and nodes (combined with Synapse’s strong typing, type enforcement, and type awareness) are key to the powerful analysis and performance capabilities of a Synapse hypergraph.

Viewing or Working with Light Edges

Light edges are not “objects” in Synapse in the same way as forms, types, or properties. (In fact, light edges do not exist until you create them.) The Storm model commands (specifically the model.edge.* commands) include options for working with light edges that exist in a given Cortex.

Internal to The Vertex Project, we have defined a number of light edges for our own use. Light edges may also be created by Vertex-provided components such as Power-Ups (see Power-Up). Any light edges used by Power-Ups are described in the associated Power-Up documentation.

Light edge conventions used by The Vertex Project are documented within the Synapse source code. Light edges that can be used with a given form are also documented with the Forms in the Synapse Data Model technical reference. These conventions are not currently enforced and meant as recommendations.