Storm Reference - Automation

Background

Synapse is designed to facilitate large-scale analysis over disparate data sources with speed and efficiency. Many of the features that support this analysis are built into Synapse’s architecture, from performance-optimized indexing and storage to an extensible data model that allows you to reason over data in a structured manner.

Synapse also supports large-scale analysis through the use of automation. Synapse’s automation features are available through the Storm runtime and include:

Triggers and Cron
Macros
Dmons

All forms of automation in Synapse utilize the Storm query language. That is, regardless of the type of automation used, the result is the execution of a predefined Storm query. This means that anything that can be written in Storm can be automated, from the simple and straightforward to the highly complex. Actions performed via automation are limited only by imagination and Storm proficiency. Some automation is fairly simple (“if X occurs, do Y” or “once a week, update Z”). However, automation can take advantage of all available Storm features, including subqueries, variables, libraries, control flow logic, and so on.

Considerations

This section is not meant to be a detailed guide on implementing automation. However, a few areas are included here for consideration when utilizing automation in your environment.

Permissions.

The ability to create, modify, or delete the various types of automation depends on a user having the appropriate permissions to do so. If you assign permissions for these tasks in your environment, keep the following in mind:

Triggers, cron jobs, and dmons execute in the context of (with the permissions of) the user who creates them.
Macros execute in the context of the user who calls them for execution, regardless of who created them.

The Storm query executed by a given automation task cannot perform tasks that it does not have appropriate permissions to perform. This leads to some straightforward (and some less straightforward) situations. For example:

A user with higher privileges can write a macro that can be called and executed by a user with lower privileges. However, if the lower-privileged user does not have permissions to perform actions encoded in the macro, execution by the lower-privileged user will fail (i.e., with an AuthDeny error).
A user with higher privileges can write a trigger that calls a macro. Because the macro was called by the trigger, the macro executes with the permissions of the user who wrote the trigger. As triggers are event-driven (see below), this means it is possible for a lower-privileged user to cause an event that fires the higher-privileged trigger that calls the higher-privileged macro. Both will execute as written even though the execution was “caused” by a lower-privileged user.

Use Cases.

Organizations can implement automation in any way they see fit, and at any scale. Some automation may be enterprise-wide, where it is used to support an organization’s overall mission or analysis efforts. Other automation may be put in place by individual analysts to support their own research efforts, either on an ongoing or temporary basis. For example:

an analyst may write a cron job to execute a long-running data collection and analysis task during off-peak hours, or may create a macro as a “shortcut” to make it easier to store and execute a particular Storm query on demand.
an organization may implement a set of automation to track malicious indicators within Synapse and orchestrate the integration of those indicators with security devices or SIEM systems.

Architecture.

There are varying approaches for “how” to write and implement automation. For example:

The various types of automation can be kept entirely separate and independent from one another, each executing their own Storm code and performing different tasks. Alternatively, automation can be somewhat centralized (say in macros), where other types of automation execute minimal Storm queries whose purpose is to call the more extensive Storm stored in macros.
Automation can be written as many small, individual elements that each perform a relatively simple task, but which are designed to work together like building blocks to orchestrate larger-scale operations. In contrast, automation can be implemented using fewer elements that perform larger, more unified tasks using more complex Storm to carry out multiple operations.

Each approach has its pros and cons; there is no single “right” way, and what works best in your environment or for a particular task will depend on your needs (and possibly some trial and error).

Governance / Management.

Where multiple users have the ability to create automation tasks, it is possible for them to create duplicative or even conflicting automation. This is less likely to occur (or less likely to be potentially harmful) when users create automation to support their personal workflow, but can present problems when multiple users have permissions to both create automation and perform actions within Synapse that have “global” effects. For example:

Tags typically represent assessments about data. This can include things like whether an IP address is a sinkhole or whether an indicator is associated with a particular threat group. Organizations may wish to consider under what circumstances these assessments should be applied via automation. (For organizations not comfortable with fully automating certain analytical decisions, automation can be used to tag something “for review” to ensure a human in the loop.)
Automation is particularly useful for “enriching” indicators (such as domains or file hashes) by ingesting data into Synapse from various third-party sources. In some cases third-party access may be subject to query limits or may incur per-query (or per-result) usage costs. Organizations may wish to consider how to balance the convenience of automation with potential operational or financial impacts.

Organizations should plan whether or how to coordinate and deconflict automation where appropriate.

Review and Testing.

Automation in Synapse provides significant advantages. It relieves analysts from performing tedious work, freeing them to focus on more detailed analysis and complex tasks. It also allows you to scale your analytical operations by limiting the amount of work that must be performed manually. However, badly written automation or automation that contains logical errors also allows you to “make errors at machine speed”. We strongly encourage testing automation in a development environment before implementing it on a production system.

Nodes In and Nodes Out.

In some cases automation can be used to “chain” operations (e.g., both triggers and cron can call macros) or may be executed inline with other Storm operations (a macro can be called in the middle of a Storm query where the query expects to perform further operations on the nodes returned by the macro). In these cases, both the automation itself and any Storm executed after the automation may fail if the inbound nodes are not what is expected by the query.

Users should keep the Storm Operating Concepts in mind when writing automation.

Triggers and Cron

Triggers and cron are similar in terms of permissions and management. Both triggers and cron:

Support permissions. Synapse uses permissions to determine who can create, modify, and delete triggers and cron jobs. Triggers and cron jobs execute with the permissions of the user who creates them. (This is fairly intuitive for cron, but may be less so in the case of triggers.) Conversely, a trigger or cron job can only perform actions that their creator has permissions to perform.
Are introspectable. Triggers and cron jobs are created as run-time nodes (“runt nodes”) and can be lifted, filtered, and pivoted across just like other elements of the Synapse data model (see the Storm Reference - Model Introspection guide for details).
Are view-specific. Synapse allows the optional segregation of data in a Cortex into multiple layers (Layer) that can be “stacked” to provide a unified View of data to users. Triggers and cron jobs are specific to a view. This means that they are carried over (for example) if a user forks (Fork) a view. See the layer and view commands in the Storm Reference - Storm Commands guide for additional detail on working with views and layers.

Note

A simple Synapse implementation consists of a single Cortex with a single layer and a single view.

Triggers

Triggers are event-driven. As their name implies, they trigger (“fire”) their associated Storm query on demand when specific events occur in a Cortex. As noted above, a trigger executes with the permissions of the user who created it; the trigger can only perform actions that the user is allowed to perform.

Triggers can fire on the following events:

Adding a node (node:add)
Deleting a node (node:del)
Setting a property (prop:set)
Adding a tag to a node (tag:add)
Deleting a tag from a node (tag:del)

Each event requires an object (a form, property, or tag) to act upon - that is, if you write a trigger to fire on a node:add event, you must specify the type of node (form) associated with the event. Similarly, if a trigger should fire on a tag:del event, you must specify the tag whose removal fires the trigger.

tag:add and tag:del events can take an optional form, allowing you to specify that a trigger should fire when a given tag is added (or removed) from a specific form as opposed to any / all forms.

Triggers execute immediately when their associated event occurs in a Cortex. This allows automation to occur in “real time” as opposed to waiting for a scheduled cron job to execute (or for an analyst to manually perform some task). As such, triggers are most appropriate for automating tasks that should occur right away (e.g., based on efficiency or importance). For example, triggers can be used to perform simple but tedious tasks, freeing analysts from having to perform such tasks manually; or they can be used to collect additional data (e.g., from third-party services) about nodes of interest (“enrich” those nodes).

Warning

Triggers execute inline with the process (typically a Storm query) that causes them to fire. That is, if a process (or an analyst) creates a node as part of a longer Storm query and creation of that node causes a trigger to fire, the trigger’s Storm will execute immediately and in full before returning “back” to the main query to complete any additional Storm operations. This includes execution of any additional code that may be called by the trigger itself, such as a macro. Conceptually, it is as though all of the trigger’s Storm code and any additional Storm that it calls were inserted into the middle of the original Storm query that fired the trigger.

This inline execution can impact your query’s performance, depending on the Storm executed by the trigger and the number of nodes causing the trigger to fire. The`` –async`` option can be used when creating a trigger to specify that the trigger should run in the background vs. inline. This will cause the trigger event to be stored in a persistent queue, which will then be consumed automatically by the Cortex.

For example, let’s say you are using CSVTool (see Synapse Tools - csvtool) to load 3,000 indicators and tag them as “malicious”. Your Cortex includes a trigger that fires when a “malicious” tag is applied to any node, calling a Storm query that contacts five third-party services to enrich those indicators, resulting in the creation of dozens of additional nodes for each indicator enriched. This tag-and-enrich process is executed inline for each of the 3,000 nodes created by the CSVTool ingest. If the trigger was created as an async trigger, the CSVTool may complete before the enrichment is done, but the enrichment will eventually complete in the background.

Conversely, triggers only execute when the specified event occurs. The events used to fire triggers typically will only ever occur once - the node inet:fqdn=woot.com will only ever be added to a Cortex once; applying the tag #my.tag to a hash:md5 node will only occur one time (barring deleting and later re-adding the node or tag). The only exception is the prop:set event, which will fire when the specified property is first set (often on node creation) as well as any time the property is modified (i.e., “set” again).

The “one-time” nature of triggers may result in a small number of edge cases where the trigger does not perform the desired action. For example:

Triggers do not operate retroactively; they will not fire for nodes that already exist in a Cortex at the time the trigger is added. That is, if you write a new trigger to fire any time the tag #my.tag is applied to a hash:md5 node, the trigger will have no effect on existing hash:md5 nodes that already have the tag.
If a trigger depends on a resource (process, service, etc.) that is not available when it fires, the trigger will simply fail; it will not “try again”.

Finally, users should be aware that in some cases proper trigger execution may depend on the timing and order of events with respect to creating nodes, setting properties, creating nodes from secondary properties (“derivative nodes”), etc. The detailed technical aspects of Synapse write operations are beyond the scope of this discussion, but as always it is good practice to test triggers (or other automation) before putting them into practice to ensure they perform as intended.

Variables:

Triggers automatically have the Storm variable $auto populated when they run. The $auto variable is a dictionary which contains the following keys:

$auto.iden
The identifier of the Trigger.

$auto.type
The type of automation. For a Trigger this value will be trigger.

$auto.opts
Dictionary containing trigger specific runtime information. This includes the following keys:

$auto.opts.form
The form of the triggering node.

$auto.opts.propfull
The full name of the property that was set on the node. Only present on prop:set triggers.

$auto.opts.propname
The relative name of the property that was set on the node. Does not include a leading :. Only present on prop:set triggers.

$auto.opts.tag
The tag which caused the trigger to fire. Only present on tag:add and tag:del triggers.

$auto.opts.valu
The value of the triggering node.

Syntax:

Triggers are created, modified, viewed, enabled, disabled, and deleted using the Storm trigger.* commands. See the trigger command in the Storm Reference - Storm Commands document for details.

Note

Triggers can be modified after they are created, but modification is limited to updating the Storm query executed by the trigger. If other aspects of a trigger need to be changed (such as the event that fires the trigger, or the form a trigger operates on), the trigger must be deleted and re-created.

Examples:

In the examples below, the command to create (add) the specified trigger is shown.

For illustrative purposes, in the first example the newly created trigger is displayed by using the trigger.list command and then by lifting the associated syn:trigger runtime (“runt”) node.

Add a trigger that fires when an inet:whois:email node is created. If the email address is associated with a privacy-protected registration service (e.g., the email address is tagged #whois.private), then also tag the inet:whois:email node.

storm> trigger.add --name "tag privacy protected inet:whois:email" node:add --form inet:whois:email --query { +{ -> inet:email +#whois.private } [ +#whois.private ] }
Added trigger: ca529e65c15c219d0e6d02f9a0087f20

Newly created trigger via trigger.list

storm> trigger.list
user       iden                             en?    async? cond      object                    storm query
root       ca529e65c15c219d0e6d02f9a0087f20 True   False  node:add  inet:whois:email             +{ -> inet:email +#whois.private } [ +#whois.private ]

The output of trigger.list contains the following columns:

The username used to create the trigger.
The trigger’s identifier (iden).
Whether the trigger is currently enabled or disabled.
The condition that causes the trigger to fire.
The object that the condition operates on, if any.
The tag or tag expression used by the condition (for tag:add or tag:del conditions only).
The query to be executed when the trigger fires.

Newly created trigger as a syn:trigger node

storm> syn:trigger
syn:trigger=ca529e65c15c219d0e6d02f9a0087f20
        :cond = node:add
        :doc =
        :enabled = True
        :form = inet:whois:email
        :name = tag privacy protected inet:whois:email
        :storm =  +{ -> inet:email +#whois.private } [ +#whois.private ]
        :user = 6eb651e9c756016fe347a298a3ec1436
        :vers = 1

While runt nodes (Node, Runt) are typically read-only, syn:trigger nodes include :name and :doc secondary properties that can be set and modified via Storm. This allows management of triggers by providing them with meaningful names and descriptions of their purpose. Changes to these properties will persist even after a Cortex restart.

Add a trigger that fires when the :exe property of an inet:dns:request node is set. Check to see whether the queried FQDN is malicious; if so, tag the associated file:bytes node for analyst review.

storm> trigger.add --name "tag file:bytes for review" prop:set --prop inet:dns:request:exe --query { +{ :query:name -> inet:fqdn +#malicious } :exe -> file:bytes [ +#review ] }
Added trigger: 78356863684a8e948baed134b6b7e0d6

Add a trigger that fires when the tag #ttp.phish.attach (indicating that a file was an attachment to a phishing email) is applied to a file:bytes node. Use the trigger to also apply the tag #attck.t1566.001 (indicating MITRE ATT&CK technique “spear phishing attachment”).

storm> trigger.add --name "tag phish attachment with #attck.t1566.001" tag:add --form file:bytes --tag ttp.phish.attach --query { [ +#attck.t1566.001 ] }
Added trigger: 04fec9c958414bdf3fe850ce719129dd

Add a trigger that fires when the tag #osint (indicating that the node was listed as a malicious indicator in public reporting) is applied to an FQDN. The trigger should call extended Storm commands to submit the FQDN to third-party services (such as a domain whois service, a passive DNS service, or a malware execution / sandbox service) to enrich the FQDN. (Note: Synapse does not include these services in its public distribution; this trigger assumes such services have been implemented).

storm> trigger.add --name "enrich osint fqdn" tag:add --form inet:fqdn --tag osint --query { | whois | pdns | malware }
Added trigger: 55cad26bae452afca31ab9d641290bef

Add a trigger that fires when the tag #osint (indicating that the node was listed as a malicious indicator in public reporting) is applied to any node. The trigger should call (execute) a macro called enrich. The macro contains a Storm query that uses a switch case to call the appropriate extended Storm commands based on the tagged node’s form (e.g., perform different enrichment / call different third-party services based on whether the node is an FQDN, an IPv4, an email address, a URL, etc.).

storm> trigger.add --name "enrich osint" tag:add --tag osint --query { | macro.exec enrich }
Added trigger: 8591798b88fc9b3bc23b331bc9503cd8

Cron

Cron jobs in Synapse are similar to the well-known cron utility. Where triggers are event-driven, cron jobs are time / schedule based. Cron jobs can be written to execute once or on a recurring schedule. When creating a cron job, you must specify the schedule for the job and the Storm query to be executed. As noted above, a cron job executes with the permissions of the user who created it; the job can only perform actions that the user is allowed to perform.

As such, cron is most appropriate for automating things such as non-urgent tasks, tasks that only need to run intermittently, or resource-intensive tasks that should run during off-hours.

Note

When scheduling jobs, cron interprets all times as UTC.

Variables:

Cron jobs automatically have the Storm variable $auto populated when they run. The $auto variable is a dictionary which contains the following keys:

$auto.iden
The identifier of the Cron job.

$auto.type
The type of automation. For a Cron job this value will be cron.

Syntax:

Cron jobs are created, modified, viewed, enabled, disabled, and deleted using the Storm cron.* commands. See the cron command in the Storm Reference - Storm Commands document for details.

Note

Cron jobs can be modified after they are created, but modification is limited to updating the Storm query executed by the job. If other aspects of a job need to be changed (such as its schedule for execution), the job must be deleted and re-created.

Examples:

In the examples below, the command to create (add) the specified cron job is shown.

For illustrative purposes, in the first example the newly created cron job is then displayed by using the cron.list command and by lifting the associated syn:cron runtime (“runt”) node.

Add a one-time / non-recurring cron job to run at 7:00 PM to create the RFC1918 IPv4 addresses in the 172.16.0.0/16 range.

storm> cron.at --hour 19 { [ inet:ipv4=172.16.0.0/16 ] }
Created cron job: 420a54db5900edbf4e858c8c6e434e28

Newly created cron job via cron.list

storm> cron.list
user       iden       view       en? rpt? now? err? # start last start       last end         query
root       420a54db.. 47a6345a.. Y   N    N         0       Never            Never             [ inet:ipv4=172.16.0.0/16 ]

The output of cron.list contains the following columns:

The username used to create the job.
The first eight characters of the job’s identifier (iden).
Whether the job is currently enabled or disabled.
Whether the job is scheduled to repeat.
Whether the job is currently executing.
Whether the last job execution encountered an error.
The number of times the job has started.
The date and time of the job’s last start and last end.
The query executed by the cron job.

Newly created cron job as a syn:cron node

storm> syn:cron
syn:cron=420a54db5900edbf4e858c8c6e434e28
        :doc =
        :name =
        :storm =  [ inet:ipv4=172.16.0.0/16 ]

While runt nodes (Node, Runt) are typically read-only, syn:cron nodes include :name and :doc secondary properties that can be set and modified via Storm. This allows management of cron jobs by providing them with meaningful names and descriptions of their purpose. Changes to these properties will persist even after a Cortex restart.

Add a cron job to run every Tuesday, Thursday, and Saturday at 2:00 AM that lifts 1,000 IPv4 address nodes with missing geolocation data (i.e., no :loc property) and submits them to an extended Storm command that calls an IP geolocation service. (Note: Synapse does not include a geolocation service in its public distribution; this cron job assumes such a service has been implemented).

storm> cron.add --day Tue,Thu,Sat --hour 2 { inet:ipv4 -:loc | limit 1000 | ipgeoloc }
Created cron job: 2a9fd87493b748e7b64b0af66ef6dc1d

Add a cron job to run on the 15th of every month to lift all MD5, SHA1, and SHA256 hashes tagged “malicious” that do not have corresponding file (file:bytes) nodes and submit them to an extended Storm command that queries a third-party malware service for those files. (Note: Synapse does not include a malware service in its public distribution; this cron job assumes such a service has been implemented).

storm> cron.add --day 15 { hash:md5#malicious hash:sha1#malicious hash:sha256#malicious -{ -> file:bytes } | malwaresvc }
Created cron job: 7f6d86e2451cf99c1914a1862dcdaa3d

Macros

Similar to Triggers and Cron, a macro is a stored Storm query / set of Storm code. However, macros differ from triggers and cron in some important ways:

Cortex-specific. Where triggers and cron jobs are specific to a View, macros are specific to a given Cortex.
Execution. Where triggers and cron execute based on specific criteria (events for triggers and scheduled times for cron), macros do not “execute” by default. Instead, macros are executed with the Storm macro.exec command.
Permissions. Where triggers and cron execute with the permissions of the user who created them, macros execute with the permissions of the user who calls the macro. This means that if a macro is called by a trigger or cron job, it executes as the creator of that trigger or cron job. A macro can only perform actions that the calling user can perform.

The Storm query contained in a macro can be as simple or complex as you like. Users can write “personal” macros to store complex or frequently-used queries that can be executed easily by calling the macro name instead of manually typing a lengthy query. Organizations can use them to store longer or more complex Storm queries that may be more difficult to view and manage as triggers or cron jobs within the Storm cmdr CLI.

Because macros are executed via a Storm command, they can be executed any way Storm can be executed. In short, they can be called manually on demand from the cmdr CLI, or they can be called as part of the Storm query executed by a trigger, cron job, or other automation. This provides a great deal of flexibility, particularly for organization-wide automation. By using a macro, you can write a single Storm query to carry out a task that can then be called in a variety of ways.

Note

Macros can contain any valid Storm. You can write a macro to be fully “self-contained” so that it both lifts a specified set of nodes and operates on those nodes. Alternately, you can write a macro with the expectation that it will be called as part of a larger Storm query and will operate on a particular set of “inbound” nodes that will be piped to the macro.exec command.

Syntax:

Macros are created, modified, viewed, and deleted using the Storm macro.* commands. See the macro command in the Storm Reference - Storm Commands document for details.

Note

Macros consist entirely of Storm, so can be edited / modified freely with appropriate permissions.

Examples:

Add a macro named sinkhole.check that lifts all IPv4 addresses tagged as sinkholes (#infra.sinkhole) and submits those nodes to an extended Storm command that calls a third-party passive DNS service. (Note: Synapse does not include a passive DNS service in its public distribution; the macro assumes such a service has been implemented).

storm> macro.set sinkhole.check ${ inet:ipv4#infra.sinkhole | pdns }
Set macro: sinkhole.check

Add a macro named check.c2 that takes an inbound set of file:bytes nodes and returns any FQDNs that the files query and any IPv4 addresses the files connect to. Use a filter in the macro to ensure that the macro code only attempts to process inbound file:bytes nodes.

storm> macro.set check.c2 ${ +file:bytes | tee { -> inet:dns:request :query:name -> inet:fqdn | uniq } { -> inet:flow:src:exe :dst:ipv4 -> inet:ipv4 | uniq } }
Set macro: check.c2

Add a macro named enrich that takes any node as input and uses a switch statement to call extended Storm commands for third-party services able to enrich a given form (line breaks and indentations used for readability). (Note: Synapse does not include third-party services / connectors in its public distribution; the macro assumes such services have been implemented.)

storm> macro.set enrich ${ switch $node.form() {

    /* You can put comments in macros!!! */

    "inet:fqdn": { | whois | pdns | malware }
    "inet:ipv4": { | pdns }
    "inet:email": { | revwhois }
    *: { }
} }
Set macro: enrich

Note

The macro above uses variables, methods, and a switch case for control flow.

Dmons

A Dmon is a long-running or recurring query or process that runs continuously in the background, similar to a traditional Linux or Unix daemon.

Variables:

Dmons will have the storm variable Storm variable $auto populated when they run. The $auto variable is a dictionary which contains the following keys:

$auto.iden
The identifier of the Dmon.

$auto.type
The type of automation. For a Dmon this value will be dmon.

Note

If the variable $auto was captured during the creation of the Dmon, the variable will not be mapped in.

Syntax:

Users can interact with dmons using the Storm dmon.* commands (see the dmon command in the Storm Reference - Storm Commands document for details) and the $lib.dmon Storm libraries.