feed
The Synapse feed
tool is a way to ingest data exported from one Cortex into another Cortex. Users should be familiar with both the Synapse data model (Data Model - Terminology et al.) as well as Synapse concepts such as packed nodes and splices in order to use and understand the feed
tool effectively.
Syntax
The feed
tool is executed from an operating system command shell. The command usage is as follows (line is wrapped for readability):
usage: synapse.tools.feed [-h] (--cortex CORTEX | --test) [--debug] [--format FORMAT] [--modules MODULES]
[--chunksize CHUNKSIZE] [--offset OFFSET] [files ...]
Where:
- -h
displays detailed help and these command line options
- CORTEX
specifies the telapth URL to the Cortex where the data should be ingested.
--test
means to perform the ingest against a temporary, local Cortex instead of a live cortex, for testing or validationWhen using a temporary Cortex, you do not need to provide a path.
--debug
specifies to drop into an interactive prompt to inspect the state of the Cortex post-ingest.FORMAT
specifies the format of the input files.Currently, only the values “syn.nodes”, “syn.splices”, and “syn.nodeedits” are supported.
Defaults to “syn.nodes” if not specified
MODULES
specifies a path to a Synapse CoreModule class that will be loaded into the temporary Cortex.This option has no effect if the
--test
option is not specifiedFor more on Core Modules, see Cortex Development Quickstart
CHUNKSIZE
specifies how many lines or chunks of data to read at a time from the given files.Defaults to 1000 if not specified
OFFSET
specifies how many chunks of data to skip over (starting at the beginning)files
is a series of file paths containing data to load into the Cortex (or temporary Cortex)Every file must be either json-serialized data, msgpack-serialized data, yaml-serialized data, or a json lines file. The files do not have to all be of the same type.
Ingest Examples - Overview
The feed
tool
Ingest Example 1
This example demonstrates loading a set of nodes via the feed
tool with the “syn.nodes” format option. The nodes are of a variety of types, and are encoded in a json lines (jsonl) format.
JSONL File:
The jsonl file (testnodes.jsonl
) contains a list of nodes in their packed form. Each line in the file corresponds to a single node, with all of the properties, tags, and nodedata on the node encoded in a json friendly format.
[["it:reveng:function", "9710579930d831abd88acff1f2ecd04f"], {"iden": "508204ebc73709faa161ba8c111aec323f63a78a84495694f317feb067f41802", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069466909, "description": "An example function"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["inet:ipv4", 386412289], {"iden": "d6270ca2dc592cd0e8edf8c73000f80b63df4bcd601c9a631d8c68666fdda5ae", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069584577, "type": "unicast"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["inet:url", "https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide"], {"iden": "dba0a280fc1f8cf317dffa137df0e1761b6f94cacbf56523809d4f17d8263840", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069758843, "proto": "https", "path": "/en/latest/synapse/userguide.html#userguide", "params": "", "fqdn": "synapse.docs.vertex.link", "port": 443, "base": "https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["file:bytes", "sha256:ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c"], {"iden": "137fd16d2caab221e7580be63c149f83a11dd11f10f078d9f582fedef9b57ad5", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625070470041, "sha256": "ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c", "md5": "be1bb5ab2057d69fb6d0a9d0684168fe", "sha1": "57d13f1fa2322058dc80e5d6d768546b47238fcd", "size": 16}, "tagprops": {}, "nodedata": {}, "path": {}}]
Verifying the Data:
Typically, users will want to double check the data they have before loading it into a production Cortex. The feed
tool allows us to perform an ingest our of nodes file against an empty, ephemeral Cortex, so that we can check what nodes get created before slamming them into production. To load testnodes.jsonl
into an ephemeral Cortex and drop into a prompt to explore the ingested nodes, run:
python -m synapse.tools.feed --test --debug --format syn.nodes testnodes.jsonl
Assuming the command completed with no errors, we should now have a cmdr
prompt connected to our test Cortex:
cli>
From which we can issue Storm commands to interact with and validate the nodes that were just ingested. For example:
cli> storm #my.cool.tag
it:reveng:function=9710579930d831abd88acff1f2ecd04f
.created = 2021/06/30 19:46:31.810
:description = An example function
#my.cool.tag
inet:ipv4=23.8.47.1
.created = 2021/06/30 19:46:31.810
:type = unicast
#my.cool.tag
inet:url=https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide
.created = 2021/06/30 19:46:31.810
:base = https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide
:fqdn = synapse.docs.vertex.link
:params =
:path = /en/latest/synapse/userguide.html#userguide
:port = 443
:proto = https
#my.cool.tag
file:bytes=sha256:ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c
.created = 2021/06/30 19:46:31.810
:md5 = be1bb5ab2057d69fb6d0a9d0684168fe
:sha1 = 57d13f1fa2322058dc80e5d6d768546b47238fcd
:sha256 = ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c
:size = 16
#my.cool.tag
complete. 4 nodes in 16 ms (250/sec).
Loading the Data:
Once we’ve inspected and verified the data is acceptable for loading, we can point the feed
tool to the Cortex we want to load the nodes into, and the same nodes should be added.
python -m synapse.tools.feed --cortex tcp://cortex.vertex.link:4444/cortex00 --format 'syn.nodes'
testnodes.jsonl
However, once we’ve inspected the data, let’s say that the it:reveng:function and inet:ipv4 nodes are not allowed in the production Cortex, but the inet:url and file:bytes are. We can skip these two nodes by using a combination of the chunksize
and offset
parameters:
python -m synapse.tools.feed --cortex tcp://cortex.vertex.link:4444/cortex00 --format 'syn.nodes'
testnodes.jsonl --chunksize 1 --offset 1
With the chunksize
parameter signifying that the feed
tool should read two lines at a time from the file and process those before reading the next line, and the offset
parameter meaning the feed
tool should skip all lines before and including line 1 (so lines 1 and 0) when attempting to add nodes, and only add nodes once it’s read in lines 2 and beyond.
Ingest Example 2
This example demonstrates loading a series of splices via the “syn.splices” format option. Splices are atomic edits made to the Cortex, so they are more granular, and thus more voluminous than just nodes. For instance, the storm command [it:host=1cad54991eaff5bba5d2015c29c3e3a3 :desc="synapse server" :name="syn007"]
results in this set of splices (which have been saved to testsplices.yaml
).
---
- - node:add
- ndef:
- it:host
- 1cad54991eaff5bba5d2015c29c3e3a3
time: 1625087167677
user: 267d945a32e3ae246ecf71e0bc6a620e
- - prop:set
- ndef:
- it:host
- 1cad54991eaff5bba5d2015c29c3e3a3
oldv: null
prop: .created
time: 1625087167677
user: 267d945a32e3ae246ecf71e0bc6a620e
valu: 1625087167677
- - prop:set
- ndef:
- it:host
- 1cad54991eaff5bba5d2015c29c3e3a3
oldv: null
prop: desc
time: 1625087167679
user: 267d945a32e3ae246ecf71e0bc6a620e
valu: synapse server
- - prop:set
- ndef:
- it:host
- 1cad54991eaff5bba5d2015c29c3e3a3
oldv: null
prop: name
time: 1625087167680
user: 267d945a32e3ae246ecf71e0bc6a620e
valu: syn007
- - node:add
- ndef:
- it:hostname
- syn007
time: 1625087167680
user: 267d945a32e3ae246ecf71e0bc6a620e
- - prop:set
- ndef:
- it:hostname
- syn007
oldv: null
prop: .created
time: 1625087167680
user: 267d945a32e3ae246ecf71e0bc6a620e
valu: 1625087167680
...
Verifying the Data:
To load testsplices.yaml
into a test Cortex to see the splices getting applied, we can run the feed
tool like so:
python -m synapse.tools.feed --test --debug --format "syn.splice" testsplices.yaml
Which drops us into a cmdr
prompt, where we can verify that the it:host
node and it:hostname
nodes were created:
cli> storm it:host
it:host=1cad54991eaff5bba5d2015c29c3e3a3
.created = 2021/06/30 21:34:57.181
:desc = synapse server
:name = syn007
complete. 1 nodes in 5 ms (200/sec).
cli> storm it:hostname
it:hostname=syn007
.created = 2021/06/30 21:34:57.182
complete. 1 nodes in 5 ms (200/sec).
Loading the Data:
As before, once the data has been inspected and approved, we can point the feed
tool at the Cortex we want to apply the splices to in order to apply them.
python -m synapse.tools.feed --cortex tcp://cortex.vertex.link:4444/cortex00 --format 'syn.splice'
testsplices.yaml