Test Persistent Cache a nodeJS module

Test Persistent Cache a nodeJS module

Assign
Date
‣
Status
Completed

What

A simple Node module to persistently store/cache arbitrary data.

Why

From https://www.notion.so/jymcheong/Measure-simple-enqueue-vs-AddEvent-26f6557156c74f74869bb5461d1e89eb, we can see that it is much faster ingestion with simple enqueue. That meets the objective of quickly getting events into the backend first, then process.
 
When it comes to process, I started off with a "always-link-to-parent" mentality because I wanted to learn what's going on under the Windows' hood. https://github.com/jymcheong/SysmonViz is the predecessor of OpenEDR.
 
At some juncture, I notice it was inefficient. I noticed that the very first SMSS.exe always had a ParentImage = "System". That means for that SMSS.exe, the Sequence would be always be "System > smss.exe", without having to link edge between parent to child.
 
It also means for the next child of the first SMSS.exe, we simply append the short image name to the parent's sequence. This was in turn implemented with ODB function. Then I started questioning myself: why bother to link to parent if we can get & count Sequence frequency???
So this task is related to totally shifting the Sequence derivation to insertEvent.js.
This matters because the ODB functions can further simply to link only when the Sequence or CommandLine anomalies are found.

Objectives

  1. Ensure that the cached values really persist after we restart the script
  1. Define & measure performance (eg. is it always reading off the disk or only the first time or...?)

YJ's Documentation

Installation of persistent-cache

Run the command:
npm install persistent-cache

Overview of persistent-cache

Persistent-cache insertion of data

There are two ways of inserting data: cache.put(key, data, cb) or cache.putSync(key, data).
Let's take a look into the putSync function which I will be using for Objective 1 to insert data into the cache.
Source code of putSync function:
notion image
By default, both variables persist and ram are set to true (can be changed to suit your needs).
  • ram: Whether the cache will use memory caching or not (mirrors all cache data in the ram, saving disk I/O and increasing performance).
  • persist: Whether the cache should be persistent, aka if it should write its data to the disk for later use or not.
Since both variables are true, it will create an entry on both the disk and in memoryCache.
FYI: The cached data are stored as JSON files in /openEDR/backend/cache/cache
notion image
notion image
Reference:

Persistent-cache retrieval of data

Similarly, there are two ways of retrieving data from the cache: cache.get(key, cb) or cache.getSync(key).
Let's take a look into the getSync function which I will be using for Objective 1 to retrieve data from the cache.
Source code of getSync function:
notion image
  • First red box: Returns data if cache is using memory cache i.e. ram set to true and the key exists in memoryCache.
  • Second red box: Returns data if the JSON file exists on the disk. If the file does not exists, return undefined.
  • Third red box: Returns undefined if the cache entry is no longer valid. By default, cache entries are not invalidated (stay until deleted).
Reference:

Objective 1 - Ensure persistent-cache works

To test if the cached values really persist after restarting the script, these were the steps I took:
  1. Modify insertEvent.js to create a cache and insert sample data into it.
    1. // Create cache var cache = require('persistent-cache'); var test = cache(); // Insert sample data into test cache test.putSync('babies', ['Ron', 'Emily']); // Debugging purposes console.log('Inserted data into cache');
  1. Reload insertEvent.js for the changes to take place.
    1. notion image
  1. Replace the insertion of data in insertEvent.js to retrieve the data instead.
    1. // Create cache var cache = require('persistent-cache'); var test = cache(); // Insert sample data into test cache // test.putSync('babies', ['Ron', 'Emily']); // Retrieve the data of entry babies console.log(test.getSync('babies'));
  1. Reload insertEvent.js and check if data was returned.
    1. notion image
      As shown in the screenshot above, I was able to retrieve the data stored in the cache. Thus, this proves that persistent-cache works as the cached values persist even after restarting the script.

Objective 2 - Define & measure performance

For this objective, I will compare the performance of persistent-cache lookup vs performance of OrientDB index-lookup.

Overview of OrientDB index-lookup

Failed attempt to create an index with 3 keys i.e. Company, Hostname & ProcessGuid. As shown in the screenshot below, reason is because field 'Company' is absent in class definition of ProcessCreate. One solution would be to use the field 'Organisation' which is a field of ProcessCreate.
notion image
I created an index with 2 keys instead i.e. using Hostname & ProcessGuid.
notion image
Next, I went to find the number of entries in the created index. I will need to populate 29178 entries in the persistent-cache to ensure fairness.
notion image

Overview of persistent-cache lookup

We first need to populate the persistent-cache with 29178 entries to match the number of entries in ODB testIndex.
I created a simple for loop to create 29178 entries. Each entry has 2 keys to match the 2 keys in ODB testIndex.
for(var i=0;i<29178;i++){ test.putSync(['key'+1,'keys'+1],['data'+i]); }
Check to ensure that the cache contains 29178 entries using the command ls -l | wc -l:
notion image
Now that both the cache and ODB index have the same number of entries, we can perform a lookup on both to see which has the better performance.

Lookup timings

ODB index lookup timing:
Attempt 1: Query executed in 0.003 seconds.
notion image
Attempt 2: Query executed in 0.001 seconds.
notion image
Attempt 3: Query executed in 0.002 seconds.
notion image
On average, it takes 0.002 seconds to execute a ODB index lookup on 29178 entries.
 
Persistent-cache lookup timing:
To measure the time taken for the lookup, I made use of console.time:
console.time('Retrieve') //Cache lookup console.log(test.getSync(['key2255','keys2255'])); console.timeEnd('Retrieve')
*Note about console.time:
The string being passed to the time() and timeEnd() methods must match (for the timer to finish as expected).
References:
 
Attempt 1: Lookup executed in 6.919ms = 0.006919 seconds
notion image
Attempt 2: Lookup executed in 3.727ms = 0.003727 seconds
notion image
Attempt 3: Lookup executed in 4.663ms = 0.004663 seconds
notion image
On average, it takes around 0.005 seconds to execute a persistent-cache lookup on 29178 entries.

Conclusion

ODB index lookup: 0.002 seconds
Persistent-cache lookup: 0.005 seconds
Difference: 0.003 seconds
📢
A persistent-cache lookup is slightly slower as compared to a ODB index lookup.

Jym's Comments

's work changed my initial assumption that external cache should be out-performing ODB indices. His result shows ODB index doing better, there are also other concerns with external caching like the library security, memory-disk-usage... etc, that it made me reconsider my next course of action.
I will be deriving Sequence from 4688 audit events, which are reliable compared to Sysmon ProcessCreate. This is related to what we learnt from his earlier track that there were missing SpoofedParentProcess edges.
With 4688 events, we can an alternative to link spoof'ed child process to real-parent-process in an event that Sysmon somehow does not record. We also solve inability to derive Sequence if early Sysmon ProcessCreate events somehow MIA'ed.
notion image
The parsing of 4688 also pave the way for detect host sensor tampering:
notion image
Assuming an unfavourable event of Privilege Escalation such that Sysmon is disabled, we still have 4688 events that will be streaming back. The attacker will need to access Windows Event either via Audit sub-system calls or directly reading ETL files. My next step is to monitor access to ETL files.