Creating custom indexes to data with Swift Sets

Creating custom indexes to data with Swift Sets

Introduction & Background

I have been working on a flow visualization tool for a while now. It is part of a project where a datacenter has evolved over a long time and has grown into a complex myriad of connections, applications and “solutions” to make things work. Goal of the project is to visualize certain environments and how they communicate with other services. Surely we’ve looked at Tetration as a Service and other visualization tools. The business case wasn’t viable enough for commercial solutions, so decided to go with Elastiflow as a basis.

Personal note: It’s been way overdue for a blog post. Even though Tom (@networkingnerd) mentioned that many bloggers have found it hard to find the time and drive to write blog posts during COVID19, WFH and everything that comes with it, I have been frustrated over myself I haven’t found the drive to write for way too long. 

It works well for forensics but for the original purpose (visualization, research & analysis) it was lacking a few bits. It was thus decided that a custom-made application that takes the samples from ElastiFlow and work from there was the most viable road. And so I started writing a Mac Application, using Core Data and GraphViz to fetch samples, process them (dedup), and were able to visualize flows. 

Fast forward to just before summer break; the flows in the database have grown to millions of records, which led to huge delays (300s) for the visualization of the flows on a larger dataset of server IPs. The reason for that is twofold, database queries are always more expensive than in-memory searches and the way I organized the data (in-memory) was inefficient. I’ve solved the latter by optimizing the code structure and algorithm and reduced the visualization time from 300s to 25s per visualization iteration. But the first cause of delays (roundtrips to the database) isn’t solved with that optimization and it bit me in ingesting new samples (as that also checks the database for duplicate samples). The solution to that is to use an in-memory database (this is quite common; Cisco ISE has an in-memory database (I think Cisco Prime Infrastructure does too), and big data platforms like SAP/HANA). After a quick research for in-memory Swift based databases, I decided to build my own Swift based memory manager for just the flows.

Sets in Swift, Hashable, and my idea

Swift has a number of ways to create collections of data elements (classes or structures). The most common approach is an array, which is an ordered collection. It is very easy to use an array, as the example below shows.

Swift Arrays are not fixed in size, and it’s easy to add or remove elements to that array. Ideal for data manipulation and search. But.. the search is linear and takes, on average, half of the iterations to find the entry, but worst case it will take n-iterations (where n is the size of the array) to find the record you’re looking for. 

So to optimize the search speed, we need to reduce the number of iterations needed to find a specific flow. This is where the Set class comes into play. The Swift Set class is used to store unique data elements. The uniqueness is determined by the hash of a unique identifier (the key). Hashing is a mathematical method that creates a unique consistent value for a piece of data. It’s a common principle in not only encryption (integrity validation), but also in database search optimizations.

Instead of comparing all attributes in a record (let’s say a flow), the Set class checks the unique identifier if the element is inside the collection:

Using the Set swift class allowed me to do that optimization from 300seconds to 25seconds for visualization. So why not use the Set Class to store the flows. So I can definitely use that in the in-memory database of flows. In my application I use four categories of queries / searches on those flows:

  1. Full flows: Search for all attributes of a flow (protocol, source IP and Port, destination IP and Port) 
  2. Similar flows: Search for all flows for the same protocol, same source IP and to a specific destination IP and Port
  3. IP conversations: Search for flows only based on source and destination IP
  4. Server applications: Search for all flows matching a specifc destination IP and Port

Effectively these four types of searches are similar to four indices in a database, and I should be able to implement them inside Swift. And this is where the power of class inheritance and protocols come into play inside the Swift language. The Set class requires that the element inside the collection conforms to the Hashable and Equatable protocol (Hashable inherits from Equatable). It means that every element you want to store in a Set needs to minimally have the following functions:

func hash(into: inout Hasher)
static func  == (Self,Self) -> Bool

All common types in Swift (String, Int, Date) have a default implementation of these methods, so you can compare types ( == keyword is actually a function ) and use the Set class in your code. The default hash and == function implementations take all attributes and use them for comparison. 

But it is possible to override that behavior by specifically implementing the Hashable protocol and write your own implementation (and thus specify which properties are used for uniqueness)

The problem

So I wrote the following code to create the indices:

// Base class for storing network flows in memory
class FMMFlowEntry : Hashable, CustomStringConvertible {
    var proto : String = ""
    var srcIP : String = ""
    var srcPort : Int = 0
    var dstIP : String = ""
    var dstPort : Int = 0
    var firstSeen : Date?
    var lastSeen : Date?
    
    static func == (lhs: FMMFlowEntry, rhs: FMMFlowEntry) -> Bool {
        let response = dstIP == rhs.dstIP && proto == proto && dstPort == dstPort && srcIP == srcIP && srcPort == rhs.srcPort
        return response
    }
        
    public func hash(into hasher: inout Hasher) {
        hasher.combine(proto)
        hasher.combine(srcIP)
        hasher.combine(srcPort )
        hasher.combine(dstIP)
        hasher.combine(dstPort)
    }
    
    var description : String {
        let response : String = "\(proto) \(srcIP):\(srcPort) -> \(dstIP):\(dstPort)"
        return response
    }
    
    init(proto: String, srcIP: String, srcPort: Int, dstIP : String, dstPort: Int) {
        self.proto = proto
        self.srcIP = srcIP
        self.srcPort = srcPort
        self.dstIP = dstIP
        self.dstPort = dstPort
    }
    
    
    init() {
        
    }
}

// Class that represents a flow record based on source IP address and destination IP address
class FMMIPFlow  : FMMFlowEntry  {
    var refCount : Int = 1
    init(from: FMMFlowEntry) {
        super.init(proto : from.proto, srcIP : from.srcIP, srcPort : from.srcPort, dstIP : from.dstIP, dstPort : from.dstPort)

    }

    override public func hash(into hasher: inout Hasher) {
        hasher.combine(srcIP)
        hasher.combine(dstIP)
    }
    
    static func == (lhs: FMMIPFlow, rhs: FMMIPFlow) -> Bool {
        let response = lhs.dstIP == rhs.dstIP && lhs.srcIP == rhs.srcIP
        return response
    }
    
    override var description : String {
        let response : String = "\(srcIP) -> \(dstIP) [\(refCount)]"
        return response
    }
}

In the above code there is the base class (FMMFlowEntry) that has all necessary attributes. The hash function specifies which properties of the flow are used to determine if it is unique). The == function also uses those attributes to check for equality.

The class FMMIPFlow inherits every method and property from FMMFlowEntry. It then overrides the hash function that only the srcIP and dstIP attributes are used for hashing and comparison. 

This should mean that if I’d compile and run the code below, the variable uniqueIPFlows would only have one entry (as the srcIP and dstIP are the same for the two added flows). When I run it in Playground, this is the result.

That means that both flow1 and flow4 are added to the Set. I was not expecting that, I created a custom hash function and so it should not add it, right? For debugging I ran it again, printing the hash value too:

So even while they have the same hash value (which changes every time I run the code), both flows are still added to the index. That is not the purpose, but why?

Cause of the problem

The answer to that took me quite a while and would like to share it here. The problem lies in the comparison function. It is defined as a static function (conform to the protocol). The static keyword also means final and cannot be overridden. So even if I have a new static == function in the subclass, it will never be called because static implies final (and the compiler will just ignore that other method). We can validate that by adding a print in our code:

static func == (lhs: FMMFlowEntry, rhs: FMMFlowEntry) -> Bool {
    print("FMMFlowEntry == func called")
    let response = lhs.dstIP == rhs.dstIP && lhs.proto == rhs.proto &&
         lhs.dstPort == rhs.dstPort && lhs.srcIP == rhs.srcIP &&
         lhs.srcPort == rhs.srcPort
    return response
}

And rerun the code

So the hash is used for searching, but to determine if an element needs to be added to the Set, there is an equality check. And because the two flows are different (on source port), both are added. 

The solution

Now that we know the cause, we can create a solution to work around that blockade. Instead of performing the comparison check inside the static function, I can create an instance function that is called on the lhs parameter and validates against the rhs, so that I am able to override that function in a subclass. The code in FMMFlowEntry is changed:

static func == (lhs: FMMFlowEntry, rhs: FMMFlowEntry) -> Bool {
    return lhs.equals(to: rhs)
}
    
func equals(to rhs: FMMFlowEntry) -> Bool {
    let response = dstIP == rhs.dstIP && proto == rhs.proto && 
        dstPort == rhs.dstPort && srcIP == rhs.srcIP && 
        srcPort == rhs.srcPort
    return response
}

and the code inside the FMMIPFlow Class is changed to:

override func equals(to rhs: FMMFlowEntry) -> Bool {
    guard let rhs = rhs as? FMMIPFlow else { return false }
    let response = self.dstIP == rhs.dstIP && self.srcIP == rhs.srcIP
    return response
}

Now when we try to run the following code, the static compares in the base class should call the overriden equals function of the instance (FMMIPFlow in this case) and will only check for the srcIP and dstIP property, which should lead to only one entry in the Set:

Succes! Now I can create indexes / search types very quickly by just subclassing the base record and override the equals function for that specific index. Which allows me to to code the indexes I need with only a few lines of code and still be able to work with the Set class and have speed in search queries. One step closer to my in-memory database that is capable of storing millions of flows.

 

The key lesson here is that the keyword static in Swift means that it is a final method and cannot be overridden in a subclass. The method around it is to call an Instance method in the base class and override this in the subclass.

The Swift Playground used in this post is available on github

Swift, JSON Encoding/Decoding and subclasses

Swift, JSON Encoding/Decoding and subclasses

Over the past weeks I have been preparing for two CiscoLive Barcelona breakout sessions. In one of them I will give a brief demo and the other session where I will be covering parts of the Cisco Press book that I wrote. The preparation itself is not only about the slides, but also developing code that is to be used in the demo’s. These demo’s are built on iOS devices and run on some containers, so I have been writing that software in Swift, which is a beautiful and powerful programming language. One of my previous posts covers some principles of Swift. One really powerful feature is the easy capability to encode or decode data to the JSON format.  

If you want to have a class to be able to convert to and from a JSON format, just use the Codable protocol and you’re ready, see the code example below:

/*
 * Enumeration of supported message types. Extend this for new messages
 */
enum MessageType : Int, Codable {
    case unknown = 0            // default, unknown
    case acknowledgement = 255  // acknowledgement to message, if required
    case hello = 1              // hello, for keep alive, always followed by ack
    case sendMessage = 2        // send a unicast message to another client
    case broadcast = 3          // send a message to all connected clients
}

/*
 * Generic parent class
 * Every message has the following attributes
 * Version: To define which version we are talking about
 * Command of the message
 * client-id that sends the message
 */ unique request id, used for acknowledging, etc..
class Message : Codable, prettyPrint  {
    var version : String = "1.0"
    var msgType : MessageType = .unknown
    var clientId: String = "" // client host, generated by the server to guarantee
    var requestId: String = UUID.init().uuidString  // unique request id for this message, used in the ack
    
    // Default constructor
    // Not used cause calling super.init can override msgType value
    init() {
        // empty on purpose
    }
}
This code example defines a class message with variables for messageType (of type MessageType), requestId, which is a unique UUID string value, and a data variable which can contain any String. So let’s say I create a new message , called hello with the data “Hello there!” with the following code sample:
let msg = Message()
msg.msgType = .hello
msg.data = "Hello There!"

To convert this to JSON, this would only require a few lines of code:

let encoder = JSONEncoder()
let jsonData = try encoder.encode(msg)

The variable jsonData (of type Data) now contains a JSON-version of the earlier created message. Just to check the output, I can use the following commands to convert that data to String and output it in XCode’s Playground. 

let jsonDataAsString = String(data: jsonData, encoding: .utf8)

Suppose you would like to extend our message class with a special broadcast message, where the message can be sent to a all endpoints.. You could add an optional broadcastContent variable to the message class and create a state machine to determine when to use that value. Another alternative is to leverage the power of object-oriented programming and create a new subtype, like the following code example:

/*
 * BroadcastMessage is used to broadcast a message to all connected clients
 */
class BroadcastMessage : Message {
    // response message
    var msgContent : String = ""   // Message to broadcast   
}

So when you’d create a multicast message, like below, you’d expect that it would contain all attributes in the json file, right? Let’s check it out in Playground:

As you can see, the output does not contain all attributes of the broadcast message! It only contains the base message type class values. The msgContent variable is not included. It took me some time debugging and researching to figure out what happens. Swift bug SR-5431 and SR-4722  provide more details. Without going into those bugs, it comes down to the fact that as soon as you subclass a class that conforms to Codable, you need to override the default encode/decode methods and write your own. After some fiddling around, I have used the following code pattern to achieve that result.

/* Generic parent class
 * Every message has the following attributes
 * Version: To define which version we are talking about
 * Command of the message
 * client-id that sends the message
 * unique request id, used for acknowledging, etc..
 */
class Message : Codable, prettyPrint  {
    var version : String = "1.0"
    var msgType : MessageType = .unknown
    var data: String = "" // client host, generated by the server to guarantee
    var requestId: String = UUID.init().uuidString  // unique request id for this message, used in the ack
    
    private enum CodingKeys: CodingKey {
        case version, msgType, data, requestId
    }
    
    
    // Default constructor
    // Not used cause calling super.init can override msgType value
    init() {
        // empty on purpose
    }
    
    required init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        version = try container.decode(String.self, forKey: .version)
        msgType = try container.decode(MessageType.self, forKey: .msgType)
        data = try container.decode(String.self, forKey: .data)
        requestId = try container.decode(String.self, forKey: .requestId)
    }
    
    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(version, forKey: .version)
        try container.encode(msgType, forKey: .msgType)
        try container.encode(data, forKey: .data)
        try container.encode(requestId, forKey: .requestId)
    }
}

/*
 * BroadcastMessage is used to broadcast a message to all connected clients
 */
class BroadcastMessage : Message {
    // response message
    var msgContent : String = ""   // Message to broadcast
    
    // coding keys enumeration used for JSON encoding/decoding
    private  enum CodingKeys: CodingKey {
        case msgContent
    }
    
    // set class variables
    private func initClassVars() {
        self.msgType = .broadcast
        msgContent = ""
    }
    
    // default constructor. Call the parent and set variables
    override init() {
        super.init()
        initClassVars()
    }
    
    // Constructor used to instantiate a class from JSON Data
    required init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        msgContent = try container.decode(String.self, forKey: .msgContent)
        try super.init(from: decoder)
    }
    
    // Method used to encode class to JSON
    override public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(msgContent, forKey: .msgContent)
        try super.encode(to: encoder)
    }
}

As you can see, when BroadcastMessage is converted to JSON, it is now correctly encoded.

I am now using the coding pattern below to achieve this functionality:

  • Create a private enum called CodingKeys that follows CodingKey. ]
  • Enter all class variables as part of the enumeration
  • Create custom encoders and decoders for the base class
  • In the subclass, define a new private enum called CodingKeys . I have marked both private so the compiler knows which variable to know in which function
  • Create the custom encoders
  • Encode the variables of the child class and then
  • Call the encoder / decoder of the parent class 

Creating custom indexes to data with Swift Sets

Swift & Network programmability, a good combo? An introduction.

Swift is commonly known by iOS and MacOSX Software developers as Apple introduced the language in 2014 for MacOSX, iOS and Linux application development.
In my role as software engineer I’ve used different programming languages to build small tools, solutions or prototypes. For network programmability I’ve used Java as my primary language. I have my reasons, which I might share later in another post.

Network Programmability on the web pretty much evolves around Python. Is Swift mature enough and powerfull enough to be used for programming the network? Time to write up my experiences in a blog series. The first post is an introduction to Swift. (more…)