Using VRNI to Analyze Applications

As I've been doing more NSX Distributed Firewall work, one of my customers came to me with an interesting challenge.  They had a fairly complicated application that they wanted to move into a microsegmented security model, but they only had a diagram from when the application was initially deployed and were confident that it didn't show everything that the application did or even all of the VMs that were part of the application.  It was an interesting problem, and since they had vRealize Network Insight deployed, the answer was fairly straight-forward!

VRNI captures network traffic information from just about any device that can forward netflow data.  It uses that data to figure out which devices are communicating to each other, as well as highlighting all sorts of network issues like dropped packets or asymmetric routes.  In this case, I just used it as a giant repository of glorious 5-tuple data!

I started by registering the application in VRNI.  I went to Security and then Applications and added the application, then put all of the known servers into it in tiers as per the outdated diagram that I was given.

Next, I ran a query over the past week's worth of data: flows where application = 'my application' to get VRNI to show me all of the network communications that involved any of the VMs in this application.  This pulled up some 4k network communications (note, this isn't packets, this is just flows, like VMA -> VMB:443, VMA -> VMB:80, so there isn't a bunch of duplication).  I then exported this data to CSV and went to work.

My first goal was to identify any additional systems that weren't in my original set.  So I opened up a new tab in excel and put my list of known servers in a column, then added a new column to my VRNI export.  I used that new column to exclude all network flows that were between two systems that I already knew were part of the application by using a formula like this: =sum(countif(<known server range>,source vm)+countif(<known server range>,destination vm)).  That formula gave me a 2 if both the Source VM and the Destination VM were on my "known servers" list, so I filtered out all of the 2s.

Next, I filtered out all network flows where the Source VM or Destination VM were blank (those were all flows that went out to external, non-VM devices, like desktops or the internet), which left me with a few hundred flows of interest (that is to say, network flows that involved one of my application VMs and some other VM).  I was able to further reduce my list of interesting flows by filtering out common infrastructure traffic, like NTP, DNS, LDAP, etc. 

Since I couldn't think of any other convenient ways to reduce the list further, I took all of the Source VMs and Destination VMs and made them into a single list in a column (and removed duplicates).  I then used that same countif technique to determine which of those VMs were already on my "known servers" list (so a 1 meant that it was on the list already), and I filtered those out.  That left me with a list of servers of interest.  I brought those servers to the application's SME, asking him to help identify which of those servers were part of the application.  He recognized about 3/4 of the systems as his, which I added to the list.  I then took the remaining servers to the infrastructure team to verify that they were either infrastructure systems or parts of other applications that had an interdependency with the target application (noting those interdependencies for specific rules in the future).

I added the new servers to the VRNI application, then repeated the whole process.  I went through this several times, until I had discovered all servers that were part of the application.  At that point, my next step was to start classifying the servers by tier so that we can apply appropriate policies to them.

First, I wanted to identify the presentation tier servers.  The presentation tier is the collection of servers that interact with the customers.  From a VRNI data perspective, it's going to be the systems that have flows that do not have a Source VM or that do not have a Destination VM (excepting those flows that reach out to the internet).  So, I filtered my list to show me only lines that had a blank Source VM or Destination VM field, then did my best to remove any administrative or non-applicable traffic.  I removed any RDP network flows and tried to evaluate SSH on a case by case basis.  Eventually, I had a list of servers that would be tagged as presentation tier and the list of ports that each one was using.

Next, I wanted to identify the database tier.  We knew that this system used Oracle, which operates on a well understood set of ports.  So, I filtered my list of network flows based on those ports and grabbed all of the VMs in the Destination VM field.  That gave me my database tier.  What was left?  The application tier!  How'd I populate it?  With whatever was left!

So, that gave me my list of servers and an approximation of which tiers each one was in.  I next decided that I'd like to try and visualize that data, since a giant table isn't exactly what I'd call user friendly.  The best free tool that I could find for this task was draw.io.  I chose this tool because it can accept text input and will automatically generate an (admittedly crude) relationship diagram from that text.

To prepare my text, I added another column to my VRNI data.  This one had a formula like this: =concatenate(Source VM,"->",Destination VM) which generates text in the syntax that draw.io is expecting for its relationship diagram.  I then removed the duplicates from this column (since draw.io is happy to draw the same line over and over again) and imported that data.

To do that, I pressed the + icon on the far right of the ribbon, then selected from text...  I changed it to a diagram via the dropdown on the bottom, and pasted in all of those X->Y lines that I generated in Excel.  And that gave me a radial mess, which didn't work too well for me because I conceptualize application flow diagrams as more of a vertical flow mess.  So, with everything selected, I went to Arrange -> Layout -> Vertical Flow, which gave me a display that I found far more helpful.

Between the spreadsheet with its categorizations of the servers and the unattractive network flow diagram that I generated, we were able to figure out this application and assign appropriate security policies to it!

Comments

Popular posts from this blog

Clone a Standard vSwitch from one ESXi Host to Another

PowerShell Sorting by Multiple Columns

Deleting Orphaned (AKA Zombie) VMDK Files