Faster Log Insight Responses for NSX Firewall Source/Destination IP Queries
We've been doing a lot of work with the NSX Firewall recently. Log Insight has become our go-to tool for troubleshooting to get real-time information about what the firewall is doing. By far, the most common query that I run in Log Insight will be for all entries that have a vmw_nsx_firewall_src or vmw_nsx_firewall_dst of the IP Address that I'm interested in, and I'll often throw a vmw_nsx_firewall_dst_port or a vmw_nsx_firewall_action into the mix to further refine my results.
Unfortunately, these queries can be pretty slow. They're great if you're looking at the last 5 minutes worth of data, and they're pretty good going back to the past hour... but when we went beyond a 1 hour window, we found ourselves needing to wait. If we wanted to go all the way out to a 24 hour window, we'd need to go get lunch while the query ran. That seemed unreasonable to us, so we opened a support ticket and the VMware engineer made some tweaks that absolutely helped, but we found something else that's really easy and can be done by anyone experiencing similar issues.
As I mentioned, we do a lot of queries based on Source and/or Destination IP Address. We noticed that queries based on Firewall Action were significantly faster than our queries that were based on an IP Address, so we dug into the extracted field definitions, and found something interesting. This is the default field extraction regex for the Source IP Address:
\S+\s+[A-Fa-f\d\.:]+(/|->)
Yikes! It took us a bit to parse that, but there's some important stuff to understand in there. The first thing to note is that part of it is grey and part of it is black. The grey parts are preceding matches and postceding matches, whereas the black is the actual extracted field itself. So, in this case, we're looking for stuff that follows a match of \S+\s+, which is looking for 1 or more non-white-space characters that are immediately followed by 1 or more white-space characters. After that, it's looking for a whole bunch of hexadecimal characters or a colon or a period, all in sequence... so it'll find an IPv4 or an IPv6 address. Finally, for the postcedent, it's looking for a "/" or a "->".
We theorized that this regex might be slowing down our queries because of how broad it is. Since we're using IPv4 at this customer site, we decided to try making our own version of the query, but make it so that it would only key onto IPv4 addresses:
\S+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(/|->)
Yeah, I know that this actually looks even uglier, but it's actually pretty simple. It's looking for four chunks of 1-3 numeric digits with dots in between them... ie, an IPv4 Address. OK, technically, it'll happily grab some non-IP Address string like 876.462.123.32, but such a string is super unlikely to show up in these logs and our whole goal here is to make a faster regex, so we accepted that risk.
And it worked! We ran a dreaded 24 hour query and, within a couple of minutes, it was pulling back thousands of results (compared to the original query effectively hanging the interface). We watched it merrily count up the number of results until we got bored (at 67k hits), which was plenty for us to realize that this query was running much more quickly.
So, we went ahead and put together these IPv4 extracted fields for both Source and Destination, to make our lives easier! Here's a couple of screenshots of the extracted field definitions themselves, for reference. You can actually set these up really easily by clicking the eyeball View this Field icon next to the default source/destination fields, then press the Duplicate button and just change the Extracted Value to IP address (v4) via the dropdown menu.
One more quick note: that dfwpktlogs: INET additional context also helped to speed up queries with these extracted fields, as that limited the number of log entries that it had to compare to these regular expressions when attempting to find source or destination IP Addresses.
Unfortunately, these queries can be pretty slow. They're great if you're looking at the last 5 minutes worth of data, and they're pretty good going back to the past hour... but when we went beyond a 1 hour window, we found ourselves needing to wait. If we wanted to go all the way out to a 24 hour window, we'd need to go get lunch while the query ran. That seemed unreasonable to us, so we opened a support ticket and the VMware engineer made some tweaks that absolutely helped, but we found something else that's really easy and can be done by anyone experiencing similar issues.
As I mentioned, we do a lot of queries based on Source and/or Destination IP Address. We noticed that queries based on Firewall Action were significantly faster than our queries that were based on an IP Address, so we dug into the extracted field definitions, and found something interesting. This is the default field extraction regex for the Source IP Address:
\S+\s+[A-Fa-f\d\.:]+(/|->)
Yikes! It took us a bit to parse that, but there's some important stuff to understand in there. The first thing to note is that part of it is grey and part of it is black. The grey parts are preceding matches and postceding matches, whereas the black is the actual extracted field itself. So, in this case, we're looking for stuff that follows a match of \S+\s+, which is looking for 1 or more non-white-space characters that are immediately followed by 1 or more white-space characters. After that, it's looking for a whole bunch of hexadecimal characters or a colon or a period, all in sequence... so it'll find an IPv4 or an IPv6 address. Finally, for the postcedent, it's looking for a "/" or a "->".
We theorized that this regex might be slowing down our queries because of how broad it is. Since we're using IPv4 at this customer site, we decided to try making our own version of the query, but make it so that it would only key onto IPv4 addresses:
\S+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(/|->)
Yeah, I know that this actually looks even uglier, but it's actually pretty simple. It's looking for four chunks of 1-3 numeric digits with dots in between them... ie, an IPv4 Address. OK, technically, it'll happily grab some non-IP Address string like 876.462.123.32, but such a string is super unlikely to show up in these logs and our whole goal here is to make a faster regex, so we accepted that risk.
And it worked! We ran a dreaded 24 hour query and, within a couple of minutes, it was pulling back thousands of results (compared to the original query effectively hanging the interface). We watched it merrily count up the number of results until we got bored (at 67k hits), which was plenty for us to realize that this query was running much more quickly.
So, we went ahead and put together these IPv4 extracted fields for both Source and Destination, to make our lives easier! Here's a couple of screenshots of the extracted field definitions themselves, for reference. You can actually set these up really easily by clicking the eyeball View this Field icon next to the default source/destination fields, then press the Duplicate button and just change the Extracted Value to IP address (v4) via the dropdown menu.
One more quick note: that dfwpktlogs: INET additional context also helped to speed up queries with these extracted fields, as that limited the number of log entries that it had to compare to these regular expressions when attempting to find source or destination IP Addresses.
Comments
Post a Comment
Sorry guys, I've been getting a lot of spam recently, so I've had to turn on comment moderation. I'll do my best to moderate them swiftly after they're submitted,