Yesterday I encountered and interesting issue that demonstrated the real necessity behind good network testing. To give some background on this story, I have previously contacted a testing vendor and have them give a very good presentation on the ins and out of good Q&A testing. The organization that saw this presentation sought this to be a bit much for their environment.
This organization has a pair of IPS Sensors in which are leveraging spanning-tree in order to have fail over. The figure below illustrates the design.
Now the vendor that supplied the IPS code is having a problem with their few latest builds. The vendor's most stable release still has a bug with the SMB portion of their engine. What is interesting is that when a particular type of SMB traffic passes through the engine the engine will fail. Now this particular organization has several tools to view the health of certain things but the IPS sensors are not one of the ones they can easily view. What happened to this particular organization was that the sensors failed, both of them. The primary sensor failed and the secondary sensor failed upon seeing the traffic conditions. It was 9 days different between the primary and secondary failure.
The beautiful thing about this design is that you leverage spanning-tree to complete fail over and the reason that is important is because the switches and routers are participating in a load sharing mode for increased performance.
The vendor is now suggesting to this team that an engineering release that can be specially developed for them that will address the issue is the way to go. So what does this organization do? Do they install code that is not completed vetted? Can they test the code thoroughly first? Do they rely on blind faith and risk complete IPS failure?
They now see the need for better network and system testing before go live and production. You see they could put the new IPS code on the secondary sensor and see if it crashes the 'passive' node, however they have not run traffic through that release since its a passive node.
This was good opportunity to win friends...and influence testing.
