So Is Our Anti-Virus Totally Broken?
On September 19th, our anti-virus software went insane. Specifically, Sophos mistakenly issued an update which caused our anti-virus software to detect itself as a virus. This, as you can imagine, caused all sorts of problems. Below, you can see a snapshot of what ITS' anti-virus alerts mailbox looked like. Note the number of messages.
We know a lot of you were affected by this issue and it caused all sorts of annoying pop ups and we apologize for the troubles this caused.
For those who are interested in learning more about the issue, please read on for more details about what went wrong, what we did to try to contain and then fix it, and where we stand now.
On most of our workstations on campus, we turn off alerting so you don't get bugged when your anti-virus software detects a virus. This actually happens more often than you think. Even if you exclude the 8,300 alerts generated by this mistake on Sophos' part, our alerts mailbox contains almost 50,000 individual alerts generated over the course of just 1 year.
Since part of my responsibilities is to manage and maintain the Sophos Anti-Virus software on our campus, though, I have alerts turned on for my desktop so I can see exactly what it's detecting when it detects it. So when this problem first reared its ugly head, I knew right away because I started getting popups about a virus called Shh/Updater-B being found all over my system.
My first inclination was that Shh/Updater-B is what is known as a false positive. That is, a programming error is causing the anti-virus program to detect benign, safe programs as being harmful and quarantining them.
First step in these cases is to check online to see if anyone else is seeing the same problem. The answer, in this case, was yes. Very yes. At the time, that discussion page wasn't 118 pages long like it is now. It was maybe 5 pages, although growing by the second. This means we're not the only ones who are having this problem. The good news, for me anyway, is that I didn't screw up. The bad news is that means I don't know how to fix it.
So second step is to call support and figure out what's going on but I can't get through, despite repeated attempts. Major problem at Oxy + same problem elsewhere + can't get through to tech support usually means something very bad has happened and I eventually got confirmation that it was, in fact, a false positive caused by a badly programmed anti-virus detection update but depressingly, there was no solution yet.
The fix came about an hour later. You may recall, however, that I initally said that this problem caused Sophos to detect itself as a virus. Specifically, Sophos detected its own autoupdate components as a virus and prevented them from running. This produced an interesting problem: Sophos is preventing itself from updating which prevents itself from downloading the update that will allow Sophos to update again.
Sophos tech support was still unresponsive, likely getting battered from calls all over the world. At this point, figuring I had little to lose, I wrote a small program which found the faulty detection file and deleted it. Then, I had it find Sophos' autoupdate components in the quarantine folder and put them back in their original location. This initial attempt worked well enough and fixed somewhere around 50% of machines within the first night.
The remaining computers were a little trickier. The next morning, I found that my program didn't quite catch all of the components that had been qurantined so I added them in. My guess is that by the time all the computers had run the updated version of this program, we were at 80-90% fixed.
Computers that didn't get the fix within the first 24 hours had quarantined enough Sophos update components that my program couldn't reliably put them back. This meant sending ITS staff and student techs to individual machines to reinstall the Sophos anti-virus program entirely.
Eventually, Sophos got their act together, issued an apology, and provided a detailed explanation of why this problem happened and what they're going to do to keep it from happening again. TLDR version: human error, but they're putting in processes to prevent it from happening again. Also, they're adding more phones and tech support staff to deal with heavy call load.
As of 10/9, our systems show only 9 computers that are still affected by this problem and we're working on tracking them down. If you think you're still having problems, though, please let the Tech Helpdesk know.
One natural question is whether we should continue to trust the Sophos product. Our previous anti-virus vendor was industry-leader McAfee and they had a far worse problem back in 2010. Symantec, another industry leader, did something similar back in 2007. So one could cynically say that Sophos is, at least, in good company here. It is largely a moot point for now. Our contract with Sophos does not expire until 2015 but we will certainly take this issue into account before signing on with them again.
One piece of good news is that, as far as we can tell, the home use versions of Sophos were not affected by this issue. ITS recently updated our home use versions to the latest available, version 10 for Windows users and version 8 for Macintosh users.
- Info Center:
- Technology Helpdesk:
(323) 259-2880 firstname.lastname@example.org
- IR Operations Offices: (323) 259-2832