• MyOxy
  • Offices & Services
  • Alumni
  • Newsroom
  • Calendars

Occidental College

Our StoryAdmission & AidAcademicsLife at OxyLos AngelesOxy VoicesGivingGo Tigers!

Information Resources

  • Information Resources
  • Blog
  • What's the point of all this maintenance?

What's the point of all this maintenance?

September 15, 2011

As you can guess from our schedule of activities for this Saturday, the point of maintenance boils down to keeping stuff up to date and fixing stuff that's broken. So why do we need to keep things up to date and why can't we fix the broken stuff some other time?
Automatic Updates

Where's the "Restart Never" button?

Excellent questions. Potentially unconvincing answers below the break.

Why bother with updates?

For a while, not updating systems was the norm and this was for all the reasons that are likely to be obvious to you - downtime is bad, change is bad, and nothing bad ever happens anyway.  And this was all fine until bad stuff started to happen with systems that were never updated. Really bad stuff. The Oxy campus got hit pretty hard by the Blaster worm (as did just about everyone else on the Internet).  Back then, we didn't update our employee desktops, we didn't require students to register their computers, and we didn't regularly patch our servers.  Rather than do those things, though, we focused solely on what is known as network perimeter security.  It was expensive and difficult to manage but it ultimately worked. Until it didn't. So we'd shore up our network perimeter again. Lather. Rinse. Repeat. In the meantime, we had servers getting infected which meant we had to take them down, rebuild them and restore them from backup, leaving departments without their data for days. We had students bring down infected machines to add to our queue which, on bad weeks, numbered up in the hundreds and patiently waited up to a week to get it back.  And we had our employee desktops getting infected, requiring ITS to come out, wipe them clean, restore them to default configuration. It's worth pointing out that nearly all of the massive worm infestations I linked to above were entirely preventable - the necessary security updates were available, they just weren't being installed.  So we learned our lesson.  We install updates now.  We know it inconveniences everyone but hopefully everyone can see that the alternative was pretty lousy.

You said something about fixing broken stuff?

Oh yes, that's the other bit.  On any given day, lots of things break.  Most of it is easily fixable.  If a particular fix requires a reboot or downtime but it affects a relatively simple system or if the fix is limited in scope and impact, we'll usually fix those late at night or early in the morning. But not all fixes are simple.  And when the thing that's broken is an enterprise database application, a critical piece of networking equipment, or a complex bundle of disks, fixing the problem can be a most decidedly non-trivial task. The main concern here is the potential impact if something goes wrong while fixing it. The technology involved is pretty complex with lots of moving parts and lots of dependencies. Often times, this means lots of different folks need to be involved and if things go wrong, lots of different vendors might need to be contacted and additional folks from ITS may need be brought in as well. So why Saturday morning?  As trivial as this might sound, one of the reasons is because the aforementioned people tend to be awake.  There are practical benefits to this.  Not having to wait while the second-tier support engineer wakes up, showers, gets dressed and commutes into the office is definitely a plus when your entire network is down. But the bigger concern is that if you're doing things late at night or early in the morning, people tend to be tired and being tired is bad. Depending on the circumstances, it can be really, really bad. So by picking a time when everyone is awake, we reduce the chance of mistakes being made when working with our most complex systems. Networking equipment can be tricky enough to work with, even when you're awake and alert.  A group of network engineers found this out last week, as you may recall. It's not perfect and we know the work of the College is not confined to business hours. With few exceptions, just about any time we pick for maintenance inconveniences someone (the Jackson-Buffet principle) and the fact that everyone has been so understanding of this is a big part of why I like working here.

But what about...

Any other questions?  Comments?  Criticisms?  Are my arguments invalid?  Please feel free to leave a comment below or send me an email.

Information Resources

  • Academic Commons Vision
  • Blog
    • Yesterday's Internet Outage
    • Using OxyConnect with Internet Explorer 8
    • So Is Our Anti-Virus Totally Broken?
    • Information Resources 2012 Fall Newsletter now available!
    • Deptprinters queues
    • Academic Commons Taskforce: Progress and Process
    • Academic Commons….the history
    • Building and Rebuilding
    • By the Book – The Evolution of the Library
    • Do you love or hate conversation view for Gmail?
    • Does private browsing work?
    • One Perspective on The Digital Scholarship Institute
    • Optimizing Resources
    • OxyScholar Feb '12 Stats
    • Mapping New Directions in Academic Technologies
    • Making (Art) History With the Help of the DSI
    • Enumerating Badness: The latest way Facebook conspires to destroy you
    • Google Calendar sharing changes
    • How do you pick when to do maintenance?
    • Is there really a safety in numbers on Facebook?
    • A “Site” For More Eyes
    • A new method for accessing your files from off campus
    • Why does my antivirus icon look different?
    • Worried about losing your phone?
    • Why will the Internet go down for everyone if you're just doing work in HSC?
    • yeah, that was an earthquake
    • Where We Go To Learn
    • What's the point of all this maintenance?
    • A brief history of computing at Occidental
    • A Darker Shade of Green: Saving Power in the Datacenter
    • 2x upgrade
    • 2011/12 Library Break Hours extended thanks to student feedback
    • "Takeaway" Lunch - Scholars Discuss Their Experiences at the 2010 DSI
    • Pharos Print Drivers
    • The Definition of Simplicity
    • The DSI, Then and Next
    • The Attention Economy: Calculating the “Cost” of Information Overload
    • What does an Internet problem look like?
    • What exactly is a NOS anyway?
    • Welcome
    • Using Oxyconnect's Appointments Feature
    • The Right Tech For Real Results
    • Update on the 2011/10/07 network outage
    • Print your reports as PDFs from Banner
    • So what is central storage and why are you doing so much maintenance because of it?
    • SSL is supposed to solve two problems but most people only care about one of them
    • The Academic Commons Emerges
    • Scheduled Maintenance for Saturday, April 14th
    • Release to Print update - Coming Soon to Res Halls
    • Scheduled Maintenance for 15 Oct 2011
    • Scheduled Maintenance for 17 Sep 2011
  • Hours
  • People
  • Employment
  • Info Center:

    (323) 259-2640


  • Technology Helpdesk:

    (323) 259-2880 helpdesk@oxy.edu


  • IR Operations Offices: (323) 259-2832


  • Information Resources VP/CIO: (323) 259-1451
Tweet

Occidental College

  • For Parents
  • Employment
  • Contact Us
  • Maps & Directions

1600 Campus Road Los Angeles, California 90041