Back in 2000, when I was working at Disney World as a Network Design Specialist, all disney.com e-mail accounts ran on two e-mail servers affectionately known and Pain and Panic (named after the shape-shifting imps from Disney’s Hercules movie). Little did I know that those names were about to become all too real.
My biggest design project at disney involved replacing the network that interconnected the different theme parks (Magic Kingdom, Epcot, Animal Kingdom, and the Studios), along with some of the Disney resorts. At the beginning of the project, those locations were interconnected via FastEthernet (i.e. 100 Mbps) links, and when you looked at the physical topology superimposed on a map of the Disney World property, it looked like a smile, resulting in it being called SmileNet.
What I had to do was design a network with new Cisco Catalyst switches using Gigabit Ethernet (i.e. 1 Gbps) links. Redundancy was key in the design, meaning that the new design should not only have redundant links, but a Spanning Tree Protocol (STP) root bridge needed to be strategically choosen (and a backup root bridge), to provide optimal pathing.
After I completed a preliminary proposal, I had to go before a peer-review board and present/defend my design. Finally, after weeks of planning and mocking things up in a lab, we were ready to roll out the new design, which included multiple Cisco Catalyst 6509 and 3500 Series switches.
Before ripping out the old switches and replacing them the new ones, we wanted to make sure that we minimized the impact on both users (i.e. Disney’s cast members and guests) and other IT staff. You see, Disney World (just outside of Orlando, FL) was interconnected to ABC (in New York, NY), Paul Harvey Radio (in Chicago, IL), and Disneyland (in Anaheim, CA). A network disruption in one location could be felt in all locations, meaning that we all needed to be on the same page regarding any major network changes. To make that happen, we had a change management system, a software application that allowed the IT staff in one department (or at one location) notify other IT staffers about a planned network maintenance operations.
That’s the big lesson I wanted to share with you, the importance of having a change management system. If you just decide to swap out a router at midnight tonight, that might kill an over-the-network backup of a critical server. However, if you announced your intention via a change management system, the system administrator of the server could let you know that it wasn’t great timing for them, and you could make other arrangements.
If you don’t currently use any change management software, there’s no shortage of options from which you can select. Just do an Internet search for “change management software,” and you’ll learn about lots of options.
My team had done our due diligence and posted a notification in Disney’s change management system about our impending equipment/connection replacements. No one seemed to have an issue with our schedule, and we were ready to rock.
To minimize the impact of our equipment/connection swap out, we had a maintenance window beginning at 2:00 AM. Four of us showed up at the DISC (Disney Information Services / IT Data Center) Building, hopped in a van and drove over to Magic Kingdom. Our goal that night was to swap out a couple of older switches with two Cisco Catalyst 6509s (equipment at other locations would be swapped out on different nights). We walked into the tunnel system under Magic Kingdom, where the switches were.
FUN FACT: The tunnel system under Magic Kingdom was built on level ground. Then, dirt was excavated from a nearby area to cover the tunnel system. Magic Kingdom was then built on that newly created ground. What happened to the big gaping hole left after the removal of all that dirt? It was filled with water and became the famous Seven Seas Lagoon.
We swapped everything out as planned. The link integrity lights on the Cisco Catalyst 6509s came to life, and all seemed well. However, after returning to our DISC Building offices, something was definitely amiss.
Specifically, our e-mail didn’t work. After digging a bit deeper, we came to the stark realization that nobody’s e-mail worked. From CEO Michael Eisner to Mickey Mouse, all disney.com e-mail accounts were down.
In the wee hours of the morning we pondered whether or not our backbone replacement had anything to do with the e-mail issue. It seemed too coincidental to not be our fault, and it seemed as if this might be a resume-producing event.
We waited for the e-mail team to come in to further diagnose the issue. Sure enough, Pain and Panic (the e-mail servers) were not happy. The sun was rising over Cinderella Castle when the underlying issue was finally discovered. It seems that someone in Disneyland (in Anaheim, CA) had removed an ATM (Asynchronous Transfer Mode) module from a switch or router, resulting in the Pain and Panic servers being isolated from the rest of the network. And before you ask, the answer is no. The Anaheim team did not enter that activity into the change management system. Their failure to do so, and the quite literal pain and panic I went through that night has made me a huge proponent of change management.
The primary lesson I hope you take away from this blog posting is the importance of having a change management system in place. However, a change management system is of limited value if it isn’t used by everyone. That’s where employee training comes in.
This blog posting is the second in my Lessons I Learned from Disney series. If you missed the first one, you can check it out here:
Kevin Wallace, CCIEx2 (R/S and Collaboration) #7945, CCSI 20061