Having to migrate a data centre or computer room is not the once-in-a-lifetime event that it used to be. In today’s global economy with mergers, acquisitions, divestments, outsourcing, insourcing, industry regulation and real estate opportunities being key, organisations may be faced with the prospect of moving their IT infrastructure once every 5-7 years.
This can be a major, high-risk project, particularly to organisations with 24 x 7 customer facing applications, such as those provided via the internet. But by proper planning, utilising available technology and with good management, a data centre migration can be executed with minimal risk and maximum effect.
Just how can this be done? Here are the seven steps to success:
1) Preparation
Get executive sponsorship for the project and an understanding at senior management level that this activity is going to happen and that there is likely to be an impact on service delivery. Then get someone in who has done this before so that the project manager is not learning by his or her mistakes at the expense of your project or business. Once IT Executive is on board and you have the right project manager in place you are ready to start planning.
2) Planning
Like any major project, the devil is in the detail and detailed project planning is imperative. At 4am on a Sunday one does not want to be deciding how to deal with technical problems or what to do next. A properly prepared project plan means the project manager will only be ticking off tasks on a list, not making them up.
Key to the Planning Phase is to follow a proper project management methodology. PRINCE2 and PMBOK are example methodologies that provide ways to manage things like scope and project co-ordination and managing time, quality, procurement, cost, communications risks and issues. As a challenge to your project managers, ask them to tell you the difference between a risk and an issue. An incorrect answer should tell you that you may not have the right PM. In case you are struggling yourself, a risk is something that has not yet happened that will have an impact on the project if it does; an issue is an event that has happened and is now impacting project deliverables. These core elements of project planning must be managed with discipline and process.
3) Engage the business
Prior to executing a data centre migration, it is vital to determine the impact of a likely service outage to the business units affected by the migration. Service Level Agreements must be closely scrutinised, business unit stakeholders engaged and the impacts on the business determined. Business representatives will be required to help prepare for, participate in and test after a migration. The business will not let you do anything that is likely to have a major impact on their ability to deliver a committed service or affect its reputation in the marketplace. If you understand this from day one, you are more likely to obtain a positive response from the business as you try to reach a compromise position in order to execute the migration(s).
4) Network transition approach
A project of this type must be executed with minimal risk to maximise the likelihood of a successful outcome. A simple way to minimise risk in a data centre migration is to install duplicate components of the key current data centre infrastructure into the new facility.
This must include core and distribution layer network switching infrastructure and may include disk and tape storage, depending on its complexity and the amount of downtime available during the migration. Of course you now know this as you have solicited this information from the business as discussed above.
Network infrastructure is very complicated both in its design and its physical appearance. Networks can, amongst other things, be virtualised, segmented, spanned and routed and the relocation strategy must take the current state of the network into consideration as well as considering how available technology can be exploited to minimise risk. For example, it is possible to migrate a server without having to change its IP address. For many organisations this will be highly desirable as there will be instances of applications with static (hard-coded) IP addresses that may not work if the server(s) on which the application(s) reside change IP address. Methods to ensure IP addresses do not have to change include: moving an entire subnet or VLAN at once; stretching or spanning a subnet or VLAN across the old and new sites; or routing an IP address to a new location. Your site will probably be able to cater for one of these options but they all come with their downsides so the best option for your site must be carefully considered and, if possible, tested prior to letting your network administrators loose.
Network latency may also come into play, particularly if synchronous data replication or active/active application topologies are deployed. New network and storage equipment which is likely to perform much faster than what it is replacing may address some latency issues but the pure tyranny of distance (i.e. the length of time it takes for light to travel across long lengths of fibre) may impact the project.
Of course there is also the physical side of your network. One look at the communications cabinets in your data centre will tell you that not only that it will be too risky to migrate them, but there will not be enough time to re-cable them during your agreed service outage. Many communications cabinets are a spaghetti bowl of Ethernet and fibre cables that run between switches, to patch panels and hubs and to servers and other infrastructure like SAN switches or firewalls. This is also a good time to ensure that an effective and risk-averse cable management system and cabling practices are employed. Look at the old data centre’s racks. Are they a mess ? Do you want to fix this ? Have you had outages caused by errors or accidents caused when dealing with a faulty server, cable or port etc ? Are cables running outside racks into other racks ? There probably are and you probably have had these types of issues. Many of these are caused by not having a cable management (inter and intra rack) or a server racking standard. Furthermore, now is the time to enforce a policy, if not already in place, limiting who can perform racks moves, adds and changes (i.e. to the Facilities group) and when such changes can be made (e.g. preventing any changes to a rack supporting Production equipment to after hours).
By installing a new, properly managed, network at the new data centre, not only will you be able to test your IP relocation strategy prior to the migration, but you will be able to properly cable switches, hubs, routers and to server racks etc prior to the migration. It might also be a good time to implement or upgrade to a higher capacity or faster core switch or similar. There is also the additional benefit of an in-place fallback position as, in the event of a back-out being required, the network infrastructure is still sitting at the current site.
5) SAN approach
Again, with risk minimisation in mind, if you told your CIO that all business-critical data would be physically located at the new data centre prior to any servers moving there from the current site, would he or she like that approach? Of course they would.
This is not quite as simple as it sounds, however, as there may be many things to do to prepare. This approach assumes that your business-critical data lives on a storage area network (SAN or NAS), not on locally attached storage. If your data is on a SAN, you have reached first base. This means it may be considered as a single entity. Now you need to be already using or in a position to use a data replication tool and process, which most disk vendors offer. Data replication tools allow the user to have multiple copies of the same data at multiple locations. Using a data replication tool, data can be maintained live across the current and new sites when relocating servers. That way, the applications they support only need to be unavailable for about the length of time it takes to have the servers migrated and operational at the new site. And, of course, there is a fallback position of the data still being current and available at the now old site.
This may also be a great opportunity to test your disaster recovery strategy or, in fact, to implement one. This process is effectively the same as interrupting the delivery of service from the current data centre and re-starting it from another one. Having continuous data replication will greatly reduce the amount of time it will take to restore normal operations in the event of a disaster. Implementing or improving this to support your data centre migration may have some significant additional benefits to the business after the migration.
6) Consolidation or Virtualisation
Many IT shops are implementing consolidation or virtualisation strategies. These are different means of reducing the physical number of servers required to run a large number of applications. A common server deployment strategy has, until recently, been to run one application per server. This is now changing to many (compatible) applications per server. Moving servers to a new data centre is a good opportunity to implement a strategy like this since the workload only, and not the server, is moving. Physical-to-virtual processes and server consolidation strategies are available from your vendors and other sources. These can be tested prior to and executed on the day of a migration. As a fallback, the original servers are still sitting in place at the now old data centre. It may also be a good time to consider technology refreshes and/or Unified Computing. Whilst significantly increasing the budget of the project, installing new kit at the new site may be a way for you to replace old hardware with newer technology whilst dramatically reducing the risk of the move.
7) Performing the migration using brains not braun
Moving data centres is not just about putting stuff into a truck and moving it to a new location – in case you haven’t already got the picture! There are a few tricks that will help ensure that this can happen with minimal risk and in minimal time. These include:
Power-off and power-on all infrastructure in scope to migrate. Hardware failures after a migration are generally not caused by damage in transit but by nature of the fact that some equipment may not have been power-cycled for months or years and the simple matter of stopping and re-starting it can cause a hardware failure such as a blown power supply, a dead card or a dead disk. If anything in scope to migrate is power cycled in the weeks leading up to the migration, it will likely fail then rather than after it has been migrated. This means that repairs can be affected at a time other than an already small migration window. It is also a good idea to uplift your hardware maintenance contracts (e.g. from 9 x 5 with a 4 hour response to 24 x 7 with a 2 hour response) for the duration of the migrations. Also make sure your hardware vendors know you are moving and that you may call on them over that weekend, not to mention the new service location after the migration.
Ensure a cable audit is performed before the migration and that as much cabling is pre-installed at the new data centre prior to the migration. If you think about it, without a cable there is nothing, so incorrect cabling or a poorly planned cable strategy will almost certainly mean major issues at the new site. If cable management is planned properly, once a server arrives at the new site, its host, SAN and console cables will be waiting for it ready to be connected.
Ensure floor and rack space planning is performed. Things to look for include: ensuring the right (type, size and number) rack power outlets are ready at the new site; that the floor is strengthened where required (i.e. for heavy equipment such as disk storage devices); and that room cooling is considered (e.g. hot and cold aisle, locations of blade server racks, room for expansion). This is also a good time to ensure your configuration management database is updated.
Get the right transportation company. It may sound silly, but it is very easy to tip over a million-dollar piece of infrastructure when loading it into and out of a truck. A properly briefed and managed transportation company with credentials in moving sensitive equipment must be selected.
Manage the logistics. These are the common-sense things to do to ensure there are no gotchas on the day. Make sure there are electricians and hardware engineers on site or on standby. There will almost always be a power or cabling issue. Is someone on hand to change a power plug or fit off a cable? If there is a lift in either building, have you booked it, is it available after hours (i.e. not locked by cleaners), can you move equipment over the building entrance foyer, or do you have to use the loading bay? Have you booked the loading bay? Is a lift technician on stand-by? Does building security know what is going on?
Does security have the names of ALL the workers requiring data centre access? Make sure the weekend duty guard is briefed. As they are often temporary workers, they may not know about it and could be a major obstacle. Is there a status line? Have a phone number that management and stakeholders can call to get progress reports. Update it every hour. Is there a catering plan? Providing food and drinks for all staff on site not only keeps them fed and watered but stops them wandering off for an hour or so in search of a pizza. Is there a contact list for everyone involved, including mobile and home numbers and alternate contact?
Ensure that the new data centre is not cabled as badly as the current one. If your data centre is a showpiece of tidy and structured cabling, you are one of the very small minority. Most data centres have their fair share of cabling challenges and under the floor tiles of some of your locations will be an archaeological dig of multiple generations of cables. Your new data centre should use patch panels in server racks, KVMs, structured cabling, cable-management within and between racks, cable trays and neat and tidy, labelled cabling at the server and network ends. Once this is done, make sure your change management process does not allow this to get out of hand again. Not only will the cabling look much better but activities like being able to perform server maintenance within a rack without affecting other servers in the rack will be possible.
Finally, don’t panic. Remain cool if things go wrong during the migration, make sure there are management representatives contactable to make decisions if required and do your best to make sure no one works too long without sleep. People make mistakes when they are tired.
So you can see that, although a data centre migration is a high-risk and very complicated project, a combination of diligent management and planning, engagement with the business, leveraging of available technology and sound logistics management, not to mention common sense, can help ensure a very successful outcome with several additional benefits.