Background

We have been successfully using our TFS Lab Management system for a while. However, we have noticed an issue that when deploying environments the performance of the system slowed. This was down to I/O congestion between our servers and the SAN that provided their main VM storage because the library store and the hyper-v host servers all shared the same SAN.

clip_image002

Also we had the need to start to create environments using Windows 8 and Server 2012. This is not possible using System Center Virtual Machine Manager (SCVMM) 2008 or Hyper-V hosted on earlier than Windows 2012.

So it was time to do an upgrade of both the operating system, and our underlying hardware. The decision was made to dump the SAN and provide each Hyper-V host with its own SAS based RAID 5 disk. The main Lab SCVMM library server would also use local storage. All servers would be moved to Server 2012 and SCVMM would advance to 2012 with Service Pack 1 (required to manage Server 2012).

clip_image004

You might ask why we did not do this sooner, especially the move to support Windows 8 and Server 2012 VMs? The answer is that until TFS 2012 quarterly update 1 there was no support for System Center 2012, which itself had to wait until the SP1 release in January. So the start of the year was the first time we could do the update. We had planned to do very early in the year, but waiting for hardware and scheduling of our time let it drag on.

Hardware

The hardware upgrade was less straightforward than original anticipated as we needed to change the host controller card to support the required RAID5 array of disks. Once that was done on each of the Hyper-V hosts, installing the OS and configuring Hyper-V was quick and easy.

Meanwhile the library server was moved to a different box to allow for the sliding block puzzle of storage move and OS upgrades.

The Plan

After much thought we decided our upgrade path would be

  1. Stop all environments
  2. Rebuild each Hyper-V server with Server 2012 on their mirrored boot disks, adding the new SAS hardware and removing the SAN disks.
  3. Build the new SCVMM 2012SP1 server on Server 2012. The DB server would also move from an in-place SQL 2008 R2 instance to be hosted on our resilient SQL 2012 with Always On (also now supported in SCVMM 2012 SP1). This would mean an upgrade with existing database, as far as SCVMM is concerned.
  4. Add the rebuilt Hyper-V hosts to SCVMM
  5. Get the deployed VMs off SAN and onto the new SAS disks, along with a transfer of the library share.
  6. Reconfigure TFS to point to the new SCVMM server

What really happened

SCVMM Upgrade

The major problem we found was that you can’t upgrade the database from SCVMM 2008 R2 SP1 to SCVMM 2012 SP1. It need to be upgraded to SCVMM 2012 first.

In fact the SCVMM install process then suffered a cascade of issues:

  • SCVMM 2008 R2 needed the WAIK for Vista. SCVMM 2012 required the Win7 WAIK and SCVMM 2012 SP1 wanted the Windows 8 Automated Deployment kit.
  • Each time SCVMM wanted to upgrade its database, the database had to be in Simple Recovery mode. This is incompatible with Always On, which requires Full Recovery mode. This meant that the DB couldn’t be moved into the Availability group until the final installation was complete.

In the end we built a temporary VM for SCVMM 2012. The original host was left on Server 2008 R2 with the Vista WAIK; the intermediate host had Server 2008 R2 with the Windows 7 WAIK and the final host had Server 2012 with the Windows 8 ADK.

The problem we then found was that the final SCVMM 2012 SP1 system failed to start the Virtual Machine Manager service. The error we saw was:

Problem signature:
P1: vmmservice
P2: 3.1.6011.0
P3: Utils
P4: 3.1.6011.0
P5: M.V.D.SqlRetryCommand.InternalExecuteReader
P6: M.V.DB.CarmineSqlException
P7: b82c

We rolled back and repeated the process a number of time, and even tried a final in-place upgrade of the original host. Each resulted in the same fault. The internet turned up no help – two or three people reported the same fault and each one required a clean SCVMM installation to fix the fault.

Interestingly, our original DB size was around 700Mb. Each time, the upgrade process left a final DB of around 70Mb, suggesting something had gone wrong during DB updates. Whether this had anything to do with the presence of Lab we don’t know.

In the end we had no choice but to install SCVMM 2012 SP1 clean, with a new database. Once we did that everything worked just fine from the SCVMM point of view.

TFS

First we repointed TFS at the new SCVMM server

**Tfsconfig.exe lab /settings / scvmmservername:**my_new_scvmmservername /force

This worked OK, we then tried to run to upgrade the schema

tfsconfig lab /upgradeSCVMM /collectionName:*.

But this errored. Also when we tried to change the library and host groups both via the UI and command line we also got errors. The problem was that TFS thought there were libraries and host groups on the now retired old SCVMM server.

The solution was to open Microsoft Test Manager (MTM) and delete the live environments and stored environments, VMs and templates. This had no effect on the new SCVVM server as these entries referred to the now no-existent SCVMM host. It just removed the entries in the TFS databases.

Once this was done we could run the command

tfsconfig lab /upgradeSCVMM /collectionName:*.

And we could also reset the library and host groups to point to the new server.

So what do we have?

The upgrade (reinstall?) is now done and what do we have? On the Hyper-V hosts we have running VMs, and due to the effort of our IT team we have wired back the virtual networks previously created for Lab Management for network isolation. It is a mess and need doing via MTM, but it works for now.

The SCVMM library contains loads of stored environments. However as they were stored using the GUID based names used in Lab management their purpose is not that obvious.

As we had to delete the SCVMM database and TFS entries we have lost that index of GUIDs to sensible names.

The next step?

We need to manually take VMs and get them imported correctly into the new SCVMM library so that they can be deployed.

For running environment that don’t require network isolation we can compose new environments to wrapper the VMs. However, if we need network isolation we can’t see any method other than to push each VM up into the library and re-deploy them as a new network isolated environment, more posts to follow as I am sure we will learn stuff doing this.

Points of note:

  • Lab Manager shoves a load of XML into the notes field of a VM on Hyper-V. If that xml is present, lab assumes the VM is part of an environment and won’t show that as being a VM for use in new environment. Deleting the XML makes the VM magically reappear.
  • Allowing developers to make lots of snapshots, particularly with save states is a bad idea. When migrating between server versions, the Hyper-V save states are incompatible so you will lose lots and lots and lots of work if you’re not careful.
  • Having lots of snapshots with configuration variations might also cause confusion. Collapsing snapshots overall is a not a bad idea.
  • If you have a VM in the library (we now have multiple libraries to make sure people using Lab can’t accidentally delete key VM images) with the same VM ID as a running machine SCVMM won’t show you the one that is running. If you think it through this makes sense – SCVMM doesn’t export and import VM folders, it simply hefts them around the network. This is just like copying a VM form one Win8 box to another – the default option is to import the machine with the same ‘unique’ identifier.
    The solution to this one is to import a new copy of the source VM onto the hyper-v host, choosing the ‘make a copy and generate a new ID’ option. This new VM can then be manipulated with SCVMM and you still have a ‘gold master’ in your library.
  • Enabling server 2012 data deduplication on your SCVMM library shares is a very good plan. Ours is saving 75% of disk space. The only thing to be wary of is that if you pull a VM from the library onto Hyper-V and then store it back to the library the file will take up the ‘correct’ amount of disk space until the dedupe job runs again. If you’re not careful you can ‘overfill’ your disk this way!
  • SCVMM 2012 and SP1 like clouds and offer all kinds of technology solutions that can build a virtual infrastructure for you with VLANs and all kinds of cleverness. This will confuse the life out of Lab Manager so don’t do it for your Lab Environment. We now have a couple of ‘clouds’ in SCVMM and the Lab one consists of little more than the Hyper-V hosts and associated libraries. There is one virtual switch and one logical switch, both of which exist only to hook up our main network. Anything more complex will confuse Lab.
    Meanwhile, Lab still creates new Hyper-V Virtual Switches. SCVMM will know nothing of these so you need to be aware of that when inspecting VMs in SCVMM.

If we were doing it again…

So what have we learnt? The upgrade of the SCVMM database is critical. If this fails you have no other option other than to rebuild and manual recreate from the basic VMs

Even with SC-VMM 2012 SP1 and Lab Management there are still many moving parts you have to consider to get the system working. The change from 2008 to 2012 is like going from Imperial to Metric measurements. It is the same basic idea, it just feels like everything has changed.

Thanks to Rik and Rob for helping with this post