It was bound to happen in the end, the wrong environment got deleted on our TFS Lab Management instance. The usual selection of rushing, minor mistakes, misunderstandings and not reading the final dialog properly and BANG you get that sinking feeling as you see the wrong set of VMs being deleted. Well this happened yesterday, so was there anything that can be done? Luckily the answer is yes, if you are quick.
Firstly we knew SCVMM operations are slow, so I RDP’d onto the Hyper-V host and quickly copied the folders that contained the VMs scheduled to be deleted. We now had a copy of the VHDs.
On the SCVMM host I cancelled the delete jobs. Turns out this did not really help as the jobs just get rescheduled. In fact it may make matters worse as the failing of jobs and their restarting seems to confuse SCVMM, took it hours before it was happy again, kept giving ‘can’t run job as XXX in use’ and losing sight of the Hyper-V hosts (needed to restart the VMM service in the end).
So I now had a copy of three network isolated VM, so I
- Created new VMs on a Hyper-V host using Hyper-V manager with the saved VHDs as their disks. I then made sure they ran and were not corrupted
- In SCVMM cleared down the saved state so they were stopped (I forgot to do this the first time I went through this process and it meant I could not deploy the stored VMs into an isolated environment, that wasted hours!)
- In SCVMM put them into the library on a path our Lab Management server knows about (gotcha here is SCVMM deletes the VM after putting it into the library, this is unlike MTM Lab Center which leaves the original in place, always scares me when I forget)
- In MTM Lab Center import the new VMs from the library
- Create a new network isolated environment with the VMs
- Wait……………………….
When it eventually started I had a network isolated environment back to the state it was when we in effect pulled the power out. All took about 24 hours, but most of this was waiting for copies to and from the library to complete.
So the top tip is try to avoid the problem, this is down to process frankly
- Use the ‘mark a in use’ feature to say who is using a VM
- Put a process in place to manage the lab resources. It does not matter how much Hyper-V resource you have you will run out in the end and be unable to add that extra VM. You need a way to delete/archive out what is not currently need
- Read the confirmation dialogs, they are there for a reason