I just got back from a client site where we built their end-to-end Win10 migration solution. It was one of the cleanest environments I’ve had the pleasure of working in, with a solid AD and CM infrastructure, and made things go smooth.

Up until my last day on the project.

After spending a couple of weeks tweaking their USMT and Win10 user experience, we were at a remote branch testing out the self-serve Win10 migration capabilities with 1E Shopping (great tool, btw…). At every site, the client has app testing workstations…and this location happened to have 3. Two were running the original Win10 pilot build, and one was still on Win7. While waiting for a couple of users to complete their migration from Win7 to Win10, we decided to upgrade the Win7 test machine using the Shopping portal.

On the CM side, when someone shops for a Windows 10 upgrade, 1E Shopping places that machine into a collection that has the OS Task Sequence deployed to it, then kicks off a policy request on the client.

All was going according to plan, and then we received a call that several users across the enterprise were getting upgraded to Windows 10! We immediately killed the task sequence deployment (noting the deploymentID) and tried to kill the TS execution on the reported clients, however did not reach them all in time, and a few were upgraded unintentionally.

With the DeploymentID, we were able to run a Status Message Query for the OS Deployment, and get a list of the machines that received the TS.

Of course, because we had nixed the deployment, when the machines rebooted after installing the new OS, it couldn’t find a task sequence to parse, so failed out.

No biggie…since we now knew the affected users, we simply copied all the remaining steps to a custom task sequence, and deployed it out to the affected machines. This finished application installs and performed the critical User State restore.

We racked our brains to try and figure out what was causing the rogue deployment…and found our root cause once we started investigating the affected clients that got stopped before upgrading to Windows 10: They all had the exact same self-signed certificate from when the workstation was joined to ConfigMgr. Apparently, about a half-dozen machines had been imaged a couple of years ago using a Ghost clone of a domain-joined workstation.

This, my friends, is what happens when your reference image isn’t SysPrep’d. What I did not know, is that when the CM Management Point sends out a Deployment Notification, it sends it out to the machine’s GUID and not the CM Resource. It was just horrible luck that our pilot location included one of these machines, but ironically lucky that we found it early on.

Remediating the issue started with a good Google search of site:social.technet.microsoft.com, and finished with a SQL guy on Microsoft’s support team. Here’s what we did to resolve the issue

If you suspect, or need to detect duplicate GUIDs in your environment, this TechNet article will give you the following query to run in SQL Server Management Studio:

select * from v_GS_System inner join v_HS_System on v_HS_System.ResourceID = v_GS_System.ResourceID where v_GS_System.Name0 <> v_HS_System.Name0

 

The first step to try and resolve your issue is to remove the GUID registration out of the CM database.  For each GUID that has duplicate entries, run the following:

delete from CollectionMembers where SMSID = 'GUID:A66AA9AA-789A-4626-903A-AA332A660AA1'
delete from ClientKeyData where SMSID = 'GUID:A66AA9AA-789A-4626-903A-AA332A660AA1'

 

Then, on your CM server, you’ll want to launch Client Center. Use Client Center to connect to the affected PC. Where the PC GUID is listed, to the right there’s an option to change the GUID on next service restart. Click that, then go into Running Processes and stop the SMS Agent Host service.

Next, you want to launch MMC.exe and add the Certificates snap-in. Connect to the affected PC, and Remove both SMS certificates.

Then, back in Client Center, restart the SMS Agent Host service.

Give it a minute, then on the CM server, check the MPRegistrations.log file and watch your client register with a new GUID.

Now I know that it’s well documented not to put the CM client in your reference image, and it’s certainly well-known that it must be SysPrep’d…however, what concerns me most is that ConfigMgr allowed these clients to communicate, both ways, using a common GUID. Since there were different resource records, all active, it knew that these were different devices, however it still allowed them to fully function with an identity conflict.

I, for one, will be checking for duplicate GUIDs before attempting to use ConfigMgr to manage a Windows 10 rollout…

É

Advertisements