One of the issues I’ve come across is using Configuration Managers (2012 R2+) feature of being able to deploy multiple Software Update Points (SUP) within a site. This scenario is essentially to avoid using traditional network load balancing (NLB) and offload the work to the clients. One would think, if one SUP is not available it’s pretty simple, switch to the next one in the list. Well this doesn’t always happen as one may expect. Why?

The Why

Whenever a client executes a Software Updates Scan Cycle (scheduled or manual), it attempts to kick off the Windows Update scan. You can watch this scan pass or fail with the error code by watching the log WUAHandler.log in the C:\Windows\CCM\logs directory. Because you’ve come across this, you’ve probably already found that file, but in any case you’ll want to see what the error code is in the log file. This is written in hexadecimal. Example:

Scan failed with error = 0x80072ee2

System Center Configuration Manager sees that it fails. Now that SCCM has caught a failure, it is supposed to retry 4 more times at every 30 minute intervals. After this time, it is supposed to switch to the next SUP in the site, wait 2 minutes for the local Group Policy to change and be applied. Now if this is not happening, this is mostly because Configuration Manager only monitors certain type of errors! This is kind of frustrating. To me, your load balancer throws an error, we automatically switch over. Generally, we don’t care what error is thrown. An error is an error. Change the SUP!

Monitored “Retry” Error Codes

In decimal format, the retry error numbers that are monitored by default are:

2149842970, 2147954429, 2149859352, 2149859362, 2149859338, 2149859344, 2147954430, 2147747475, 2149842974, 2149859342, 2149859372, 2149859341, 2149904388, 2149859371, 2149859367, 2149859366, 2149859364, 2149859363, 2149859361, 2149859360, 2149859359, 2149859358, 2149859357, 2149859356, 2149859354, 2149859353, 2149859350, 2149859349, 2149859340, 2149859339, 2149859332, 2149859333, 2149859334, 2149859337, 2149859336, 2149859335

Unfortunately, the errors logged in the log file are in hexadecimal. The easiest way is to copy the hexadecimal value from the log file and use the good old scientific mode in the calculator (calc.exe), set it to hex, paste and then switch it to decimal. If the code is in the list above, you’ve got some problems to look into. If it’s not. Then we MIGHT need to add it, depending on what your actual problem is.

To see a list of the retry error codes that the Configuration Manager SITE is monitoring, run this SQL query against the site database, then take a look at the Value2 column.

SELECT * FROM SC_Component_Property PROP join SC_SiteDefinition SCDEF ON SCDEF.SiteNumber = Prop.SiteNumber WHERE Prop.Name = 'WSUS Scan Retry Error Codes' AND SCDEF.SiteCode = 'xyz'

Scenario – SUP is not reachable, Firewall or Offline

The example of the log file entry above that failed with error 0x80072ee2 is a very common error. This means that the endpoint could not be reached. Well if it’s not considered a retry error code, the client will NEVER try and switch to a new SUP. If we have a firewall between the client and the current SUP, after each scan it will return a 0x80072ee2 error, it will continue to do this forever. In my opinion and experience, it is a very good idea to add this to the Retry Error Codes.

To see and prove what the SCCM client is monitoring as WSUS Scan Retry error codes, run the following PowerShell code (You can also use wbemtest.exe).

Get-WmiObject -Namespace "root\ccm\policy\machine\actualconfig" -Class ccm_updatesource -Property ScanfailureRetryErrorCodes|Select-Object -ExpandProperty ScanFailureRetryErrorCodes 

Updating the Retry Error Codes list

If you’ve now determined that you want to add a specific error code such as 0x80072ee2 to the retry error code list, we need to do the following:

  1. Convert the error code from hex to decimal
  2. Log onto the Site Server. The site server has the correct WMI classes that we’ll use to update the Site Database
  3. On the Site Server, run PowerShell code to update the Retry Error Code list (See below)
  4. SQL Server Double Check
  5. Test the clients

Convert the error code

the first thing that needs to be done is to convert it from a hexadecimal format to decimal.

Example 0x80072ee2 = 2147954402

Log onto the Site Server

Log onto the site server. The only things you need to edit are the first line. Change the decimal value of the RetryCodeToAdd and your SCCM Site code.

$RetryCodeToAdd = "2147954403"
$SiteCode = "xyz"

$instance = Get-WmiObject -Namespace "root\sms\site_$($SiteCode)" -Class sms_sci_component -Filter "componentname='SMS_WSUS_CONFIGURATION_MANAGER'"
$instance.Get()
$arrayIndex = $instance.Props.propertyName.IndexOf("WSUS Scan Retry Error Codes")
Write-Host "Value2 retry codes before"
$instance.Props[$arrayIndex]
$props = $instance.props
$props[$arrayIndex].Value2 = $props[$arrayIndex].Value2 -replace "}", ", $RetryCodeToAdd}"
$instance.props = $props

Write-Host "Value2 retry codes after"
$instance.Props[$arrayIndex]
$instance.Put()

SQL Server Double Check

The update we just did above with the PowerShell code is instantaneous, run the SQL Code (same code, copied from above) to validate that the Retry Error Code is now added to the list in the Value2 column. This will be sent down in the control file next time the client (or you) initiate a Machine Policy Retrieval & Evaluation Cycle.

SELECT * FROM SC_Component_Property PROP join SC_SiteDefinition SCDEF ON SCDEF.SiteNumber = Prop.SiteNumber WHERE Prop.Name = 'WSUS Scan Retry Error Codes' AND SCDEF.SiteCode = 'xyz'

Test the Clients

  1. Force a Machine Policy Retrieval & Evaluation Cycle from a client
  2. Run the Get-WmiObject PowerShell code from the section Scenario – SUP is not reachable, Firewall or Offline above. You should now see the retry code! This will take some time to populate through the infrastructure, so best bet is to go take a coffee break, long lunch, enjoy you evening and check tomorrow. I’ve found that sometimes forcing a Software Update Scan Cycle will help speed this up.
  3. Monitor the WUAHandler.log on the client, C:\Windows\WindowsUpdate.log might be helpful too.

Most of the information can be found in the following blog, https://blogs.technet.microsoft.com/umairkhan/2014/10/02/configmgr-2012-r2-multiple-sup-scenario-clients-not-failing-over-to-the-other-sup/ . Hopefully the PowerShell above helps automate and eases your pain!

Happy patching!