Veeam Cloud Connect – Cloud Gateway failover

During a recent Veeam Cloud Connect POC, I was testing the failover of the Cloud Gateway devices to observe how the tenant backup job would handle the failure. My POC setup only had 2 Cloud Gateway devices, GW1 and GW2.

CGF1

I kicked off a test backup, verified which Gateway server was being used , (GW1 in this instance having checked the interface traffic), and then proceeded to simulate a Gateway failure, but powering down the VM.

CGF2

As expected the transfer rate on my backup job dropped down to zero, but I was then pleasantly surprised to see the transfer rate increase back to pre-failure levels. Logging onto GW2 , I could see the interface traffic increasing to indicate GW2 had picked up the work load. GW1 was also brought back online.

So naturally I wanted to simulate a failure in GW2, and powered down the VM.

CGF3

As expected the backup transfer rate drop away to zero, but this time it stayed at zero and eventually the job failed. I had expected GW1 to pickup the load, so reached out to the excellent Veeam forums to understand why.

The feedback was, when a job is started Cloud Connect provides a primary, secondary and tertiary Cloud Gateway to connect to, but does not cycle through the list, once exhausted. The list of available gateways is collected at job start or retry, and it doesn’t get updated within existing session. The resident Veeam Cloud Connect expert, Luca Dell’Oca , then advised this information would be documented in the updated Veeam Cloud Connect reference document:-

Extract from document

Finally, a note on the failover process of Cloud Gateways from a end user perspective: the list of available gateways is retrieved by the end user component of Cloud Connect upon any job start or retry. The available gateways are listed in a specified order where the first usable gateway is assigned #1, the second #2, and so on. The number assignment and so the priority is not fixed, but depending as said on actual load of all repositories.

As long as the gateway marked as #1 is available, the end user keeps using this one. As soon as this gateway is not available anymore, a new connection is automatically tried against #2; if this is available, the connection

is automatically established and any running job is continued; if not, a connection is tried against the next gateway on the list. When all the gateways have been tried unsuccessfully, down to the last one, the running

job fails and a new list is retrieved for the following retry.

CGF4

The updated reference architecture document can be downloaded here http://www.veeam.com/wp-cloud-connect-reference-architecture-veeam-backup-replication-v8.html