I have worked quite a bit (too much?) with SQL Server replication. It has mostly been Transactional Replication and less Merge Replication.
A very common scenario I come across in both real life and in forum discussions is SQL Server Replication and Availability Groups.
This is the first in a short series of posts about replication and Availability Groups. As time permits, I will also finish posts on the Distributor and Subscriber roles in Availability Groups and finalize with a post on war stories from trying to move a distributor and subscribers from standalone instances to Availability Groups.
Picture of Replication Protein A3 from wikipedia, licensed under CC-BY-SA-3.0
I’m not going to give very detailed instructions. That’s already well described in official documentation. Use this page as a starting point: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-replication-for-always-on-availability-groups-sql-server?view=sql-server-ver15
Instead I’m gonna try to summarize which scenarios are supported and which are not supported and a brief overview of the steps included. I will also share some of my experience too, gotchas and success factors.
Setup Publisher in Availability Group
A publisher can be part of an availability group. Both transactional replication and merge replication are supported scenarios.
Assuming you have already setup the distributor, these are the steps. Order does matter. I’m also assuming you have already setup the Availability Group before configuring replication publisher. If that’s not the case, aka you have already setup a publisher and then add them to availability group, this is also supported and you will just have to skip the steps that have to do with setting up the primary replica as publisher in this list.
- Install replication features on all replicas.
- Configure the primary replica and all secondary replicas as publishers in the distributor (with sp_adddistpublisher).
- Configure the distributor at all replicas (with sp_adddistributor).
- Enable the database(s) for replication (with sp_replicationdboption) in the primary replica.
- (Documentation say linkedserver-connections are to be setup from publisher to push subscribers. I don’t do this, and more specifically, I don’t always know which subscribers I will have when I setup publications).
- On distributor, redirect publisher to Availability Group Listener name (using sp_redirect_publisher)
- Validate redirection works (with sp_validate_replica_hosts_as_publishers). This step will fail for secondary replicas not setup for read access.
It might come out as strange that the validation step fails. Usually you don’t want red errors in your result. But to get rid of red errors in the validation, you will need to pay additional enterprise licenses for the secondary replicas. If you won’t use the secondary replicas for read-only workloads, it’s not gonna be worth spending the extra money on additional enterprise licenses.
Test, test, test
I strongly encourage you to test this scenario and also test failing over to all secondary replica hosts.
On a busy system, you will have very, very, very nasty side effects when your publisher isn’t able to send stuff to the distributor. You will see transaction logs fill up and you will see replication as log_reuse_wait_desc in sys.databases. What that means is your transaction logs will grow, grow and then grow some more, until there’s no more disk to grow on and your databases will stop. If you miss this, and get a call in the middle of the night (these things always happen around 2AM) you’re gonna find yourself in a fun situation. A fast way to get rid of traces of replication is to set DB to single recovery mode and then run sp_removedbreplication. But it’s an Availability Group your dealing with here, and simple recovery mode doesn’t play well with Availability Groups. So your other option will be to reconfigure replication on the now primary replica, publish something and then remove replication. Either way, you’re into a mess you don’t want to deal with in production.
Script, script, script
As part of your setting up replication publishers on Availability Groups in your sandbox environment (that is where you should start, if you don’t have such environment, create one, it’s worth the effort and money), create scripts and make sure those scripts work.
- Script to setup replication.
- Script to add secondary replicas including setting up replication on them.
- Script to fail back to your original primary replica (where you have a greater chance of logreader agents actually working and not filling up transaction logs).
- Script to remove all traces of replication.
The script to remove all traces of replication is an important one. It might have to start with enabling replication, with the distribution database on the publisher and then removing it, as some commands to remove replication won’t work if you can’t reach the distributor.
It doesn’t matter if you create SQL-scripts, Powershell scripts or anything else. But please do what you can to automate creating and destruction of replication. Test these scripts. Not just the normal flow, but the odd quirky ones too. Test from the DR-site, to make sure you have firewalls open to everything you need. Test on all replicas. Test what happens when distributor is offline.
Monitor, monitor, monitor
There are a million ways to monitor your SQL Server environment. Choose your favourite flavour. But make sure to monitor your replication setup. Monitor transaction log usage on the publisher side. If your published databases’ transaction log usage remains high after transaction log backups, you want to set off alarms. You want to monitor messages from the Log Reader Agents. Your Log Reader Agents might seem to work just fine, in the way that they are in a running state. That doesn’t mean they are sending anything at all to the distributor. Try to monitor end-to-end. Does an inserted or changed row from a published table in the publisher end up in the distributor? If it doesn’t, you have a problem. And you want to solve that problem before your index maintenance jobs start filling up the transaction log the next night.
Do I hate replication?
Reading my warnings above, it might look like I hate SQL Server replication. That’s not the case. Transactional replication is a much cheaper solution than readonly AG secondaries for reporting scenarios. And you can choose to replicate only the stuff you need for reporting instead of the whole database. And you can add your reporting-specialised indexes on your reporting database. And, and, and…
But be careful. You don’t want to called in at 2AM because your production database died. Especially if the reason it failed is because you weren’t careful.