I currently manage 5 AlwaysOn Availability Groups. Two are on SQL Server 2014 and overdue for an upgrade, while three of them are on SQL Server 2017. From time to time I have run into a couple of different situations that I needed to troubleshoot and I want to tell you where to look and what to check on in these scenarios. I can’t possibly tell you about everything that could go wrong, but I can tell you about my experience with AlwaysOn Availability Groups and let you decide if that experience helps you or not.
A Single Database Is Not Synchronizing
If a single Availability Group database is in the “Not Synchronizing” state, what would you do?
There are several things to check. First you could look in SQL Server Management Studio to see if data movement has been suspended. If it has then you will see an icon similar to two pipes || over the database icon. You will also see messages in the Error Log about data movement being suspended. If this is the case, try to resume data movement. This can be done in SSMS by right clicking the database in the Availability Databases folder and choose “Resume Data Movement”. Alternatively, you could execute the below T-SQL.
ALTER DATABASE [YourDatabase] SET HADR RESUME
Beyond that you can also use PowerShell to resume data movement for the database.
<# gcm is the alias for Get-Command. Below I'm using wildcards to find any commands with the wor "resume" in it and specifying that I want PowerShell to look in the DBATools module. #> gcm *resume* -Module DBATools <# Once I have found a command I think might help me, then I run the alias Help for Get-Help to find out how to use the command #> help Resume-DbaAgDbDataMovement -Detailed #Resumes data movement on db1 and db2 to ag1 on sql2017a. Prompts for confirmation. Resume-DbaAgDbDataMovement -SqlInstance sql2017a -AvailabilityGroup ag1 -Database db1, db2
Second, check the SQL Server Error Log for errors related to the AG and check the AlwaysOn_health xEvent that is running by default for AGs. To check the default health session for AlwaysOn Availability Groups, open SSMS and navigate to Management > Extended Events > Sessions > AlwaysOn_health. Then, click the plus sign next to it in order to drill down to see the file. Right click the file and choose “View Target Data”. After doing that, you can filter and search the file for relevant entries.
If none of these things gives you enough information to resolve the problem, you could also restart the SQL Server service on the secondary that is showing the database that isn’t synchronizing. Again, there are many ways to do this, and I’m going to encourage you to look at the DBATools module in PowerShell to find an automated way to handle this. As a last resort, you could remove the database from the AG and then re-add it to the AG.
None of the Availability Group Databases Are Synchronizing
What if all of the databases are in the “not synchronizing” state, what would you do?
This is a little different because in this case every database on at least one secondary is in the “Not Synchronizing” state. The prior scenario was only about a single database with a synchronization issue.
The primary thing I check in this scenario is whether the cluster service is running on the secondary. If it isn’t, then none of the databases will synchronize. I have seen this happen several times, particularly after secondary servers have been rebooted. You could also have a network disconnect between this replica and the primary. Again though, the SQL Server Error log or the AlwaysOn_health extended events session will show you this. Additionally, you can look in the Cluster Events section of the Failover Cluster Manager for one of the nodes in the AG. In there you may find that you have lost quorum for some reason.
If for some reason you need to get down into the fine details of what is happening with the failover cluster, then check out the following path in the Windows Event Viewer.
Event Viewer > Applications and Services Log > Microsoft > Windows > FailoverClustering. In that location are three logs called Diagnostic, DiagnosticVerbose and Operational.
Next Steps To Take
- Go to YouTube and find videos on Availability Groups.
- Consider taking a class from the fine folks at SQLSkills about Failover Clustering and Availability Groups.
- Reach out to somebody if you have questions. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.