My Availability Group Database Isn’t Synchronizing

I currently manage 5 AlwaysOn Availability Groups. Two are on SQL Server 2014 and overdue for an upgrade, while three of them are on SQL Server 2017.  From time to time I have run into a couple of different situations that I needed to troubleshoot and I want to tell you where to look and what to check on in these scenarios. I can’t possibly tell you about everything that could go wrong, but I can tell you about my experience with AlwaysOn Availability Groups and let you decide if that experience helps you or not.

A Single Database Is Not Synchronizing

If a single Availability Group database is in the “Not Synchronizing” state, what would you do?

There are several things to check. First you could look in SQL Server Management Studio to see if data movement has been suspended. If it has then you will see an icon similar to two pipes || over the database icon.  You will also see messages in the Error Log about data movement being suspended. If this is the case, try to resume data movement. This can be done in SSMS by right clicking the database in the Availability Databases folder and choose “Resume Data Movement”. Alternatively, you could execute the below T-SQL.

ALTER DATABASE [YourDatabase] SET HADR RESUME

Beyond that you can also use PowerShell to resume data movement for the database.

<#
gcm is the alias for Get-Command. Below I'm using wildcards to find any commands with the wor "resume" in it
and specifying that I want PowerShell to look in the DBATools module.
#>
gcm *resume* -Module DBATools

<#
Once I have found a command I think might help me, then I run the alias Help for Get-Help
to find out how to use the command

#>

help Resume-DbaAgDbDataMovement -Detailed

#Resumes data movement on db1 and db2 to ag1 on sql2017a. Prompts for confirmation.

Resume-DbaAgDbDataMovement -SqlInstance sql2017a -AvailabilityGroup ag1 -Database db1, db2
    

 

Second, check the SQL Server Error Log for errors related to the AG and check the  AlwaysOn_health xEvent that is running by default for AGs. To check the default health session for AlwaysOn Availability Groups, open SSMS and navigate to Management > Extended Events > Sessions > AlwaysOn_health. Then, click the plus sign next to it in order to drill down to see the file. Right click the file and choose “View Target Data”. After doing that, you can filter and search the file for relevant entries.

Extended Events AlwaysOn Health

If none of these things gives you enough information to resolve the problem, you could also restart the SQL Server service on the secondary that is showing the database that isn’t synchronizing. Again, there are many ways to do this, and I’m going to encourage you to look at the DBATools module in PowerShell to find an automated way to handle this. As a last resort, you could remove the database from the AG and then re-add it to the AG.

None of the Availability Group Databases Are Synchronizing

What if all of the databases are in the “not synchronizing” state, what would you do?

This is a little different because in this case every database on at least one secondary is in the “Not Synchronizing” state. The prior scenario was only about a single database with a synchronization issue.

The primary thing I check in this scenario is whether the cluster service is running on the secondary. If it isn’t, then none of the databases will synchronize. I have seen this happen several times, particularly after secondary servers have been rebooted. You could also have a network disconnect between this replica and the primary. Again though, the SQL Server Error log or the AlwaysOn_health extended events session will show you this. Additionally, you can look in the Cluster Events section of the Failover Cluster Manager for one of the nodes in the AG. In there you may find that you have lost quorum for some reason.

If for some reason you need to get down into the fine details of what is happening with the failover cluster, then check out the following path in the Windows Event Viewer.

Event Viewer > Applications and Services Log > Microsoft > Windows > FailoverClustering. In that location are three logs called Diagnostic, DiagnosticVerbose and Operational.

Next Steps To Take

  1. Go to YouTube and find videos on Availability Groups.
  2. Consider taking a class from the fine folks at SQLSkills about Failover Clustering and Availability Groups.
  3. Reach out to somebody if you have questions. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.

 

Lessons Learned Updating to SQL Server 2017 AlwaysOn Availability Groups

In late 2019, a project that had been in progress for well over a year finally came to a conclusion.  I had collaborated with a number of people to migrate a stand alone SQL Server 2008 and two stand alone SQL Server 2008 R2 instances.  Each stand alone SQL Server was migrated to a three node Availability Group. Here are a few things learned along the way.

Automatic Seeding

First, consider using automatic-seeding if you are migrating to AlwaysOn Availability Groups.  I hadn’t previously worked with anything newer than SQL Server 2014 so I was unaware of this feature.  It is new as of SQL Server 2016. In this case, the old SQL Servers were migrating up to SQL Server 2017, so this option was available to us.  For a couple of months leading up to the migration event there had been various tests involving the set up of the AGs.  One of those tests involved restoring a full backup followed by one or more log backups over on the secondary nodes.  As the migration date approached, we started experimenting with auto seeding.

The auto-seeding feature streams the database from the primary to one or more secondaries behind the scenes.  This is accomplished using the endpoints created during the Availability Group creation process. This feature can eliminate the need to create scripts to handle the restores of databases over to secondary nodes in the AG.  

There are four ways to monitor the process of the streaming of the databases to the secondary nodes.

  • First, the SQL Server Error Log makes entries about the process.
  • Second, various DMVs contain information about the status and progress of automatic seeding. 
  • Third, Backup history tables in MSDB can be queried.
  • Fourth, Extended Events can be used to monitor the progress.  We chose to query DMVs during the migration.

Handling Connection Names

Second, ponder how to handle the scenario where connection names for applications change. As it turns out, a lot of meaning and importance is attached to names.  The computer and SQL Server instance names we were dealing with were deeply ingrained in the business. The names had been around for years and were used everywhere in connection strings.  If we migrate to new SQL Server instances, how do we handle that? 

One option is to create an alias in DNS that points to another location. Several months before the actual migration, we changed the computer names for each of the stand alone SQL Servers. The Infrastructure team used the previous computer names as aliases that then pointed to the newly named computers. This does require a change inside SQL Server, by the way.  SQL Server is aware of the computer it is installed on, so if you do this, know that you have to make a change in SQL Server as well. 

Let me explain the concept by way of example. Let’s suppose you have a computer called MyServerABC and it has a default instance of SQL Server on it.  This means that database connections also use the name MyServerABC to talk to the databases.  That connection name might be used in a lot of applications.  So that tens, or maybe even hundreds, of connection strings don’t need to change when the SQL Server is migrated, here is what you can do. During a maintenance window, change the name of the computer and make a change to SQL Server so it knows the new computer name it is on. Here is a link to describe how to do that in SQL Server and it has a discussion of other considerations when doing this sort of thing.

Let’s suppose the new computer name is MyServerXYZ. After the previous actions are taken, in DNS a new record is made called “MyServerABC”.  That alias is then “pointed” or directed to the computer name “MyServerXYZ”.  Now, applications that use MyServerABC as a connection name can continue using that connection name.  When the application makes a call to MyServerABC, that call is simply re-routed to MyServerXYZ and the application is none the wiser.

This exact change process worked for us early on.  Later, once we migrated the first stand alone SQL Server to an Availability Group, a person on the Infrastructure team changed the alias so that it pointed to the AG Listener instead of the old server we just migrated from.  Connections were then instantly routed to the new SQL Server 2017 AG. 

Database Migration Assistant

Third, the Database Migration Assistant is software from Microsoft that evaluates a SQL instance and the databases on that SQL Server for readiness to be migrated to a new platform. This tool can help migrate databases to Azure, but we were executing a migration involving on-prem to on-prem.  The tool provides an assessment of and information regarding scenarios in use in your environment that might block a successful migration as well as offering information on new features to leverage in the new SQL Server. 

We were able to find several things in our environment that had to be changed prior to the migration using this software.  Without the assessment, we would have migrated only to experience failures of various kinds in the new environment.  

Planning Software

planning

Fourth, use planning software of some kind so the migration steps are known and progress can be tracked.  One of the things I chose to do was plan out each step and create an electronic project board for all the things that needed to happen the day of the migration.  I had been using Office 365’s Planner at that time, so I created my steps there.  One of the Developers converted that to Trello because that is what the Dev teams were using at the time, and the Devs were at the office with us to help with testing applications post-migration.  The Trello board was made visible on a couple of large, wall mounted TV’s for everyone to see.  This allowed everyone to know what had been done, what was being worked on and what was next.

Communication

Fifth, think carefully about how the migration team will communicate during the migration.  (All in the same physical room? Company conference bridge while working remote? Conference call with a support vendor? Remote work platform like Teams or Slack? Some combination of these things?)

If you have a big room that will hold everyone, that might be a great option. Remember, this happened back in early December 2019, pre-covid. So, having a dozen or so of your favorite co-workers all in a room made sense.

Meeting together allows for easy, fast communication by simply turning to the person in the room to discuss something or ask a question. Collaboration is smooth and convenient in that scenario.  As I mentioned earlier, we leveraged a Trello board that was displayed on a couple of large TVs so everyone can see what has been worked on, what is currently being worked on and what is next.  

Be aware and mindful of the human tendency to talk when idle though.  If you get a dozen people in a room and only a few of them are busy at the moment, there could be a significant amount of chatter to block out for the people who are busy at that moment.

These days, now that the U.S. is experiencing an explosion of work from home due to the pandemic, you would probably set up something like Teams or Slack and use that as your collaboration platform so people can collaborate from home on migration day.  Platforms that allow for phone calls, screen sharing and quick collaboration among large groups of people would be essential in a large scale, high impact IT project. In those scenarios, the people collaborating still need to think carefully about communication so that people aren’t talking over each other while addressing the migration work. 

Well, that’s it for now.  I hope these points help you consider some things that maybe need to be included or discussed as part of your next IT project.  

Next Steps To Take

  1. Download and try out the Database Migration Assistant. 
  2. Check out Office 365’s Planner/Tasks app or something like Trello or Jira to help you not only with migration planning, but stayong on top of your every day workflow..
  3. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.