Thursday
Jul222010

Building a SQL 2008 Failover Cluster

Had a chance to build my first failover cluster for SQL 2008 and I must say a few things have changed since I built my last one on SQL 2005.

This article contains the following high-level checklist items for a successful SQL 2008 clustering experience:

  • Hardware considerations
  • Server Build gotchas
  • Network configuration considerations
  • SQL installation tips

The Hardware

You've got to have dual-homed servers so you can establish a heartbeat connection between all the nodes (we'll assume two nodes in an active/passive configuration for the context of this article).  That's how the servers determine which one is available to host the clustered services.

Shared Drives - Obviously you'll need a drive(s) for your data and logs that will have to be available to both nodes of the cluster.  In addtion to those you'll need clustered resources of the type "drive" for the following:

  • Quorum or "witness" drive (traditionally Q:\ )
  • The Distributed Transaction Coordinator ("DTC") clustered resource

The Server Build(s)

As you build up your OS take care to do a few basic tweaks:

Windows Firewall - configure it so it's not filtering traffic needed by SQL Server (ports 1433, SQL Browser on 1444, and any non-standard ports you've selected for additional instance traffic).

UAC - assuming you're building on Windows 2008 find a happy medium where you're not being bombarded with warnings

Anti-Virus - if you don't have an AV solution that will handle SQL server be sure you exclude from scanning the usual suspects:

The quorum drive (again - usually Q:\)

Any SQL-related files:  .mdf .ldf .ndf .bak .trn

Configure domain service accounts sufficient for running SQL.  These should have auditing rights, log on as service, log on as batch job, et cetera. 

Personally I would create brand new accounts for these as lockouts and permissions changes will affect your clustered instance of SQL if it's running under these credentials.  Incidentally this account inexplicably gets locked out between installing the first and second nodes of your cluster. 

Add the .NET framework and failover cluster support from the "Features" node of Server Manager.  Install your .NET framework components without doing a full-fledged web server configuration.

When you add the failover clustering from features leave it with the default NETWORK service account and not a domain account.

Now go back and complete each of those configuration tweaks on the other server.  Be sure to match all nodes up the best you can.

From cluadmin.msc run your verification tests.  If you get any disk failures be sure to remember your friend chkdsk /f .

Configure the DTC service as a clustered resource (right-click on your cluster in cluadmin.msc then "Configure a Service or Application").  As noted above you'll need to allocate a separate drive to this - albeit one with minimal disk space.

The Network

Allocate all the necessary IP's.  A failover cluster with two nodes will need at least seven IP addresses: 

  • One for each server on the "private" network for the two servers' heartbeat connections (no default gateway needed for these)
  • One for each server to communicate on the "public" network - think of your "traditional," non-clustered configuration.
  • One clustered resource IP address for the cluster itself.  This isn't tied to a NIC specifically, but it's the IP all your clients will point to and will route them to whatever node owns the SQL clustered resource.
  • One clustered resource IP address for the Distributed Transaction Coordinator ("DTC" going forward)
  • One for your SQL clustered instance which you're prompted for during the SQL installation.  This will be the traditional "Server Name" value that your clients will want to point their connection strings to.

These last three "clustered resource" IP's relate directly to your cluster and not to a specific NIC.  You'll want to be sure the addresses you choose for these aren't being distributed by DHCP servers, so talk to your network people to get those excluded.

Now change your binding order.  You don't want the servers to try to route traffic bound for the network over the "private" heartbeat NIC's.  So navigate to Start > Run > ncpa.cpl and visit the Advanced menu.  Uncheck File/Printer services on the heartbeat NIC's and change the binding order to put the domain/public NIC at the top of the list.

Finally get to know your WAN people.  Any firewalls separating your cluster from your clients are going to question traffic that's aimed at your cluster's IP, yet responded to by the public IP address of one or the other of your clustered SQL Servers.  That looks to be like a spoofing attack to Mr. Firewall, so make sure your WAN guys know what the hell you're intending to do.

I can't stress this last point enough - especially if you're doing something funky like offering a named SQL instance over a non-traditional port.  You won't be able to secure your server if you allow traffic to the entire range of ports that SQL Server could potentially offer it over, so be sure to specify specific dynamic port(s) with your client configuration tools.

The SQL Installation

If your SQL installation media is old then download and install SQL 2008 Service Pack 1 first before you install anything else.  Nothing worse than an "invalid SKU" error greeting you at the end of a *long* installation. 

Don't be surprised if your credentials for the SQL service account don't work as they're most likely locked out.  Had this happen twice in row when installing two disparate failover clusters.  Happily I only had to unlock the account once and it never happened again.

Use the clustering options of the SQL 2008 installation:

  • Install the first node with the New SQL Server Failover Cluster option under the "install" heading in the setup menu.
  • Be sure to Add node... (also under "install") when installing the second node. 

Go back and apply SP1 again now if your media was prior to SP1.  You'll have a ton of additional components that'll need to be updated.

Finally I had issues with keeping my license key in the .ini file, so I cleared it out of the .ini file and entered the same key manually.

Final Considerations

If you're installing an instance other than the default you could leave port 1434 open for the SQL Browser service or you could open your firewall to the entire range of dynamic ports that SQL may choose to serve the instance over.

The best option though is to use SQL Server Configuration Manager to hard-code the Dynamic TCP port.  That way your instance is available over the same port every time. 

Navigate to SQL Server Network Configuration > Protocols for YourInstance.  Now right-click on TCP/IP and visit the IP Addresses tab.  Under the IP All section at the bottom you can hard-code a value for TCP Dynamic port.

Be sure to clear the TCP field directly below it.  This will ensure your instance comes up on this port every time.

Summary

  • Don't forget to make applicable configuration changes to both servers.
  • Check to be sure SQL is coming up reliably on the same port each time.
  • Don't configure both a TCP and a TCP dynamic port to be the same.
  • Use the portqry tool from Microsoft to ensure your ports are open
  • SP1 installation first - depending on how old your SQL 2008 installation media is

I've attached a couple of great links for step-by-step cluster configuration and also for setting up aliases.  Pay special attention to this last if you are setting up a secure server where not all ports will be open between the clients and server.