Thursday, November 13, 2008

Database Mirroring in Microsoft SQL Server 2005: The finer points

I have often wondered why people still use features like log-shipping, transactional replication, etc for their disastery recovery solutions rather than use simpler options. A software DR solution should be easy to implement, be effective and safe as well while ensuring near zero or zero data loss by itself.

Some drawbacks of log-shipping are following:
a.] You need frequent log backups of the production database.
b.] The recoverability of your DR database depends largely on the frequency of transaction log backups and the time taken to copy such backups from production to DR and the time taken to restore these backups.
c.] The log-shipped database remains dormant except having production transaction log backups getting restored on to it.
d.] From an application end-user perspective, the DR database is useful only after a disaster, not anytime else.
e.]The entire transaction log backup is restored at DR and then Uncommitted transactions are rolled back (which is fine). It is not that only committed transactions go from production to DR. So there is unnecessary avoidable network traffic.
f.] The physical transaction log backup file has to be copied to the DR causing avoidable network traffic and causing pressure on the DPC [NIC] CPU's.
h.] With log-shipping, there is always a scope for data loss in case of disasters. And companies do compromise on this front which is not needed.
i.] Too many jobs are in place to run log-shipping (to backup logs at source, copy them to DR and restore them at DR).
j.] Too many time-consuming processes involved when failure occurs especially if the database involved in huge (VLDB).
k.] Failover is always manual.
l.] Even if the data loss is of 10 minutes, the actual downtime would be atleast an hour at a minimum. This directly means loss of revenue for that period.
m.] Manual user redirection in case of failure of the main server/database is needed.
n.] Restored transaction log backup files on DR have to be deleted. Continuous copying and deleting of files make the DR look like a file server and of couse, causes the discs over there to get fragmented much faster.

Drawbacks of Replication:
a.] Complex process to set up involving snapshot replication, etc to begin with.
b.] In certain types of replication, an extra column (GUID) is created at the end of each replicated table causing an increase in disc space usage.
c.] Complex conflict resolution rules have to be put in place during setup.
d.] A failure in replication could actually mean reinitialization of subscriptions most of the time. That's a time-consuming process and means that DR would not be technically existing during that period if replication is being used for DR purposes.
e.] Two things to manage instead of one. You need to manage Production database as well as the replication for failures, conflicts, maintenance, etc.
f.] Needs some skill to manage replication and its own set of problems.
g.] Of course there are certain merits in replication like updatable subscriptions, merge replication, bi-directional transactional replication, availability of the subscriber database for reads/writes, etc. But the drawbacks above make replication a more tediuos and time-consuming operation/process than necessary.
h.] Manual failover and user redirection in case of failure of the main server/database.

What is the easiest alternative? Database Mirroring is. Why? Because of the following reasons:

a.] Easy to set up, easier still if log-shipping is already in place and needs to be replaced with database mirroring.
b.] Easy to monitor. Inbuilt Microsoft Database Mirroring Monitor in SQL Server Management Studio.
c.] Manual or automatic failover.
d.] Separate counters in Windows Performance Monitor for database mirroring.
e.] Dynamic Management Views for up to the point-in-time information on the principal and mirror databases, lag time between mirror and principal and so on.
f.] Minimal or no data loss even in asynchronous mode.
g.] Guaranteed zero-data-loss in synchronous mode.
h.] Possibility of automatic user redirection in case of failures.
i.] Hardly requires maintenance. Set it up and forget it.
j.] Easy to failover. It's a one line code to failover and failback.
k.] Makes other maintenance jobs like Windows patching and SQL Server patching easy with no downtime.
l.] Possibility to use mirror for reporting application using the snapshot feature.
m.] Requires least amount of maintenance of all the DR options.
n.] Easy to use.
o.] Easy to change between different modes of mirroring (synchronous and asynchronous).
p.] Only committed transactions flow to mirror. So less network banmndwidth consumed.
q.] No file transfers involved.
r.] No conflict resolutions needed.
s.] Miniam disc fragmentation as no copying and deleting fo files is involved.
t.] Maintenance actions on principal database automatically move over to mirror. I am talking of operations like shrinking log/data files, etc.

I am not going into the set up of mirroring as it is simple, easy to use, and easy to code. In roughly 10-15 lines of TSQL code, mirroring can be established in the desired mode.

Some points which I would consider important with respect to mirroring are as follows:

Do not install mirror database in a different node of the same cluster where principal resides. This would be a disastrous approach in an active-active cluster when one node has to failover to the other. Always keep you mirror database on a separate cluster or machine if you don't have a cluster.

Do not keep the transaction log file size unnecessarily large. If say your peak transaction log file consumption is 5GB for example, it would be good to have the actual file size within 10GB to 20GB. Don't keep it at say 50GB. That slows down writes to the transaction log file on principal and hence also on mirror. The point is don't keep too much empty space in the transaction log file on principal. Expand it as needed.

Don't create full or differential database backups on principal when you are in the process of restoring logs on to the mirror database to bring it upto speed with the principal database. If by mistake that does happen, stop any scheduled transaction log backups on principal and let the mirror sync up wth principal on its own although it would take a few minutes for the mirror to catch up and get synchronized with the principal this way. While the catching-up process is going on, it would be better to stop transaction log backups on the principal as well. That is okay. Patience helps in this avoidable scenario. After the synchronization is established, turn on the jobs for transaction log backups, full backups and differentail backups back on the principal. Please do consider this point during the setup process for database mirroring just in this avoidable case where somebody performs a full or differential database backup of the principal database by mistake while you are setting up database mirroring.

Start with asynchronous mode to begin with. Observe for a few days how the transactions are doing, whether there is any piling-up of log on mirror and how often the entities are not in sync per day. If you find synchonization 90% of the time, you can consider moving on to synchronous mode if need be.

I would suggest using TSQL code instead of the GUI for the setup. There is a lot more flexibility in TSQL coding than in the GUI. And it is a lot easier to debug/rollback/comeback/startover with TSQL during the setup process if something goes really bad due to circumstances beyond your control. Mirroring is so straightforward that it would not go wrong if you are following the 10-11 simple steps to setting up mirroring. The GUI is good for monitoring though and for setting up alerts.

Do not use synchronous mode if the CPU utilization is more than 50% on your principal server averaged for a given hour. This can potentially cause unwanted automatic failovers. In such a case, either set right you application / database code/architecture/design/indexes, etc first or use asynchronous mode of mirroring only.