Design Goals & methodology for going from a single file database to a multiple file one

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

Firstly what is a Database in SQL Server ?

 

There are probably many definitions but one aspect is that a database has a transaction log and therefore the capability to coordinate many concurrent transactions, if recovery is needed completed transactions can be rolled forward and incomplete transactions can be rolled back thereby recreating a consistent database.

 

Applications that use multiple databases will either have to use the MSDTC – distributed transaction coordinator or risk an inconsistent recovery should it be required.

 

Why use multiple Files ?

 

Better performance on backup, restore and the ability to place files on different drives giving rise to better IO throughput and better performance.

Additionally a SQL Server database on the enterprise edition can be online once the Primary Filegroup (containing the system objects) is online.

Each filegroup can be backed up and restored separately.

 

Design

 

Database — transaction log

|

|

 – < Filegroup(s) (FG)

|

|

-< File(s)

 

There is always a Primary Filegroup (contains the system objects)

There is always a Default Filegroup (contains objects which are not specifically placed, doesn’t have to be the Primary FG)

 

For all versions of SQL Server before SQL Server 2016

Use TraceFlag 1117 to ensure all files in a FG grow at the same time by an equal amount

Use Traceflag 1118 to disable mixed allocation on extents – reduces reliance on SGAM pages

 

SQL Server uses a proportional fill algorithm to determine where to write data

this depends on the amount of free space (not size !) in each file within the target filegroup.

 

Implications

 

Up to sizes of about 500Gb running with a single File causes no major problems. But unfortunately migrating a database over 50Gb from a single file to multiple files is a bit tricky

This implies it is best to address this challenge early.

 

One way to do this is to have a standard design and alter the model database so that all new databases not created with an explicit file layout will inherit the files from the model db.

 

We should have a Primary Filegroup and just hold the system objects in it – a single file is probably OK in this FG.

 

We should have one or more other Filegroup(s) – DATA is as good a name as any

I would recommend having 4 files in this filegroup ( up to 1Tb – rough rule of thumb) all of equal size and equal autogrow settings that way we can place all 4 files on 1 LUN, 2 files on 2 LUNS, or 1 file on 4 LUNs as requirements change.

 

when we might want more than 1 Filegroup for example

when we have staging tables that we frequently load and truncate

when we have tables with large datatypes (XML, Spatial, Images) etc,

when we have historical data which we wish to set read-only (& back up separately)

when using partitioned tables

when we can obtain a performance improvement by isolating large or frequently used tables to their own FG

when using In Memory tables

when using filetables

when using fulltext search – put the fulltext indexes on their own FG

when the vendor recommends it

etcetera.

 

 

Migration

 

Add new LUNs to the SQL Server, we will want GPT drives (not MBR) with 64 kb allocation unit size.

 

Create the new Filegroup(s) and add the new empty files.

 

Move the tables to the new Filegroups by using

 

Create Index XXX ON DATA with DROP_EXISTING

 

realistically we will need to be offline while this process in ongoing

but we don’t need to move all the tables at the same time.

When it comes to HEAPs (tables without any clustered index) – there are two choices

Create a new clustered index on the target filegroup in question and then drop the clustered index

 

or

 

Create a new empty table with matching schema and a new name

Script indexes and constraints

Use Insert Select to move the data across

Drop indexes and constraints on the original table

Drop the original table

Use sp_rename to rename the new table to the original tablename

Re-add indexes and constraints

 

We can run these scripts as a job overnight using the SQL Server Agent

 

NB The transaction log will need to be large enough to accommodate the largest table that needs to be moved

alternately one can create a new table, move the rows in batches navigating

the Clustered index and then rename the new table (this is a lot of work and needs thorough testing) – contact your local friendly SQL Server experts 😊