Saturday, June 13, 2009

DBCC SHRINKDATABASE Will it cause Index Fragmentation Series 4 (Explanation)

Well as I learnt in previous blog Shrink Database caused fragmentation, but why? Before we discuss fragmentation, let me try to explain DBCC ShrinkDatabase command.

DBCC ShrinkDatabase command takes database name, Target Percent and Truncate | NoTruncate as parameters.

Truncate or NoTruncate indicate to database engine if space reclaimed by database engine be given back to Operating System or not.

Truncate: SQL Database engine releases space to OS, NOTruncate: Does not release space to OS though file content is  (leaving empty space at the end).

Target Percent is slightly confused here. Target Percent is not Target Size that database should be post shrinking operation but Target Percent is Target free space database should have post shrinking.

When database is shrunk either by Management Studio or DBCC command it moves data from end of file to beginning or starting of file. It tries to fill pages as it gets to move from end to start of file. So pages can not be moved in chunks (as there may not be huge chunks of pages at start in a normal production environment) but pages maybe moved individually inducing fragmentation.

In this case I have learnt that a completely defragmented index is fragmented to 97% post shrink operation and that is terrifying with regards to performance. Are you all seeing the same :)

Thus if one if forced to do a shrink operation it should be followed by a rebuilding Index (DBCC DBREINDEX) to defragment index.

Also this is one of reasons for Auto Shrink Database Property to be turned off.

Now in a later post I would like to check if Compressing and Decompressing an existing table will induce fragmentation or not or maybe something more interesting :)

If you have any questions / comments / corrections pertaining to any of fragmentation details please feel free to send mail to me at gurucb@hotmail.com

DBCC SHRINKDATABASE, Will it cause Fragmentation in Index Practical Example Series 3

Create database TestShrink
on Primary
(
    Name = '    ',
    filename = 'D:\TestShrink_Data.MDF'
)
log on
(
    Name = 'TestShrink_Log',
    FileName = 'D:\TestShrink_Log.ldf'
)
go

Use TestShrink
go

Create table HugeTable (A Char(8000))
go

insert into HugeTable values('a')
go


Insert into HugeTable
Select * from HugeTable
go 18 --This is kewl because it runs this batch for 18 times filling up table as fast as it could :). I like this one a lot.

Create table SmallTable ( C char(900))
go

insert into SmallTable values('A')
go

Before Populating:

declare @dbid int, @objid int
select @dbid = DB_ID('TestShrink'),@objid = OBJECT_ID('SmallTable')
dbcc ind(@dbid,@objid,2)
go

image

 

Populating Data:

insert into SmallTable
select * from SmallTable
go 12

declare @dbid int, @objid int
select @dbid = DB_ID('TestShrink'),@objid = OBJECT_ID('SmallTable')
dbcc ind(@dbid,@objid,2)
go

image

Get Fragmentation Information:

select * from sys.dm_db_index_physical_stats(9,2121058592,2,1,DEFAULT)

image

Reindexing table to remove fragmentation:

dbcc dbreindex('SmallTable')

declare @dbid int, @objid int
select @dbid = DB_ID('TestShrink'),@objid = OBJECT_ID('SmallTable')
dbcc ind(@dbid,@objid,2)
go

image

select * from sys.dm_db_index_physical_stats(9,2121058592,2,1,DEFAULT)

image

Truncate table HugeTable (To create empty space at the end of the file)

sp_spaceused

image

DBCC SHRINKDATABASE('TestShrink',50,NOTRUNCATE)

Rechecking Fragmentation:

declare @dbid int, @objid int
select @dbid = DB_ID('TestShrink'),@objid = OBJECT_ID('SmallTable')
dbcc ind(@dbid,@objid,2)

image

select * from sys.dm_db_index_physical_stats(9,2121058592,2,1,DEFAULT)

image

Too tired taking screen shots :) will summarize why Shrink Database will impact fragmentation.. For any question please fee free to send mail to gurucb@hotmail.com.

DBCC ShrinkDatabase, Will it cause Fragmentation in Index: Fragmentation Series 3

Recently we had a discussion if ShrinkDatabase will cause fragmentation and if we need to reindexing and Shrinking of database as a maintenance plan what should be order of operations: ShrinkDB followed by Reindexing or Reindexing post ShrinkDB.

Before starting as any good DBA would suggest, Shrinking a database or Database file should not be considered at all atleast of production, for it has adverse impact on performance. The reason file expanded was it needed space in first place so shrinking database / file to reclaim disk space is not right approach as database would grow again.

But there may be some scenarios where it makes sense to Shrink Database.. One that comes on top of my head is an example where a huge table is deleted from database as it is no longer required which accounts for 50% of database size. In such cases it may be absolutely valid for database to be shrunk.

Now since disclaimer is over :) would like to get into Nitty-gritty of Fragmentation with ShrinkDatabase.

For repro here is the plan: (Please do not try in Production Environment)

Step 1: Create Database (on drive that has couple of DBs free space)

Step 2: Create huge Table with lots of pages (couple of thousands amounting to GIGS)

Step 3: Create smaller table (Ensure pages for smaller table come in end after Huge Table)

Step 4: Create index on Smaller table

Step 5: Populate Data in Smaller Table

Step 6: Take DBCC IND output for Huge Table and Smaller Table.

Step 7: Run DBCC REINDEX on smaller table to ensure no fragmentation.

Step 7: Get sys.dm_db_index_physical_stats output for smaller tables.

Step 8: Get Sp_SpaceUsed for database.

Step 9: Truncate Huge Table

Step 10: Get SP_SpaceUsed again for Database

Step 11: Shrink Database using DBCC ShrinkDatabase

Step 12: Get DBCC IND output for smaller Table

Step 13: Get sys.dm_db_index_physical_stats for smaller table and compare fragmentation.

Since this notes is growing to be long will put it in a new series.. these above steps would prove that Shrinking would introduce fragmentation of indexes.

Summarizing:

* Shrinking Databases and Files should be avoided.

* If Shrinking fragmentation is introduced and rebuilding is necessary to remove fragmentation.

* Always follow Shrinking with Reindexing for optimal performance.

Understanding Logical Fragmentation – Series 2

Post understanding of what is logical fragmentation, now I would use DBCC IND to calculate Logical Fragmentation and compare it with sys.dm_db_index_physical_stats dmv.

The formula to calculate Logical fragmentation is

Fragmentation % = (Fragment Counts – 1) / Page Counts * 100

There are 2 types of extents Mixed Extent and Uniform extent. Mixed Extents means each page of extent may belong to one or more number of objects. To rephrase it an extent may contain pages from objects amounting to a maximum of 8 objects or minimum of 2 objects.

In our case let us consider a table of 8 pages.

Case 1: Each page belongs to a different extents and each page is not physically next to pervious page.

In our case below each page in an extent belongs to a different object (by color code). So in this extent each page belongs to Objects like "A”, “B”, “C” etc.

image

Assume our object A contains 8 Pages and each of these 8 Pages are in different extents and are not consecutive.

NOTE: Last of Current Extent and First page of NEXT extents are considered as different fragments but as constituents of same fragment.

Number of Pages: 8, Number of Fragments: 8.

Fragmentation % = ((8 – 1) / 8 )* 100 = 87.5

Case 2: 4 Pages belong to Extent A and 4 Pages belong to different Extent and in each extent they are contiguous pages.

Number of Page: 8, Number of Fragments: 2 (Extent A and Extent B and assuming each pages in Extent are contiguous)

Fragmentation % = ((2-1) / 8 ) * 100 = 12.5%

In similar way we can calculate fragmentation based on Number of Fragments and Number of Pages.

The reason why we may have to reduce 1 from Fragment counts is if there is any object with only 1 Page them it would be only in 1 Fragment. So according to our formula

Fragmentation % = ( (1 – 1 ) / 1 ) * 100 = 0

Else even for 1 Page we would get fragmentation as 100% :). To avoid this one needs to deduct 1 from Fragment Counts.

Coming to an example in SQL Server, I have used below script to create and populate table:

use tempdb
go
create table TestFragmentation(Id int, ch char(900))
go
create index Ind_TestFragmentation on TestFragmentation(ch)
go
declare @i int
set @i = 1
while (@i < 11)
begin
    insert into TestFragmentation values(1,Replicate(@i,400))
    set @i = @i + 1
end
go
dbcc traceon(3604,-1)
go
declare @objid int
declare @dbid int
select @objid = object_id('TestFragmentation'), @dbid = db_id()
dbcc ind(@dbid,@objid,2)

image

From above example, exclude Page 114 as it is an IAM page and this index is of 2 levels

Root Page (Page Type = 2 and Index Level = 1) and Leaf level Page (Page Type = 2 and Index level  = 0)

Now according to formula:

Fragmentation % at Non Leaf Level (Root Page , Page ID = 174) =  ((1 – 1) / 1) * 100 = 0%

Fragmentation % at Leaf Level ( 3 Pages Page ID = 110, 78, 41) = ((3 – 1) / 3 ) * 100 = 66.67 %

To check our results run below query:

select * from sys.dm_db_index_physical_stats(2,2121058592,2,1,Default)

image

From above index_level = 1 is non leaf level and in this case root level and there is only 1 page and see avg_fragmentation_in_percent it is 0 where as for the first row where index_level = 0 it is 66.67 which equates to our formula of fragmentation.

To test it further we have added some more rows using below script and this has been executed 4 times

declare @i int
set @i = 1
while (@i < 11)
begin
    insert into TestFragmentation values(1,Replicate(@i,400))
    set @i = @i + 1
end

go 4

 

Now run this command:

declare @objid int
declare @dbid int
select @objid = object_id('TestFragmentation'), @dbid = db_id()
dbcc ind(@dbid,@objid,2)

 

image

Out of this select only pages with Index Level 0 and page Type = 2. If we look at columns PageID, Next page and Prev Page we see that all of them are in different fragments. 10 Pages in 10 different fragments.

Fragmentation % = ((10 – 1) / 10) * 100 = 90.

And running query:

select * from sys.dm_db_index_physical_stats(2,2121058592,2,1,Default)

image

See this it is same as our calculation..

Let us see affect of reindexing on this table with below command:

dbcc dbreindex('TestFragmentation',ind_TestFragmentation)

go

Post reindexing,

image

There are 7 pages in leaf level with 3 fragments thus

Fragmentation % = ((3 – 1) / 7 ) * 100 = 28.57.

Hope with these 2 posts would help in understanding Fragmentation to some extent if you have questions please feel free to send mail to gurucb@hotmail.com

Understanding Fragmentation (Index) SQL Server – Simplistic View

“Googling” (www.google.com) or “Living” :)  (www.live.com) about fragmentation in SQL Server would result in innumerable link and most of them almost state same things in similar manner. Going thru couple of them left me wondering if Fragmentation is a simple and well understood (as represented by number of links) why the heck did we need so many articles, blogs etc etc.

In this blog, see Logical Fragmentation in a different and yet simplistic way and how to calculate Fragment Counts, Page Counts and Fragmentation %.

Put in simple way Fragmentation is out of order pages. Imagine taking notes in School for different subjects. If all of subject notes are taken in same book with out any order being followed then we would end up fragmenting the book. To elucidate this further,

Math notes in Pages 1, 3, 9, 5, 11, 21. English notes in Pages 2, 22, 25, 29.

Now to read complete Math notes I will have go to 1, 3, 9, 5, 11, 21 (Order of reading is important).

Every database has Data File and each data file is logically split into Pages (similar to notebook pages). For example Page 1, 2, 3 so on…… But if a table occupies multiple pages and not consecutively or contiguously then table is logically fragmented.

Fragmentation performance impact effects are only felt when all pages need to be read not when only 1 pages needs to be read. Understanding with our notes example even if Math notes is spread across multiple books (and pages) if it is only 1 page that to be read and if I know what page to read, even if pages are in random order it does not affect at all as I do not need to flip through pages but if entire notes needs to be read then fragmentation affects performances as pages are not in order.

Coming back to SQL Server, “Fragmentation does not effect Index Seek  but only effects Index Scan operations.”

 

In SQL Server 2005, (also in SQL Server 2000) there are couple of commands that we would be using.

DBCC IND, (I learnt usage of DBCC IND from www.SQLSkills.com)

DBCC PAGE (later but not in this blog)

Sys.dm_db_Index_Physical_Stats DMV (SQL Server 2005). More reference about this DMV from (http://msdn.microsoft.com/en-us/library/ms188917.aspx)

With this above three, I will try to explain how to calculate Fragment Counts, Page Counts and Logical Fragmentation %.

Before that formula that has been practically arrived for Logical Fragmentation % is

Logical Fragmentation % = ((Distinct Fragment Counts  - 1) / Page Counts)*100

From DBCC IND command we could identify Fragments, Page Count and then calculate Fragmentation % and compare with columns from the output of Sys.dm_db_index_Phsyical_Stats.

use Tempdb
go

/*

    Create Table in Temporary data.
    The Datatype chose is Char(900), due to
        Large data size fragemenation can be achieved easily and
        Index size would be 900 bytes
*/
Create table TestLogicalFragmentation(Col1 Char(900))
go

-- Insert Rows into table ( 8 Rows * 900 Bytes each = 7.2 KB < 8 KB (Page Size)
insert into TestLogicalFragmentation values('A')
go 8 --Inserts 8 Rows

--Created Index
Create index ind on TestLogicalFragmentation(Col1)
go

--Output of DBCC IND for Index ind and Index ID = 2 as it is a Non Clustered Index
declare @DBID int
declare @ObjectID int
select @DBID = db_id(), @ObjectID = Object_ID('TestLogicalFragmentation')
DBCC IND(@DBID,@ObjectID,2)

Output of DBCC IND

image

Index ID = 2 , Non Clustered Index (IND)

Page Type= 2 stands for Index Page and Page Type= 10 stands for IAM Page

Index Level: Number of Index Levels from Root to leaf level and in this there is only 1 Page.

Now let us insert couple of more rows in the table and dump IND again.

insert into TestLogicalFragmentation values('A')
go 8 --Inserts 8 Rows

declare @DBID int
declare @ObjectID int
select @DBID = db_id(), @ObjectID = Object_ID('TestLogicalFragmentation')
DBCC IND(@DBID,@ObjectID,2)
go

image

Now there are multiple pages, Pages 174, 110 and 115 are of Page Type 0, Page 90 of Page Type 1 and Page 41 of Page Type 10. We would consider only Page type 2 and Index Level 0 as they are index Pages (Type 2) and Leaf Level pages (Index Level 0) as fragmentation is prevalent more that Leaf Level.

Page 174: NextPagePID = 115 and PrevPagePID = 0 indicates first page in double linked list.

Page 115: NextPagePID = 110 and PrevPavePID=174 indicates second page.

Page 110 is Last Page and could be arranged as below:

image

But if the Pages were contiguous with no fragmentation it could have been in any ways as below.

image

Going back to current way pages are stored (first diagram) pages are stored in 3 different fragments, if they were contiguous like above it would have in 1 Fragment irrespective they span across Extents or in same extents. (i.e 8 page of Fragment x and 1 Page of Fragment x + 1 are not considered different fragments but are of same fragment).

Now coming back to % Fragmentation = ((Fragment Count – 1) / Total pages) * 100

In this scenario Logical fragmentation = ((3 different Fragments – 1)  / 3 Total Pages ) * 100 = 66.67% fragmentation.

Let us verify with DMV.

declare @DBID int
declare @ObjectID int
select @DBID = db_id(), @ObjectID = Object_ID('TestLogicalFragmentation')
Select * from sys.dm_db_Index_Physical_Stats(@DBID,@ObjectID,2,null,null)
go

image

See these columns these concur with our findings above.

In the next blog I will try to give different angle for fragmentation for more number of pages.

SQL Reporting Services: Report Data in Report Header and Report Footer

Any Report in Reporting Services can have a Report Header and / or Report Footer that are placed at top and bottom of Report Page. By default both of them are turned off but can be turned on simply by clicking Report Menu Item (from Layout Tab) and enabling Report Header and Report Footer as is depicted below.

image

Below are restrictions on Report Header / Report Footer:

* Header and Footers can contain only static data (text, Images,lines, rectangles, Borders etc)

* It can not contain Report Items like Tables, Matrix, List etc.

* TextBoxes in Report Footer or Report header can not reference Data from Dataset. But using Expressions ReportItems or Parameters Collections can be used.

* Expressions in TextBoxes can not refer more than 2 ReportItems from Reports

* Expressions can not contain functions like RowNumber() with scope as Dataset.

In Some cases, it would be helpful to have Data from Dataset in Report Footer or Report header and typically that could be consistent across all pages in Report. For example Report Metadata (that could not be generated from Built in reporting services functions) or Template Reports that could be reused across organizations may have these enabled.

All such reports can be created using ReportItems or Parameters Collections.

ReportItems Collection:

Creating New Dataset:

 

image

 

DataSource:

Connecting to AdventureWorksDW database (downloaded from www.codeplex.com)

image

Query for Dataset:

SELECT       
    DimProductSubcategory.EnglishProductSubcategoryName, DimProductCategory.EnglishProductCategoryName,
    FactResellerSales.OrderQuantity, DimProduct.EnglishProductName
FROM           
DimProduct INNER JOIN DimProductSubcategory
    ON DimProduct.ProductSubcategoryKey = DimProductSubcategory.ProductSubcategoryKey
INNER JOIN DimProductCategory
    ON DimProductSubcategory.ProductCategoryKey = DimProductCategory.ProductCategoryKey
INNER JOIN FactResellerSales
    ON DimProduct.ProductKey = FactResellerSales.ProductKey

 

Enabling Page Footer:

image

 

Report Body

Generate a report in normal way one would develop report with Table Report Item.

Steps:

1. Drop a table Report Item from toolbox:

image

Configure table report Item with dataset created in Data tab and set toggling. Post configuration Table Report Item should be as below:

2. Create table Groups – Right Click on left most columns (image ) and click on Insert Group.

image

Enter Group Name and Select Grouping Columns as below:

 

image

Defining Toggling by going to Visibility tab.

For drill down enable toggling for each grouping in table:

image

 

Post all configurations table Report Item should look as below.

image

In Report Footer, we would show Product Category and corresponding Order Quantity, so drag and drop 4 text boxes as shown below:

image 

Top 2 text boxes are for headings and bottom one is where table cells from Report Body are referred. To refer cells in Table write expressions for values as below:

=ReportItems!EnglishProductCategoryName.Value

Where ReportItems is a collection are report level.

EnglishProductCategory is textbox in table (Remember in table each cell of table (intersection of row and column is a textbox).

Then set visibility to false to Row and columns that show ProductCategory (DO NOT DELETE IT) and it is shown only in footer.

Using this indirect method Textboxes in Report Footer and Header can refer Data from Dataset.

In next we would indulge in how to use Parameters in Report Header and Report Footers.

Checkpoints in SQL Server Integration Services across multiple Packages

SQL Server 2005 Integration Services (SSIS) can restart failed packages from the point of failure, instead of rerunning the whole package and it depends upon Check Point File.

Check Point does not contain statements (SQL statements) that generated the data but it maintains history of what tasks need to restart. Sample Checkpoint output file (it should be .xml)

<DTS:Checkpoint xmlns:DTS="www.microsoft.com/SqlServer/Dts" DTS:PackageID="{B4EB440E-2CD8-412C-A79B-7461E0F9065F}"><DTS:Variables DTS:ContID="{B4EB440E-2CD8-412C-A79B-7461E0F9065F}"/><DTS:Container DTS:ContID="{8CE70AD0-684D-41D0-AD8E-511850925630}" DTS:Result="0" DTS:PrecedenceMap=""/></DTS:Checkpoint>

These Guid’s are just tasks that need be restarted and nothing else. To configure Check Point

* CheckpointFileName=Specify File Name

* CheckpointUsage=IfExists

* SaveCheckpoints=True

When dealing with Parent and Child Package Checkpoints should be configured else execution results will vary.

In this example I have created a Master package or Parent Package that Creates SQL table using Execute SQL Task and then invokes 2 different Child Packages which insert data into table created.

 

Package (Parent):

image

In Execute SQL Task created table as below:

IF EXISTS (SELECT        *
                                  FROM            sys.objects

                                  WHERE        name LIKE 'Multi_Level_CheckPoint_Test')
BEGIN
drop table Multi_Level_CheckPoint_Test
CREATE TABLE Multi_Level_CheckPoint_Test(i int PRIMARY KEY)
END

 

Child 1 has again a Execute SQL Task that inserts rows into SQL Table created in parent Package:

 

image

Insert into Multi_Level_CheckPoint_Test values(1)

Child 2 Package is similar to Child 1 in operation but it has 2 Execute SQL tasks with 2 insert statements.

 

image

* Insert into Multi_Level_CheckPoint_Test values(2)

* Insert into Multi_Level_CheckPoint_Test values(1) --- This would fail as table column is primary key and we are inserting duplicate row (1 is already inserted in Child1. Execute SQL Task)

After creating packages:

* Parent  - CheckPoint Enabled

* Child  - Checkpoint not enabled

First execution it would fail at second Insert of Child 2 which is understandable as we are trying to insert duplicate rows. But subsequent executions, It would fail again at Child2 but at first insert statement..

This is because checkpoint file at Parent indicates that Child2 needs to be executed again but since Child2 does not have Check Point enabled it starts from First Execute SQL Task which would fail again.

So, in multilevel package executions checkpoints should enabled at each package level and not just at parent level. It would have been nice if Child inherits Parents Checkpoint but there again the question would be do all task have to checkpoint enabled or only some and if only few which of few. Thus onus lies on developer to properly maintain checkpoints.