Error TF53001: The database operation was canceled by an administrator

A customer updated his TFS 2010 to 2013 in a new machine running Windows Server 2012 R2 and Sql Server 2014. Everything went fine, until after few days they started having an error whenever he tried to do a GetLatest or a Check-in or Check-out operation.

Error TF53001: The database operation was canceled by an administrator

Actually this error is not really informative, so I asked them to verify Event Viewer on the server (an operation you should always do whenever you have wrong behavior of your TFS). For each client operation that gave error they have this Event Error logged

Log Name:      Application
Source:        MSSQL$SQL2014TFS
Date:          19/02/2015 17:15:54
Event ID:      17310
Task Category: Server
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      xxxx.xxx.local
Description:
A user request from the session with SPID 70 generated a fatal exception. SQL Server is terminating this session. Contact Product Support Services with the dump produced in the log directory.
Event Xml:

This is an internal error of Sql Server, and we verified that SQL 2014 was in RTM, with no cumulative update installed. After installing latest Cumulative Update for SQL Server 2014 everything started working again. Since Cumulative Update usually address bugs in Sql Server product, it is always a good practice to keep your Sql Server up to date, and if you are experiencing strange Sql error, it could be the solution to your problems.

Gian Maria.

Check progress of DBCC CHECKDB

If you issue a DBCC CHECKDB on a big database to verify for consistency errors, it will take a long time to complete, but the Management Studio windows usually does not give you any hint about how long does it take, or a percentage progress. Luckily enough sql server has a Dynamic Management View that can solve your problem.

This is the SQL code to visualize progress of the operation

   1: SELECT  session_id ,

   2:         request_id ,

   3:         percent_complete ,

   4:         estimated_completion_time ,

   5:         DATEADD(ms,estimated_completion_time,GETDATE()) AS EstimatedEndTime, 

   6:         start_time ,

   7:         status ,

   8:         command 

   9: FROM sys.dm_exec_requests

  10: WHERE database_id = 16

In my example I filtered the results only for the database and used the Id of the database that you can obtain with the DB_ID function.

An example of what you got with this query is represented in the following picture.

image

As you can see you can easily visualize percentage of completion, estimated end time and the command that is running.

Gian Maria.

When it is time to tweak SQL Server queries

I’ve a stored procedure with a query that runs on a quite big database, it was slow (more than one minute to run) and was optimized using a temp table. The result is that execution time dropped to ~2 secs, and since this was acceptable the optimization stopped.

After a couple of months, the query become really slow again, it got executed in ~30 secs and I started to investigate why.

A quick tour with SSMS and STATISTICS I/O did not reveal some news, but the execution plan have a strange sort operation that takes 90% of the time of the overall query and this is really strange, but I did not find any reason why it should slow the query so much. To have a better picture of what is happening I decided to fire the Activity Monitor to check if the query stops for any lock in table, but I found that the task that is executing the query goes into heavy parallelism (you see a lot of row in the activity monitor with the same id), but each subtask is waiting a lot with a  CXPACKET wait type and everything seems stuck. CXPACKET wait time happens when the execution of the query is parallelized so I decided to disable parallelism of the query with OPTION (MAXDOP 1) to verify if the situation change. The result is that the query now executed in ~2 secs, like it did two months before, so I decided to leave it with parallelism disabled and it start to run just fine.

The conclusion is,: when database is big, and query are complex, it is not so simple to understand why a query is slow, sometimes you need to tweak how SQL Server issue the query with Query Hint and you should use all the tools you have to understand what is really happening :P.

Gian Maria.

Hardcore fix error in Sql Server database

In a production Sql Server database we had some issue with the hardware, the result is that one very big database started to gave us errors on DBCC CHECKDB, the error is the following one.

Msg 8929, Level 16, State 1, Line 1
Object ID xxxxxx, index ID 1, partition ID xxxxxx, alloc unit ID xxxxx (type In-row data): Errors found in off-row data with ID xxxxxxx owned by data record identified by RID = (3:34252:4)

This happens in one of the most central and important table of the database, that contains millions of record, and only 20 rows gave us error, so I’d like to verify witch rows had the error, because in that table there are lots of records in logical status “deleted” and the corresponding physical row can be removed from the database without much pain and reinserted with the very same data to avoid dataloss.

So my question is… how can I identify the row given the RID (Row Identifier?) First of all from Object ID you can easily find the name of the table with select Object_Name(<Object_ID>) but what about the RID. After a little search I find the sys.fn_physLocFormatter function and I created this query

   1: SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS RID,  

   2:         *

   3: into debug.RIDTABLE 

   4: FROM TableName

I selected into a table in Debug schema the RID of the Row as well as all the other columns of the table with error, now I take the output of the DBCC CHECKDB, selected all the RID of rows with errors and I could execute the query

   1: select * from debug.RIDTABLE

   2: where RID IN ('(3:34252:4)', '(x:xxxxx:x)'..... 

To extract from the RIDTAble all the row with errors, then I can simply delete them from the original table, and insert them again from the RIDTABLE. Checking the data inside RIDTABLE it seems that the data is correct, except for a NVARCHAR(MAX) field that contains long string of text.

After the process DBCC CHECKDB did not give any other error, so I was quite happy to solve this problem without the need to put the database in single-user-mode, but the most important aspect is that I was able to identify the rows with error, to understand the impact of errors in the software. It turned out that only a couple of lines contained really important data, the others are in Deleted status, so we did not lost anything so important.

Gian Maria.

Why Management Studio forces you to know SQL

Suppose you have a big table with about 3 GB of data in and you need to add a nullable new column on it, you can open SSMS, open the table in designer and create a new column of type Int called sclo_durationInMinutes, press save and the table gets update quite immediately.

Then you realize that the user want the duration in Minutes as a floating point number, so you open the designer, change the type of the sclo_durationInMinutes from Int to Float and press save…. after 30 seconds SSMS tells you that you got a timeout. If you look at the change script you can verify, with HORROR, that changing the type of a column is done with the creation of a temporary table, copy all the data (3 GB) in the temp table and finally deleting the old table and renaming the temp table with the name of the original table… REALLY?

Then I simply opened a new query and issue an ALTER TABLE manually

alter table dbo.scanlog

alter column sclo_durationInMinutes Float null 

I know that SSMS should work for a wide range of change, but requiring a full copy of a table just to change the type of a column it is quite overkill.

In the end, if you use SSMS it is better for you to have a deep knowledge of SQL, because the designer too much often opts for Copy-drop-rename strategy to alter structure of a table.

Gian Maria.