.NET Rocks! (continued)
Yeah, you'll go all over the drive. So what we instead try to do is, we try to make our I/Os, as efficient as possible. We want to do one single scan across the database. We actually read everything in allocation order, and we generate what we call facts about things that we have seen. So, for instance, if you imagine a very, very simple B-tree with a single root page, and a couple of child pages at the leaf level and there's links between the various pages so, we might read a parent-the root page which is called the parent. We generate a fact saying, "Hey, we saw this page, this parent page, this page is the parent of say page A at the leaf level and page B at the leaf level." And then at some later point we may read those leaf level pages and generate facts about those, saying, "Hey, we saw this page A , we saw this page B, page A points to page B and vice versa." So we have these facts about the different pages that we have seen and throw all those through a processor, we sort them by page id and so on, we get them back out in a sorted list and we do what's called aggregation on them. So for instance, checking the linkage in a B-tree, every page that we see, has to have three facts about it: it has to have a parent page pointing to it, it has to have a what we call an actual fact, which is we read that page, and we have to have the linkage fact, at the leaf level.
Richard Campbell: Right.
Paul Randal: So by generating these facts, we can read the pages in any order, and we can still do all the reconciliation and aggregation to make sure they are consistent.
Richard Campbell: So, more relatively, you are reading them in the order that they are written so that reads fast, knowing that eventually it's all going to match up as if you read it deep
Paul Randal: Yup.
and if it doesn't, then you have found a problem.
Carl Franklin: Absolutely it's about indexing and getting it right. Do you find that it's better if you defrag first, before you run CHECKDB?
Paul Randal: It makes no difference at all because we have written allocation order. What we actually do is we take a bunch of IAM chains and I will explain what those are in a minute if you want-we take a bunch of IAM chains and we merge them all together. So we read a bunch of pages for multiple index and tables, all at the same time. We generate all these different facts, and because they are keyed by page id, object id, index id, none of them get mixed up, so going back to the TempDB disk speed issue, we are generating all these facts and we have got to store them somewhere, so we store them in a word table in memory, but if you've got a very, very large database, it's possible that that word table is going to get bigger than the size of memory you have got, so it has to spill and it's going to spill to TempDB, so we have got this fact table that's spilling out to TempDB, which we are reading and writing from, so if your TempDB disks are really crappy, then that's going to slow down CHECKDB performance, okay?
Carl Franklin: Sure.
Richard Campbell: I'm finding TempDB on SQL 2005 boxes is just way busier.
Paul Randal: It is. There is a bunch of extra stuff that uses TempDB. We put out a white paper on working with TempDB in SQL Server 2005 that Sunil Agarwal on my team wrote with a couple of other people; that he tells a whole bunch of DMVs that you can use to see what the load is on TempDB and where the allocations are and so on. So you can do a whole bunch of tuning. In fact, so now there's a very good presentation on tuning TempDB and troubleshooting TempDB. That's actually going to be at SQL Connections in Orlando for Microsoft Day on the 26th of March, where he's going to do that presentation so, that's why I want to check that out.
Carl Franklin: Awesome.
Richard Campbell: And we will be there too.
Carl Franklin: We will be there, that's why we're not doing anything that day as a matter of fact.
Richard Campbell: And my little acronym Police call says DMV is Dynamic Management View.
Paul Randal: Absolutely, yup.
Richard Campbell: Good.
Carl Franklin: Thanks Richard.
Richard Campbell: Because he slips them in there fast.
Carl Franklin: Yeah.
Paul Randal: Okay, three more things affect the speed of CHECKDB. Complexity of the database schemer, so I described how we generate all these different facts about different things in the database; the more complex the schemer is, then the more things we are going to have to generate facts about, and the more checks we are going to have to run. So for instance I said earlier on, that the non-clustered index checking algorithm can take up to 30% of the CPU if you don't have any non-clustered indexes, then we are not going to run that algorithm, okay? Same goes for a bunch of other things.
Richard Campbell: But if you don't have any non-clustered indexes, you probably got problems.
Paul Randal: You probably got problems anyway so
Carl Franklin: Is there any way to quantify, and I know everything is relative, but is there any way to quantify how complex a given schemer is? Is it possible to miscalculate that?
Paul Randal: Well, it's not really possible to say what's too complex or not. It depends on what your application needs. It's possible to go overboard, say, for instance, there's a bunch of new features in Server 2005, for instance, Partitioning, and Row-overflow, which is the ability to have rows that are greater than 8k. I have seen customers that change every column in their schemer to be VARCHAR 8000, or VARCHAR (Max) for instance, so that they can spill over, and they go nuts and have hundreds of partitions and what kind of range scan performance that we touched on earlier on when Kimberly was so nice to join us. Range Scan performance doesn't just depend on fragmentation, okay? If you have rows that are over 8k, then one of the columns has been pushed off into text storage, so if you are doing a range scan and scanning these rows in the Read Ahead for the index or tables that the rows live in, [that] may be wonderful but, if you are having to pull in rows from overflow storage then each one of those column values you are pulling in from row overflow storage is a random IO.
Richard Campbell: Right, because it doesn't know how long it is.
Carl Franklin: Sure.
Paul Randal: And it doesn't know where it is in the disk, so it is very, very difficult to do Read Aheads on these randomly placed things, okay? So you may think you've got perfect fragmentation but your range scan performance still sucks. It's because you have allowed larger than 8k rows and, of course, there's no way to predict which rows are going to have columns that are off-row. The column that gets pushed off-row from one row maybe different than the column that gets pushed off-row for another row. So you got to be careful that even though there's all these new features you don't just use them for the hell of them. You have got to
Richard Campbell: This is traditional Microsoft behavior though. Give us enough rope to hang ourselves with...
Paul Randal: Yeah, I know, one of the things that we don't do very well, and we are working on it, is giving really good guidelines on what we just use and WAN and Best Practices and so on.
Richard Campbell: It's always remarkable how much is your fault that they misused your tool,
Paul Randal: Yeah, I mean, we give people nice spangly features to use and we don't give them a whole bunch of information on when they should really use them and when not to use them and what the tradeoffs and so on are.
Richard Campbell: This will punish you if you do dumb things with it.
Carl Franklin: Sounds like I go to a pharmacy and I get warnings on the bottle, "Warning! Don't be dumb."
Paul Randal: Yeah. Do not take an overdose; this is going to kill you, kind of thing.
Richard Campbell: Yeah.
Paul Randal: So there is complexity to the database schemer right, what else? Oh, which options you specify? So I said there's an option, you can use with physical only that will cut out all the logical checks and make things run faster. So that's a good example of that. There's also an option that you can turn off, just the non-clustered index checks, which is no index, use that and that cuts down the speed too.
Richard Campbell: But you are also cutting down the validation of CHECKDB?
Paul Randal: It is cutting down the validation but if you use physical only and you have got page checks on this turned on, then you are going to find any hardware core corruptions as long as page checksums have been written. And that's another-keep that on the stack and I'll will talk about page checksums in a minute.
Carl Franklin: All right.
Paul Randal: And then, I guess the last thing is the number and type of corruptions that we find. So there are some algorithms that don't get run unless we find corruptions. So, for instance, the non-clustered index checking algorithm. When we find a row, going and doing the look up in the non-clustered index to see whether the row really exists, we use a kind of complicated hashing algorithm-it's the same kind of fact generation algorithm that I described before. So if we find that there's any mismatch packs, there's no way to match that back to a particular index or a table row.
Richard Campbell: All you now know there's something wrong.
Paul Randal: Something is wrong somewhere. So what we have to do is, we have to rescan all the rows again, looking for rows that match to that mismatched pack, and then we actually go and do what we call the Deep Dive, where we actually go in to the exact look up in the table or index that we think is missing. So that algorithm is extremely expensive to run because then you are taking a whole bunch of random I/Os.
Richard Campbell: But that's also the algorithm you wanted run-you have now found a problem.
Paul Randal: We have now found a problem, so the thing is if we find a problem, it could trigger one of these long running algorithms.
This interview continues at .NET Rocks!.