Browse DevX
Sign up for e-mail newsletters from DevX


Heard on .NET Rocks!: Jay Roxe

This installment of .NET Rocks! is an interview with Visual Basic Product Manager, Jay Roxe.

am the host of .NET Rocks!, an Internet audio talk show for .NET developers online at http://www.franklins.net/dotnetrocks and http://msdn.microsoft.com/dotnetrocks. My co-host, Rory Blyth, ( http://www.neopoleon.com ) and I interview the movers and shakers in the .NET community. We now have over 88 shows archived online, and we broadcast a new show every Friday night from 7:30 PM to 8:45 PM, Eastern Time, and make them available for download (with podcasting support) on the following Monday.

Show number #86 was an interview with Visual Basic Product Manager, Jay Roxe. Before landing that role, he was Dev Lead on the .NET Framework and Base Class Libraries, where he worked on some of the fundamental framework and BCL classes. Here is an excerpt from the first half of the show where he talked about that experience:

Carl: Now, a lot of people don't really know who you are, and so for many of our listeners this is an introduction to Jay. So, just lay a couple of classes in the framework that you've written on us to establish your studliness.

Jay: OK, well let me give you the history. I joined what is now the .NET Framework team, or the Common Language Runtime team, back in November of 1997. [This was] back when it was called Project Lightning, then it became COM+, then it became Project 42, then we had this nice little re-org that made it Project 21 ? we lost half the team.

And so, I wrote things like String and StringBulder, and I wrote the initial implementation, although I did not own it forever, all of the base types like Int [16, 32, and 64], and double, and all of those. I did some of the work on Object and was Dev Lead for the System.IO classes, the globalization, and a bunch of the collections work as well.

Rory: Hey Jay, you mind if I ask you a couple questions? I'm already curious about some things. First of all, and this was brought up at one of the MSDN events I did this week, why is String sealed? Note: for VB.NET programmers, Sealed = NotInheritable.

Jay: Because we do a lot of magic tricks in String to try and make sure we can optimize for things like comparisons, to make them as fast as we possibly can. So, we're stealing bits off of pointers and other things in there to mark things up. Just to give you an example, and I didn't know this when I started, but if a String has a hyphen or an apostrophe in it [then] it sorts differently than if it just has text in it, and the algorithm for sorting it if you have a hyphen or an apostrophe if you're doing globally-aware sorting is pretty complicated, so we actually mark whether or not the string has that type of behavior in it.

Rory: So, what you're saying is that in the string world, if you didn't seal String there would be a whole lot of room for wreaking a lot of havoc if people were trying to subclass it.

Jay: Exactly. It would change the entire layout of the object, so we wouldn't be able to play the tricks that we play that pick up speed.

Carl: What I'm really curious about, Jay, is the StringBuilder. It's magic! I mean, I show that to people and they can't believe the difference in speed.

Jay: Which difference in speed?

Carl: Well the classic demo that I do for people is that I use a for/next loop taking an Int32 i from 1 to 10,000, and I append i.ToString to a string variable using an &= operator. And it takes about a second on a 2Ghz machine.

           Dim Value As String
           Dim i As Int32
           For i = 1 To 10000
               Value &= i.ToString

Then I put a zero on the Int32 and make it go to 100,000.

           Dim Value As String
           Dim i As Int32
           For i = 1 To 100000
               Value &= i.ToString

And then it takes a long time. After a minute goes by, we're barely past 40,000. Then I do it with a StringBuilder and it's immeasurable. Milliseconds.

           Dim Value As String
           Dim sb As _
           New System.Text.StringBuilder
           Dim i As Int32
           For  = 1 To 10000
           Value = sb.ToString

Jay: Because you're only actually using the String when you get out of the loop, right?

Carl: Right. So what's going on in there?

Jay: When you're appending i.Tostring, what we actually do is every time you run that we'll create the string "1" then we'll create the string "1 2" then we'll create the string "1 2 3," and although that probably would be useful for counting the number of dates you're going to get after being in the New York Times (.NET Rocks! was in the New York Times, Thursday, October 28, 2004. See http://www.shrinkster.com/1wj ) it's probably not what the people are really looking for. [Everyone laughs.]

Carl: So, the new string is the complete string?

Jay: It's the complete string. String doesn't change. It's immutable. What you've actually done when you do the count from 1 to 10,000 is you've created 10,000 instances of String.

Carl: And you're actually using System.String...

Jay: Yes. You've only got a reference to one of the Strings, which means you've got 9,999 of them on the floor.

Carl: Ready for garbage collection.

Jay: Ready for garbage collection. But you've taken the time to create all of those strings, and you're up... what is it, you're probably at 37,000 characters give or take in the last string?

Carl: Yeah.

Jay: So that's a pretty big string by the time you get all done.

Carl: Now, why is it so slow when using a String?

Jay: Well, every time you create one of those you're doing a 37,000 character copy when you get toward the end. Now, every string is two bytes, so you're moving large amounts of memory and I'm sure one of your listeners who has a calculator in front of them can figure out that if you're moving 37,000 characters times 2 bytes times 10,000 strings...

Carl: Well, wait a minute. If you're using the StringBuilder, aren't you also copying?

Jay: No. That's actually the trick. The StringBuilder object itself points to a string, and we have some internal APIs on string that allow us to change that one string. So let's say you create the StringBuilder with just "1" in it, and then you add "2." Normally you would have created two string instances. We actually have a pointer to what at that point is a hidden string, where we can just create the string "1 2" without having to copy it.

Now, we have to do an occasional copy because when you get up to having 100 characters in the string or whatever, you have to grow the string, we have to grow the buffer, just like you would with any growable array. What that means is we only have to copy it when we're out of memory.

The trick is that whenever you ask us for a pointer to that string (with .ToString) we just go in and give you back that string object and mark it as dirty. So that if you, say, wanted to add again, at that point we create another copy of the string.

Carl: That's pretty tricky. But it's those internal APIs that make it so much faster than a straight copy then, right?

Jay: Yep. And any time you see a String, you can count on it being immutable. But we can take advantage of the fact that we know some things about strings to get you that higher performance.

Carl: Am I creating more object references in a string loop, versus only one reference with StringBuilder?

Jay: Well, actually creating the object reference itself is pretty cheap. We did a lot of benchmarking, because we know this is something people really care about. And, it turns out that most of the time is actually spent doing the copy. So even if you've got a pretty optimized memcopy, that's still a lot of memory moving around.

Thanks for your registration, follow us on our social networks to keep up-to-date