String Interning

In the (.NET) CLR (Common Language Runtime), there exists a table known as the intern pool. It contains a single reference to every unique string literal in your program. It does this to reduce the amount of memory used. This should come as no surprise as you’d have probably wondered at some point if the CLR is ‘smart’ enough not to make multiple copies of the same string. It basically is, but it’s not all that simple.

Taking note that strings in .NET are immutable, consider the following code;

string s1 = “hello world”;

string s2 = string.Concat(“hello”, ” world”);

 

Console.WriteLine(s1 == s2);

Console.WriteLine(Object.ReferenceEquals(s1, s2));

 

The output for the above code is:

True

False

 

The strings are equal, but they are different objects. In the case above, the strings are not interned. Let’s look at the next piece of code:

string s1 = “hello world”;

string s3 = “hello world”;

string s4 = s1;

 

Console.WriteLine(Object.ReferenceEquals(s1, s3));

Console.WriteLine(Object.ReferenceEquals(s1, s4));

Console.WriteLine(Object.ReferenceEquals(s3, s4));

 

This time, all thee comparisons are True. All the variables are actually the same objects as the CLR retrieved the same reference to the literal string from the intern pool and assigned it to each of the variables.

The good news is, you can intern strings yourself, and you do it with:

string String.Intern(string str);

 

The Intern() function will return an interned string reference. If the literal already exists in the intern pool, it will return the reference to it, if not, a reference to str will be added to the intern pool and that reference will be returned.

 

string s1 = “hello world”;

string s2 = string.Concat(“hello”, ” world”);

string s3 = string.Intern(s2);

 

Console.WriteLine(Object.ReferenceEquals(s1, s2));

Console.WriteLine(Object.ReferenceEquals(s1, s3));

 

The output of the above code is:

False

True

 

So, when would it be practical or reasonable to intern strings yourself? Your program would probably need to be working with a lot of string operations that likely end up creating many strings containing the same content. This is different from, for example, declaring const or static string literals and referencing them.

You’ll have to consider that string interning has the following side effects (from MSDN):

  • the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR’s reference to the interned String object can persist after your application, or even your application domain, terminates
  • to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected
Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s