Tuesday, November 08, 2005

Specialization - where does it happen?

For this first installment of the blog I plan to write about a topic I am very fond of, a .Net technology that was defined almost completely in 2001, a technology of which we got the first implementation as early as in 2003. Yet still, this technology has been just released with Visual Studio 2005 wave of technologies. There are aspects of this technology that I consider almost beautiful because of the way how they fit into the architecture of the basic runtime elements of .Net like IL and JIT.
Perhaps you have already guessed that I am talking about .Net generics. Generics were thought out in MS Research as early as 2001, experimentally implemented in Rotor in 2003 (project called Gyro). I think that it is beautiful that, although some new IL statements were introduced, it was not necessary to change a lot in the majority of IL in order to introduce generics since IL was for a large part generic from the beginning. Since lots of IL instructions were generic from the beginning, there was a relatively clean way to implement generics on such a foundation. For instance, IL instruction add simply adds two first operands on the runtime stack regardless of their type (of course, only if they are of compatible types) and the JIT compiler determines what the actual type of the operands is and generates appropriate code. The IL command itself doesn’t hardcode the type of its operands (for instance it doesn’t look like add.int32 or add.int64). At the IL level the type of the two operands is implicit.
The aim of the generics is to introduce generic functionality in .Net (at first, mostly in order to enable strongly typed generic collections; now some other important uses like for instance nullable types appear) in an efficient, robust and elegant manner avoiding problems that other languages/environments experienced. One of such problems is the code bloat experienced by the C++ templates. Efficiency of the code is achieved through elimination of boxing, unboxing and type conversions (casting) in general, specialized code and smaller memory footprint.
Today there are lots of good sources of information on generics. Some of them are more formal (like the two here and here) and some less formal. In September and October 2003, MSDN Journal published two articles on generics by Jason Clark (here and here). There is an article on msdn by Juwal Löwy. Design guidelines by Krzystof Cwalina for generics can be found here.
What I would like to write about is the border between a generic and specialized code – where the generic code stops and specialized code starts.
It is interesting to note that at the IL level everything is generic. Irrespective of the number of different specializations of some generic class or function that are used in code, there is only one instance of IL code.  If we had our custom list of integers CustomList<int> and a list of strings CustomList<string> there would be only one implementation of the class CustomList in the IL code. The specializations for the int and string happen in the runtime and are done by the JIT compiler.
Actually, the JIT compiler handles specializations for the reference types and the specializations for the value types differently. So if we for instance have CustomList<object> and CustomList<string> there will be only one runtime implementation of the CustomList generic class – the one that is valid for all reference types. Since all reference types are handled using references, such sharing of the runtime implementations is possible. On the other side, each specialization for a value type will get its own runtime implementation. Thus we have smallest possible footprint as long as the generic class is specialized for reference types. For value types we get the best performance. Performance is achieved through elimination of boxing and maximal specialization. Since CustomList<int> internally really works with integers and CustomList<double> with doubles, boxing and unboxing, the conversions necessary in the classes like System.Collections.ArrayList that hold all data as the type object, are not needed anymore.
We will discuss the characteristics of the generic functionality using simple console application example (see example 1). We will investigate what the IL code looks like for a given simple C# program and then we’ll pay a visit to the JIT compiler and see what really happens in the runtime. The discussion may seem a bit involved but actually no real knowledge either of IL or of assembly code is needed to get the point. So, follow me.
using System;
class MainApp
public static T ReturnArg<T>(int i, T[] args)
return args[i];
public static void Main()
object[]      objArray      = new object[3];
string[]      stringArray   = new string[]{"a","b"};
byte[]        byteArray     = new byte[] {1,2,3,4,5};
char[]        charArray     = new char[] {'a','b','c','d','e'};
int[]         intArray      = new int[]   {1,2,3,4,5};
long[]        longArray     = new long[]  {1L,2L,3L};
decimal[]     decimalArray = new decimal[]{ 1M, 2M, 3M };
//reference type parameter –
//all will be handled by the same specialized implementation
ReturnArg(0, objArray);
ReturnArg(1, stringArray);
//value type parameter - each type gets its own specialization
ReturnArg(0, byteArray);
ReturnArg(1, charArray);
ReturnArg(0, intArray);
ReturnArg<long>(1, longArray);
ReturnArg(0, decimalArray);
Example 1. original C# program
The generic function ReturnArg has two arguments: the first is the index of the element to return and the second is an array from which the element with the index given in the first argument is returned. The type of the elements in the array is generic. In the Main function we initialize different arrays, two of them are of reference types, object and string, and there are various value type arrays with value types of increasing size (from the byte having size 1 byte, char with the size of two bytes to the decimal with the size of 16 bytes). It will be interesting to see how the types having different sizes will be handled.
If we compile the sample and open the resulting .exe in ildasm we will get following IL representation for the function ReturnArg<T>. The original C# code is also given in the comments.

.method public hidebysig static !!T  ReturnArg<T>(int32 i,!!T[] args) cil managed
//000007:               return args[i];
  IL_0000:  ldarg.1
  IL_0001:  ldarg.0
  IL_0002:  ldelem     !!T
  IL_0007:  ret
} // end of method MainApp::ReturnArg
Example 2. ReturnArg<T> function compiled in IL
Obviously, the code is generic. There is nothing specialized in the given IL. The two first ldarg IL commands load the arguments on the stack and the ldelem command gets the element from the array. Only the ldelem command is parameterized by the type of the argument, but it is still generic. The specialization will happen in the runtime when this IL code gets transformed in to the machine code.  We will see it in a minute.
Let’s see what the IL code for the Main function looks like. Each C# row with a call the ReturnArg function is transformed in 4 IL instructions where the first two prepare the arguments and the fourth one just removes the result of the call from the stack, since we don’t do anything with it. The third instruction is the call of the generic function itself. We can see that at the mnemonic level IL code doesn’t look all that different from the C# code. At the byte code level (see Example 4) this call instruction is a 0x28 byte code followed by the index of the function description in the metadata tables. In our case the index (2B)000001 references the first function description in the metadata table number 0x2b, table with signatures of instantiated generic functions.
So at the IL code level still no specializations happen, all type arguments are just stated as data.
.method public hidebysig static void  Main() cil managed
...code that initializes the arrays omitted for brevity
//000021:   //reference type parameter - all will be handled by the same specialized implementation
//000022:             ReturnArg(1, objArray);
  IL_00b3:  ldc.i4.1
  IL_00b4:  ldloc.0
  IL_00b5:  call       !!0 MainApp::ReturnArg<object>(int32,!!0[])
  IL_00ba:  pop
//000023:             ReturnArg(0, stringArray);     
  IL_00bb:  ldc.i4.0
  IL_00bc:  ldloc.1
  IL_00bd:  call       !!0 MainApp::ReturnArg<string>(int32,!!0[])
  IL_00c2:  pop
//000024:             //value type parameter - each type gets its own specialization
//000025:             ReturnArg(4, byteArray);
  IL_00c3:  ldc.i4.4
  IL_00c4:  ldloc.2
  IL_00c5:  call       !!0 MainApp::ReturnArg<uint8>(int32,!!0[])
  IL_00ca:  pop
//000026:             ReturnArg(4, charArray);
  IL_00cb:  ldc.i4.4
  IL_00cc:  ldloc.3
  IL_00cd:  call       !!0 MainApp::ReturnArg<char>(int32,!!0[])
  IL_00d2:  pop
//000027:             ReturnArg(3, intArray);
  IL_00d3:  ldc.i4.3
  IL_00d4:  ldloc.s    intArray
  IL_00d6:  call       !!0 MainApp::ReturnArg<int32>(int32,!!0[])
  IL_00db:  pop
//000028:             ReturnArg<long>(0, longArray);
  IL_00dc:  ldc.i4.0
  IL_00dd:  ldloc.s    longArray
  IL_00df:  call       !!0 MainApp::ReturnArg<int64>(int32,!!0[])
  IL_00e4:  pop
//000029:             ReturnArg(2, decimalArray);
  IL_00e5:  ldc.i4.2
  IL_00e6:  ldloc.s    decimalArray
  IL_00e8:  call   !!0 MainApp::ReturnArg<valuetype[mscorlib]System.Decimal>(int32,!!0[])
  IL_00ed:  pop
//000030:         }
  IL_00ee:  ret
} // end of method MainApp::Main
Example 3. Main function compiled in IL
//000026:             ReturnArg(1, objArray);
  IL_00b3:  /* 17   |              */ ldc.i4.1
  IL_00b4:  /* 06   |              */ ldloc.0
  IL_00b5:  /* 28   | (2B)000001   */ call    !!0 MainApp::ReturnArg<object>(int32,!!0[])
  IL_00ba:  /* 26   |              */ pop
//000027:             ReturnArg(0, stringArray);      
  IL_00bb:  /* 16   |              */ ldc.i4.0
  IL_00bc:  /* 07   |              */ ldloc.1
  IL_00bd:  /* 28   | (2B)000002   */ call    !!0 MainApp::ReturnArg<string>(int32,!!0[])
  IL_00c2:  /* 26   |              */ pop
Example 4. A call of the ReturnArg<object> and ReturnArg<string> with IL byte codes visible
If we go a step further to the assembly language level, the generic code finally gets specialized. We will start our little application and in the debugger watch the “Disassembly” window (see Example 5.).
Remember the statement above that all specializations of a generic function for reference types share the same runtime implementation? In other words, the JIT compiler should generate the same code for functions ReturnArg<object> and ReturnArg<string>. Well, actually there is only one piece of code. That is obvious if we observe the first two calls of the ReturnArg function. Machine code generated by the JIT compiler calls the function at the same address for both calls.
So it is true, ReturnArg function for objects as well as for the strings has the same runtime implementation – the specializations for reference types share the same implementation!

25:             //reference type parameter - all will be handled by the same specialized implementation
    26:             ReturnArg(1, objArray);
00000232 68 00 88 91 00   push        918800h
00000237 8B 54 24 44      mov         edx,dword ptr [esp+44h]
0000023b B9 01 00 00 00   mov         ecx,1
00000240 FF 15 CC 87 91 00 call        dword ptr ds:[009187CCh]
00000246 90               nop             
00000247 90               nop             
    27:             ReturnArg(0, stringArray); 
00000248 68 48 88 91 00   push        918848h
0000024d 8B 54 24 48      mov         edx,dword ptr [esp+48h]
00000251 33 C9            xor         ecx,ecx
00000253 FF 15 CC 87 91 00 call        dword ptr ds:[009187CCh]
00000259 90               nop             
0000025a 90               nop             
    28:             //value type parameter - each type gets its own specialization
    29:             ReturnArg(4, byteArray);
0000025b 8B 54 24 48      mov         edx,dword ptr [esp+48h]
0000025f B9 04 00 00 00   mov         ecx,4
00000264 FF 15 9C 88 91 00 call        dword ptr ds:[0091889Ch]
0000026a 90               nop             
    30:             ReturnArg(4, charArray);
0000026b 8B 54 24 4C      mov         edx,dword ptr [esp+4Ch]
0000026f B9 04 00 00 00   mov         ecx,4
00000274 FF 15 DC 88 91 00 call        dword ptr ds:[009188DCh]
0000027a 90               nop             
    31:             ReturnArg(3, intArray);
0000027b 8B 54 24 50      mov         edx,dword ptr [esp+50h]
0000027f B9 03 00 00 00   mov         ecx,3
00000284 FF 15 1C 89 91 00 call        dword ptr ds:[0091891Ch]
0000028a 90               nop             
    32:             ReturnArg<long>(0, longArray);
0000028b 8B 54 24 54      mov         edx,dword ptr [esp+54h]
0000028f 33 C9            xor         ecx,ecx
00000291 FF 15 5C 89 91 00 call        dword ptr ds:[0091895Ch]
00000297 90               nop             
    33:             ReturnArg(2, decimalArray);
00000298 FF 74 24 58      push        dword ptr [esp+58h]
0000029c 8D 4C 24 34      lea         ecx,[esp+34h]
000002a0 BA 02 00 00 00   mov         edx,2
000002a5 FF 15 9C 89 91 00 call        dword ptr ds:[0091899Ch]
000002ab 90               nop             
000002ac 90               nop             
Example 5. JIT generated code for the Main function (fragment)
For various value type specializations the different addresses are called and obviously generic function ReturnArg specialized for different value types has different runtime implementations that are specialized for each particular value type..
Let’s see what these implementations look like.

00000000 57               push        edi 
00000001 56               push        esi 
00000002 8B F9            mov         edi,ecx
00000004 8B F2            mov         esi,edx
00000006 83 3D 6C 78 91 00 00 cmp         dword ptr ds:[0091786Ch],0
0000000d 74 05            je          00000014
0000000f E8 02 23 44 79   call        79442316
00000014 3B 7E 04         cmp         edi,dword ptr [esi+4]
00000017 72 05            jb          0000001E
00000019 E8 AE 45 44 79   call        794445CC
0000001e 8B 44 BE 0C      mov         eax,dword ptr [esi+edi*4+0Ch]
00000022 5E               pop         esi 
00000023 5F               pop         edi 
00000024 C2 04 00         ret         4  
Example 6. MainApp.ReturnArg<T>(int, T[]) JIT generated code for reference type specializations
The line at the address 0000001e is the one where the real copying of the data from the array takes place.
What does it look like for an array of bytes? It is interesting to note that if we step into the function specialized for the type byte the debugger knows in which version we are and states it in the address combo inside of the disassembly window. For this function, it recognizes the function as:  ”MainApp.ReturnArg<byte>”. The code that copies the data is bolded.
00000000 57                         push        edi 
00000001 56                         push        esi 
00000002 8B F9                      mov         edi,ecx
00000004 8B F2                      mov         esi,edx
00000006 83 3D 6C 78 91 00 00      cmp         dword ptr ds:[0091786Ch],0
0000000d 74 05                      je          00000014
0000000f E8 CA 22 44 79             call        794422DE
00000014 3B 7E 04                   cmp         edi,dword ptr [esi+4]
00000017 72 05                      jb          0000001E
00000019 E8 76 45 44 79             call        79444594
0000001e 0F B6 44 3E 08             movzx       eax,byte ptr [esi+edi+8]
00000023 5E                         pop         esi 
00000024 5F                         pop         edi 
00000025 C3                         ret
Example 7. MainApp.ReturnArg<byte> JIT generated code
For the char type (which is two bytes long) the function looks like this:

00000000 57                         push        edi 
00000001 56                         push        esi 
00000002 8B F9                      mov         edi,ecx
00000004 8B F2                      mov         esi,edx
00000006 83 3D 6C 78 91 00 00      cmp         dword ptr ds:[0091786Ch],0
0000000d 74 05                      je          00000014
0000000f E8 92 22 44 79             call        794422A6
00000014 3B 7E 04                   cmp         edi,dword ptr [esi+4]
00000017 72 05                      jb          0000001E
00000019 E8 3E 45 44 79             call        7944455C
0000001e 0F B7 44 7E 08             movzx       eax,word ptr [esi+edi*2+8]
00000023 5E                         pop         esi 
00000024 5F                         pop         edi 
00000025 C3                         ret  
Example 8. MainApp.ReturnArg<char> JIT generated code
The code generated for the specializations for other value types differs only in the way the data from the array element is copied.
Following fragments show how it is done for all remaining value type specializations:
0000001e 8B 44 BE 08      mov         eax,dword ptr [esi+edi*4+8] 
Example 9. MainApp.ReturnArg<int> JIT generated code (fragment)
0000001e 8B 44 FE 08      mov         eax,dword ptr [esi+edi*8+8]
00000022 8B 54 FE 0C      mov         edx,dword ptr [esi+edi*8+0Ch]
Example 10. MainApp.ReturnArg<long> JIT generated code (fragment)
0000002f F3 0F 7E 06      movq        xmm0,mmword ptr [esi]
00000033 66 0F D6 07      movq        mmword ptr [edi],xmm0
00000037 F3 0F 7E 46 08   movq        xmm0,mmword ptr [esi+8]
0000003c 66 0F D6 47 08   movq        mmword ptr [edi+8],xmm0
Example 11. MainApp.ReturnArg<decimal> JIT generated code (fragment)
So what we have seen is the way how .NET 2.0 implementation of generics eliminates code bloat through reuse of the code (for the reference type specializations) and provides most efficient code through specializations for value types.
I guess that would be enough for this time. The stories about how generics provides for robustness (through the design time type safety and constraints) or for simplicity of the code (through type inference) will wait for some other time.
Note: this sample was part of the session on generics held at the first Ekobit’s internal conference in September 2003 and later that year at the one of the early meetings of Croatian MS Community. Power Point file in Croatian can be found here.


Post a Comment

Links to this post:

Create a Link

<< Home