A Re-Introduction to C# References

Standard

Reviewing what we need to know pre- and post- C# 7 features about the type system and references in particular, while correcting common misconceptions along the way.

Warm-up Exercise

What would the following code output? Hint: An array is a reference type.

C# Types: Reference, Value and Primitives

Put simply, a type is something that describes what a variable can hold.

Misconception: The new keyword means we are creating a reference type. Wrong! Perhaps this comes from the syntax provided by the primitive type aliases (and also many developers aren’t using structs regularly, so won’t be exposed to using new with them).

Primitive types are those, which the compiler supports directly, along with a range of operators and casting between them. They map to framework types e.g.

int  maps to System.Int32

float  maps to System.Single

The C# compiler will emit identical IL for the following lines.

The latter is the alias, which primitives provide, that masks the use of the new  keyword for those value types.

Below is a quick sketch of the types in C#. The Framework Class Library (FCL) obviously has a lot more that I won’t try to squeeze in.

Value Types

If you edit the copy made of a file, the changes do not affect the original file.

This is how value types are passed around in C# – as copies of the data. Given:

int originalInt = 0;

The value of originalInt is 0, which is the data we are intending to store in this variable.

When we pass originalInt as a value to a method (more on this later), or assign it to a new variable, we are making a copy. Any changes to the value in the copy, do not change the original e.g.

500 was only added to the copy. originalInt  is still 0.

A note on inheritance

Just to confuse, all types, including System.ValueType , inherit from System.Object . Don’t get ideas – this is so we can have them behave like reference types through boxing. Us developers cannot actually inherit from value types in our code.

In summary:

  • The ‘value’ stored in a value type is the actual data.
  • The default behaviour passing it around, is that we are making copies of this value.
  • They support interfaces but there is no inheritance.

Reference Types

We’ll start this one with the analogy of a link to a file:

The ‘link’ from this analogy is a reference in C#.

A reference type still has a value – it’s just that the ‘value’ is a memory location where the actual data is.

By default, we are not directly working with that value. Whenever we access the variable, we are fetching the data stored in the memory location referenced by that value (Mads Torgersen uses the example for those from the pointer world – think of it as automatically dereferencing)

So, when you pass one of these around in code, it is making copies of this reference, not the data. Consider the following code:

We have created a new SimpleObject in memory and stored its memory address / location in the value of original .

When we make a copy of it, we are still copying the value as we do with value types:

But the value being copied is this memory location.

So now copied and original  both reference the same memory location. When we modify the data referenced by copied (the property inside it, Number ), we are also changing original .

Now it gets interesting, and gets us a step closer to understanding the behaviour of the code in the warm-up exercise.

Remember – the ‘value’ stored in a reference type, is the reference to the object’s memory address. Now we create a another SimpleObject , after making a copy, and the new operator returns its memory address, which we store in original .

copied  still points to the object that original used to point to.  Confusing? Let’s return to our analogy:

Tom has changed the copy of the link he has, which doesn’t affect Kate’s copy. So now their links point to different files.

In summary / need to know:

  • The ‘value’ in a reference type a memory location where the actual data is
  • Whenever we access a reference type variable, we are fetching the data stored in the memory location it has as its value
  • The default behaviour when we pass it around is that we are copying just this reference value. Changes to the data inside the object are visible by both the original and any copies. Changes to the actual memory location value are not shared between copies of that value.

Passing to a Method by Value

The default behaviour is passing by value without extra keywords looks like so:

private static void One(int[] intArray)

You’re probably doing this 99% of the time without a thought.

Nothing new to learn, so need for any code samples. This will exhibit all the behaviour already covered above:

  • a value type will pass a copy of its value and changes to that copy won’t be seen by the caller
  • a reference type will pass a reference to the object as its value and changes to the object will be seen by the caller; but if it is assigned to a brand-new object inside the method, the caller will not see this change

Passing to a Method by Reference

We have ref , out  and with C# 7, in  keywords for passing by reference.

Let’s just look at ref while we get our heads round what passing by reference means.

Note that the compiler will insist that the keyword appears in the call and the method, so our intention is clear both ends:

Behaviour with Value types

If you pass a value type by reference, there is no copying and the called method is able to make changes to the data, which will be visible to the caller.

Misconception: passing a value type by reference causes boxing to occur. Wrong! Boxing occurs when you convert a value type to a reference type. Passing a value type by reference simply creates an alias, meaning we’d have two variables representing the same memory location.

Behaviour with Reference types

I was cheeky in the warm-up test – I passed a reference type, by reference, which is not a common thing to do.

Misconception: passing a reference type is the same as passing by reference. Wrong! This is easier to explain by trying to do both at the same time, and to observe how it differs from passing a reference type as a value.

Back to the file link analogy to look at what happens when we pass a reference type by reference to a method:

Instead of passing Tom and Kate copies of my link, I gave them access to the link itself. So as before, they both see changes to the file; but now also, if one of them changes the link to a new file, they both see that too.

So, using the ref keyword is kind of telling the compiler not to dereference / fetch the data from the memory location, but instead, pass the address itself, analogous to creating an alias.

We can see in the IL emitted for the code above, that the opcode stind is used to store the result back in the address of the int32  passed by address (note the &).

In summary / need to know:

  • The ref modifier allows a value to be passed in and modified – the caller sees the changes.
  • The ’value’ when used with reference types is the actual memory location hence it can change where the caller’s variable points in memory.

When Reference Types Meet Ref Locals

In C# 7 we got ref locals. They were introduced along side ref returns to support the push for more efficient, safe code.

I want to use them with reference types to give us a second chance to appreciate what happens when we pass a reference type around by reference.

A complete code example:

Notice how the original is replaced by the copy now. In the IL we can see that ref locals utilise ldloca (for value and reference types) – we are copying the actual address where the value is (remember that the value in a reference type is a memory address were the object is).

By using ref , we are essentially making an alias to this value containing the address – any changes to either, including pointing the reference to a new object, will affect both.

Ref returns

Just imagine I have an array of large structs and not the int I have used below.

I can now return a reference directly to an element in an int  array without any copying.

The gotcha with return ref  is scope. Glance ahead and you’ll see I briefly cover the stack and stack frames, if you struggle with this bit. Ultimately, when a method returns you’ll lose anything on the stack and lose references to anything on the heap (and GC will claim it). With this in mind you can only ref return something visible in the scope of the caller. You can see above I am returning a reference to an index in the array held at the call site.

Ref locals & returns – useful for reference types?

We could start using ref  returns and ref  locals, but expect limited use cases if you work higher up the stack. Many libraries we use have already or will be utilising these and the new Span<T > work, so it is useful to understand how they play.

The real value is to avoid copying around large value types – they complement the existing feature to pass by reference, adding the (missing) reference-like behaviour we already get with reference types.

For reference types, as with passing to method by ref, you’re giving a caller access to the actual memory location and letting them change it. If anyone has come across some real-world scenarios please share so I can add it here.

Where do the Stack, Heap and Registers fit in all this?

Misconception: value types are always allocated on the stack. Wrong! If we’re going to get into discussions about where allocations occur, then it would be more correct to state that the intention is more like:

  • short-lived objects to be allocated in registers or on the stack (which is going to be any time they are declared inside a method)
  • and long-lived objects to be allocated on the heap.

EDIT: Eric Lippert suggests we should be thinking in terms of a ‘short term allocation pool and long term allocation pool … regardless of whether that variable contains an int or a reference to an object’.


Mostly, we shouldn’t be concerning ourselves with how any particular JIT allocates and we should make sure we know the differences in how the two types are passed around. That said, the .NET roadmap has been focused on ‘inefficiencies … directly tied to your monthly billing’, and delivered Span<T>  and ref struct , which are stack-only.

For interest, here’s a few scenarios where we can expect a value type will to be heap allocated:

  • Declared in a field of a class
  • In an array
  • Boxed
  • Static
  • Local in a yield return block
  • Inside lambda / anon methods

What does it even mean to allocate on the stack (or the heap)?

This stack thing… it is actually that same call stack, made up of frames, which is responsible for the execution of your code. I’m not going to teach you about what a stack data structure is now.

A stack frame represents a method call, which includes any local variables. Those variables store the values for value or reference types we have already thoroughly discussed above.

A frame only exists during the lifetime of a method; so any variables in the frame also only exist until the method returns.

A big difference between stack and heap is that an object on the heap can live on after we exit the function, if there is a reference to it from elsewhere. So, given that passing references to objects around can potentially keep them alive forever, we can safely say that all reference types can be considered long-term and the objects/data will be allocated on the heap.

Misconception: The integers in an array of integers int[]  will be allocated to the stack. Wrong. Value types are embedded in their container so they would be stored with the array on the heap.

Enforcing Immutability, Now That We’re Passing More References

Out and ref produce almost identical IL with the only difference being, the compiler enforces correct code who is responsible for initialising the object being referred to:

  • Out  – caller does not have to initialise the value. If they do it is discarded on calling the method. The called method must write to it.
  • Ref  – caller must initialise the value

Great for avoiding copying value types but how do prevent the method being called from making unwanted modifications? C# 7 introduced the in  modifier. It got the name by being the opposite of out (because it makes the reference (alias) read only; and the caller does have to initialise the value).

The equivalent for the other direction i.e. return ref , is the new modifier: ref readonly .

Here’s the immutable array example from the readonly ref proposal:

Now we can still get a reference to an array element without copying, but without the dangers of full access to the location:

Briefly on Boxing

You can convert from value to reference type and back again. It can be implicit or explicit and is commonly seen when passing a value type to a method that takes object types:

And unboxing:

An interesting case of implicit boxing is when working with structs that implement interfaces. Remember, an interface is a reference type.

This will cause a boxing to occur.

Misconception: when a value type is boxed, changes to the boxed reference affect the value type itself. Wrong! You’d be thinking of when we create an alias with ref local or passing by reference. Changes to the boxed copy on the heap have no effect on the value type instance and vice versa.

When the C# compiler spots any implicit or explicit boxing it will emit specific IL:

IL_007c: box

When the JIT compiler sees this instruction, it will allocate heap storage and wrap the value type contents up in a ‘box’, which points to the object on the heap.

If you are careful, boxing is not going to hurt performance. Problems arise when they are occurring within iterations over large data sets. There is both additional CPU time for the boxing itself, followed by the additional pressure on the garbage collector.

Misconception: in the warm-up exercise, the array goes on the heap and so do the int objects in it. Therefore, the int objects have to be boxed. Wrong!

Remember we rebuffed the misconception that ALL value types go on the stack. With that mind, it doesn’t mean int objects ending up on the heap are boxed. Take the code:

If this were inside a method, a new array object would be allocated to the heap with a reference to it stored on the stack. The int objects 10 and 20 would be allocated to the heap also with the array.

Warm-up answer

30, 20
10, 20
60, 70

Summary.

  • The ‘value’ in a value type is the actual data.
  • The default behaviour when we pass a value type around is that we are copying the actual value.
  • The ‘value’ held in a reference type, is the reference to a location in memory where the data is.
  • Whenever we access a reference type variable, we are fetching the data stored in the memory location it has as its value
  • The default behaviour when we pass a reference type around is that we are copying just this reference value. Changes to the data inside the object are visible by both the original and any copies. Changes to the actual memory location value are not shared between copies of that value.
  • The ref modifier allows a value to be passed in and modified – the caller sees the changes. The ‘value’ when used with reference types is the actual memory location, hence it can change where the caller’s variable points in memory.
  • Amongst other things beyond article, C# 7 introduced a way to return by ref . It also gave us the readonly  keyword and in modifier to help enforce immutability.

Some homework because I ran out of space:

  • Doing reference and value type quality right
  • When to use structs vs classes
  • How string differs
  • Extension method refs
  • Readonly structs
  • Nullable value types and look forward to nullable reference types

Sources

Who knows? I play with the internals a lot and read a great deal, so can’t be sure where it all comes from. It’s just in my head now. Probably:

  • Any of the Mads or Skeet talks I’ve watched
  • The writings of by Eric Lippert
  • Writing High Performance .NET Code by Ben Watson
  • CLR Via C# by Jeffrey Richter
  • Pro .NET Performance by Sasha Goldshetin
  • Probably loads from MS blogs and MS repositories at github.com

C# Debug vs. Release builds and debugging in Visual Studio – from novice to expert in one blog article

Standard

Super happy to have won First Prize @ Codeproject for this article.

Repository for my PowerShell script to inspect the DebuggableAttribute of assemblies.

Introduction

‘Out of the box’ the C# build configurations are Debug and Release.

I planned to write a write an introductory article but as I delved deeper into internals I started exploring actual behaviour with Roslyn vs. previous commentary / what the documentation states. So, while I do start with the basics, I hope there is something for more experienced C# developers too.

Disclaimer: Details will vary slightly for .NET languages other than C#.

A reminder of C# compilation

C# source code passes through 2 compilation steps to become CPU instructions that can be executed.

Diagram showing the 2 steps of compilation in the C# .NET ecosytem

As part of your continuous integration, step 1 would take place on the build server and then step 2 would happen later, whenever the application is being run. When working locally in Visual Studio, both steps, for your convenience, fire off the back of starting the application from the Debug menu.

Compilation step 1: The application is built by the C# compiler. Your code is turned into Common Intermediate Language (CIL), which can be executed in any environment that supports CIL (which from now on I will refer to as IL). Note that the assembly produced is not readable IL text but actually metadata and byte code as binary data (tools are available to view the IL in a text format).

Some code optimisation will be carried out (more on this further on).

Compilation  step 2:  The Just-in-time (JIT) compiler will convert the IL into instructions that the CPU on your machine can execute. This won’t all happen upfront though – in the normal mode of operation, methods are compiled at the time of calling, then cached for later use.

The JIT compiler is just one of a whole bunch of services that make up the Common Language Runtime (CLR), enabling it to execute .NET code.

The bulk of code optimisation will be carried out here (more on this further on).

What is compiler optimisation (in one sentence)?

It is the process of improving factors such as execution speed, size of the code, power usage and in the case of .NET, the time it takes to JIT compiler the code – all without altering the functionality, aka original intent of the programmer.

Why are we concerned with optimisation in this article?

I’ve stated that compilers at both steps will optimise your code. One of the key differences between the Debug and Release build configurations is whether the optimsations are disabled or not, so you do need to understand the implications of optimisation.

C# compiler optimisation

The C# compiler does not do a lot of optimisation. It relies ‘…upon the jitter to do the heavy lifting of optimizations when it generates the real machine code. ‘  (Eric Lippert). It will nonetheless still degrade the debugging experience.  You don’t need in-depth knowledge of C# optimisations to follow this article, but I’ll look at one to illustrate the effect on debugging:

The IL nop instruction (no operation)

The nop instruction has a number of uses in low level programming, such as including small, predictable delays or overwriting instructions you wish to remove. In IL, it is used to help breakpoints set in the your source code behave predictably when debugging.

If we look at the IL generated for a build with optimisations disabled:

nop instruction

This nop instruction directly maps to a curly bracket and allows us to add a breakpoint on it:

curly bracket associated with nop instruction

This would be optimised out of IL generated by the C# compiler if optimisations were enabled, with clear implications for your debugging experience.

For a more detailed discussion on C# compiler optimisations see Eric Lippert’s article: What does the optimize switch do?. There is also a good commentary of IL before and after being optimised here.

The JIT compiler optimisations

Despite having to perform its job swiftly at runtime, the JIT compiler performs a lot of optimisations. There’s not much info on its internals and it is a non-deterministic beast (like Forrest Gump’s box of chocolates) – varying in the native code it produces depending on a many factors. Even while your application is running it is profiling and possibly re-compiling code to improve performance. For a good set of examples of optimisations made by the JIT compiler checkout Sasha Goldshtein’s article.

I will just look at one example to illustrate the effect of optimisation on your debugging experience:

Method inlining

For the real-life optimisation made by the JIT compiler, I’d be showing you assembly instructions. This is just a mock-up in C# to give you the general idea:

Suppose I have:

The JIT compiler would likely perform an inline expansion on this, replacing the call to Add()   with the body of Add()  :

Clearly, trying to step through lines of code that have been moved is going to be difficult and you’ll also have a diminished stack trace.

The default build configurations

So now that you’ve refreshed your understanding of .NET compilation and the two ‘layers’ of optimisation, let’s take a look at the 2 build configurations available ‘out of the box’:

Visual Studio release and debug configurations

Pretty straightforward – Release is fully optimised, the Debug is not at all, which as you are now aware, is fundamental to how easy it is to debug your code. But this is just a superficial view of the possibilities with the debug and optimize arguments.

The optimize and debug arguments in depth

I’ve attempted to diagram these from the Roslyn and mscorlib code, including: CSharpCommandLineParser.cs, CodeGenerator.cs, ILEmitStyle.csdebuggerattributes.cs, Optimizer.cs and OptimizationLevel.cs. Blue parallelograms represent command line arguments and the greens are the resulting values in the codebase.

Diagram of optimize and debug command line arguments and their related settings in code

The OptimizationLevel enumeration

OptimizationLevel.Debug disables all optimizations by the C# compiler and disables JIT optimisations via DebuggableAttribute.DebuggingModes  , which with the help of ildasm, we can see is:

Manifest debuggable attribute

Given this is Little Endian Byte order, it reads as 0x107, which is 263, equating to: Default , DisableOptimizations , IgnoreSymbolStoreSequencePoints  and EnableEditAndContinue, (see debuggerattributes.cs.

OptimizationLevel.Release enables all optimizations by the C# compiler and enables JIT optimizations via DebuggableAttribute.DebuggingModes = ( 01 00 02 00 00 00 00 00 ) , which is just DebuggingModes.IgnoreSymbolStoreSequencePoints .

With this level of optimization, ‘sequence points may be optimized away. As a result it might not be possible to place or hit a breakpoint.’ Also, ‘user-defined locals might be optimized away. They might not be available while debugging.’ (OptimizationLevel.cs).

IL type explained

The type of IL is defined by the following enumeration from ILEmitStyle.cs.

As in the diagram above, the type of IL produced by the C# compiler is determined by the OptimizationLevel ; the debug argument won’t change this, with the exception of debug+ when the OptimizationLevel is Release i.e. in all but the case of debug+, optimize is the only argument that has any impact on optimisation – a departure from pre-Roslyn*.

* In Jeffry Richter’s CLR Via C# (2014), he states that optimize- with debug- results in the C# compiler not optimising IL and the JIT compiler optimising to native.

ILEmitStyle.Debug – no optimization of IL in addition to adding nop instructions in order to map sequence points to IL

ILEmitStyle.Release – do all optimizations

ILEmitStyle.DebugFriendlyRelease – only perform optimizations on the IL that do not degrade debugging. This is the interesting one. It comes off the back of a debug+ and only has an effect on optimized builds i.e. those with OptimizationLevel.Release. For optimize- builds debug+ behaves as debug.

The logic in (CodeGenerator.cs) describes it more clearly than I can:

The comment in the source file Optimizer.cs states that, they do not omit any user defined locals and do not carry values on the stack between statements. I’m glad I read this, as I was a bit disappointed with my own experiments in ildasm with debug+, as all I had been seeing was the retention of local variables and a lot more pushing and popping to and from the stack!

There is no intentional ‘deoptimizing’ such as adding nop instructions.

There’s no obvious direct way to chose this debug flag from within Visual Studio for C# projects? Is anyone making use of this in their production builds?

No difference between debug, debug:full and debug:pdbonly?

Correct – despite the current documentation and the help stating otherwise:

csc command line help

They all achieve the same result – a .pdb file is created. A peek at CSharpCommandLineParser.cs  can confirm this. And for good measure I did check I could attach and debug with WinDbg for both the pdbonly and full values.

They have no impact on code optimisation.

On the plus side, the documentation on Github is more accurate, although I’d say, still not very clear on the special behaviour of debug+.

I’m new.. what’s a .pdb? Put simply, a .pdb file stores debugging information about your DLL or EXE, which will help a debugger map the IL instructions to the original C# code.

What about debug+?

debug+ is its own thing and cannot be suffixed by either full or pdbonly. Some commentators suggest it is the same thing as debug:full, which is not exactly true as stated above – used with optimize- it is indeed the same, but when used with optimize+ it has its own unique behaviour, discussed above under DebugFriendlyRelease .

And debug- or no debug argument at all?

The defaults in CSharpCommandLineParser.cs are:

The values for debug- are:

So we can confidently say debug- and no debug argument result in the same  single effect – no .pdb file is created.

They have no impact on code optimisation.

Suppress JIT optimizations on module load

A checkbox under Options->Debugging->General; this is an option on the debugger in Visual Studio and is not going to affect the assemblies you build.

You should now appreciate that the JIT compiler does most of the significant optimisations and is the bigger hurdle to mapping back to the original source code for debugging. With this enabled, the debugger will request that DisableOptimizations  is ignored by the JIT compiler.

Until circa 2015 the default was enabled. I earlier cited CLR via C#, in that pre-Roslyn we could supply optimise- and debug- arguments to csc.exe and get unoptimised C# that was then optimised by the JIT compiler – so there would have been some use for suppressing the JIT optimisations in the Visual Studio debugger. However, now that anything being JIT optimised is already degrading the debugging experience via C# optimisations, Microsoft decided to default to disabled on the assumption that if you are running the Release build inside Visual Studio, you probably wish to see the behaviour of an optimised build at the expense of debugging.

Typically you only need to switch it on if you need to debug into DLLs from external sources such as NuGet packages.

If you’re trying to attach from Visual Studio to a Release build running in production (with a .pdb or other source for symbols) then an alternative way to instruct the JIT compiler not to optmiize is to add a .ini file with the same name as your executable along side it with the following:

Just My Code.. What?

By default, Options->Debugging→Enable Just My Code is enabled and the debugger considers optimised code to be non-user. The debugger is never even going to attempt non-user code with this enabled.

You could uncheck this option, and then theoretically you can hit breakpoints. But now you are debugging code optimised by both the C# and JIT compilers that barely matches your original source code, with a super-degraded experience – stepping through code will be unpredictable you will probably not be able to obtain the values in local variables.

You should only really be changing this option if working with DLLs from others where you have the .pdb file.

A closer look at DebuggableAttribute

Above, I mentioned using ildasm to examine the manifest of assemblies to examine DebuggableAttribute . I’ve also written a little PowerShell script to produce a friendlier result (available via download link at the start of the article).

Debug build:

Release build:

You can ignore IsJITTrackingEnabled, as it is has been ignored by the JIT compiler since .NET 2.0. The JIT compiler will always generate tracking information during debugging to match up IL with its machine code and track where local variables and function arguments are stored (Source).

IsJITOptimizerDisabled simply checks DebuggingFlags for DebuggingModes.DisableOptimizations. This is the one that turns on optimisation by the JIT compiler.

DebuggingModes.IgnoreSymbolStoreSequencePoints tells the debugger to work out the sequence points from the IL instead of loading the .pdb file, which would have performance implications. Sequence points are used to map locations in the IL code to locations in your C# source code. The JIT compiler will not compile any 2 sequence points into a single native instruction. With this flag, the JIT will not load the .pdb file. I’m not sure why this flag is being added to optimised builds by the C# compiler – any thoughts?

Key points

  • debug- (or no debug argument at all) now means: do not create a .pdb file.
  • debug, debug:full and debug:pdbonly all now causes a .pdb file to be output. debug+ will also do the same thing if used alongside optimize-.
  • debug+ is special when used alongside optimize+, creating IL that is easier to debug.
  • each ‘layer’ of optimisation (C# compiler, then JIT) further degrades your debugging experience. You will now get both ‘layers’ for optimize+ and neither of them for optimize-.
  • since .NET 2.0 the JIT compiler will always generate tracking information regardless of the attribute IsJITTrackingEnabled
  • whether building via VS or csc.exe, the DebuggableAttribute is now always present
  • the JIT can be told to ignore IsJITOptimizerDisabled during Visual Studio debugging via the general debugging option, Suppress JIT optimizations on module load. It can also be instructed to do so via a .ini file
  • optimised+ will create binaries that the debugger considers non-user code. You can disable the option Just My Code, but expect a severely degraded debugging experience,

You have a choice of:

  • Debug: debug|debug:full|debug:pdbonly optimize+
  • Release: debug-|no debug argument optimize+
  • DebugFriendlyRelease: debug+ optimize+

However, DebugFriendlyRelease is only possible by calling Roslyn csc.exe directly. I would be interested to hear from anyone that has been using this.