A Re-Introduction to C# References

Standard

Reviewing what we need to know pre- and post- C# 7 features about the type system and references in particular, while correcting common misconceptions along the way.

Warm-up Exercise

What would the following code output? Hint: An array is a reference type.

C# Types: Reference, Value and Primitives

Put simply, a type is something that describes what a variable can hold.

Misconception: The new keyword means we are creating a reference type. Wrong! Perhaps this comes from the syntax provided by the primitive type aliases (and also many developers aren’t using structs regularly, so won’t be exposed to using new with them).

Primitive types are those, which the compiler supports directly, along with a range of operators and casting between them. They map to framework types e.g.

int  maps to System.Int32

float  maps to System.Single

The C# compiler will emit identical IL for the following lines.

The latter is the alias, which primitives provide, that masks the use of the new  keyword for those value types.

Below is a quick sketch of the types in C#. The Framework Class Library (FCL) obviously has a lot more that I won’t try to squeeze in.

Value Types

If you edit the copy made of a file, the changes do not affect the original file.

This is how value types are passed around in C# – as copies of the data. Given:

int originalInt = 0;

The value of originalInt is 0, which is the data we are intending to store in this variable.

When we pass originalInt as a value to a method (more on this later), or assign it to a new variable, we are making a copy. Any changes to the value in the copy, do not change the original e.g.

500 was only added to the copy. originalInt  is still 0.

A note on inheritance

Just to confuse, all types, including System.ValueType , inherit from System.Object . Don’t get ideas – this is so we can have them behave like reference types through boxing. Us developers cannot actually inherit from value types in our code.

In summary:

  • The ‘value’ stored in a value type is the actual data.
  • The default behaviour passing it around, is that we are making copies of this value.
  • They support interfaces but there is no inheritance.

Reference Types

We’ll start this one with the analogy of a link to a file:

The ‘link’ from this analogy is a reference in C#.

A reference type still has a value – it’s just that the ‘value’ is a memory location where the actual data is.

By default, we are not directly working with that value. Whenever we access the variable, we are fetching the data stored in the memory location referenced by that value (Mads Torgersen uses the example for those from the pointer world – think of it as automatically dereferencing)

So, when you pass one of these around in code, it is making copies of this reference, not the data. Consider the following code:

We have created a new SimpleObject in memory and stored its memory address / location in the value of original .

When we make a copy of it, we are still copying the value as we do with value types:

But the value being copied is this memory location.

So now copied and original  both reference the same memory location. When we modify the data referenced by copied (the property inside it, Number ), we are also changing original .

Now it gets interesting, and gets us a step closer to understanding the behaviour of the code in the warm-up exercise.

Remember – the ‘value’ stored in a reference type, is the reference to the object’s memory address. Now we create a another SimpleObject , after making a copy, and the new operator returns its memory address, which we store in original .

copied  still points to the object that original used to point to.  Confusing? Let’s return to our analogy:

Tom has changed the copy of the link he has, which doesn’t affect Kate’s copy. So now their links point to different files.

In summary / need to know:

  • The ‘value’ in a reference type a memory location where the actual data is
  • Whenever we access a reference type variable, we are fetching the data stored in the memory location it has as its value
  • The default behaviour when we pass it around is that we are copying just this reference value. Changes to the data inside the object are visible by both the original and any copies. Changes to the actual memory location value are not shared between copies of that value.

Passing to a Method by Value

The default behaviour is passing by value without extra keywords looks like so:

private static void One(int[] intArray)

You’re probably doing this 99% of the time without a thought.

Nothing new to learn, so need for any code samples. This will exhibit all the behaviour already covered above:

  • a value type will pass a copy of its value and changes to that copy won’t be seen by the caller
  • a reference type will pass a reference to the object as its value and changes to the object will be seen by the caller; but if it is assigned to a brand-new object inside the method, the caller will not see this change

Passing to a Method by Reference

We have ref , out  and with C# 7, in  keywords for passing by reference.

Let’s just look at ref while we get our heads round what passing by reference means.

Note that the compiler will insist that the keyword appears in the call and the method, so our intention is clear both ends:

Behaviour with Value types

If you pass a value type by reference, there is no copying and the called method is able to make changes to the data, which will be visible to the caller.

Misconception: passing a value type by reference causes boxing to occur. Wrong! Boxing occurs when you convert a value type to a reference type. Passing a value type by reference simply creates an alias, meaning we’d have two variables representing the same memory location.

Behaviour with Reference types

I was cheeky in the warm-up test – I passed a reference type, by reference, which is not a common thing to do.

Misconception: passing a reference type is the same as passing by reference. Wrong! This is easier to explain by trying to do both at the same time, and to observe how it differs from passing a reference type as a value.

Back to the file link analogy to look at what happens when we pass a reference type by reference to a method:

Instead of passing Tom and Kate copies of my link, I gave them access to the link itself. So as before, they both see changes to the file; but now also, if one of them changes the link to a new file, they both see that too.

So, using the ref keyword is kind of telling the compiler not to dereference / fetch the data from the memory location, but instead, pass the address itself, analogous to creating an alias.

We can see in the IL emitted for the code above, that the opcode stind is used to store the result back in the address of the int32  passed by address (note the &).

In summary / need to know:

  • The ref modifier allows a value to be passed in and modified – the caller sees the changes.
  • The ’value’ when used with reference types is the actual memory location hence it can change where the caller’s variable points in memory.

When Reference Types Meet Ref Locals

In C# 7 we got ref locals. They were introduced along side ref returns to support the push for more efficient, safe code.

I want to use them with reference types to give us a second chance to appreciate what happens when we pass a reference type around by reference.

A complete code example:

Notice how the original is replaced by the copy now. In the IL we can see that ref locals utilise ldloca (for value and reference types) – we are copying the actual address where the value is (remember that the value in a reference type is a memory address were the object is).

By using ref , we are essentially making an alias to this value containing the address – any changes to either, including pointing the reference to a new object, will affect both.

Ref returns

Just imagine I have an array of large structs and not the int I have used below.

I can now return a reference directly to an element in an int  array without any copying.

The gotcha with return ref  is scope. Glance ahead and you’ll see I briefly cover the stack and stack frames, if you struggle with this bit. Ultimately, when a method returns you’ll lose anything on the stack and lose references to anything on the heap (and GC will claim it). With this in mind you can only ref return something visible in the scope of the caller. You can see above I am returning a reference to an index in the array held at the call site.

Ref locals & returns – useful for reference types?

The real value is to avoid copying around large value types – they complement the existing feature to pass by reference, adding the (missing) reference-like behaviour we already get with reference types.

We could start using ref  returns and ref  locals, but expect limited use cases if you work higher up the stack. Many libraries we use have already or will be utilising these and the new Span<T > work, so it is useful to understand how they play.

For reference types, as with passing to method by ref, you’re giving a caller access to the actual memory location and letting them change it. If anyone has come across some real-world scenarios please share so I can add it here.

Where do the Stack, Heap and Registers fit in all this?

Misconception: value types are always allocated on the stack. Wrong! If we’re going to get into discussions about where allocations occur, then it would be more correct to state that the intention is more like:

  • short-lived objects to be allocated in registers or on the stack (which is going to be any time they are declared inside a method)
  • and long-lived objects to be allocated on the heap.

EDIT: Eric Lippert suggests we should be thinking in terms of a ‘short term allocation pool and long term allocation pool … regardless of whether that variable contains an int or a reference to an object’.


Mostly, we shouldn’t be concerning ourselves with how any particular JIT allocates and we should make sure we know the differences in how the two types are passed around. That said, the .NET roadmap has been focused on ‘inefficiencies … directly tied to your monthly billing’, and delivered Span<T>  and ref struct , which are stack-only.

For interest, here’s a few scenarios where we can expect a value type will to be heap allocated:

  • Declared in a field of a class
  • In an array
  • Boxed
  • Static
  • Local in a yield return block
  • Inside lambda / anon methods

What does it even mean to allocate on the stack (or the heap)?

This stack thing… it is actually that same call stack, made up of frames, which is responsible for the execution of your code. I’m not going to teach you about what a stack data structure is now.

A stack frame represents a method call, which includes any local variables. Those variables store the values for value or reference types we have already thoroughly discussed above.

A frame only exists during the lifetime of a method; so any variables in the frame also only exist until the method returns.

A big difference between stack and heap is that an object on the heap can live on after we exit the function, if there is a reference to it from elsewhere. So, given that passing references to objects around can potentially keep them alive forever, we can safely say that all reference types can be considered long-term and the objects/data will be allocated on the heap.

Misconception: The integers in an array of integers int[]  will be allocated to the stack. Wrong. Value types are embedded in their container so they would be stored with the array on the heap.

Enforcing Immutability, Now That We’re Passing More References

Out and ref produce almost identical IL with the only difference being, the compiler enforces correct code who is responsible for initialising the object being referred to:

  • Out  – caller does not have to initialise the value. If they do it is discarded on calling the method. The called method must write to it.
  • Ref  – caller must initialise the value

Great for avoiding copying value types but how do prevent the method being called from making unwanted modifications? C# 7 introduced the in  modifier. It got the name by being the opposite of out (because it makes the reference (alias) read only; and the caller does have to initialise the value).

The equivalent for the other direction i.e. return ref , is the new modifier: ref readonly .

Here’s the immutable array example from the readonly ref proposal:

Now we can still get a reference to an array element without copying, but without the dangers of full access to the location:

Briefly on Boxing

You can convert from value to reference type and back again. It can be implicit or explicit and is commonly seen when passing a value type to a method that takes object types:

And unboxing:

An interesting case of implicit boxing is when working with structs that implement interfaces. Remember, an interface is a reference type.

This will cause a boxing to occur.

Misconception: when a value type is boxed, changes to the boxed reference affect the value type itself. Wrong! You’d be thinking of when we create an alias with ref local or passing by reference. Changes to the boxed copy on the heap have no effect on the value type instance and vice versa.

When the C# compiler spots any implicit or explicit boxing it will emit specific IL:

IL_007c: box

When the JIT compiler sees this instruction, it will allocate heap storage and wrap the value type contents up in a ‘box’, which points to the object on the heap.

If you are careful, boxing is not going to hurt performance. Problems arise when they are occurring within iterations over large data sets. There is both additional CPU time for the boxing itself, followed by the additional pressure on the garbage collector.

Misconception: in the warm-up exercise, the array goes on the heap and so do the int objects in it. Therefore, the int objects have to be boxed. Wrong!

Remember we rebuffed the misconception that ALL value types go on the stack. With that mind, it doesn’t mean int objects ending up on the heap are boxed. Take the code:

If this were inside a method, a new array object would be allocated to the heap with a reference to it stored on the stack. The int objects 10 and 20 would be allocated to the heap also with the array.

Warm-up answer

30, 20
10, 20
60, 70

Summary.

  • The ‘value’ in a value type is the actual data.
  • The default behaviour when we pass a value type around is that we are copying the actual value.
  • The ‘value’ held in a reference type, is the reference to a location in memory where the data is.
  • Whenever we access a reference type variable, we are fetching the data stored in the memory location it has as its value
  • The default behaviour when we pass a reference type around is that we are copying just this reference value. Changes to the data inside the object are visible by both the original and any copies. Changes to the actual memory location value are not shared between copies of that value.
  • The ref modifier allows a value to be passed in and modified – the caller sees the changes. The ‘value’ when used with reference types is the actual memory location, hence it can change where the caller’s variable points in memory.
  • Amongst other things beyond article, C# 7 introduced a way to return by ref . It also gave us the readonly  keyword and in modifier to help enforce immutability.

Some homework because I ran out of space:

  • Doing reference and value type quality right
  • When to use structs vs classes
  • How string differs
  • Extension method refs
  • Readonly structs
  • Nullable value types and look forward to nullable reference types

Sources

Who knows? I play with the internals a lot and read a great deal, so can’t be sure where it all comes from. It’s just in my head now. Probably:

  • Any of the Mads or Skeet talks I’ve watched
  • The writings of by Eric Lippert
  • Writing High Performance .NET Code by Ben Watson
  • CLR Via C# by Jeffrey Richter
  • Pro .NET Performance by Sasha Goldshetin
  • Probably loads from MS blogs and MS repositories at github.com

.NET Performance Tip – Know Your Garbage Collection Options

Standard

Introduction

An under-utilised setting that can offer substantial performance gains.

Workstation GC – is what you’ll be getting by default with a .NET application and might be unaware of another option. It uses smaller segments, which means more frequent collections, which in turn are short, thus minimising application thread suspensions. When used with concurrent GC, it is best suited for desktop / GUI applications. With concurrent disabled (all threads will suspend for GC), it uses a little less memory and is best suited for lightweight services on single-core machines, processing intermittently (appropriate use cases are few and far between).

There Is Another Option!

Server GC– the one you should try – if you have multiple processors dedicated to just your application, this can really speed up GC, and often allocations too. GCs happen in parallel on dedicated threads (one for each processor/core), facilitated by a heap per processor. Segments are larger, favouring throughput and resulting in less frequent, but longer GCs. This does mean higher memory consumption.

I mentioned a concurrent GC setting above (since .NET4, this is called background GC). From .NET4.5, it is enabled by default in both Server and Workstation GC. I don’t expect you’ll ever change it but good to know what it brings to the table – with it enabled, the GC marks (finds unreachable objects) concurrently using a background thread. It does mean an additional thread for every logical processor in Server GC mode but they are lower priority and won’t compete with foreground threads.

Add this to your app.config for Server GC:

If you have good reason, you can disable background GC:

Disclaimer: As with all performance work, measure impact before and after to confirm it’s the right choice for your application!

.NET String Interning to Improve String Comparison Performance (C# examples)

Standard

Introduction

String comparisons must be one of the most common things we do in C#; and string comparisons can be really expensive! So its worthwhile knowing the what, when and why to improving string comparison performance.

In this article I will explore one way – string interning.

What is string interning?

String interning refers to having a single copy of each unique string in an string intern pool, which is via a hash table in the.NET common language runtime (CLR). Where the key is a hash of the string and the value is a reference to the actual String object.

So if I have the same string occurring 100 times, interning will ensure only one occurrence of that string is actually allocated any memory. Also, when I wish to compare strings, if they are interned, then I just need to do a reference comparison.

String interning mechanics

In this example, I explicitly intern string literals just for demonstration purposes.

Line 1:

  • This new “stringy” is hashed and the hash is looked up in our pool (of course its not there yet)
  • A copy of the “stringy” object would be made
  • A new entry would be made to the interning hash table, with the key being the hash of “stringy” and the value being a reference to the copy of “stringy”
  • Assuming application no longer references original “stringy”, the GC can reclaim that memory

Line 2: This new “stringy” is hashed and the hash is looked up in our pool (which we just put there). The reference to the “stringy” copy is returned
Line 3: Same as line 2

Interning depends on string immutability

Take a classic example of string immutability:

We know that when line 2 is executed, the “stringy” object is de-referenced and left for garbage collection; and s1 then points to a new String object “stringy new string”.

String interning works because the CLR knows, categorically, that the String object referenced cannot change. Here I’ve added a fourth line to the earlier example:

Line 4: s1 doesn’t change because it is immutable; it now points to a new String object ” stringy new string”.
s2 and s3 will still safely reference the copy that was made at line 1

Using Interning in the .NET CLR

You’ve already seen a bit in the examples above. .NET offers two static string methods:

Intern(String str)

It hashes string str and checks the intern pool hash table and either:

  • returns a reference to the (already) interned string, if interned; or
  • a references to str is added to the intern pool and this reference is returned

IsInterned(String str)

It hashes string str and checks the intern pool hash table. Rather than a standard bool, it returns either:

  • null, if not interned
  • a reference to the (already) interned string, if interned

String literals (not generated in code) are automatically interned, although CLR versions have varied in this behaviour, so if you expect strings interned, it is best to always do it explicitly in your code.

My simple test: Setup

I’ve run some timed tests on my aging i5 Windows 10 PC at home to provide some numbers to help explore the potential gains from string interning. I used the following test setup:

  • Strings randomly generated
  • All string comparisons are ordinal
  • Strings are all the same length of 512 characters as I want the CLR will compare every character to force the worst-case (the CLR checks string length first for ordinal comparisons)
  • The string added last (so to the end of the List<T>) is also stored as a ‘known’ search term. This is because I am only exploring the worst-case approach
  • For the List<T> interned, every string added to the list, and also the search term, were wrapped in the string.Intern(String str) method

I timed how long populating each collection took and then how long searching for the known search term took also, to the nearest millisecond.

The collections/approaches used for my tests:

  • List<T> with no interning used, searched via a foreach loop and string.Equals on each element
  • List<T> with no interning used, searched via the Contains method
  • List<T> with interning used, searched via a foreach loop and object.ReferenceEquals
  • HashSet<T>, searched via the Contains method

The main objective is to observe string search performance gains from using string interning with List<T> HashSet is just included as a baseline for known fast searches.

My simple test: Results

In Figure 1 below, I have plotted the size of collections in number of strings added, against the time taken to add that number of randomly generated strings. Clearly there is no significant difference in this operation, when using a HashSet<t> compared to a List<T> without interning. Perhaps if had run to larger sets the gap would widen further based on trend?

There is slightly more overhead when populating the List<T> with string interning, which is initially of no consequence but is growing faster than the other options.

Figure 1: Populating List<T> and HashSet<T> collections with random strings

Figure 2, shows the times for searching for the known string. All the times are pretty small for these collection sizes but the growths are clear.

Figure 2: Times taken searching for a string known, which was added last

As expected, HashSet is O(1) and the others are O(N). The searches not utilising interning are clearly growing in time taken at a greater rate.

Conclusion

HashSet<T> is present in this article only as a baseline for fast searching and should obviously be your choice for speed where there are no constraints requiring a List<T>.

In scenarios where you must use a List<T> such as where you still wish to enumerate quickly through a collection, there are some performance gains to be had from using string interning, with benefits increasing as the size of the collection grows. The drawback is the slightly increased populating overhead (although I think it is fair to suggest that most real-world use cases would not involve populating the entire collection in one go).

The utility and behaviour of string interning, reminds me of database indexes – it takes a bit longer to add a new item but that item will be faster to find. So perhaps the same considerations for database indexes are true for string interning?

There is also the added bonus that string interning will prevent any duplicate strings being stored, which in some scenarios could mean substantial memory savings.

Potential benefits:

  • Faster searching via object references
  • Reduced memory usage because duplicate interned strings will only be stored once

Potential performance hit:

  • Memory referenced by the intern hash table is unlikely to be released until the CLR terminates
  • You still need to create the string to be interned, which will be allocated memory (granted, this will then be garbage collected)

Sources

  • https://msdn.microsoft.com/en-us/library/system.string.intern.aspx

Erratic Behaviour from .NET MemoryCache Expiration Demystified

Standard

On a recent project I experienced first-hand, how the .NET MemoryCache class, when used with either absolute or sliding expiration, can produce some unpredictable and undocumented results.

Sometimes cache items expire exactly when expected… yay. But mostly, they expire an arbitrary period of time late.

For example, a cache item with an absolute expiry of 5 seconds might expire after 5 seconds but could just as likely take up to a further 20 seconds to expire.

This might only be significant where precision, down to a few seconds, is required (such as where I have used it to buffer / throttle FileSystemWartcher events) but I thought it would be worthwhile decompiling System.Runtime.Caching.dll and then clearly documenting the behaviour we can expect.

When does a cache item actually expire?

There are 2 ways your expired item can leave the cache:

  • Every 20 seconds, on a Timer, it will pass through all items and flush out anything past its expiry
  • Whenever an item is accessed, its expiry is checked and that item will be removed if expired

This goes for both absolute and sliding expiration. The timer is enabled as soon as anything is added to the cache, whether or not it has an expiration set.

Note that this is all about observable behaviour, as witnessed by the bemused debugger, because once an item has past its expiry, you can no longer access it anyway – see point 2 above, where accessing an item forces a flush.

Just as weird with Sliding Expiration…

Sliding expiration is where an expiration time is set, the same as absolute, but if it is accessed the timer is reset back to the configured expiration length again.

  • If the new expiry is not at least 1 second longer than the current (remaining) expiry, it will not be updated/reset on access

Essentially, this means that while you can add to cache with a sliding expiration of <= 1 second, there is no chance of any accessing causing the expiration to reset.

Note that if you ever feel the urge to avoid triggering a reset on sliding expiration, you could do this by boxing up values and getting/setting via the reference to the object instead.

Conclusion / What’s so bewildering?

In short, it is undocumented behaviour and a little unexpected.

Consider the 20 second timer and the 5 second absolute expiry example. When it is actually removed from the cache, will depend on where we are in the 20 seconds Timer cycle; it could be any time period, up to an additional 20 seconds, before it fires, giving a potential total of ~ 25 seconds between actually expiring from being added.

Add to this, the additional confusion you’ll come across while debugging, caused by items past their expiry time being flushed whenever they are accessed, it has even troubled the great Troy Hunt: https://twitter.com/troyhunt/status/766940358307516416. Granted he was using ASP.NET caching but the core is pretty much the same, as System.Runtime.Caching was just modified for general .NET usage.

Decompiling System.Runtime.Caching.dll

Some snippets from the .NET FCL for those wanting a peek at the inner workings themselves.

CacheExpires.cs

FlushExpiredItems is called from the TimerCallback (on the 20 seconds) and can also be triggered manually via the MemoryCache method, Trim. There must be interval of >= 1 second between flushes.

Love the goto – so retro. EDIT: Eli points out that it might just be my decompiler!

MemoryCacheEntry.cs

UpdateSlidingExp updates/resets sliding expiration. Note the limit MIN_UPDATE_DELTA of 1 sec.

MemoryCacheStore.cs

See how code accessing a cached item will trigger a check on its expiration and if expired, remove it from the cache.

FileSystemWatcher vs locked files

Standard

Git repository with example code discussed in this article.


Another Problem with FileSystemWatcher

You’ve just written your nice shiny new application to monitor a folder for new files arriving and added the code send that file off somewhere else and delete it. Perhaps you even spent some time packaging it in a nice Windows Service. It probably behaved well during debugging. You move it into a test environment and let the manual testers loose. They copy a file, your eager file watcher spots the new file as soon as it starts writing and does the funky stuff and BANG!… an IOException:

The process cannot access the file X because it is being used by another process.

The copying had not finished before the event fired. It doesn’t even have to be a large file as your super awesome watcher is just so efficient.

A Solution

  1. When a file system event occurs, store its details in Memory Cache for X amount of time
  2. Setup a callback, which will execute when the event expires from Memory Cache
  3. In the callback, check the file is available for write operations
  4. If it is, then get on with the intended file operation
  5. Else, put it back in Memory Cache for X time and repeat above steps

It would make sense to track and limit the number of retry attempts to get a lock on the file.

Code Snippets

I’ve built this on top of the code discussed in a previous post on dealing with multiple FileSystemWatcher events.

Complete code for this example solution here

When a file system event is handled, store the file details and the retry count, using a simple POCO, in MemoryCache with a timer of, something like 60 seconds:

A simple POCO:

In the constructor, I initialised my cache item policy with a callback to execute when these cached POCOs expire:

The callback itself…

  1. Increment the number retries
  2. Try and get a lock on the file
  3. If still lock put it back into the cache for another 60 seconds (repeat this MaxRetries times)
  4. Else, get on with the intended file operation

Other ideas / To do….

  • Could also store the actual event object
  • Could explore options for non-volatile persistence
  • Might find sliding expiration more appropriate in some scenario

A robust solution for FileSystemWatcher firing events multiple times

Standard

Git repository with example code discussed in this article.


The Problem

FileSystemWatcher is a great little class to take the hassle out of monitoring activity in folders and files but, through no real fault of its own, it can behave unpredictably, firing multiple events for a single action.

Note that in some scenarios, like the example used below, the first event will be the start of the file writing and the second event will be the end, which, while not documented behaviour, is at least predictable. Try it with a very large file to see for yourself.

However, FileSystemWatcher cannot make any promises to behave predictably for all OS and application behaviours. See also, MSDN documentation:

Common file system operations might raise more than one event. For example, when a file is moved from one directory to another, several OnChanged and some OnCreated and OnDeleted events might be raised. Moving a file is a complex operation that consists of multiple simple operations, therefore raising multiple events. Likewise, some applications (for example, antivirus software) might cause additional file system events that are detected by FileSystemWatcher.

Example: Recreating edit a file in Notepad firing 2 events

As stated above, we know that 2 events from this action would mark the start and end of a write, meaning we could just focus on the second, if we had full confidence this would be the consistent behaviour. For the purposes of this article, it makes for a convenient examples to recreate.

If you edited a text file in c:\temp, you would get 2 events firing.

Complete Console applications for both available on Github.

A robust solution

Good use of NotifyFilter (see my post on how to select NotifyFilters) can help but there are still plenty of scenarios, like those above, where additional events will still get through for a file system event.

I worked on a nice little idea for utilising MemoryCache as a buffer to ‘throttle’ additional events.

  1. A file event ( Changed in the example below) is triggered
  2. The event is handled by OnChanged. But instead of completing the desired action, it stores the event in MemoryCache with a 1 second expiration and a CacheItemPolicy  callback setup to execute on expiration.
  3. When it expires, the callback OnRemovedFromCache completes the behaviour intended for that file event.

Note that I use AddOrGetExisting as an simple way to block any additional events firing within the cache period being added to the cache.

If you’re looking to handle multiple, different, events from a single/unique file, such as Created with Changed , then you could key the cache on the file name and event named, concatenated.

NotifyFilters enumeration explained (FileSystemWatcher)

Standard

The Problem

When I first worked with the FileSystemWatcher class I ended up experimenting with combinations of NotifyFilters and event handlers to get the desired result; it is not immediately clear, which changes to files and folders, trigger which events.

The job can only get harder when also up against a known issues (just search on stackoverflow.com) with events firing twice.

EDIT: See my separate blog post, on a more reliable solution to the ‘two FileSystemWatcher events firing twice problem’.

Here, I hope to provide simple guidance on using the NotifyFilter enumeration for those just starting out with FileSystemWatcher.

What is a NotifyFilter?

These filters determine what you are watching and thus, which events can be triggered.

They can also be helpful in limiting the number of events trigger in some scenarios where complex file operations, or applications like antivirus software cause additional events to be triggered (see above) although you can’t have 100% confidence without some additional defensive coding.

Note that the default values for the NorifyFilter property  are LastWrite, FileName and DirectoryName.

So, what filters can result in a Changed event being triggered?

  • Attributes
  • CreationTime
  • LastAccess
  • LastWrite
  • Security
  • Size

Which filters can result in a Renamed event being triggered?

  • DirectoryName
  • FileName

Which filters can result in a Created event being triggered?

  • DirectoryName
  • FileName

And which filters can result in a Deleted event being triggered?

  • DirectoryName
  • FileName

In case you missed it in the MSDN documentation, you can combine more than one NotifyFilters member by using the bitwise OR operator like so:

I get two events when I create a new file… what’s with that?

Most guides to FileWatcher tend to lead you towards the Changed event. However, using this often leads to multiple events, which is not desirable. Try out the code in this gist to see the two-event behaviour (just copy a fileinto c:\temp when it’s running). Then try out the code in this other gist, demonstrating how you can use Created with NotifyFilters.FileName to get a single event from a new file in a folder.

A bit more….Where are the events for copying and moving?

Copied files will trigger Created events in the destination folder so use NotifyFilters.FileName.

The same applies for moved files but you can also watch the source folder for Deleted events (still using the same NotifyFilter).

The above works for copied  and moved folders (using instead, NotifyFilters.DirectoryName), although more code is required to trigger events for any files inside the folder. See: https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.110).aspx.

Appendix A – Table detailing NotifyFilters enumeration from MSDN:

Attributes The attributes of the file or folder.
CreationTime The time the file or folder was created.
DirectoryName The name of the directory.
FileName The name of the file.
LastAccess The date the file or folder was last opened.
LastWrite The date the file or folder last had anything written to it.
Security The security settings of the file or folder.
Size The size of the file or folder.

Source: https://msdn.microsoft.com/en-us/library/system.io.notifyfilters(v=vs.110).aspx

Appendix B – Table of events you’ll be interested in if you’ve landed here 

Changed Occurs when a file or directory in the specified Path is changed.
Created Occurs when a file or directory in the specified Path is created.
Deleted Occurs when a file or directory in the specified Path is deleted.
Renamed Occurs when a file or directory in the specified Path is renamed.

Source: https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.110).aspx

C# image extension methods for physical print size in millimetres

Standard

We had a business need for the actual print size of an image in millimetres.

The PhysicalDimension property on System.Drawing.Image was unfortunately not what I had hoped for.

Here’s a couple of extension methods that use the width and height in pixels, along with the DPI (as HorizontalResolution and VerticalResolution) to return an Image object’s width and height in mm.

Usage example for those new to using extension methods: