Addressing a simple yet common C# Async/Await misconception

Standard

Git repository with example code discussed in this article.

Async/await has been part of C# since C# 5.0 yet many developers have yet to explore how it works under the covers, which I would recommend for any syntactic sugar in C#. I won’t be going into that level of detail now, nor will I explore the subtleties of IO and CPU bound operations.

The common misconception

That awaited code executes in parallel to the code that follows.

i.e. in the following code, LongOperation() is called and awaited, and while this is executing, and before it has completed, the code ‘doing other things’ will start being executed.

This is not how it behaves.

In the above code, what actually happens is that the await operator causes WithAwaitAtCallAsync() to suspend at that line and returns control back to DemoAsyncAwait() until the awaited task, LongOperation(), is complete.

When LongOperation() completes, then ‘do other things’ will be executed.

And if we don’t await when we call?

Then you do get that behaviour some developers innocently expect from awaiting the call, where LongOperation() is left to complete in the background while continuing on with  WithoutAwaitAtCallAsync() in parallel, ‘doing other things’:

However, if LongOperation() is not complete when we reach the awaited Task it returned, then it yields control back to DemoAsyncAwait(), as above. It does not continue to complete ‘more things to do’ – not until the awaited task is complete.

Complete Console Application Example

Some notes about this code:

  • Always use await over Task.Wait() to retrieve the result of a background task (outside of this demo) to avoid blocking. I’ve used Task.Wait() it in my demonstrations to force blocking and prevent the two separate demo results overlapping in time.
  • I have intentioinally not used Task.Run() as I don’t want to confuse things with new threads. Let’s just assume LongOperation() is IO-bound.
  • I used Task.Delay() to simulate the long operation. Thread.Sleep() would block the thread

This is what happens when the code is executed (with colouring):

 

Conclusion

If you use the await keyword when calling an async method from inside an async method, execution of the calling method is suspended to avoid blocking the thread and control is passed (or yielded) back up the method chain. If, on its journey up the chain, it reaches a call that was not awaited, then code in that method is able to continue in parallel to the remaining processing in the chain of awaited methods until it runs out of work to do, and then needs to await the result, which is inside the Task object returned by LongOperation().​​​​​​​

.NET String Interning to Improve String Comparison Performance (C# examples)

Standard

Introduction

String comparisons must be one of the most common things we do in C#; and string comparisons can be really expensive! So its worthwhile knowing the what, when and why to improving string comparison performance.

In this article I will explore one way – string interning.

What is string interning?

String interning refers to having a single copy of each unique string in an string intern pool, which is via a hash table in the.NET common language runtime (CLR). Where the key is a hash of the string and the value is a reference to the actual String object.

So if I have the same string occurring 100 times, interning will ensure only one occurrence of that string is actually allocated any memory. Also, when I wish to compare strings, if they are interned, then I just need to do a reference comparison.

String interning mechanics

In this example, I explicitly intern string literals just for demonstration purposes.

Line 1:

  • This new “stringy” is hashed and the hash is looked up in our pool (of course its not there yet)
  • A copy of the “stringy” object would be made
  • A new entry would be made to the interning hash table, with the key being the hash of “stringy” and the value being a reference to the copy of “stringy”
  • Assuming application no longer references original “stringy”, the GC can reclaim that memory

Line 2: This new “stringy” is hashed and the hash is looked up in our pool (which we just put there). The reference to the “stringy” copy is returned
Line 3: Same as line 2

Interning depends on string immutability

Take a classic example of string immutability:

We know that when line 2 is executed, the “stringy” object is de-referenced and left for garbage collection; and s1 then points to a new String object “stringy new string”.

String interning works because the CLR knows, categorically, that the String object referenced cannot change. Here I’ve added a fourth line to the earlier example:

Line 4: s1 doesn’t change because it is immutable; it now points to a new String object ” stringy new string”.
s2 and s3 will still safely reference the copy that was made at line 1

Using Interning in the .NET CLR

You’ve already seen a bit in the examples above. .NET offers two static string methods:

Intern(String str)

It hashes string str and checks the intern pool hash table and either:

  • returns a reference to the (already) interned string, if interned; or
  • a references to str is added to the intern pool and this reference is returned

IsInterned(String str)

It hashes string str and checks the intern pool hash table. Rather than a standard bool, it returns either:

  • null, if not interned
  • a reference to the (already) interned string, if interned

String literals (not generated in code) are automatically interned, although CLR versions have varied in this behaviour, so if you expect strings interned, it is best to always do it explicitly in your code.

My simple test: Setup

I’ve run some timed tests on my aging i5 Windows 10 PC at home to provide some numbers to help explore the potential gains from string interning. I used the following test setup:

  • Strings randomly generated
  • All string comparisons are ordinal
  • Strings are all the same length of 512 characters as I want the CLR will compare every character to force the worst-case (the CLR checks string length first for ordinal comparisons)
  • The string added last (so to the end of the List<T>) is also stored as a ‘known’ search term. This is because I am only exploring the worst-case approach
  • For the List<T> interned, every string added to the list, and also the search term, were wrapped in the string.Intern(String str) method

I timed how long populating each collection took and then how long searching for the known search term took also, to the nearest millisecond.

The collections/approaches used for my tests:

  • List<T> with no interning used, searched via a foreach loop and string.Equals on each element
  • List<T> with no interning used, searched via the Contains method
  • List<T> with interning used, searched via a foreach loop and object.ReferenceEquals
  • HashSet<T>, searched via the Contains method

The main objective is to observe string search performance gains from using string interning with List<T> HashSet is just included as a baseline for known fast searches.

My simple test: Results

In Figure 1 below, I have plotted the size of collections in number of strings added, against the time taken to add that number of randomly generated strings. Clearly there is no significant difference in this operation, when using a HashSet<t> compared to a List<T> without interning. Perhaps if had run to larger sets the gap would widen further based on trend?

There is slightly more overhead when populating the List<T> with string interning, which is initially of no consequence but is growing faster than the other options.

Figure 1: Populating List<T> and HashSet<T> collections with random strings

Figure 2, shows the times for searching for the known string. All the times are pretty small for these collection sizes but the growths are clear.

Figure 2: Times taken searching for a string known, which was added last

As expected, HashSet is O(1) and the others are O(N). The searches not utilising interning are clearly growing in time taken at a greater rate.

Conclusion

HashSet<T> is present in this article only as a baseline for fast searching and should obviously be your choice for speed where there are no constraints requiring a List<T>.

In scenarios where you must use a List<T> such as where you still wish to enumerate quickly through a collection, there are some performance gains to be had from using string interning, with benefits increasing as the size of the collection grows. The drawback is the slightly increased populating overhead (although I think it is fair to suggest that most real-world use cases would not involve populating the entire collection in one go).

The utility and behaviour of string interning, reminds me of database indexes – it takes a bit longer to add a new item but that item will be faster to find. So perhaps the same considerations for database indexes are true for string interning?

There is also the added bonus that string interning will prevent any duplicate strings being stored, which in some scenarios could mean substantial memory savings.

Potential benefits:

  • Faster searching via object references
  • Reduced memory usage because duplicate interned strings will only be stored once

Potential performance hit:

  • Memory referenced by the intern hash table is unlikely to be released until the CLR terminates
  • You still need to create the string to be interned, which will be allocated memory (granted, this will then be garbage collected)

Sources

  • https://msdn.microsoft.com/en-us/library/system.string.intern.aspx

Erratic Behaviour from .NET MemoryCache Expiration Demystified

Standard

On a recent project I experienced first-hand, how the .NET MemoryCache class, when used with either absolute or sliding expiration, can produce some unpredictable and undocumented results.

Sometimes cache items expire exactly when expected… yay. But mostly, they expire an arbitrary period of time late.

For example, a cache item with an absolute expiry of 5 seconds might expire after 5 seconds but could just as likely take up to a further 20 seconds to expire.

This might only be significant where precision, down to a few seconds, is required (such as where I have used it to buffer / throttle FileSystemWartcher events) but I thought it would be worthwhile decompiling System.Runtime.Caching.dll and then clearly documenting the behaviour we can expect.

When does a cache item actually expire?

There are 2 ways your expired item can leave the cache:

  • Every 20 seconds, on a Timer, it will pass through all items and flush out anything past its expiry
  • Whenever an item is accessed, its expiry is checked and that item will be removed if expired

This goes for both absolute and sliding expiration. The timer is enabled as soon as anything is added to the cache, whether or not it has an expiration set.

Note that this is all about observable behaviour, as witnessed by the bemused debugger, because once an item has past its expiry, you can no longer access it anyway – see point 2 above, where accessing an item forces a flush.

Just as weird with Sliding Expiration…

Sliding expiration is where an expiration time is set, the same as absolute, but if it is accessed the timer is reset back to the configured expiration length again.

  • If the new expiry is not at least 1 second longer than the current (remaining) expiry, it will not be updated/reset on access

Essentially, this means that while you can add to cache with a sliding expiration of <= 1 second, there is no chance of any accessing causing the expiration to reset.

Note that if you ever feel the urge to avoid triggering a reset on sliding expiration, you could do this by boxing up values and getting/setting via the reference to the object instead.

Conclusion / What’s so bewildering?

In short, it is undocumented behaviour and a little unexpected.

Consider the 20 second timer and the 5 second absolute expiry example. When it is actually removed from the cache, will depend on where we are in the 20 seconds Timer cycle; it could be any time period, up to an additional 20 seconds, before it fires, giving a potential total of ~ 25 seconds between actually expiring from being added.

Add to this, the additional confusion you’ll come across while debugging, caused by items past their expiry time being flushed whenever they are accessed, it has even troubled the great Troy Hunt: https://twitter.com/troyhunt/status/766940358307516416. Granted he was using ASP.NET caching but the core is pretty much the same, as System.Runtime.Caching was just modified for general .NET usage.

Decompiling System.Runtime.Caching.dll

Some snippets from the .NET FCL for those wanting a peek at the inner workings themselves.

CacheExpires.cs

FlushExpiredItems is called from the TimerCallback (on the 20 seconds) and can also be triggered manually via the MemoryCache method, Trim. There must be interval of >= 1 second between flushes.

Love the goto – so retro.

MemoryCacheEntry.cs

UpdateSlidingExp updates/resets sliding expiration. Note the limit MIN_UPDATE_DELTA of 1 sec.

MemoryCacheStore.cs

See how code accessing a cached item will trigger a check on its expiration and if expired, remove it from the cache.

FileSystemWatcher vs locked files

Standard

Git repository with example code discussed in this article.


Another Problem with FileSystemWatcher

You’ve just written your nice shiny new application to monitor a folder for new files arriving and added the code send that file off somewhere else and delete it. Perhaps you even spent some time packaging it in a nice Windows Service. It probably behaved well during debugging. You move it into a test environment and let the manual testers loose. They copy a file, your eager file watcher spots the new file as soon as it starts writing and does the funky stuff and BANG!… an IOException:

The process cannot access the file X because it is being used by another process.

The copying had not finished before the event fired. It doesn’t even have to be a large file as your super awesome watcher is just so efficient.

A Solution

  1. When a file system event occurs, store its details in Memory Cache for X amount of time
  2. Setup a callback, which will execute when the event expires from Memory Cache
  3. In the callback, check the file is available for write operations
  4. If it is, then get on with the intended file operation
  5. Else, put it back in Memory Cache for X time and repeat above steps

It would make sense to track and limit the number of retry attempts to get a lock on the file.

Code Snippets

I’ve built this on top of the code discussed in a previous post on dealing with multiple FileSystemWatcher events.

Complete code for this example solution here

When a file system event is handled, store the file details and the retry count, using a simple POCO, in MemoryCache with a timer of, something like 60 seconds:

A simple POCO:

In the constructor, I initialised my cache item policy with a callback to execute when these cached POCOs expire:

The callback itself…

  1. Increment the number retries
  2. Try and get a lock on the file
  3. If still lock put it back into the cache for another 60 seconds (repeat this MaxRetries times)
  4. Else, get on with the intended file operation

Other ideas / To do….

  • Could also store the actual event object
  • Could explore options for non-volatile persistence
  • Might find sliding expiration more appropriate in some scenario

A robust solution for FileSystemWatcher firing events multiple times

Standard

Git repository with example code discussed in this article.


The Problem

FileSystemWatcher is a great little class to take the hassle out of monitoring activity in folders and files but, through no real fault of its own, it can behave unpredictably, firing multiple events for a single action.

Note that in some scenarios, like the example used below, the first event will be the start of the file writing and the second event will be the end, which, while not documented behaviour, is at least predictable. Try it with a very large file to see for yourself.

However, FileSystemWatcher cannot make any promises to behave predictably for all OS and application behaviours. See also, MSDN documentation:

Common file system operations might raise more than one event. For example, when a file is moved from one directory to another, several OnChanged and some OnCreated and OnDeleted events might be raised. Moving a file is a complex operation that consists of multiple simple operations, therefore raising multiple events. Likewise, some applications (for example, antivirus software) might cause additional file system events that are detected by FileSystemWatcher.

Example: Recreating edit a file in Notepad firing 2 events

As stated above, we know that 2 events from this action would mark the start and end of a write, meaning we could just focus on the second, if we had full confidence this would be the consistent behaviour. For the purposes of this article, it makes for a convenient examples to recreate.

If you edited a text file in c:\temp, you would get 2 events firing.

Complete Console applications for both available on Github.

A robust solution

Good use of NotifyFilters (see my post on how to select NotifyFilters) can help but there are still plenty of scenarios, like those above, where additional events will still get through for a file system event.

I worked on a nice little idea with a colleague, Ross Sandford, utilising MemoryCache as a buffer to ‘throttle’ additional events.

  1. A file event (Changed in the example below) is triggered
  2. The event is handled by OnChanged.But instead of completing the desired action, it stores the event in MemoryCache with a 1 second expiration and a CacheItemPolicy callback setup to execute on expiration.
  3. When it expires, the callback OnRemovedFromCache completes the behaviour intended for that file event.

Note that I use AddOrGetExisting as an simple way to block any additional events firing within the cache period being added to the cache.

If you’re looking to handle multiple, different, events from a single/unique file, such as Created with Changed, then you could key the cache on the file name and event named, concatenated.

NotifyFilters enumeration explained (FileSystemWatcher)

Standard

The Problem

When I first worked with the FileSystemWatcher class I ended up experimenting with combinations of NotifyFilters and event handlers to get the desired result; it is not immediately clear, which changes to files and folders, trigger which events.

The job can only get harder when also up against a known issues (just search on stackoverflow.com) with events firing twice.

EDIT: See my separate blog post, on a more reliable solution to the ‘two FileSystemWatcher events firing twice problem’.

Here, I hope to provide simple guidance on using the NotifyFilter enumeration for those just starting out with FileSystemWatcher.

What is a NotifyFilter?

These filters determine what you are watching and thus, which events can be triggered.

They can also be helpful in limiting the number of events trigger in some scenarios where complex file operations, or applications like antivirus software cause additional events to be triggered (see above) although you can’t have 100% confidence without some additional defensive coding.

Note that the default values for the NorifyFilter property  are LastWrite, FileName and DirectoryName.

So, what filters can result in a Changed event being triggered?

  • Attributes
  • CreationTime
  • LastAccess
  • LastWrite
  • Security
  • Size

Which filters can result in a Renamed event being triggered?

  • DirectoryName
  • FileName

Which filters can result in a Created event being triggered?

  • DirectoryName
  • FileName

And which filters can result in a Deleted event being triggered?

  • DirectoryName
  • FileName

In case you missed it in the MSDN documentation, you can combine more than one NotifyFilters member by using the bitwise OR operator like so:

I get two events when I create a new file… what’s with that?

Most guides to FileWatcher tend to lead you towards the Changed event. However, using this often leads to multiple events, which is not desirable. Try out the code in this gist to see the two-event behaviour (just copy a fileinto c:\temp when it’s running). Then try out the code in this other gist, demonstrating how you can use Created with NotifyFilters.FileName to get a single event from a new file in a folder.

A bit more….Where are the events for copying and moving?

Copied files will trigger Created events in the destination folder so use NotifyFilters.FileName.

The same applies for moved files but you can also watch the source folder for Deleted events (still using the same NotifyFilter).

The above works for copied  and moved folders (using instead, NotifyFilters.DirectoryName), although more code is required to trigger events for any files inside the folder. See: https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.110).aspx.

Appendix A – Table detailing NotifyFilters enumeration from MSDN:

Attributes The attributes of the file or folder.
CreationTime The time the file or folder was created.
DirectoryName The name of the directory.
FileName The name of the file.
LastAccess The date the file or folder was last opened.
LastWrite The date the file or folder last had anything written to it.
Security The security settings of the file or folder.
Size The size of the file or folder.

Source: https://msdn.microsoft.com/en-us/library/system.io.notifyfilters(v=vs.110).aspx

Appendix B – Table of events you’ll be interested in if you’ve landed here 

Changed Occurs when a file or directory in the specified Path is changed.
Created Occurs when a file or directory in the specified Path is created.
Deleted Occurs when a file or directory in the specified Path is deleted.
Renamed Occurs when a file or directory in the specified Path is renamed.

Source: https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.110).aspx

C# image extension methods for physical print size in millimetres

Standard

We had a business need for the actual print size of an image in millimetres.

The PhysicalDimension property on System.Drawing.Image was unfortunately not what I had hoped for.

Here’s a couple of extension methods that use the width and height in pixels, along with the DPI (as HorizontalResolution and VerticalResolution) to return an Image object’s width and height in mm.

Usage example for those new to using extension methods:

Using FakeItEasy with Entity Framework 6

Standard

What this article is about

I have quickly become a fan of FakeItEasy. If you are not familiar with it, visit https://fakeiteasy.github.io/ and check out the super fluent interface yourself.

I couldn’t resist looking for a way to quickly fake out EF6 and I am going to share how that went.

What I will not be discussing

  • The differences between mocks, stubs and so on (see Gerard Meszaro‘s list for that)
  • The usefulness of mocking ORMs
  • The limitations of in-memory test doubles for a database


Background – testing with EF5

In EF5 (before my time!) we would create a mock DbSet class that implemented IDbSet, with a whole load of properties and methods. Then create an interface for a DbContext and implement that with a mocked context. I did this following the instructions here and it generated quite a lot of code to maintain for my relatively small project.

Changes in EF6

In EF6, DbSet gained an additional protected internal constructor. So instead of creating all those mocks, we mark the DbSet properties virtual, allowing FakeItEasy to override them:


Working with the DbSet methods

This is fairly straightforward and requires the behaviour of any DbSet method called to be defined.

Create instances of a fake context and fake DbSet, arrange the behaviour for calls to the Products getter and calls to the Add method on the DbSet:

Why not setup calls direct to the methods given as we need to define their behaviour individually?

We can but I prefer the first way as it has clearer logic and keeps things consistent with the way we can work with LINQ extension methods below. Opinions?

Note that there is no difference in arrangement between DbSet’s asynchronous and non-asynchronous methods.


Working with the LINQ extension methods (non-asynchronously)

So what if the method under test is using LINQ extension methods on the DbSet?

On the face of it these are tidier to work with because after setting up your fake DbSet, LINQ-to-Objects kicks in and there is no need to identify the behaviour of individual methods. We will take advantage of DbSet  implementing IQueryable:

For our in-memory storage we will use a List that implements IQueryable, which allows us to return its properties and method to use in the fake DbSet.

Next, we setup the fake DbSet. Note that FakeItEasy needs to be told explicitly to implement IQueryable in order for the Castle proxy to intercept.

To setup the behaviour for the DbSet under LINQ methods we can ‘redirect’ all calls to the method and properties of the IQueryable interface to the fakeIQueryable.

Finally, we setup the fake context and its behaviour when the Products DbSet getter is called; then go ahead and instantiate the object under test, passing it the fake context and a dummy to the Get method.

 

Working with the LINQ extension methods (asynchronously)

Problem 1: What if we used a similar test arrangement for a test method that calls an asynchronous LINQ extension method? e.g.

We get an error: The provider for the source IQueryable does not implement IDbAsyncQueryProvider…

So here we must change our behaviour for the Provider property. I have made use of a class provided by MSDN that wraps up our fakeIQueryable to provide an implementation of IDbAsyncQueryProvider:

Problem 2: What if we want to enumerate through the DbSet asynchronously?

We’ll get a different error message, and the solution here is to use another class provided in the link above to wrap up our fakeIQueryable to implement IDbAsyncEnumerator, replacing the original call to GetEnumerator form earlier.

FakeItEasy now also needs to be told to implement IDbAsyncEnumerable.

 

Validation

I did not have time to look at FakeItEasy and EF’s validation in depth.

I briefly considered one scenario where we might be relying on EF for our data validation, catching any DbEntityValidationException. We could quite easily check how our code handles validation errors by throwing a DbEntityValidationException e.g.

Then construct an IEnumerable of DbEntityValidationResult . However, this is just theory and I have not tried this myself yet!


Summary

If you are already using FakeItEasy in your tests, it is quite nice for consistency to be able to use it with Entity Framework without having to maintain mock code for DbContext and DbSet.

There are obvious limitations to how much behaviour can be faked this way. I did say I would not be debating the merits of mocking an ORM but here are just two issues with mocking EF:

  • The differences in behaviour between LINQ-to-Objects and LINQ-to-Entities are many – there is a good Stackoverlow.com answer here detailing some of these.
  • Mocking the behaviour of EF’s validation would likely be unmanageable

I would be interested in hearing others’ experiences of scenarios where it has been useful to mock EF.


Other options

There are a few other efforts (get the reference?) out there to provide support for testing EF but they are often poorly maintained. Highway.Data looks interesting but I have not tried it yet (link).

.NET Core now has an InMemory provider for testing (link).