When to use Threads

I've written a lot now about how to use threads, but not when to use threads - when it's appropriate to and when it's better to keep everything in the same thread. This is partly because after a while it becomes fairly natural to work out what belongs where, and partly because it's quite tricky to actually describe.

There are a few times when there's absolutely no point in using multiple threads. For instance, if your application is bound by a single resource (e.g. the disk, or the CPU) and all the tasks you would use multiple threads for will all be trying to use that same resource, you'll just be adding contention. For instance, suppose you had an application which collected all the names of files on your disk. Splitting that job into multiple threads isn't likely to help - if anything, it'll make it worse, because it would be asking the file system for lots of different directories all at the same time, which could make the head seek all over the place instead of progressing steadily. Similarly, if you have workflow where each stage relies entirely on the results of the previous stage, you can't use threads effectively. For instance, if you have a program which loads an image, rotates it, then scales it, then turns it into black and white, then saves it again, each stage really needs the previous one to be finished before it can do anything.

Suppose, however, you wanted to read a bunch of files and then process their contents (e.g. calculating the cryptographic hash using an algorithm which takes a lot of processor power) then that might very well benefit from threading - either by having several threads doing both, or one thread dedicated to disk IO and another dedicated to hash calculation. The latter would probably be better, but would probably involve more work as the threads would need to be passing data to each other rather than just doing their own thing. Even though here there is still a dependency between the data being read from disk and the crypto processing, you don't have to read all the data from the disk before you can start processing it.

Similarly, in applications other than Windows Forms applications, it probably doesn't matter if you need to do something which will take a little while - a second or so, for instance. If you're writing a batch program, it doesn't matter in the slightest whether your code is always doing something different or whether some operations take longer than others. That's not to say that batch processing should always be single-threaded, but you don't need to worry about having a "main" thread which must be ready to react to events.

If you're writing an ASP.NET page, then unless you think the operation will take so long that you're willing to start another thread and send a page back to the user which tells them to wait and then refreshes itself (using an HTTP meta refresh tag, for instance) periodically to check whether the "real" page has finished, it's usually not worth changing to a different thread. It won't make the page come up any faster, as you'll just have to tell the "main" thread to wait for the other one to finish, and it'll be considerably more complicated to implement. Of course, the part about not being any faster isn't true if you genuinely can do two independent things at once - querying two different databases, for instance. In that case, threading can occasionally be very useful even in ASP.NET scenarios, although you should consider using the thread pool for such tasks to avoid creating lots of threads which each only run for a short time. Don't forget that there may well be other requests which want to use the same resources, and you won't be doing less work in total by spreading it out over many threads. In short, while creating extra threads (or explicitly using the thread pool) is sometimes useful in ASP.NET, it's usually a last resort rather than a matter of course.

In Windows Forms applications, you really should put anything which takes any significant amount of time (even reading a short file) into a different thread, unless the code is really only for your own use and you don't mind an unresponsive UI. This is important, as while the UI thread is doing something else (reading a file, doing a heavy calculation, etc) it can't be reacting to events like the user trying to close it, or a previously hidden area now becoming exposed. The UI can easily become very unresponsive, which gives a horrible user experience. Here, threading isn't used to get the job done quickly - it's used to get the job done while keeping the user satisfied with responsiveness. You might be surprised just how quickly a user can notice an app becoming unresponsive. Even if the user can't actually do anything but close the application or move the window around while they wait, it gives a much more professional feel to a program if you don't end up with a big white box when you pass another window over it.

The choice between using a "new" thread and using one from the thread pool is a contentious one - I tend to like using new threads for most purposes, and others recommend almost always using thread pool threads. My concern about using the thread pool is that there are only a certain number of threads in it, and it's used in various ways by the framework itself - and it's not always obvious that it's doing so. If you accidentally end up with all the threads in the thread pool waiting for other work items which are scheduled to run in the thread pool, you'll have a very difficult deadlock to debug.

Using new threads, however, is relatively costly if they're only going to run for a short while - creating new threads isn't terribly cheap at the operating system level, whereas the thread pool will of course re-use threads to avoid repeating this cost. One happy medium is to use your own thread pool which is separate from the system one but which will still re-use threads. I have a fairly simple implementation in my Miscellaneous Utility Library which you can tweak if it doesn't quite meet your requirements. It probably won't perform quite as well as the system thread pool which has been finely tuned - but you have a lot more control over what goes in it, how many threads it creates, etc.

Using multiple threads is almost always going to introduce complexity to your application, so the potential benefits should be carefully considered and weighed up before you start writing code. Work out the "boundaries" between threads in detail - what thread needs access to what data when, etc. This can make an enormous difference when it comes to the implementation. After making the decision to use threads and designing the threading scenarios carefully, keep taking care while writing the code - it's very easy to slip up, unfortunately, even with all the tools the .NET framework provides. The results of this work should be an elegant application which performs well and remains responsive whatever it's doing - something to feel justly proud of. Good luck!


Next page: Thread-safe Types and Methods
Previous page: An Alternative Approach To Monitors



Back to the main C# page.