Wednesday, August 26, 2009

Raycasting explained

Raycasting can be an extremely useful tool in computer games. There is a lot of confusion on what raycasting really is and how it can be used. I’ll try to explain some it in simple terms and give some simple examples.

Raycasting - as the name suggests involves a ray – but what exactly is a ray?

A ray is part of a line which is finite in one direction, but infinite in the other. It can be defined by two points, the initial point, A, and one other, B.

Source: Wikipedia

In Farseer Physics Engine we define a ray as two vectors; start and end. We can for obvious reasons not have an infinite length ray, we have to have some kind of boundary where the ray stops. This means that rays essentially become lines and then we simply use those lines to test for collisions with other stuff like other lines, AABB or geometries.

Collision testing

There are 4 test cases that can be performed:

  1. Point
  2. Line
  3. AABB
  4. Geometry

The great thing about all those (except number 1) is that they can be reduced to simple line vs. line tests. An AABB is made up of 4 lines and a geometry is made of N (where N > 2) lines. With this in mind we just have to know how to test two lines against each other.

This is done simply by using a linear equation from basic algebra:


Using this equation we can create a line. m is the amount that the y-value grows or shrinks with each time x increases by 1. b is point on which the line crosses the y-axis. With this knowledge we can draw a line like this:


It has the equation: y = 2*x+1

Now that we know how to create a line, we need a second one to test against:


This line has the equation y = –1*x+1. From what we gathered earlier, b was the point where the line crosses the y-axis. Since both linear equations has a b value of 1, they must be intersecting each other there at (0, 1).

To check for intersections, we take the two equations and simply does an equality check between them and input the result (the value of x) into one of our equations:


So our intersection point is at (0,1). Exactly how this is done in code is a little more complicated. In Farseer Physics Engine we implemented line intersection like described here: Intersection Point Of Two Lines


Now we know that raycasting operates on lines, but how can this be applied to games? Here is a list of a few examples:

  • Collision detection
  • Distance calculation
  • Jumping
  • Lasers
  • Field of view
  • Lightning
  • Suspension
  • Many more…

You can use rays for collision detection and check the distance between the two objects. You can also use it for smooth jumping on to platforms and even car suspension. Another common usage is finding the field of view. The only limit is your imagination.

Ray casts are cheap, but they should not be used for everything. Some physics engines incorporate rays into their broad phase collider to speed up permanent rays. This will soon be implemented into Farseer Physics Engine too.

Saturday, August 22, 2009

New game using Farseer Physics – Zombies 2.0

A new game just got posted on the Farseer Physics Engine forums, this time it is a zombie survival game. You are the lone protector of a science team, but you do have the choice of 16 different base upgrades and 15 different weapons to help you out. It features real-time lightning and pure action. Check out out!

Simulators And Determinism

When working with simulators, it is important to understand determinism and why we have trouble with determinism across different platform and configurations. Simulators that are iterative converge their solution over multiple iterations and can accumulate system-inherited errors and thus can become unstable over time.

Why is determinism important

Does determinism in simulation systems even matter? The short answer is yes, very important. The long answer is that simulations systems usually base their current world on the previous world. It takes one step at the time (time step) and calculate everything based on the previous step. If you shoot a projectile towards a target it, the trajectory of the projectile will travel in a certain curve. If you run the same simulation on a different platform, the projectile might end up somewhere totally different than the first simulation.

Take the following example with a projectile trajectory. They both have the same initial conditions.

Simulation on platform 1:


Simulation on platform 2:


You would expect the results to be the same on both platforms, but they are not.

What can affect determinism

There are several things that can affect the determinism of a simulation, the order of elements, the rounding of floats, the low level instructions and much more. Here is a list of some examples on what can impact determinism:

  • Sorting
    • Stable vs. unstable sorting
  • Float operations
    • Rounding methods
    • Optimizations
  • Hardware instructions
    • Some hardware has higher accuracy than others.

The list is by no means complete, there are hundreds of factors that can affect determinism; the list only contains those we focus on in this post.


In Farseer Physics Engine we sort the contacts (where two shapes touch) by the amount two shapes overlap. This way we can resolve the deepest contacts first and get a more realistic simulation. We used to use Array.Sort() that is based on quick sort, but quick sort is not a stable sorting algorithm. Stable in this case just means that the order of elements are preserved if the keys that are equal are in the same order before and after sorting. If two contacts had the same amount of penetration, they might get shuffled around and the order of IDs will be out of order. We switched to an insertion sort algorithm instead and suddenly the whole simulation acted differently. Not a big problem since it would do that across all platforms, but simulations recorded from previous version of the physics simulator did not match newer versions anymore.

Floating point operations

Floats can be tricky to work with. I took the following example from this article on floating points in .net:


You would guess that the program always return true - but that is not the case. If compiled under release mode (to enable optimizations) and x86, it will return false. Even though you can represent two values exact, the result of an operation on the values might not be exact. There is a standard for floating point operations, but it allows different platforms to return different values. This means that even if two systems follow the same standard, they might return different values. You can always be sure that the same value is always returned from a float operation on the same platform.

Let us look at what happens under the hood.


The float operations are stored in a 80 bit register while the memory is only 64 bit. The value gets calculated with high precision inside the register, but gets truncated in the register. Some compilers optimize your code in such a way that the value in the register gets compared to the value in memory. Since the memory value is truncated, it is not equal to the non-truncated value. This is what happens in the C# example above. To combat this we can do some tricks to disable float optimizations (by making the float not eligible to the optimization), but the simplest thing is to check if the float is in the range of epsilon. More on this further down.

Hardware Instructions

Let us take the fused multiply-accumulate instruction as an example. This instruction can be found in IBM PowerPC or Intel Itanium processors that can compute ax±b with a single rounding instead of multiple. This increase accuracy (and performance because of the fewer rounding instructions) and thus a simulation will run with higher accuracy on the PowerPC or Intel Itanium platform compared to the PC platform.

This is just one out of many instructions. Some platforms have lower accuracy some have higher. If you develop a simulation that runs on PC, make sure to test it on other platforms if you officially support them.

What can we do about it

Now that we know some of the factors that influence the determinism of a simulation we can implement some techniques to increase the determinism.

The first thing is obvious, use stable sorting methods, it is as simple as that. Anything float related is a little more complicated. There are 3 methods of getting more accuracy:

  • Use doubles
  • Use decimal
  • Epsilon
  • Don’t use floats
Double and decimal

Using doubles simply just shift the problem. It might now be as visible as before, but it is still there. Floats have a precision of 7 digits while double have 15-16 digits. Decimal on the other hand is very accurate and modeled in such a way that no errors are involved. (They have a precision of 28-29 digits by the way). Take a look at the following example:


The code above will write false for reasons described earlier in this post. The same code using the decimal data type:


The above example will return true, just like we would expect. Performance of decimals are not as high as with floats or integers because of the added precision. But places where accuracy is paramount, it might be the only solution.


Epsilon is the smallest positive float value greater than zero (in .net) and can be used to check the equality between two floats, even after they have been rounded.


Using this will always make the equality check between two floats return the correct value.

Fixed point

The last thing is to simply not use floats and use integers instead. This is also called fixed point arithmetic. In fixed point math you have fixed number of digits after the radix point. It is basically an integer that you scale using some scale factor. Take the number 1.595 as an example. That is the same as 1595/1000 – so 1000 is the scale factor here. Not only can this improve accuracy, but it can also improve performance on devices without a FPU.


Use stable sorting methods if the order of elements are important and make sure to take great care with floats in algorithms that are sensitive to small errors. Use doubles if you simply need more precision or use fixed point math if you need exact values. Floating point values are not evil, they are a trade off between performance an accuracy, but you might need to do equality checks in places where you do float operations.

Thursday, August 20, 2009

Performance tips and tricks – part 3

In part 2 I gave a list of the different C# keywords that could potentially increase the performance of your code. This time we take a look at the different manual inlining techniques – it will not be like the two previous posts as this one can’t be presented in simple code snippets. The compiler is normally in charge of doing those kinds of optimizations, but sometimes it might not because of some requirements. First off, let’s get an overview of the different optimizations done by the C# compiler:

As you see, it is quite effective and contains the mostly used optimization techniques required by modern compilers.

Why manually optimize?

Well, the problem is that while the compiler almost always takes the right path, there are times where manual inlining is necessary. One example is when you are using an old .net version like 2.0; It will not inline methods that contain a value type parameter.

Another example is automatic properties. They will get inlined in release mode and all that, but they are considered regular methods by the compiler and thus still fall under the requirements of the compiler (see them further down).

Method inlining

The compiler has a few requirements to a method before it can be inlined:

  • Methods that are greater than 32 bytes of IL will not be inlined.
  • Virtual functions are not inlined.
  • Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while.
  • Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
  • If any of the method's formal arguments are structs, the method will not be inlined.

While all of them are great requirements and suffice to say that the compiler is really clever in this area (it determines inlining depending on cache size etc.) a few of those items can pose as limitations in your application. More precisely: Virtual functions, struct arguments and exception handling. I’m going to take each of them separately.

Virtual calls
The only way to get around this one is to simply not use virtual calls. If you have a way of getting rid of virtual methods, do it. Virtual calls are up to 40% slower than static or instance method calls. If you get rid of them, the compiler might also inline more methods.

Struct arguments
.Net 3.5 SP1 brought some improvements to this area and it now consider methods with struct arguments as candidates for inlining. However, if you use < .net 3.5 SP1, you will need to identify the methods yourself and inline then manually.

Exception handling
I’m not going to tell you not to do exception handling. While there are other ways of reporting errors, you might be forced to use try/catch. To get around this you need to make sure that the operation expensive methods does not contain exception handling code. Always test input once and then send it off to be calculated on.

Another thing is properties. You might have a ListCapacity property that checks if the list capacity is < 0 and throw an error if that is the case. Using that property directly in a loop can give you a performance penalty. Instead you should have a backing field (a simple variable) that contains the value and get/set from/to that instead.

Loop unrolling

As I mentioned in part 1, you should move your work outside of loops and you should not even use loops if you don’t need to. Loop unrolling is (almost) the same thing. A classic example of loop unrolling:






The C# compiler only inlines small loops with small bodies. Inlining a huge loop is very inefficient since the code size would increase and thus you will get slower code execution. But intelligently unrolling a loop can greatly increase performance, so beware of what you inline and remember to profile it.

Wednesday, August 19, 2009

Performance tips and tricks – part 2

In part 1 I presented you with some tricks on loop optimizations and constants. This time I’ll show you some simple keywords in C# that can speed up the execution of your code for the various reasons.

Sealed keyword

Marking a class as sealed makes it it unable to be used as a base class. This is primarily used to prevent other classes from deriving from it. This can unlock some run-time optimizations.





Static keyword

You can use the static keyword to instantiate a copy of a class at run-time and only keep that one copy in memory. You can read more about the static keyword in the link – be careful with it, you should only use it where appropriate.





Struct keyword

There are two types of allocations in C#: Stack and heap
The stack is where all the value types go: int, char, enum, structs and the like.
The heap is where all the reference types go: class, interface, string and the like.

The stack does not get garbage collected and allocations in that memory space is a lot quicker. When the data structure allow it, we should allocate our data on the stack.





Ref keyword

Using ref can sometimes greatly improve performance. It copies the value type by reference instead of by value. If this sounds like gibberish to you, take a look at the following article: Parameter passing in C#

Beware that the before example does actually not change the unit data. It only changes a copy of the data. The calling class will not upgrade the unit.

Also, have in mind that this can also decrease performance. The overhead of dereferencing the value type can exceed the overhead of copying by value. Always test if you actually get more performance.





Out keyword

The out keyword is almost exactly like the ref keyword. The ref keyword needs the object to be instantiated before you pass it as a parameter, the out keyword does not.





Note: As you see, the responsibility of instantiating the UnitData has been shifted to the UpgradeUnit() method. Before it had to be instantiated before it was passed to the method.

Unsafe keyword

The unsafe keyword is not used very often because of its nature and a name that can scare away most developers. Not to worry, the unsafe keyword enables the use of C concepts like pointers. Pointers are not just grab and go knowledge and it can cause havoc if used incorrectly.

Remember to turn on unsafe code in the project build properties. Also remember to use the fixed keyword to pin the memory location of variables or use the stackalloc keyword to allocate memory outside of the garbage collected memory.





Note how the arrow (->) is used instead of the dot (.) when accessing members.

Tuesday, August 18, 2009

Performance tips and tricks – part 1

In this post series I will take a look at the different tips and tricks that can increase the performance of your application. You probably already know some of them and some of them are even just a bit of logical thinking. In any case, it should help you remember the different techniques. I took all these tips and tricks out of my article High Performance Game Development (The revised version – coming out soon) since they apply to almost all languages.

Move work out of loops

This one is kind of obvious, but it can be hard to identify in some cases. You simply move work from inner loops to outer loops or even out of a loop altogether. I’ve made some simple examples of this:





Don’t use loops

This is much like the previous one. One important difference is that you don’t use loops at all. You will save a lot of instructions and thus you will have better performance.





Use constants whenever possible

Using constants can greatly improve performance. The compiler can make better decisions about your code and make use of low level optimizations that speed up the execution of the code.





Note how we can use the const keyword on the addition variable. This is only possible when the depending variables are also marked with the const keyword.

Simplify variables

Whenever you can, make sure to simplify variables to removing redundant operations. This might not always be possible and sometimes it can severely decrease the code readability. In this specific case you can maintain the readability by simply using the const keyword. The compiler will then simplify the variables for you.





The famous inverse square root

I’ve already mentioned before that you can approximate divisions using a division-free Newton-Raphson approximation. You can also use the Newton-Raphson method for approximating the square root and the inverse square root function.

But first, what do we actually use inverse square root and square root for? A lot really – everything that uses a normalized vector; collision responses, lightning and reflection to mention a few. It is an important part of any game and the operation has to be as fast as possible. Back in the day when computers had weaker CPUs and no specialized hardware to handle floating point operations (Today we even have specialized physics cards and specialized lightning circuits), some steps had to be taken to compute the inverse square root a million times each second as quickly as possible.

And here it is:


I could go into details on how this works, but I think it has been described sufficiently on Wikipedia's article Fast inverse square root. Suffice to say that the line with 0x5f3759df is the interesting one, and even to this day we don’t know the origin of it. Several different papers has been written to try and explain where this magic hex code came from.

I included this fast inverse square root function in the library (C#).

Monday, August 17, 2009

Mathematical operations and micro-benchmarking

You might have heard that the division operation on a modern FPU is slower than a multiplication operation. If you have not heard this before; it’s true – divisions are indeed slower. But why?

Without going into the technical details, the divide operation is an iterative algorithm and the multiplication operation is computed directly. One of the more common ways of doing divide operations on a FPU is to use a divide-free Newton-Raphson approximation to get the reciprocal of the denominator and then multiply with the numerator. More on this will come in a later post.

Let’s put it to the test.
Here we run 10.000.000 iterations where we divide 500 by 11.5


Here we multiply 500 by the multiplicative inverse (reciprocal) of 11.5


From what I described earlier our multiplication example (code example 2) should perform better than the division example. Here are the results:

Division: 00:00:00.0010787
Multiplication: 00:00:00.0010871

Something is clearly wrong. First off all, they have virtually no difference in the time it took to compute the results. Second of all, the time it took to do the operations is really small. Don’t worry, there is a perfectly good reason for this:

We used constants to represent the numbers we needed to divide and multiply. The compiler knows this and use the approximation instructions on the FPU – we also did the same operation 10.000.000 times. This leads us to the next section of this post:


Micro-benchmarking is where you test a single or multiple operations to see what takes the most time. In this case, all we tested was that the division and multiplication examples took the same amount of time and we simply timed the cache and not the operations themselves.

There are a bunch of examples on the internet that shows us exactly how micro-benchmarking can deceive us. In this case, the compiler simply optimizes the code and CPU cache helped us to get the results fast. To truly test the performance of the divide and multiply operators, we need to make the compiler second guess; simply remove the const keyword and rerun the code. You will see that the multiply operator is up to 10 times faster (depending on platform and compiler mode).


Divisions are really slower than multiplication, but the CPU uses precise approximations by either using lookup tables or iterative algorithms. It is a clever piece of hardware that always tries to speed up your the execution of your code where ever possible. When you do micro-benchmarking, make sure the compiler generates the code you actually want to test. Often the compiler even strips away your for-loop and simply run the code once. If there is no way to get around this, simply run the code once yourself and use StopWatch.ElapsedTicks as measuring tool and it will tell you of any speed differences.

Friday, August 14, 2009

New game using Farseer Physics Engine – Superspace

Another game got posted on our forums. This time it is a space shooter game with cooperative, death match and racing modes. It has 36 different levels for you to play with. The game has great particle effects, nice dynamics and classic sounds.

You can view a trailer of the game on youtube: Superspace trailer

Tuesday, August 11, 2009

Another game using Farseer Physics Engine – Square Off

A new game called Square off just got posted on the Farseer Physics Engine forums. It is created by Gnomic Studios and has entered the Dream Build Play competition.

It features great graphics, multiplayer and a huge fun-factor. Take a look at this video to see what I mean.

New game using Farseer Physics Engine - Ninjic

We have about 1300 daily pageviews and  200 downloads each day over at the Farseer Physics Engine project. We also have a quite a few users posting their games for others to review and to show off how they used the engine.

A lot of the games are good, but Ninjics surpassed my expectations. The graphics is from UP Studio and the designer and a guy teamed up to create a ninja-style game.

See the video of Ninjic on youtube.

Monday, August 10, 2009

Google analytics and Blogspot

I looked around for some statistics to get an idea on the amount of visitors that read my blog. I was surprised that nothing like that existed by default. Google is the owner of Blogspot and they are known for their detailed analytics of the internet and reliable site tracking, so I thought that it was a joke that Blogspot did not have a statistics module by default.

Google took the right path.

So why not include a statistics module by default? I’ll tell you why; the best sites in the world are defined by their functionality and ease of use. By not including a default statistics module and let the users choose what they like is very clever and gives your users an extremely flexible service.

I personally went with Google Analytics. There are several other services that tracks sites statistics. The choice is up to you.

Sunday, August 9, 2009

The accuracy and speed of function approximations – part 3

In part 2 we had a look at the accuracy of the approximated functions. In this part we will take a look at the speed of the same approximated functions.

There is no reason to use approximation unless you gain something from it. Usually the gain will be in the form of a speed improvement over the real function. Embedded devices with low memory and low frequency CPU can benefit greatly from approximation – even new computers can benefit from it. Lets test it out:


The timing depends a lot on the platform and settings. This is compiled under .Net 3.5 SP1 x64 in release mode. Timing is based on 1000000 iterations.

The results are clear – the quadratic curve equation is a lot faster than both the Taylor series and real sine function. The quadratic curve equation is also more accurate than the Taylor series.


If you are programming for embedded devices such as mobile phones, PDAs or barcode readers, you can gain a lot of function approximations. In this article series we tested the accuracy (true across all platforms – small but consistent variations can occur) and the speed on the PC platform. Using a quadratic curve approximation with added accuracy can give you 10 times better performance for a small sacrifice in accuracy.

You can get the code to the approximation library (written in C#) here:

Update: I managed to improve the performance of the LowSin() function. It now runs at 3 ms on my computer.

Saturday, August 8, 2009

The accuracy and speed of function approximations – part 2

In part 1 we took a look at two different function approximations; Polynomials and Taylor series. We found out that Taylor series are a lot more accurate than the quadratic equation – or is it?

The quadratic equation is fine in some cases. If you want to create a swinging hammer, a wave or the movement for a platform; this approximation is fine. But we can add some extra precision to the function and get it to be even more accurate. The new function looks like this:


The first function f defines the sine approximation and the second function g adds some extra precision. The result will look like this:


We are nearly unable to distinguish the two lines from each other. And remember from part 1, it is exact in 0, π/2 and π. So using this quadratic equation is really more accurate than Taylor series – but accuracy is not everything. We will take a look at the performance in part 3.

Update: Part 3 is here

The accuracy and speed of function approximations

The hard truth of function approximations is that they are not very accurate – that is why they are called approximations (duh). But just how accurate are the different approximations? Here I will take a look at the accuracy of the different sine function approximations.

First off, what exactly are we going to approximate? Let us take a look at the sine function in a graph:


The X axis goes from –π to π.

Let the approximation begin. We start out with a polynomial approximation. We simply use  a quadratic equation to approximate the curve of the sine function as best as we can. I came across a post where a guy used –π, -π/2, 0, π/2,π as points of reference and derived a function from that. He came up with:


Without getting too much into the details, this quadratic equation only approximates the curve of the interval [0 ; π]. Putting this equation into a graph with the real sine function gives us this:

Blue is our sine function and red is our quadratic equation. As you can see, the quadratic function is exact in 0, π/2 and π. Now let us have a look at the good old Taylor Series approximation of sine:

Blue is our sine function and red is our Taylor series. Pretty neat huh? It is impossible to tell the difference between the two functions from the interval [0;π/2], but then it starts to deviate. Suffice to say that  that we can see from our results that the Taylor series approximation is more accurate – or is it? Stay tuned for part 2.

Update: part 2 is here

That new code smell - With C# example

The High Performance Game Development article is coming along fine. I wrote up some C# code that follows with the article, most of the code is examples on transcendental function approximations.

Here is a pretty example on a sine function using Taylor Series:

public static float FastSin(float x)
    return x - ((x * x * x) / 6) + ((x * x * x * x * x) / 120)
        - ((x * x * x * x * x * x * x) / 5040);

Much more to come!

Who owns the article?

I've been writing an article called High Performance Game Development for quite some while now, and since it (in my opinion) contains a culmination of high quality knowledge about the complicated area of performance, I would like to make it available to the masses. I went looking for a place to post the article, but was surprised to see that most sites simply takes away your rights to the content of the article. Their disclaimer text often goes like this:

By submitting your article to xxxxxxxx, you are granting to us a permanent, non-exclusive license to use the article in any format we deem appropriate, with the assurance that you will always be given full credit for your work

I was also surprised to see that all sites want you to write the article, then upload it to them. You have no way of editing the article by yourself. You will have to contact the site owners to get your article updated. I can understand why they do this - it must be because of vandalism and inappropriate words and so on.
By why should I write an article, give it to them and essentially give them the rights to do whatever they want with the article?

No way...

That is why I decided to publish the article myself. I don't know where to host it yet, but hopefully it will get noticed and be passed around forums.

Friday, August 7, 2009

The state of TTS systems

I have this semi-blind friend that I used to work together with. He uses an application called Zoomtext for reading online news, install applications and so on. The only problem is that the program is poor in quality - always lagging around the screen, crashing and is really heavy to dance with. Being geek by nature I tried to find alternatives and stumbled upon some online TTS (Text-To-Speech) system with some really good quality voices. One of them was iSpeech. It is a webservice based application where you can upload your content for free. But free does not necessarily mean easy, here is what I think about it:

1. It is online and thus you can only past it some text and then get the speech results. There are no parsing of documents on the fly so you can read while you listen. This can be solved (sort of), read on...

2. Building an application that works together with the parsing engine of Zoomtext seems to be pretty easy. Problem is that each upload of text portion will contain an advertisement for the service - You can of course get around that paying a large amount of money, but paying $99 each month (for 600 minutes of audio) for only the voice of a TTS system is not really an option.

So how come handicapped people have to live without the high quality voice proviced by iSpeech (or odiogo) in a text parser engine like the one from Zoomtext? We are way behind on the field of assistance applications in my opinion and someone needs to put those two together.

Thursday, August 6, 2009

New blog

I’ve been thinking about getting a blog, and now it finally happened. Stay tuned…