# Posts Tagged ‘perspective’

## Perspective projection: part 3

Posted by Andy on May 30, 2009

In part one and part two, we derived formulas for projecting a 3d point onto the view plane, with values mapped into clip space, and depth information correctly preserved.  Now we’re ready to take everything we know about vector/matrix multiplication and homogeneous coordinates, and compose the perspective projection matrix.

Let’s begin by recapping those formulas.  zNear is the z value of the near plane (also called d in previous posts).  zFar is the z value of the far plane.  nearWidth and nearheight are the width and height of the near plane:

$clip\_x = x \times \dfrac{2 \times zNear}{z \times nearWidth} \\ \\ \\ clip\_y = y \times \dfrac{2 \times zNear}{z \times nearHeight} \\ \\ \\ clip\_z = (z \times \dfrac{zFar}{zFar-zNear} + \dfrac{zNear \times zFar}{zNear-zFar}) \div z$

Notice that each one of these involves a division by z.  Actually, the calculation of clip_z doesn’t really need a division by z, but as explained in the previous post, we have to use homogeneous coordinates to accomplish the division by z for x and y, and it’s an all or nothing deal.  It’s amazing to me that all of this ends up working out.  Thank goodness for mathematicians!

Since that division by z will not (and cannot) be carried out by the matrix multiplication itself, we can take it right out of all three equations.  We just need to make sure the matrix puts the value of z into the resulting w coordinate.  This leaves us with calculations that can be carried out by a single 4×4 matrix.  All that’s left is to figure out where in the matrix to place the different parts.

Remember that each component of the result vector is determined by a single row of the matrix:

Row 1 determines the resulting x coordinate.

Row 2 determines the resulting y coordinate.

Row 3 determines the resulting z coordinate.

Row 4 determines the resulting w coordinate.

Let’s proceed one component at a time.  We want the source x to be multiplied by $\dfrac{2 \times zNear}{nearWidth}$.  The column in row 1 of the matrix that gets multiplied by x is the first one, entry (1,1).  That’s all there is to it.  We put $\dfrac{2 \times zNear}{nearWidth}$ into (1,1) and zeros in all the other columns.  So the first row of the matrix is:

$\begin{array}{cccc} \frac{2 \times zNear}{nearWidth} & 0 & 0 & 0 \end{array}$

Calculating y is almost the same, except it’s the second column that gets multiplied by the source y.  So the second row is:

$\begin{array}{cccc} 0 & \frac{2 \times zNear}{nearHeight} & 0 & 0 \end{array}$

For z, remember that we need to add two terms together.  One is multiplied by the source z, and the other is just a constant.  So the part to be multiplied by z goes in the third column, and we conveniently hijack the w column to host the constant:

$\begin{array}{cccc} 0 & 0 & \frac{zFar}{zFar-zNear} & \frac{zNear \times zFar}{zNear-zFar} \end{array}$

That leaves w, to which we simply assign the value of the source z by placing a one in the third column:

$\begin{array}{cccc} 0 & 0 & 1 & 0 \end{array}$

And we’re done!  The final perspective projection matrix is:

$\left[\begin{array}{cccc} \dfrac{2 \times zNear}{nearWidth} & 0 & 0 & 0 \\ 0 & \dfrac{2 \times zNear}{nearHeight} & 0 & 0 \\ 0 & 0 & \dfrac{zFar}{zFar-zNear} & \dfrac{zNear \times zFar}{zNear-zFar} \\ 0 & 0 & 1 & 0 \end{array}\right]$

Yeah, I just figured out that WordPress supports $\LaTeX$.  Awesome.

This concludes the perspective projection series.  You should now be able to look at that matrix and understand exactly what it does to a 3D point.

## Perspective projection: part 2 – Z is special

Posted by Andy on May 30, 2009

In part one, we found the formula that will project the x and y components of a 3D point onto the view plane, and map the resulting values into the range [-1, 1].  Z is different, and a bit tricky.

The reason it is different is intrinsic in what projection is, by definition.  Projection means dropping a dimension.  In 3D graphics, we’re dropping the z dimension, so that we can render on a display that only has x and y dimensions.  In perspective projection, x and y are projected as a function of z – they’re absorbing some of z‘s information into themselves that preserves some depth information, manifested as far-away things looking smaller.

The reason it is tricky has to do with the way we’re going to use homogeneous coordinates to accomplish the division by z required by the projection formulas we already have.  To explain, I need to skip ahead a little to the matrix multiplication.  Recall from the post on vector/matrix multiplication that while x, y, z, and w from the source vector can each be multiplied by a constant in the matrix, they can only combine with each other additively.  But we need to divide x and y by z – that’s multiplicative, and that’s a problem.

The solution is apparently due to Möbius, who you might remember from such hits as the Möbius strip.  Recall that homogeneous coordinates give us a way to embed a scaling factor into a vector.  If the w coordinate is not one, the real 3D coordinates are computed by dividing the whole vector by w.  Aha!  If we can somehow get the value of z into the w component (and we can), then x and y will effectively be divided by z!  There’s just one problem – with that value in w, z itself will get divided by z too!  We need to correct for this, because we are in fact going to need accurate depth information in the projected coordinates, for things like perspective-correct texture mapping and shadow mapping.

We now have some constraints for how this projection of the z coordinate needs to work.  We know that it must map the range between the near and far planes into the range [0, 1].  And we know that the result must somehow have z itself already factored in, so that subsequent division by z ends up giving the correct answer.  I know that last bit is weird.  Just remember that we’re forced into it by the way homogeneous coordinates facilitate the divide-by-z needed by the x/y projection.

Time to break out the pencil and paper again, and see if we have enough information here to come up with a formula.  Remember that the computation of z will be determined exclusively by the third row of the projection matrix.  And it will consist of adding together bits of x, y, z, and w from the source vector.  Well, right away we can rule out any contribution from x and y, because they have nothing to do with mapping z into a range.  Really, the only input we need to accomplish that is z itself.  So let’s start by seeing if there’s any constant that we can put in the (3,3) position in the matrix that will accomplish the desired mapping when multiplied by z.

We want the result to have z factored into it (to correct for the z-division problem) and we want to reach the result by multiplying z by some constant.  Let’s call that hoped-for constant A:

$clip\_z \times z = zA$

Great.  What else do we know?  We know that when we plug in the z coordinate of the near plane, ultimately Clip_z must equal zero.  And when we plug in the z coordinate of the far plane, Clip_z must equal 1.  Simple substitution into the equation above gives us:

$0 \times zNear = zNear \times A \\ \\ 1 \times zFar = zFar \times A$

And simplifying gives:

$A = 0 \\ \\ A = 1$

Well shucks.  No solution.  This means there’s no single constant A that we can multiply z by to get the result we’re after.  Fortunately, there’s one more option: we might be able to use the last term in that matrix row.  That’s the one that will get multiplied by w and added to the result.  Since all of our input vectors are going to contain normalized homogeneous coordinates, w is always going to be one, which is innocuous.  This will let us pass through one more constant (let’s call it B), to be added to the result.  Which means we can also consider a formula with the form:

$clip\_z \times z = zA + B$

Let’s try plugging in our known values just like before:

$0 \times zNear = zNear \times A + B \\ \\ 1 \times zFar = zFar \times A + B$

This time it turns out we have a solvable system.  Solving for A and B gives:

$A = \dfrac{zFar}{zFar - zNear}$

$B = \dfrac{zNear \times zFar}{zNear - zFar}$

Plugging these back into the original formula and simplifying give us:

$clip\_z = (z \times \dfrac{zFar}{zFar - zNear} + \dfrac{zNear \times zFar}{zNear - zFar}) \div z$

Whew!  We’re done with z.  In the next post, we’ll put it all together into the projection matrix.

## Perspective projection: part 1

Posted by Andy on May 29, 2009

Armed with a clearer view of vector/matrix multiplication and homogeneous coordinates, it’s time to dissect the perspective projection.

Stated simply, the point of perspective projection is to make things that are further away look smaller.  Since the magnitude of z gets bigger as a point gets further from the viewer, it follows that the magnitude of other things should basically vary inversely with z.  However, as I quickly discovered when I wrote my first 3D program, merely dividing everything else by z is not quite enough.

In order to understand the mathematics required to do perspective projection properly, let’s back up and talk about the view frustum.  I’ll use 2D illustrations, but the same math extends pretty easily to 3D.  The view frustum:

The idea is to pretend that the computer display is a window looking “out” onto the 3D world.  Mathematically, it’s exactly like looking out a real window.  If you look out your window at a fixed point in the outside world (say, a mailbox across the street), you could imagine drawing a straight light from your eye to that point.  The point where that line intersects with the window is exactly where that mailbox will appear to you to be “on the window”.  If you kept your head perfectly still, and painted exactly what you saw onto the surface of the window, it would look real.  And that’s pretty much what we do when we render a 3D scene on a computer.

The diagram is in eye space – meaning the eye is at the origin. The near plane corresponds to the display – the window on the virtual world, which is a certain distance d from the eye.  The far plane is the boundary that is as far as the eye can see.  It’s necessary to have this limit because we need a unique number to represent every possible distance from the eye to any point in the view space, and computers don’t have infinite numeric precision.  We can still put the far plane as far away as is practical, though.  So, the view frustum neatly and precisely bounds everything the eye can see through the window.

In the 2D diagram, the point that we need to project is located at y.  The goal is to figure out y’, the location on the near plane where the viewer will seem to see that point.  One way to think about the math is in terms of similar triangles.  Since the beige triangles are similar (they have the same angles), we know that the proportion of y’ to d is the same as the proportion of y to z:

$\dfrac{y^{\prime}}{d} = \dfrac{y}{z}$

Multiplying both sides by d, we get:

$y^{\prime} = y \times \dfrac{d}{z}$

There!  Problem solved.  Since y’ (and x’, in 3d) is what we were after, we have our projection formula. From here on, we’ll refer to x’ and y’ as proj_x and proj_y.

Oh, but wait.  There is one other thing.  In real 3D systems, we generally need to map these coordinates into something called clip space, which is generally something like a unit cube (openGL) or the half of a unit cube with positive z (Direct3D).  For this, there is yet another way to look at the perspective projection, which will make the problem simpler:

See that?  If we can just warp the whole view frustum into a rectangle, then all the perspective lines become parallel, and the y coordinate of the point out in space is now the same as the projected one!  Also, if we take this just a little further and squash the rectangle into a cube, we’ll have our coordinates in clip space.  So how is this warping of space accomplished?  Hmm.  Look at the result in this diagram – the warping turns y into y’, just like before, therefore we already have the formula that does this for x and y as a function of z:

$proj\_x = x \times \dfrac{d}{z} \\ \\ \\ proj\_y = y \times \dfrac{d}{z}$

That effectively makes the frustum rectangular, and all the perspective lines parallel.  So much for the warp – on to the squash.  Let’s start by thinking about extremities.  If we can find formulas that work for the extremities, they’ll work for everything in between, too.  To complete the transformation into a half-unit cube (I’m using Direct3D’s convention here), we need to map the following:

minimum and maximum x values at both near and far planes → -1 and +1

minimum and maximum y values at both near and far planes → -1 and +1

minimum and maximum z values → 0 and +1   (for openGL, we’d simply map to -1, +1 here)

The x and y cases are really similar.  The z case is a bit special, firstly because the view frustum is not symmetric about the z axis, and secondly because we eventually need to divide everything by z, which means we’re even going to divide the transformed z by z.  I don’t expect that last bit to make sense just yet – it has to do with the matrix multiplication and homogeneous coordinates, which we’ll get to eventually.

I won’t beat around the bush anymore on these formulas.  Let’s just get out a pencil and paper and work it out.  For x and y, all we need to do is divide by half the width (or height) of the near plane.  For example, suppose the near plane ranges from -100 to +100.  We want a point at 100 to map to one.  It’s easy to work out that dividing by half the width does the trick.  A little algebra gives us:

$clip\_x = \dfrac{2 \times proj\_x}{nearWidth}$

Since we’re eventually going to compose a matrix to do all of these calculations in one go, it’s important to concatenate them all together first.  Let’s combine the projection (warp) formula with the mapping (squash) formula:

$proj\_x = x \times \dfrac{d}{z}$

and

$clip\_x = \dfrac{2 \times proj\_x}{nearWidth}$

From here on, we’re going to call d zNear, indicating that it’s the distance to the near plane along the z axis.  A little more algebra, substituting $x \times \frac{zNear}{z}$ into the second equation, gives:

$clip\_x = x \times \dfrac{2 \times zNear}{z \times nearWidth}$

And for y:

$clip\_y = y \times \dfrac{2 \times zNear}{z \times nearHeight}$

I mentioned that the case for z is special.  So special that I just finished working out the derivation for the first time just now.  I’ll show it in the next post.  Then we’ll have all the pieces we need to put the matrix together.

Posted in Fundamentals, Projections, Transformations | Tagged: , , | 1 Comment »

## Homogeneous coordinates

Posted by Andy on May 29, 2009

This continues the series on perspective projection, but again, the subject of homogeneous coordinates is extemely useful on its own.  Homogeneous coordinates are sort of notorious for being non-intuitive, or maybe it’s just hard to see what the point of them is.  I’ve always sort-of kind-of understood basically what the rules were, but to be honest, I don’t think I ever really had that aha! moment until just a few hours ago.

I’m going to assume that you understand how plain old coordinates work.  You’ve got x, y, and z values that describe a position in a cartesian coordinate system.  Usually this is represented by a 3-dimensional vector.  Simple enough.

To get so-called homogeneous coordinates, you add a fourth component, conventionally called w.  So now you have x, y, z, and w, but they still represent a point in 3 dimensions, not 4.  Then what’s the w for?  Isn’t it superfluous?  Before we answer that, let’s look at a few examples.

Consider this 3d point:

<10, 10, 10>

The simplest way to make these into homogeneous coordinates is to just add w=1:

<10, 10, 10, 1>

There, now they’re homogeneous.  Big deal!  Well, what happens if we change w?

<10, 10, 10, 5>

Uh oh.  They’re still homogeneous coordinates, but now they’re misleading.  Here’s the thing:

<10, 10, 10, 5> != <10, 10, 10>.  They are not the same point in space!  In fact:

<10, 10, 10, 5> = <2, 2, 2>!

The rule is, <x, y, z, w> is the same point as <x, y, z> only when w = 1.  If w != 1, you can normalize the whole thing by dividing every component by w (obviously, doing this will give you w = 1).

Now we can sort of start to see one part of what w is good for.  It gives us a way to package a scaling factor into a vector.  Notice that because of the rules, this scaling factor is upside down – the rest of the vector scales inversely with w.  When w gets bigger, the rest of the vector gets smaller, and when w gets smaller, the rest of the vector gets bigger.

Ok, so we can pack an inverse scaling factor in with our coordinates.  Why is that useful?  To understand that, you have to look at how these 4-component vectors interact with the 4×4 transformation matrices used everywhere in computer graphics.  I’ll get into that in the next post, where I’ll examine the perspective projection matrix.

## Vector – matrix multiplication: What matrices do to vectors

Posted by Andy on May 29, 2009

This is the first post in a series on perspective projection, but it’s useful on its own, too.

In order to understand how a projection matrix works, first you have to understand what a matrix actually does to a vector when you multiply them.  It can seem a little mysterious if you don’t look too closely, but once you look, it’s almost intuitive.

First, a refresher on dot products.  Remember that you get the dot product of two vectors by adding the component-wise products, like so:

You can look at vector/matrix multiplication as taking the dot products of the vector with each row of the matrix.  Each dot product becomes a component of the result vector.  Here’s how I like to think about this:  All of the components of the original vector can potentially contribute to each component of the result vector.  Each row of the matrix determines the contributions for one component of the result vector.  So, in the result vector, x is determined by row 1 in the matrix, y is determined by row 2, and so on.  Let’s see how this works for the simplest case: the identity matrix.  After this, it should make sense why the identity matrix has ones on a diagonal.

Each component of the original vector can contribute additively to any component in the result vector, and the weightings of the contributions are determined by the corresponding row in the matrix.  So, P’x is determined by row 1.  You can see that in the identity matrix, row 1 has a one in the x position, and zeros in all other positions.  So Px from the original vector is preserved intact by the one in that row, while Py, Pz, and Pw are suppressed by the zeros.  In the second row, there is a one in the y position, and the rest are zeros.  Since row 2 determines the contribution weightings for P’y, the contribution from Py is preserved, while Px, Pz, and Pw get zeroed.

Hopefully that makes sense after going over it once or twice.  The trick is to see each row in the matrix as selecting the way that the original vector’s components combine to produce each component of the result.  The tools used to do this are multiplication (to scale or cancel a source component) and addition (to combine those components into a single result component).

Of course, most matrices are more interesting than the identity matrix.  Try applying this to some of the transformation matrices.  Notice how the rows of a scaling matrix only affect one vector component at a time, just like the identity matrix, since x scaled is always a function of just x.  In contrast, notice how the rows of the rotation matrices affect multiple components at once, since x rotated can be a function of both x and y, or x and z, depending on the axis of rotation.  Once you absorb this, you’ll be able to see intuitively what a given matrix is doing to a vector, and make up your own matrices to do interesting things to vectors.