The Mathematics Involved in 3D Game Programming - Matrices and Cartesian Vs. Homogeneous Coordinates
This blog article is a continuation on the mathematics involved in 3D programming. The last blog in this series was The Mathematics Involved in 3D Programming - Linear Algebra
I recently began studying more in depth on how matrices are involved in 3D programming. One particularly interesting aspect of this is the use of matrices for perspective (and other situations) in 3D graphics.
Cartesian (Euclidean) Coordinates Vs. Homogeneous Coordinates
First of all, right up front, I should get three definitions out of the way.
1. Cartesian Coordinates - Originating some four centuries ago by Rene Descartes, it is the usual coordinate system we use for plotting points, vectors and other lines or shapes on x, y, and z axes. See the image below (from the Math Is Fun website):
2. Euclidean (Space) - I'll let Wikipedia describe this one: "In geometry, Euclidean space encompasses the two-dimensional Euclidean plane, the three-dimensional space of Euclidean geometry, and similar spaces of higher dimension."
And here is an image from the same Wikipedia page:
Basically, though, Cartesian is synonymous with Euclidean, for the purposes of this blog article.
3. Homogeneous Coordinates - Again, I will let Wikipedia give the explanation: "In mathematics, homogeneous coordinates or projective coordinates, introduced by August Ferdinand Möbius in his 1827 work Der barycentrischeCalcül, are a system of coordinates used in projective geometry, as Cartesian coordinates are used in Euclidean geometry. They have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates. Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts. Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix."
Easier Explanation of All of the Above
In regards to perspective in 3D programming, here is what is going on with all of this.
Take a train track for example. When you stand within a train track (safely, of course - no trains coming anytime soon), and look way down yonder into the distance, it appears as though both iron tracks have become one, a vanishing point, from the point of view of your perspective vision.
But of course, this is not the case. Two parallel tracks, or lines for that matter, will never actually meet in real life. Well, they're not supposed to, anyway, especially when concerning two parallel lines on the x,y coordinate system, such as this example from Paul Dawkin's book, Linear Algebra:
In y = mx + b form, the two lines above are y = 1/4x + 3/4 and y = 1/4x - 5/2.
But, the situation at hand here is how will distant parallel lines be represented in a 3D model as a vanishing point?
For 3D programming models, points in space are normally represented by three values, the x, y and z values. But to represent a point that is far away, near the vanishing point of perspective, a fourth value is used.
From this blog article, Programmer's guide to homogeneous coordinates there is this image:
The above image is showing the relationship between Cartesian coordinates and homogeneous coordinates in 3D programming.
To convert from Cartesian coordinates to homogeneous coordinates, an extra number is added to the coordinates, typically represented by w, and this number is usually 1 for nearby objects.
The conversion from homogeneous coordinates to Cartesian coordinates is where an "ah-ha" moment takes place. For a nearby object, dividing by 1 gives the same value again: 4/1 = 4, 2/1 = 2 and 5/1 = 5.
However, as can be noticed from the above sets of numbers, the smaller the value of w in the homogeneous set of numbers, the greater is the Cartesian set of numbers: 4/0.1 = 40, 2/0.1 = 20, and 5/0.1 = 50.
As can be seen, the value of w for the homogeneous set of numbers is getting smaller and smaller as the corresponding Cartesian numbers get larger, i.e. farther away from the origin on the Cartesian coordinate system. It is important to point out that the point itself may actually appear closer, nearer to the viewing screen, depending on if the x, y, and z numbers are positive or negative. This situation is better explained at this blog post, Explaining Homogeneous Coordinates & Projective Geometry
Including the Perspective Use, 3 Main Ways Homogeneous Coordinates are Used
The three sections below are drawn from the blog article above, Explaining Homogeneous Coordinate & Projective Geometry. To better understand each of the sections below, I recommend reading the similarly titled sections in this blog article.
Translation Matrices for 3D Coordinates
Rotation and scaling transformations in 3D programming require three columns, for x, y, and z coordinates respectively. But in order to do translations, the matrices need to have at least four columns. To fix this issue, a four-column matrix is used, with a w value added, though in this case, the w value is usually set to 1, so that it doesn't interfere with the x, y, and z coordinates, since they are not supposed to change in this case.
This is basically what was discussed above, with the Cartesian points (4, 2, 5) and their corresponding homogeneous points, (4, 2, 5, 1) and the variations of w either as 1 to 0.001.
Here is an image from the article showing a perspective projection matrix applied to a homogeneous coordinate, (2, 3, 4, 1):
The final 4x1 matrix has w = 4 because of the specific needs of the particular 3D perspective. The blog article explains this in more detail.
Positioning Directional Lights
Directional lighting means the light that shines upon a 3D object is from a source "infinitely" far away, like a star, or even the sun. The value of w in this case is 0, which implies infinity, since a number divided by zero is considered undefined or similar to infinity, as this image (from the same blog article mentioned above) will show:
So, with 3D positional lighting, the guiding rule is if w = 1, then the point is a nearby light source. If w = 0, then the light source is from directional lighting.
Another good article to read about Cartesian and homogeneous coordinates - Homogeneous Coordinates.
Thank you for reading!