3D camera, perspective, and head rotation

As awesome as this is (ha ha), our app is kind of boring and not very Cardboard-like. Specifically, it's stereoscopic (dual views) and has lens distortion, but it's not yet a 3D perspective view and it doesn't move with your head. We're going to fix this now.

Welcome to the matrix

We can't talk about developing for virtual reality without talking about matrix mathematics for 3D computer graphics.

What is a matrix? The answer is out there, Neo, and it's looking for you, and it will find you if you want it to. That's right, it's time to learn about the matrix. Everything will be different now. Your perspective is about to change.

We're building a three-dimensional scene. Each location in space is described by the X, Y, and Z coordinates. Objects in the scene may be constructed from X, Y, and Z vertices. An object can be transformed by moving, scaling, and/or rotating its vertices. This transformation can be represented mathematically with a matrix of 16 floating point values (four rows of four floats each). How it works mathematically is cool, but we won't get into it here.

Matrices can be combined by multiplying them together. For example, if you have a matrix that represents how much to resize an object (scale) and another matrix to reposition (translate), then you could make a third matrix, representing both the resizing and repositioning by multiplying the two together. You can't just use the primitive * operator though. Also, note that unlike a simple scalar multiplication, matrix multiplication is not commutative. In other words, we know that a * b = b * a. However, for matrices A and B, AB ≠ BA! The Matrix Android class library provides functions for doing matrix math. Here's an example:

// allocate the matrix arrays
float scale[] = new float[16];
float translate[] = new float[16];
float scaleAndTranslate[] = new float[16];

// initialize to Identity
Matrix.setIdentityM(scale, 0);
Matrix.setIdentityM(translate, 0);

// scale by 2, move by 5 in Z
Matrix.scaleM(scale, 0, 2.0, 2.0, 2.0);
Matrix.translateM(translate, 0, 0, 0.0, 0.0, 5.0);

// combine them with a matrix multiply
Matrix.multipyMM(scaleAndTranslate, 0, translate, 0, scale, 0);

Note that due to the way in which matrix multiplication works, multiplying a vector by the result matrix will have the same effect as first multiplying it by the scale matrix (right-hand side), and then multiplying it by the translate matrix (left-hand side). This is the opposite of what you might expect.

Note

The documentation of the Matrix API can be found at http://developer.android.com/reference/android/opengl/Matrix.html.

This matrix stuff will be used a lot. Something that is worth mentioning here is precision loss. You might get a "drift" from the actual values if you repeatedly scale and translate that combined matrix because floating point calculations lose information due to rounding. It's not just a problem for computer graphics but also for banks and Bitcoin mining! (Remember the movie Office Space?)

One fundamental use of this matrix math, which we need immediately, is to transform a scene into a screen image (projection) as viewed from the user's perspective.

In a Cardboard VR app, to render the scene from a particular viewpoint, we think of a camera that is looking in a specific direction. The camera has X, Y, and Z positions like any other object and is rotated to its view direction. In VR, when you turn your head, the Cardboard SDK reads the motion sensors in your phone, determines the current head pose (the view direction and angles), and gives your app the corresponding transformation matrix.

In fact, in VR for each frame, we render two slightly different perspective views: one for each eye, offset by the actual distance between one's eyes (the interpupillary distance).

Also, in VR, we want to render the scene using a perspective projection (versus isometric) so that objects closer to you appear larger than the ones further away. This can be represented with a 4 x 4 matrix as well.

We can combine each of these transformations by multiplying them together to get a modelViewProjection matrix:

modelViewProjection = modelTransform X camera  X  eyeView  X  perspectiveProjection

A complete modelViewProjection (MVP) transformation matrix is a combination of any model transforms (for example, scaling or positioning the model in the scene) with the camera eye view and perspective projection.

When OpenGL goes to draw an object, the vertex shader can use this modelViewProjection matrix to render the geometry. The whole scene gets drawn from the user's viewpoint, in the direction his head is pointing, with a perspective projection for each eye to appear stereoscopically through your Cardboard viewer. VR MVP FTW!

The MVP vertex shader

The super simple vertex shader that we wrote earlier doesn't transform each vertex; it just passed it through the next step in the pipeline. Now, we want it to be 3D-aware and use our modelViewProjection (MVP) transformation matrix. Create a shader to handle it.

In the hierarchy view, right-click on the app/res/raw folder, go to New | File, enter the name, mvp_vertex.shader, and click on OK. Write the following code:

uniform mat4 u_MVP;
attribute vec4 a_Position;
void main() {
   gl_Position = u_MVP * a_Position;
}

This shader is almost the same as simple_vertex but transforms each vertex by the u_MVP matrix. (Note that while multiplying matrices and vectors with * does not work in Java, it does work in the shader code!)

Replace the shader resource in the compleShaders function to use R.raw.mvp_vertex instead:

simpleVertexShader = loadShader(GLES20.GL_VERTEX_SHADER, R.raw.mvp_vertex)

Setting up the perspective viewing matrices

To add the camera and view to our scene, we define a few variables. In the MainActivity.java file, add the following code to the beginning of the MainActivity class:

// Viewing variables
private static final float Z_NEAR = 0.1f;
private static final float Z_FAR = 100.0f;
private static final float CAMERA_Z = 0.01f;

private float[] camera;
private float[] view;
private float[] modelViewProjection;

// Rendering variables
private int triMVPMatrixParam;

The Z_NEAR and Z_FAR constants define the depth planes used later to calculate the perspective projection for the camera eye. CAMERA_Z will be the position of the camera (for example, at X=0.0, Y=0.0, and Z=0.01).

The triMVPMatrixParam variable will be used to set the model transformation matrix in our improved shader.

The camera, view, and modelViewProjection matrices will be 4 x 4 matrices (an array of 16 floats) used for perspective calculations.

In onCreate, we initialize the camera, view, and modelViewProjection matrices:

    protected void onCreate(Bundle savedInstanceState) {
        //...

        camera = new float[16];
        view = new float[16];
        modelViewProjection = new float[16];
    }

In prepareRenderingTriangle, we initialize the triMVPMatrixParam variable:

// get handle to shape's transformation matrix
triMVPMatrixParam = GLES20.glGetUniformLocation(triProgram, "u_MVP");

Tip

The default camera in OpenGL is at the origin (0,0,0) and looks down at the negative Z axis. In other words, objects in the scene are facing toward the positive Z axis at the camera. To place them in front of the camera, give them a position with some negative Z value.

There is a longstanding (and pointless) debate in the 3D graphics world about which axis is up. We can somehow all agree that the X axis goes left and right, but does the Y axis go up and down, or is it Z? Plenty of software picks Z as the up-and-down direction, and defines Y as pointing in and out of the screen. On the other hand, the Cardboard SDK, Unity, Maya, and many others choose the reverse. If you think of the coordinate plane as drawn on graph paper, it all depends on where you put the paper. If you think of the graph as you look down from above, or draw it on a whiteboard, then Y is the vertical axis. If the graph is sitting on the table in front of you, then the missing Z axis is vertical, pointing up and down. In any case, the Cardboard SDK, and therefore the projects in this book, treat Z as the forward and backward axis.

Render in perspective

With things set up, we can now handle redrawing the screen for each frame.

First, set the camera position. It can be defined once, like in onCreate. But, often in a VR application, the camera position in the scene can change, so we'll reset it for each frame.

The first thing to do is reset the camera matrix at the start of a new frame to a generic front-facing direction. Define the onNewFrame method, as follows:

    @Override
    public void onNewFrame(HeadTransform headTransform) {
        // Build the camera matrix and apply it to the ModelView.
        Matrix.setLookAtM(camera, 0, 0.0f, 0.0f, CAMERA_Z, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f);
    }

Tip

Note, as you write Matrix, Android Studio will want to auto-import the package. Ensure that the import you choose is android.opengl.Matrix, and not some other matrix library, such as android.graphic.Matrix.

Now, when it's time to draw the scene from the viewpoint of each eye, we calculate the perspective view matrix. Modify onDrawEye as follows:

    public void onDrawEye(Eye eye) {
        GLES20.glEnable(GLES20.GL_DEPTH_TEST);
        GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT | GLES20.GL_DEPTH_BUFFER_BIT);

        // Apply the eye transformation to the camera
        Matrix.multiplyMM(view, 0, eye.getEyeView(), 0, camera, 0);

        // Get the perspective transformation
        float[] perspective = eye.getPerspective(Z_NEAR, Z_FAR);

        // Apply perspective transformation to the view, and draw
        Matrix.multiplyMM(modelViewProjection, 0, perspective, 0, view, 0);

        drawTriangle();
    }

The first two lines that we added reset the OpenGL depth buffer. When 3D scenes are rendered, in addition to the color of each pixel, OpenGL keeps track of the distance the object occupying that pixel is from the eye. If the same pixel is rendered for another object, the depth buffer will know whether it should be visible (closer) or ignored (further away). (Or, perhaps the colors get combined in some way, for example, transparency). We clear the buffer before rendering any geometry for each eye. The color buffer, which is the one you actually see on screen, is also cleared. Otherwise, in this case, you would end up filling the entire screen with a solid color.

Now, let's move on to the viewing transformations. onDrawEye receives the current Eye object, which describes the stereoscopic rendering details of the eye. In particular, the eye.getEyeView() method returns a transformation matrix that includes head tracking rotation, position shift, and interpupillary distance shift. In other words, where the eye is located in the scene and what direction it's looking. Though Cardboard does not offer positional tracking, the positions of the eyes do change in order to simulate a virtual head. Your eyes don't rotate on a central axis, but rather your head pivots around your neck, which is a certain distance from the eyes. As a result, when the Cardboard SDK detects a change in orientation, the two virtual cameras move around the scene as though they were actual eyes in an actual head.

We need a transformation that represents the perspective view of the camera at this eye's position. As mentioned earlier, this is calculated as follows:

modelViewProjection = modelTransform  X  camera  X  eyeView  X  perspectiveProjection

We multiply the camera by the eye view transform (getEyeView), then multiply the result by the perspective projection transform (getPerspective). Presently, we do not transform the triangle model itself and leave the modelTransform matrix out.

The result (modelViewProjection) is passed to OpenGL to be used by the shaders in the rendering pipeline (via glUniformMatrix4fv). Then, we draw our stuff (via glDrawArrays as written earlier).

Now, we need to pass the view matrix to the shader program. In the drawTriangle method, add it as follows:

    private void drawTriangle() {
        // Add program to OpenGL ES environment
        GLES20.glUseProgram(triProgram);

        // Pass the MVP transformation to the shader
        GLES20.glUniformMatrix4fv(triMVPMatrixParam, 1, false, modelViewProjection, 0);

        // . . .

Building and running

Let's build and run it. Go to Run | Run 'app', or simply use the green triangle Run icon on the toolbar. Now, moving the phone will change the display synchronized with your view direction. Insert the phone in a Google Cardboard viewer and it's like VR (kinda sorta).

Note that if your phone is lying flat on the table when the app starts, the camera in our scene will be facing straight down rather than forward at our triangle. What's worse, when you pick up the phone, the neutral direction may not be facing straight in front of you. So, each time you run apps in this book, pick up the phone first, so you look forward in VR, or keep the phone propped up in position (personally, I use a Gekkopod, which is available at http://gekkopod.com/).

Also, in general, make sure that your phone is not set to Lock Portrait in the Settings dialog box.