Hardware Instancing is an interesting topic that has quite a few uses. The idea of instancing is to use the hardware to generate transformed vertices to reduce draw call overhead. Now I know that sounded like a bunch of senseless technical jargon, and it was. In plain English, sometimes it is faster to let the GPU do all the work for us so the CPU can go off and do other work; and this is what Hardware Instancing does. This article covers how to setup a simple program that uses instancing to render a few rotating squares.
The way hardware instancing works is by sending two vertex streams through the GPU. One of the streams holds the initial set of vertices, in our case four vertices. The second stream passes along the instancing data, in our case it will be an array of Vector4’s to hold position data. To the shader, these two streams will look as one. The GPU will read the first stream a number of times, changing the data based on the second stream. Our shader is a very simple transform+texture. The part that does the instancing is where we simply add the instance data to the position.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | // Copyright (c) Microsoft Corporation. All rights reserved. float4x4 ViewProjection; // View * Projection matrix texture Tex0; sampler Tex0Sampler= sampler_state { Texture = <Tex0>; MinFilter = Linear; MagFilter = Linear; MipFilter = Linear; }; //application to vertex structure struct a2v { float4 position : POSITION0; float2 tex0 : TEXCOORD0; float4 instanceData : COLOR0; }; //vertex to pixel shader structure struct v2p { float4 position : POSITION0; float2 tex0 : TEXCOORD0; }; //pixel shader to screen struct p2f { float4 color : COLOR0; }; void ps( in v2p IN, out p2f OUT) { OUT.color = tex2D(Tex0Sampler, IN.tex0); }; void vs( in a2v IN, out v2p OUT ) { float4 data = IN.instanceData; float4 instancePos = mul(IN.position, ViewProjection); instancePos.x += data.x; instancePos.y += data.y; instancePos.z += data.z; OUT.position = instancePos; OUT.tex0 = IN.tex0; }; technique HardwareInstancing { pass P0 { VertexShader = compile vs_2_0 vs(); PixelShader = compile ps_2_0 ps(); } } |
Now that we have a shader for the application, we can go ahead and create the code to actually instance some data. We will need to setup a few variables. There is nothing too unusual yet except for the addition of a second vertex buffer to hold the instance data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | // Gives the squares some color private Texture2D texture; // Provides the instancing effect private Effect instancingShader; // Allows us to rotate the squares private float rotation = 0.0f; private Matrix rotationMatrix; // Holds the vertex of one square private VertexBuffer vertexBuffer; // Holds the instance data private VertexBuffer instanceBuffer; // Holds indices for rendering a square private IndexBuffer indexBuffer; // A valid vertex declaration so the device knows what to do private VertexDeclaration vertexDeclaration; // The instance data itself private Vector4[] instanceData; // The vertices for a square private VertexPositionTexture[] vertexData; // The indices private short[] indexData; |
We also need to declare our own array of VertexElements so that the GPU knows exactly what to do with the data. Note that the first parameter tells the stream and the last element is a Vector4 on the second stream. This is where the instancing data will go.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | // Tells the GPU how to format the streams private VertexElement[] Elements = new VertexElement[] { new VertexElement(0, 0, VertexElementFormat.Vector3, VertexElementMethod.Default, VertexElementUsage.Position, 0), new VertexElement(0, sizeof(float)*3, VertexElementFormat.Vector2, VertexElementMethod.Default, VertexElementUsage.TextureCoordinate, 0), new VertexElement(1, 0, VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.Color, 0), }; |
The next thing to do is to start creating the data. Below we load the shader, the texture and the vertices of the square.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | instancingShader = content.Load<Effect> ( @"Effects\Instancing" ); texture = content.Load<Texture2D> ( @"Textures\Texture" ); vertexData = new VertexPositionTexture[4]; vertexData[0] = new VertexPositionTexture (); vertexData[0].Position = new Vector3 ( -0.25f, 0.25f, 0 ); vertexData[0].TextureCoordinate = new Vector2 ( 0, 0 ); vertexData[1] = new VertexPositionTexture (); vertexData[1].Position = new Vector3 ( 0.25f, 0.25f, 0 ); vertexData[1].TextureCoordinate = new Vector2 ( 1, 0 ); vertexData[2] = new VertexPositionTexture (); vertexData[2].Position = new Vector3 ( 0.25f, -0.25f, 0 ); vertexData[2].TextureCoordinate = new Vector2 ( 1, 1 ); vertexData[3] = new VertexPositionTexture (); vertexData[3].Position = new Vector3 ( -0.25f, -0.25f, 0 ); vertexData[3].TextureCoordinate = new Vector2 ( 0, 1 ); |
After that is done, the Vertex and Index buffers can be created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | // Create the index array indexData = new short[6] { 0, 1, 2, 2, 3, 0 }; // Create the vertex buffer vertexBuffer = new VertexBuffer ( graphics.GraphicsDevice, vertexData.Length * VertexPositionTexture.SizeInBytes, ResourceUsage.None ); vertexBuffer.SetData<VertexPositionTexture> ( vertexData ); // Create the index buffer indexBuffer = new IndexBuffer ( graphics.GraphicsDevice, typeof ( short ), indexData.Length, ResourceUsage.None, ResourceManagementMode.Automatic ); indexBuffer.SetData<short> ( indexData ); |
And then the instance data itself. Note that these are simple translation values that will be applied to each of the four original vertices.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | // Create the instance data with some new positions instanceData = new Vector4[] { new Vector4 ( -0.75f, 0, 0, 0 ) , new Vector4 ( -0.5f, 0, 0, 0 ) , new Vector4 ( -0.25f, 0, 0, 0 ) , new Vector4 ( 0, 0, 0, 0 ), new Vector4 ( 0.25f, 0, 0, 0 ) , new Vector4 ( 0.5f, 0, 0, 0 ) , new Vector4 ( 0.75f, 0, 0, 0 ) , new Vector4 ( 1f, 0, 0, 0 ) , new Vector4 ( 1.25f, 0, 0, 0 ) , new Vector4 ( 1.5f, 0, 0, 0 ) , new Vector4 ( 1.75f, 0, 0, 0 ) }; // Create the instance buffer itself instanceBuffer = new VertexBuffer ( graphics.GraphicsDevice, typeof ( Vector4 ), instanceData.Length, ResourceUsage.None, ResourceManagementMode.Automatic ); instanceBuffer.SetData<Vector4> ( instanceData ); // Create the rotation matrix rotationMatrix = Matrix.CreateRotationZ ( rotation ); |
Last of the initialization code but certainly not the least is the vertex declaration. This is a simple instantiation that uses our own set of vertex elements.
1 | vertexDeclaration = new VertexDeclaration ( graphics.GraphicsDevice, Elements ); |
Finally we are ready to do some drawing! My friends over at MDX Info have been kind enough to provide an expiremental hack for XNA that allows people with ATI/SM2.0 cards to do hardware instancing. You can obtain this DLL here or through the tutorial’s sample code. Normally hardware instancing is a Shader Model 3 only technique, but this “hack” allows us to run it on older hardware. Note that in order to use this hack, you will need to remove the LoaderLock Exception. You can do this in C# Express by selecting “Debug -> Exceptions -> Managed Debugging Assistants” and then unchecking the LoaderLock Exception.
In the below draw function a few things are going on. First the device is cleared and the hack is preformed. Then the squares are rotated and the shader parameters are set. The shader is started and the graphics device’s properties are set. We set the frequency of the index and instance data to let the GPU know how many times of each it should read. We then preform a single draw call and end the shader.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | protected override void Draw (GameTime gameTime) { // Clear the backbuffer graphics.GraphicsDevice.Clear ( Color.Black ); // Preform the hack. XNAInfo.Experimental.MDXUtil.EnableATIInstancingHack ( graphics.GraphicsDevice ); // Reset the rotation matrix rotation += 0.01f; rotationMatrix = Matrix.CreateRotationZ ( rotation ); // Set the values of the shader instancingShader.Parameters["Tex0"].SetValue ( texture ); instancingShader.Parameters["ViewProjection"].SetValue ( rotationMatrix ); // Begin the shader instancingShader.Begin ( SaveStateMode.None ); // Begin the first pass instancingShader.CurrentTechnique.Passes[0].Begin (); // Set the vertex declaration and index buffer graphics.GraphicsDevice.VertexDeclaration = vertexDeclaration; graphics.GraphicsDevice.Indices = indexBuffer; // Tell the GPU how many times to run through the vertex data // and set the stream. graphics.GraphicsDevice.Vertices[0].SetFrequencyOfIndexData ( instanceData.Length ); graphics.GraphicsDevice.Vertices[0].SetSource ( vertexBuffer, 0, vertexDeclaration.GetVertexStrideSize ( 0 ) ); // Tell the GPU how many times to run through the instance data // and set the stream. graphics.GraphicsDevice.Vertices[1].SetFrequencyOfInstanceData ( 1 ); graphics.GraphicsDevice.Vertices[1].SetSource ( instanceBuffer, 0, System.Runtime.InteropServices.Marshal.SizeOf ( Vector4.Zero ) ); // Draw the primitives with only one call! graphics.GraphicsDevice.DrawIndexedPrimitives ( PrimitiveType.TriangleList, 0, 0, 4, 0, 2 ); // End the pass instancingShader.CurrentTechnique.Passes[0].End (); // End the shader instancingShader.End (); // Draw the rest base.Draw ( gameTime ); } |
And that is it for Hardware Instancing! Note that this is Windows Only at the moment and will not work on the Xbox 360. Happy instancing! Sample code for this tutorial can be found below.