Friday, June 8, 2012

Fun With Optimization Part 1

I was working on the voxel landscape engine when I ran into a problem.  The engine is capable of a 8192x8192x128 size landscape, but if I tried to set the densities of a 4096x4096 area it would take about 5 minutes to create.  A 2048x2048 area would take about 20 seconds, so it seemed strange that it would take so much longer for something only 4 times larger.

At first I thought it might be the way I was setting the densities.  I had a function to set the density of a point on the voxel grid, which would check if it was outside the grid, if it was inside the grid it would check if a sector already exists, if not it would create one.  A sector is an 8x8x8 grid inside the larger grid, basically simple compression.  The larger grid is 1024x1024x16, if there's nothing inside then it's set to -1, otherwise it points to the sector which holds the densities.  Without this a 8192x8192x128 landscape would take up over 8 gigs of memory.

Back to the problem, I figured that the problem might be the way I was setting the densities, using a function with bounds checking and setting them in a way that wasn't very cache friendly.  I wrote a new set density function but it didn't help at all.  After adding some more debug text I realized the problem was with creating the sectors.

The method I'm using for sectors is an array of structs, and a separate bit field which keeps track of which sectors are active.  It's basically an array of unsigned integers where each bit represents whether a sector is active, 0 for inactive and 1 for active.  To find the first available sector I was using these two functions:

int voxel_getnextsector(void)
  {
  int sectornum;

  sectornum=0;
  while (sectornum<VOXEL_MAXNUMOFSECTORS)
    {
    if (!voxel_getsectoractive(sectornum))
      return(sectornum);
    sectornum++;
    }

  return(-1);
  }

int voxel_getsectoractive(int sectornum)
  {
  int count;
  int bitnum;

  count=sectornum>>5;
  bitnum=(sectornum&31);

  if (count>=VOXEL_MAXNUMOFSECTORS/32-1)
    return(0);

  return(((voxel_level.sectoractive[count]>>bitnum)&1));
  }

The first function loops through the maximum number of sectors until it finds an inactive sector, the second function will return whether the sector is currently active or not.  It didn't seem too slow, but I have space for over 1 million possible sectors.  As the sectors were filling up, these functions were getting slower and slower.  I tried inlining the second function since that was called quite a bit, but that didn't help much.  I needed to rewrite the first function.  Here's what it looks like:

int voxel_getnextsector(void)
  {
  int count;
  int bitnum;
  int sectornum;

  count=0;
  while (voxel_level.sectoractive[count]==0xFFFFFFFF && count<VOXEL_MAXNUMOFSECTORS/32-1)
    count++;

  if (count<VOXEL_MAXNUMOFSECTORS/32-1)
    {
    bitnum=0;
    while (((voxel_level.sectoractive[count]>>bitnum)&1)==1 && bitnum<31)
      bitnum++;

    sectornum=(count<<5)+bitnum;
    return(sectornum);
    }

  return(-1);
  }

Now instead of checking each bit individually with a function call, it checks a whole 32 bit int at a time, if all of the bits are set to 1 then it can skip to the next int.  This is at least 32 times faster, and ends up being more considering function call overhead and the extra operations needed for the original.  Now it takes a few seconds to set the densities, and I can get back to working on level of detail and all the other things I need to figure out for the voxel landscape.

No comments: