Using standalone CPU profilers

To simulate excess CPU usage, we will add the following two lines to the main.cc file of the preceding example project before the return statement:

// waste time
wasteCpuCycles();
return app.exec();

The wasteCpuCycles() function will waste time with floating-point divisions in the following manner:

void wasteCpuCycles()
{
size_t count = 10000000;
double result = 0;
for(size_t i = 0; i < count; ++i)
{
result += i / 2.33;
}
qDebug() << QString("Wasted %1 divisions, result=%2")
.arg(count).arg(result);
}

First, we will try out the Very Sleepy profiler. To install it, go to  https://github.com/VerySleepy/verysleepy. Because, at the time of writing, the last released version, v0.90, cannot read the debug information of newer MinGW compilers, you will need the latest development build of the yet-unreleased 0.91 version. You can find it under the Download section of Readme.md as a link to an AppVeyor artifact (v0.90-154-g3220232 in my case). It contains an installer, so just start it and accept all defaults. After you start the profiler, you can either attach it to a running program from the list on the left-hand or click File | Launch and select a program to start, as shown in the following screenshot:

You'll see that we have to specify the executable and its working directory; for the latter, just use the directory where the application is located. After clicking Launch, the application will be started under the control of the profiler, as seen in the following screenshot:

When you have all the required profiling data, click on the Stop button in the Very Sleepy dialog. The profiler will then collect the data and display it in a new window, as can be seen in the following diagram:

We see that Very Sleepy found our CPU wasting function, and shows it as the most CPU-intensive non-system function. In the lower panel, the source code is shown with lines annotated with sampled time durations. We can see that the most time was wasted in floating-point operations, and some on loop mechanics.

Next, let's try out the current CodeXL's CPU profiling support. To install it, go to https://github.com/GPUOpen-Tools/CodeXL/releases/tag/v2.5 or the product site (https://gpuopen.com/compute-product/codexl/) and download the Windows installer (CodeXL_Win_2.5.67.exe). Install it on your computer, and start the application. 

To be able to profile the example application, you will have to create a new project by clicking on File | New Project and filling out the the executable location. After that, you can start profiling by clicking the CPU Profile icon in the toolbar. When you think that you have all the required profiling data, click on the rectangular Stop icon next to the CPU Profile icon. CodeXL will then collect the data and display it in a new profiling session, which can be seen in the following screenshot:

We see that the CPU wasting function is shown on the first row this time. This is because CodeXL will, by default, filter out system functions. We have several other views in which the data is organized in a different manner. However, CodeXL seemed to have a problem displaying the source code of the function and only displayed the disassembly in the Source/Disassembly tab. Nonetheless, the instructions were all annotated with sample rates, which gives us a nice insight into the workings of the generated assembler code.