Using functions in C++

The example in this chapter uses the techniques you have learned in this chapter to list all the files in a folder, and subfolders, in order of file size, giving a listing of the filenames and their sizes. The example is the equivalent of typing the following at the command line:

dir /b /s /os /a-d folder

Here, folder is the folder you are listing. The /s option recurses, /a-d removes folders from the list, and /os orders by size. The problem is that without the /b option we get information about each folder, but using it removes the file size in the list. We want a list of filenames (and their paths), their size, ordered by the smallest first.

Start by creating a new folder for this chapter (Chapter_05) under the Beginning_C++ folder. In Visual C++ create a new C++ source file and save it as files.cpp under this new folder. The example will use basic output and strings. It will take a single command line parameter; if more command-line parameters are passed, we just use the first one. Add the following to files.cpp:

    #include <iostream> 
    #include <string> 
    using namespace std; 

    int main(int argc, char* argv[]) 
    { 
        if (argc < 2) return 1; 
        return 0; 
    }

The example will use the Windows functions, FindFirstFile and FindNextFile, to get information about files that meet a file specification. These return data in a WIN32_FIND_DATAA structure, which has information about the filename, the file size, and file attributes. The functions also return information about folders too, so it means we can test for subfolders and recurse. The WIN32_FIND_DATAA structure gives the file size as a 64-bit number in two parts: the upper and lower 32 bits. We will create our own structure to hold this information. At the top of the file, after the C++ include files, add the following:

    using namespace std; 

    #include <windows.h>
    struct file_size {   
        unsigned int high;   
        unsigned int low;
    };

The first line is the Windows SDK header file so that you can access the Windows functions, and the structure is used to hold the information about a file's size. We want to compare files by their sizes. The WIN32_FIND_DATAA structure provides the size in two unsigned long members (one with the upper 4 bytes and the other with the lower 4 bytes). We could store this as a 64-bit number, but instead, so that we have an excuse to write some operators, we store the size in our file_size structure. The example will print out file sizes and will compare file sizes, so we will write an operator to insert a file_size object into an output steam; since we want to order the files by size, we need an operator to determine if one file_size object is greater than the other.

The code will use Windows functions to get information about the files, in particular their name and size. This information will be stored in a vector, so at the top of the file add these two highlighted lines:

    #include <string> 
    #include <vector>
    #include <tuple>

The tuple class is needed so that we can store both a string (the filename) and a file_size object as each item in the vector. To make the code more readable add the following alias after the structure definition:

    using file_info = tuple<string, file_size>;

Then just above the main function add the skeleton code for the function that will get the file in a folder:

    void files_in_folder( 
       const char *folderPath, vector<file_info>& files) 
    { 
    }

This function takes a reference to a vector and a folder path. The code will go through each item in the specified folder. If it is a file, it will store the details in the vector; otherwise, if the item is a folder it will call itself to get the files in that subfolder. Add a call to this function at the bottom of the main function:

    vector<file_info> files; 
    files_in_folder(argv[1], files);

The code has already checked that there is at least one command line argument, and we use this as the folder to examine. The main function should print out the file information, so we declare a vector on the stack and pass this by reference to the files_in_folder function. This code does nothing so far, but you can compile the code to make sure that there are no typos (remember to use the /EHsc parameter).

Most of the work is carried out in the files_in_folder function. As a start, add the following code to this function:

    string folder(folderPath); 
    folder += "*"; 
    WIN32_FIND_DATAA findfiledata {}; 
    void* hFind = FindFirstFileA(folder.c_str(), &findfiledata); 

    if (hFind != INVALID_HANDLE_VALUE) 
    { 
       do 
       { 
       } while (FindNextFileA(hFind, &findfiledata)); 
       FindClose(hFind); 
    }

We will use the ASCII version of the functions (hence the trailing A on the structure and function names). The FindFirstFileA function takes a search path, and in this case, we use the name of a folder suffixed with a *, meaning everything in this folder. Notice that the Windows function wants a const char* parameter, so we use the c_str function on the string object.

If the function call succeeds and it finds an item that meets this criterion, then the function fills in the WIN32_FIND_DATAA structure passed by the reference and it also returns an opaque pointer which will be used to make subsequent calls on this search (you do not need to know what it points to). The code checks to see if the call was successful, and if so, it repeatedly calls FindNextFileA to get the next item until this function returns 0, indicating there are no more items. The opaque pointer is passed to FindNextFileA so that it knows which search is being checked. When the search is complete, the code calls FindClose to release whatever resources Windows allocates for the search.

The search will return both file and folder items; to handle each differently, we can test the dwFileAttributes member of the WIN32_FIND_DATAA structure. Add the following code in the do loop:

    string findItem(folderPath); 
    findItem += ""; 
    findItem += findfiledata.cFileName; 
    if ((findfiledata.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0) 
    { 
        // this is a folder so recurse 
    } 
    else 
    { 
        // this is a file so store information 
    }

The WIN32_FIND_DATAA structure contains just the relative name of the item in the folder, so the first few lines create an absolute path. The following lines test to see if the item is a folder (directory) or a file. If the item is a file, then we simply add it to the vector passed to the function. Add the following to the else clause:

    file_size fs{}; 
    fs.high = findfiledata.nFileSizeHigh; 
    fs.low = findfiledata.nFileSizeLow; 
    files.push_back(make_tuple(findItem, fs));

The first three lines initialize a file_size structure with the size data, and the last line adds a tuple with the name of the file and its size to the vector. So that you can see the results of a simple call to this function, add the following to the bottom of the main function:

    for (auto file : files) 
    { 
        cout << setw(16) << get<1>(file) << " "  
            << get<0>(file) << endl; 
    }

This iterates through the items in the files vector. Each item is a tuple<string, file_size> object and to get the string item, you can use the Standard Library function, get, using 0 as the function template parameter, and to get the file_size object you call get with 1 as the function template parameter. The code calls the setw manipulator to make sure that the file sizes are always printed in a column 16 characters wide. To use this, you need to add an include for <iomanip> at the top of the file. Notice that get<1> will return a file_size object and this is inserted into cout. As it stands, this code will not compile because there is no operator to do this. We need to write one.

After the definition of the structure, add the following code:

    ostream& operator<<(ostream& os, const file_size fs) 
    { 
        int flags = os.flags(); 
        unsigned long long ll = fs.low + 
            ((unsigned long long)fs.high << 32); 
        os << hex << ll; 
        os.setf(flags); 
        return os; 
    }

This operator will alter the ostream object, so we store the initial state at the beginning of the function and restore the object to this state at the end. Since the file size is a 64-bit number, we convert the constituent parts of the file_size object and then print it out as a hexadecimal number.

Now you can compile and run this application. For example:

files C: \windows

This will list the names and sizes of the files in the windows folder.

There are two more things that need to be done--recurse subfolders and sort the data. Both are straightforward to implement. In the files_in_folder function, add the following code to the code block of the if statement:

    // this is a folder so recurse 
    string folder(findfiledata.cFileName); 
    // ignore . and .. directories 
    if (folder != "." && folder != "..") 
    { 
        files_in_folder(findItem.c_str(), files); 
    }

The search will return the . (current) folder and .. (parent) folder, so we need to check for these and ignore them. The next action is to recursively call the files_in_folder function to obtain the files in the subfolder. If you wish, you can compile and test the application, but this time it is best to test the code using the Beginning_C++ folder because recursively listing the Windows folder will produce a lot of files.

The code returns the list of files as they were obtained, but we want to see them in order of file size. To do this we can use the sort function in the <algorithm> header, so add an include to this after the include for <tuple>. In the main function, after the call to files_in_folder, add this code:

    files_in_folder(argv[1], files); 

    sort(files.begin(), files.end(), 
        [](const file_info& lhs, const file_info& rhs) { 
            return get<1>(rhs) > get<1>(lhs);    
    } );

The first two parameters of the sort function indicate the range of items to check. The third item is a predicate, and the function will pass two items from the vector to the predicate. You have to return a value of true if the two parameters are in order (the first is smaller than the second).

The predicate is provided by a lambda expression. There are no captured variables so the expression starts with [] and this is followed by the parameter list of the items being compared by the sort algorithm (passed by const reference, because they will not be changed). The actual comparison is carried out between the braces. Since we want to list the files in ascending order, we have to ensure that the second of the two is bigger than the first. In this code, we use the > operator on the two file_size objects. So that this code will compile, we need to define this operator. After the insertion operator add the following:

    bool operator>(const file_size& lhs, const file_size& rhs) 
    { 
        if (lhs.high > rhs.high) return true; 
        if (lhs.high == rhs.high) { 
            if (lhs.low > rhs.low) return true; 
        } 
        return false; 
    }

You can now compile the example and run it. You should find that the files in the specified folder and subfolders are listed in order of the size of the files.