Jun 2, 2010

Incremental Parsing Using Boost Program Options Library

I have to say I fell in love with boost::program_options since the first time I use it in developing my own C++ MapReduce implementation. It can parse command line parameters as well as configuration files. This makes it convenient for programs which supports and expects a bunch of options, where a MapReduce program is a typical example.

A special use-case of a command line parser is that a function need to parse some options out from the command line parameters, and then the rest parameters are passed to another function, which parse other options. For example, the MapReduce runtime requires to get options like "num_map_workers", "num_reduce_workers", etc, and the rest of the program (user customized map and reduce functions) need to parse application-specific options like "topic_dirichlet_prior", "num_lda_topics", etc. boost::program_options supports such kind of multi-round parsing, where the key is boost::program_options::allow_unregistered(). Here attaches a sample program: (For more explanation on this program, please refer to the official document of boost::program_options.)

#include <iostream>
#include <string>
#include <vector>

#include <boost/program_options/option.hpp>
#include <boost/program_options/options_description.hpp>
#include <boost/program_options/variables_map.hpp>
#include <boost/program_options/parsers.hpp>

using namespace std;
namespace po = boost::program_options;

int g_num_map_workers;
int g_num_reduce_workers;

vector<string> foo(int argc, char** argv) {
po::options_description desc("Supported options");
desc.add_options()
("num_map_workers", po::value<int>(&g_num_map_workers), "# map workers")
("num_reduce_workers", po::value<int>(&g_num_reduce_workers), "# reduce workers")
;
po::variables_map vm;
po::parsed_options parsed =
po::command_line_parser(argc, argv).options(desc).allow_unregistered().run();
po::store(parsed, vm);
po::notify(vm);

cout << "The following options were parsed by foo:\n";
if (vm.count("num_map_workers")) {
cout << "num_map_workers = " << g_num_map_workers << "\n";
}
if (vm.count("num_reduce_workers")) {
cout << "num_reduce_workers = " << g_num_reduce_workers << "\n";
}

return po::collect_unrecognized(parsed.options, po::include_positional);
}

void bar(vector<string>& rest_args) {
po::options_description desc("Supported options");
desc.add_options()
("apple", po::value<int>(), "# apples")
;
po::variables_map vm;
po::parsed_options parsed =
po::command_line_parser(rest_args).options(desc).allow_unregistered().run();
po::store(parsed, vm);
po::notify(vm);

cout << "The following options were parsed by bar:\n";
if (vm.count("apple")) {
cout << "apple = " << vm["apple"].as<int>() << "\n";
}
}

int main(int argc, char** argv) {
vector<string> rest_options = foo(argc, argv);

cout << "The following cmd args cannot not be recognized by foo:\n";
for (int i = 0; i < rest_options.size(); ++i) {
cout << rest_options[i] << "\n";
}

bar(rest_options);
}


Finally I have to tell that early boost version (e.g., 1.33.1 packed in Cygwin) has bugs in program_options, which leads to core dump in case of unknown options. The solution to download and build your own boost libraries. I just built 1.43.0 on Cygwin on my Windows computer.

3 comments:

Ananth Tatachar said...
This comment has been removed by the author.
Ananth Tatachar said...
This comment has been removed by the author.
Ananth Tatachar said...
This comment has been removed by the author.