Chapter 7 The config4cpp and config4j Utilities

7.1 Introduction

The config4cpp and config4j utilities are command-line utilities that act as wrappers for their corresponding Config4* libraries.¹ These utilities serve several purposes.

First, when you have written or edited a configuration file, you can use config4cpp or config4j to check if the file has any syntax errors or, optionally, schema validation errors.

Second, the utilities provide a way for you to “play with” the Config4* API without having to write code. As such, these utilities can shorten the learning curve for developers.

Finally, the utilities make it possible for a UNIX shell script to retrieve information from a Config4* file. This makes it possible to use Config4* to configure shell script-based applications.

7.1.1 Basic Operation

The config4cpp and config4j utilities work identically so, for brevity, I discuss just config4cpp in this chapter.

You can obtain a usage statement by running config4cpp without any command-line arguments (or with the -h argument). If you do that, then it prints a usage statement like that shown in Figure 7.1.

Figure 7.1: Usage statement for config4cpp

usage: config4cpp -cfg <source> <command> <options>

<command> can be one of the following:
  parse               Parse and report errors, if any
  validate            Validate <scope>.<name>
  dump                Dump <scope>.<name>
  dumpSec             Dump the security policy
  print               Print value of the <scope>.<name> variable
  type                Print type of the <scope>.<name> entry
  slist               List scoped names in <scope>.<name>
  llist               List local names in <scope>.<name>

<options> can be:
  -h                  Print this usage statement
  -set <name> <value> Preset name=value in configuration object
  -scope <scope>      Specify <scope> argument for commands
  -name <name>        Specify <name> argument for commands

  -secCfg <source>    Override default security policy
  -secScope <scope>   Scope for security policy

  -schemaCfg <source> Source that contains a schema
  -schema <full.name> Name of schema in ’-schemaCfg <source>’

  -recursive          For llist, slist and validate (default)
  -norecursive        For llist, slist and validate
  -filter <pattern>   A filter pattern for slist and llist
  -types <types>      For llist, slist and validate

  -expandUid          For dump (default)
  -unexpandUid        For dump

<types> can be one of the following:
  string, list, scope, variables, scope_and_vars (default)

<source> can be one of the following:
  file.cfg       A configuration file
  file#file.cfg  A configuration file
  exec#<command> Output from executing the specified command

As the usage statement indicates, config4cpp provides the following commands: parse, validate, dump, print, type, slist, llist and dumpSec. Regardless of the command chosen, you must always use the -cfg <source> command-line argument to specify a source of configuration information. The source can be a file (specified with file.cfg or file#file.cfg) or a command (specified with exec#...) that, when executed, prints a configuration file to standard output. If the command contains spaces, then you need to enclose the command in double quotes, for example:

config4cpp -cfg exec#"curl -sS http://localhost/file.cfg" ...

7.1.2 Commonly Used Options

As discussed in Section 3.4, many Config4* operations take two parameters that, when combined, specify the fully-scoped name of an item in a configuration file. For example:

logDir = cfg.lookupString("foo", "log.dir");

The first parameters ("foo") specifies a scope and the second parameter ("log.dir") specifies a local name within that scope. When using the config4cpp utility, you use the -scope <...> and -name <...> command-line options to specify the scope and name parameters for the underlying operations. For example:

config4cpp -cfg example.cfg print -scope foo -name log.dir

The -scope <...> and -name <...> options both default to empty strings. And since, internally, Config4* merges the two parameters to form a fully-scoped name, you can specify the fully-scoped name with either one of the two command-line options, and let the other option have its default value of an empty string. For example:

config4cpp -cfg example.cfg print -name foo.log.dir

As discussed in Section 3.8, Config4* defines several constants that denote different types of entries found in a configuration file.

CFG_STRING. A string variable.
CFG_LIST. A list variable.
CFG_VARIABLES. A variable, regardless of whether it is a string or a list.
CFG_SCOPE. A scope.
CFG_SCOPES_AND_VARS. A scope or a variable.

Some operations take one of the above values as a parameter to specify which type(s) of configuration entries the operation should process. When using config4cpp, you use the -type <...> command-line option to specify one of the above constants; however, you remove the "CFG_" prefix from the name of the constant and put the remaining part of the name in lower case. For example, -type string denotes the CFG_STRING constant. The default value of this option is -type scopes_and_vars.

The remaining sections of this chapter discuss each of the commands provided by config4cpp.

7.2 The parse Command

The parse command instructs config4cpp to parse the configuration file specified by -cfg <source> and then terminate. For example:

config4cpp -cfg example.cfg parse

If there is an error in the file, then config4cpp prints an error message before it terminates. In this way, the parse command provides a way to check for syntax errors in a recently created or modified configuration file.

If the file to be parsed is obtained by -cfg exec#"...", then the default security policy (shown in Figure 5.1) may not be permissive enough to allow the command to be executed. In such a case, you have two options.

If you just want to check whether the configuration file is syntactically valid, and you do not care about security policies, then you could execute the command yourself and save its output into a temporary file. Then you could run config4cpp on that temporary file:

command-that-prints-a-configuration-file > tmp.cfg
config4cpp -cfg tmp.cfg parse

Alternatively, if your aim is to check the suitability of a security policy, then you should create a file, say, securityPolicy.cfg that defines the three variables used to specify a security policy: allow_patterns, deny_patterns and trusted_directories. Then run config4cpp with the -secCfg <source> and -secScope <scope> command-line options:

config4cpp -cfg exec#"..." parse -secCfg securityPolicy.cfg \
           -secScope <scope>

The -secScope <scope> option specifies the scope that contains the three security-policy variables. If those variables are defined in the global scope then you can omit this command-line option because its value defaults to an empty string.

7.3 The validate Command

The validate command instructs config4cpp to parse a configuration file and then perform schema validation on a scope within the configuration file. If a validation error is encountered, then a descriptive error message is printed. This command may seem complex because its use requires a lot of command-line options. Because of this, I introduce it with an example.

Let’s assume the file myApplications.cfg (Figure 7.2) contains configuration information for several applications, and you wish to perform schema validation for information in scope foo of that file. To do this, you will need to define a schema, such as that provided by the example.fooSchema entry in the schemas.cfg file (Figure 7.3).

Figure 7.2: The file myApplications.cfg

foo {
  timeout = "5 seconds";
  log {
    level = "2";
    dir = "/tmp";
  }
  colour = "green";
  price_list = [
    # item       colour    price
    #----------------------------------
     "apple",   "red",    "EUR 0.50",
     "widget",  "green",  "EUR 0.76",
     "pen",     "blue",   "USD 2.99"
  ];
  int_list = ["1", "2", "3"];
  temperature = "29 C";
}

bar {
  ... # details omitted for brevity
}

Figure 7.3: The file schemas.cfg

example.fooSchema = [
  "@typedef colour = enum[red, green, blue]",
  "@typedef temperature = float_with_units[C, F, K]",
  "@typedef money = units_with_float[USD, EUR, GBP]",
  "timeout     = durationSeconds",
  "log         = scope",
  "log.level   = int[0, 3]",
  "log.dir     = string[4, 4]",
  "colour      = colour",
  "price_list  = table[item,string, colour,colour, price,money]",
  "int_list    = list[int]",
  "temperature = temperature"
];

example.barSchema = [ ... ]; # details omitted for brevity

You can perform the schema validation with the following command:

config4cpp -cfg myApplications.cfg validate \
           -scope foo \
           -schemaCfg schemas.cfg \
           -schema example.fooSchema \
           -recursive \
           -types scope_and_vars

Let’s examine each command-line option. The schema validation is performed on the scope specified by the -scope option in the file specified by the -cfg option. The schema used is the list of strings provided by the variable specified by the -schema option in the file specified by the -schemaCfg file. The -recursive option specifies that the schema validator should recurse into nested scopes (such as log). The types option indicates whether the schema validator should perform validation checks for string variables (-type string), list variables (-type list), all variables (-type variables), scopes (-type scope), or everything (-type scope_and_vars).

The -recursive and -types scope_and_vars options are actually default values so they could have been omitted from the above example. If you use the -norecursive option, then the schema validation will examine only the specified scope—it will not recurse into nested scopes.

Figure 7.4 shows the algorithm used in config4cpp and config4j to implement the validate command. The code is straightforward. It initializes some variables from command-line options. Then it creates two (initially empty) Configuration objects and parses the files specified by the -cfg and -schemaCfg options. Finally, it uses a SchemaValidator object to parse the specified schema and perform schema validation.

Figure 7.4: Algorithm used by the validate command

cfgSource = ...   // from -cfg <...>
scope = ...       // from -scope <...> (default is "")
name = ...        // from -name <...> (default is "")
schemaSource= ... // from -schemaCfg <...>
schemaName = ...  // from -schema <...>
isRecursive = ... // from -recursive (default) or -norecursive
types = ...       // from -types <...> (default is scope_and_vars)
try {
  cfg = Configuration::create();
  sv = new SchemaValidator();
  schemaCfg = Configuration::create();
  cfg.parse(cfgSource);
  schemaCfg.parse(schemaSource);
  sv.parseSchema(schemaCfg.lookupList(schemaName, ""));
  sv.validate(cfg, scope, name, isRecursive, types);
} catch(ConfigurationException ex) {
  System.err.println(ex.getMessage());
  System.exit(1);
}

The purpose of showing you the code in Figure 7.4 is to illustrate that config4cpp and config4j are just thin wrappers around the corresponding Config4* libraries. This knowledge is important for application developers, because it means they can use config4cpp or config4j to “play with” Config4* and its API without needing to write any code. Doing this can shorten the learning curve.

7.4 The dump Command

When Config4* parses a configuration file, it stores information about scopes and name=value pairs in hash tables. Config4* provides a dump() operation that converts information in the hash tables into the syntax of a Config4* file. The dump command of the config4cpp utility is a thin wrapper around the dump() operation.

Figure 7.5 shows a configuration file called foo.cfg, and Figure 7.6 shows the output obtained when I ran the following command on a Linux machine:

config4cpp -cfg foo.cfg dump

Figure 7.5: The file foo.cfg

foo {
  # This is a comment
  timeout = "10 minutes";
  log {
    level = "1";
    @if (osType() == "windows") {
      dir = ".";
    } @else {
      dir = getenv("HOME") + "/.foo/logs";
    }
  }
}

Figure 7.6: Output of the dump command

foo {
  timeout = "10 minutes";
  log {
    dir = "/home/cjmchale/.foo/logs";
    level = "1";
  }
}

If you compare Figures 7.5 and 7.6, you may notice that the output of dump is different to the input file in several ways. First, comments are not preserved. Second, constructs that provide adaptive configuration (Section 2.11)—such as @if-then-@else statements, function calls and the concatenation operator ("+")—are not preserved. Finally, the order of items is not necessarily preserved, for example, the order of foo.log.level and foo.log.dir is swapped. These differences are a result of how Config4* works. When parsing a file, Config4* discards comments and fully evaluates all expressions so that when a name=value entry is stored in a hash table, the value is the result of evaluating an expression, rather than the expression itself. Finally, a hash table does not preserve the order in which items were added to it. Because of this, the dump() operation retrieves the items from a hash table in an unpredictable order, and it sorts the entries based on their names before processing them. This is why dump() outputs foo.log.dir before foo.log.level.

By default, the dump command dumps the contents of the root scope of a Configuration object. However, you can use the -scope and/or -name command-line options to instruct it to dump just a named scope or variable. For example,

config4cpp -cfg foo.cfg dump -scope foo.log

outputs the following:

foo.log {
  dir = "/home/cjmchale/.foo/logs";
  level = "1";
}

and:

config4cpp -cfg foo.cfg dump -name foo.log.dir

outputs the following:

foo.log.dir = "/home/cjmchale/.foo/logs";

The -expandUid and -unexpandUid options instruct dump how to process uid- entries. As an example of this, consider the employees.cfg file shown in Figure 7.7. The following command results in the output shown in Figure 7.8:

config4cpp -cfg employees.cfg dump -expandUid

Figure 7.7: The file employees.cfg

uid-employee {
  name = "John Smith";
  address = "...";
}
uid-employee {
  name = "Mary Jones";
  address = "...";
}

Figure 7.8: Result of dumping employees.cfg

uid-000000000-employee {
  address = "...";
  name = "John Smith";
}
uid-000000001-employee {
  address = "...";
  name = "Mary Jones";
}

The -expandUid option is actually the default, so it does not need to be explicitly stated. If you use the -unexpandUid option instead, then uid- entries are printed with unexpanded names.

7.5 The dumpSec Command

The name of the dumpSec command is an abbreviation for “dump security”. This command displays the allow_patterns, deny_patterns and trusted_directories of the security policy. Usually, the items displayed will be those of the default security policy (see Figure 5.1). However, recall from Section 7.2, that you can use the -secCfg and -secScope options of config4cpp to specify a different security policy.

7.6 The print Command

The print command prints the value of a configuration variable specified by the -scope and/or -name options. For example, recall the foo.cfg file shown earlier (Figure 7.5). The following command:

config4cpp -cfg foo.cfg print -name foo.log.dir

displays:

/home/cjmchale/.foo/logs

The print command differs from the dump command in two ways. First, you can print a variable, but you can dump a variable or a scope. Second, print displays only the value of a variable, but dump displays name=value in Config4* syntax. For example, the following command:

config4cpp -cfg foo.cfg dump -name foo.log.dir

displays:

foo.log.dir = "/home/cjmchale/.foo/logs";

The print command enables a UNIX shell script to access configuration variables in a Config4* file. For example, the following line in a shell script will create the directory specified by the foo.log.dir variable in the foo.cfg file: script:

mkdir -p ‘config4cpp -cfg foo.cfg print -name foo.log.dir‘

If you print a list variable, then each each item in the list is printed on a separate line. For example, assume the bar.cfg file contains the following line:

file_list = ["tmp.txt", "TO_DO.txt", "make.log"];

The following command:

config4cpp -cfg bar.cfg print -name file_list

displays:

tmp.txt
TO_DO.txt
make.log

This one-item-displayed-per-line property makes it easy for a UNIX shell script to process each item in a list, for example:

for file in `config4cpp -cfg bar.cfg print -name file_list`
do
  echo Processing $file
done

7.7 The type Command

The type command displays the type (string, list or scope) of an entry in a configuration file. For example:

config4cpp -cfg foo.cfg type -name foo

displays:

scope

and

config4cpp -cfg foo.cfg type -name foo.log.level

displays:

string

If the specified item does not exist in the configuration file then the type command displays:

no_value

7.8 The slist and llist Commands

The slist command is a wrapper around listFullyScopedNames(). Likewise, llist is a wrapper around listLocallyScopedNames().

These commands list the fully- or locally-scoped names of entries in the scope specified by the -scope <...> and -name <...> command-line options. If you let those options have their default values (an empty string), then the commands will display a sorted list of entries in the root scope of the configuration file.

The -types <...> option specifies the types of items that the commands should list. For example, -types variables lists the names of variables, while -types scope lists the names of scopes. The default value of this option is scope_and_vars, which lists both variables and scopes.

The -recursive and -norecursive options specify whether the commands should recurse into nested scopes to list their entries. The default value is -recursive.

The following examples are based on the example.cfg configuration file in Figure 7.9.

Figure 7.9: The file example.cfg

example {
  foo {
    timeout = "infinite";
    log {
      dir = "/tmp";
      level = "1";
    }
  }
  bar {
    greeting = "Hello, world";
  }
}

The following command:

config4cpp -cfg example.cfg slist -scope example.foo

lists the entries in the example.foo scope and (recursively) in nested scopes:

example.foo.log
example.foo.log.dir
example.foo.log.level
example.foo.timeout

If you change slist to llist in the above command, and re-run it, then the output will be as follows.

log
log.dir
log.level
timeout

One of the parameters passed to the listFullyScopedNames() and listLocallyScopedNames() operations is an array of strings that specify filter patterns. Each filter pattern is a string in which "*" is a wildcard that matches zero or more characters. An entry will be included in the returned list only if: (1) the filters patterns array is empty (thus indicating that no filtering is performed); or (2) the (unexpanded form of the) entry’s name matches at least one of the filter patterns.

You can use the -filter <...> command-line option to specify a filter. You can use this option multiple times to specify multiple filters.

Pattern filters can be very useful if you are writing an application that makes use of uid entries in a configuration file. As an example, consider a people.cfg configuration file that contains a mixture of uid-friend and uid-enemy entries within the people scope. The following command will list just the uid-friend entries:

config4cpp -cfg people.cfg llist -scope people -filter uid-friend

The following command will list all the "uid-" entries:

config4cpp -cfg people.cfg llist -scope people -filter "uid-*"

7.9 Summary

The config4cpp and config4j utilities provide command-line wrappers for operations in the Config4* libraries. These utilities serve a few purposes.

First, when you have written or edited a configuration file, you can use the parse command to check if the file has any syntax errors or, use the validate command to check it for schema validation errors.

Second, the utilities provide a way for you to experiment with the Config4* API without having to write code. As such, these utilities can shorten the learning curve for developers. For example:

The -secCfg and -secScope options enable you to experiment with defining your own security policies.
The validate command enables you to explore the syntax and semantics of the schema language.
The dump command enables you to check if adaptive configuration constructs behave the way you think they should.
The slist and llist commands, and their -filter option, provide a way for you to experiment with the listFullyScopedNames() and listLocallyScopedNames() operations. This can be useful if you need to implement a browser-type application or if you plan to work with uid entries.

Finally, the print command make it possible for a UNIX shell script to retrieve information from a Config4* file. This makes it possible to use Config4* to configure shell script-based applications.

1: The config4cpp utility is a compiled application, while config4j is a Windows batch file or UNIX shell script that executes the main() operation of the org.config4j.Config4J class.