Previous Up Next

Chapter 3  Overview of the Config4* API

3.1  Introduction

The C++ and Java APIs of Config4* are very similar, so this chapter discusses both of them side by side. All the functionality of Config4Cpp is defined in the config4cpp namespace. The functionality of Config4J is defined in the org.config4j package. To illustrate the API of Config4*, consider a configuration file that contains the following entries.

foo_srv {
    timeout ?= "2 minutes";
    log {
        dir ?= "C:\foo\logs";
        level ?= "0";
    };
};

Figures 3.1 and 3.2 show examples of using Config4Cpp and Config4J to access information in the above configuration file. In much of this chapter I discuss the APIs used in these figures.

Figure 3.1: Example of Using Config4Cpp
#include <locale.h>
#include <config4cpp/Configuration.h>
using namespace config4cpp;
...
setlocale(LC_ALL, "");
...
const char *    logDir;
int             logLevel, timeout;
const char *    scope = "foo_srv";
Configuration * cfg = Configuration::create();
try {
  cfg->parse(getenv("FOO_CONFIG"));
  logDir   = cfg->lookupString(scope, "log.dir");
  logLevel = cfg->lookupInt(scope, "log.level");
  timeout  = cfg->lookupDurationSeconds(scope, "timeout");
} catch(const ConfigurationException & ex) {
  cout << ex.c_str() << endl;
}
cfg->destroy();
Figure 3.2: Example of Using Config4J
import org.config4j.*;
...
String        logDir;
int           logLevel, timeout;
String        scope = "foo_srv";
Configuration cfg = Configuration.create();
try {
  cfg.parse(cfg.getenv("FOO_CONFIG"));
  logDir   = cfg.lookupString(scope, "log.dir");
  logLevel = cfg.lookupInt(scope, "log.level");
  timeout  = cfg.lookupDurationSeconds(scope, "timeout");
} catch(ConfigurationException ex) {
  System.out.println(ex.getMessage());
}

The correct behaviour Config4Cpp depends on the locale being set correctly. Because of this, it is advisable to call setlocale() before invoking any Config4Cpp APIs. If you do this, then Config4Cpp will be able to handle characters defined in your locale, such as European accented characters or Japanese ideographs. If you neglect to call setlocale(), then Config4Cpp is likely to correctly process only characters in the 7-bit US ASCII character set.

3.2  Parsing Configuration Files

You create a configuration object by invoking the static create() operation on the Configuration class. The newly created configuration object is empty initially. You can populate it by invoking the parse() operation, which takes a file name as a parameter. The C++ example (Figure 3.1) calls the getenv() function to obtain the file-name parameter from an environment variable. For most of Java’s history, it has been difficult to access environment variables in Java applications but Config4J provides a utility getenv() operation on the Configuration class to simplify such access.1

If parse() encounters any errors, then it throws an exception of type ConfigurationException. The C++ implementation of this class provides a c_str() operation you can use to access the exception’s message. Java developers can access the exception’s message in the usual Java way, that is, by calling getMessage(). In Java, ConfigurationException is a runtime exception.

3.3  Accessing Configuration Variables

Once a Configuration object has been created and populated, you can use operations such as lookupString() and lookupList() to retrieve the values of configuration variables. You can see examples of this in Figures 3.1 and 3.2.

Some additional operations with names of the form lookup<Type>() are provided that retrieve a string value and convert it to another data-type. For example, lookupInt() converts a string value to an integer and lookupBoolean() converts a string value to a boolean.

The lookupDurationSeconds() operation converts strings, for example, "10 seconds" or "2.5 minutes", into an integer that denotes the duration in seconds (it converts "infinite" to the integer value -1). You can use such durations to configure timeout values in applications. There are also lookupDurationMilliseconds() and lookupDurationMicroseconds() operations in case you prefer to have the result expressed in milliseconds or microseconds rather than in seconds.

If a lookup operation fails—for example, lookupInt() might encounter an invalid integer—then it throws a ConfigurationException. The message contained in the exception explains what went wrong.

3.4  Scoped Names

Some Config4* operations take two parameters that, when combined, specify the fully-scoped name of a configuration variable. For example, in C++, you can access the value of foo_srv.log.dir with the following statement.

logDir = cfg->lookupString("foo_srv", "log.dir");

The example code in Figures 3.1 and 3.2 illustrates the intended purpose of this approach to identifying configuration variables. A variable, called scope, is initialized with the name of a configuration scope, and a configuration variable (such as log.dir) within that scope can be accessed by passing scope and the name of the variable as parameters to an accessor operation.

logDir = cfg->lookupString(scope, "log.dir");

Typically, the scope variable is obtained from a command-line argument. By rerunning an application with a different command-line argument, you can change the scope used to configure the application. For example, you might have one configuration scope for running an application without debugging diagnostics, and another scope that enables debugging diagnostics. Alternatively, you might have a separate scope for each user or for each instance of a replicated server application.

3.5  Presetting Configuration Variables

When Config4* is parsing a configuration file, it calls insertString() and insertList() to populate the Configuration object with name-value pairs. You can call those operations directly in your application code. One important reason for doing so is to populate a Configuration object with name-value pairs obtained from command-line arguments before parsing a configuration file. The Java code in Figure 3.3 illustrates how to do this.

Figure 3.3: Java example of presetting configuration variables
public void main(String[] args) {
  String        logDir;
  int           logLevel, timeout;
  String        scope = "foo_srv";
  Configuration cfg = Configuration.create();
  try {
    //--------
    // Pre-populate the configuration object from
    // "-set name value" command-line options
    //--------
    for (int i = 0; i < args.length; i++) {
      if (args[i].equals("-set") {
        if (i + 2 >= args.length) {
          usageError("Too few arguments after ’-set’");
          System.exit(1);
        }
        cfg.insertString(scope, args[i+1], args[i+2]);
      } else {
        ... // processing for other command-line options
      }
    }

    //--------
    // Parse the config file and lookup config variables.
    //--------
    cfg.parse(cfg.getenv("FOO_CONFIG"));
    logDir   = cfg.lookupString(scope, "log.dir");
    logLevel = cfg.lookupInt(scope, "log.level");
    timeout  = cfg.lookupDurationSeconds(scope, "timeout");
  } catch(ConfigurationException ex) {
    System.out.println(ex.getMessage());
    System.exit(1);
  }
}

This tactic provides a simple way to integrate command-line options with information in a configuration file. To understand why, consider the configuration file shown at the start of this chapter, which is repeated below for convenience.

foo_srv {
    timeout ?= "2 minutes";
    log {
        dir ?= "C:\foo\logs";
        level ?= "0";
    };
};

The use of the conditional assignment operator ("?=") within the configuration file means that a variable will be assigned a value only if it does not already have a value. For example, running the code shown in Figure 3.3 with the command-line option "-set log.level 2" will change the log level from its default value of 0 to the value of 2.

3.6  Variations of parse()

Earlier in this chapter (in Section 3.2) I said that you can call parse() to parse a configuration file. Actually, Config4* offers a lot of flexibility in parsing, as I now discuss.

3.6.1  Parsing Centralized Configuration

Let’s assume that, as shown in Figures 3.1 and 3.2, an application uses the FOO_CONFIG environment variable to specify the location of its configuration file. If the application is being used only by you and on only one computer then you can store the application’s configuration information in a file and set FOO_CONFIG to point to this.

FOO_CONFIG=/path/to/foo.cfg

A few months later you may want to use the application on several computers within the same office. You could copy the configuration file onto each of these computers but then you would end up with multiple configuration files to maintain. Alternatively, if there is a web server in your office, you could move the configuration file to it and set FOO_CONFIG on all the computers to retrieve this configuration file via curl.2

FOO_CONFIG="exec#curl -sS http://host/path/to/foo.cfg"

Recall from Section 2.10 that the adaptable configuration features in Config4* enable a configuration file to adapt its contents for different computers, operating systems or users. Because of this, a single configuration file stored, say, on a centralized web server can be used for all users of the Foo application within your organization.

3.6.2  Parsing Embedded Configuration

There is an overloaded version of parse() that takes two parameters. You can use this two-parameter version to parse configuration information that is stored in a string, as this C++ example illustrates.

const char * str = "message = \"Hello, World\";";
cfg->parse(Configuration::INPUT_STRING, str);

Constructing a configuration string manually is tedious for two reasons. First, as the above example illustrates, you have to escape double quotes with a backslash. Second, compilers place limits on the maximum length of string literals; if you wanted, say, a 50KB configuration string, then you would have to construct this by concatenating numerous smaller strings.

The config2cpp and config2j command-line utilities (discussed in Chapter 6) read an input configuration file and generate a C++ or Java class that stores the contents of the configuration file in an instance variable. The generated class automates the tedious escaping of double quotes and concatenating short string literals to produce a monolithic configuration string. You can access this configuration string by invoking the public getString() operation on the generated class.

The config2cpp and config2j utilities make it easy to generate a configuration string that can be embedded in an application. This can be useful in an embedded system that does not contain a file system.

3.6.3  Using Fallback Configuration

An important use of embedded configuration strings is to enable an application to have default configuration that can be overridden by an optional configuration file specified by, say, an environment variable or command-line argument. A primitive way to do this is shown below in Java syntax.

Configuration cfg = Configuration.create();
String cfgFile = cfg.getenv("FOO_CONFIG");
if (cfgFile != null) {
    cfg.parse(cfgFile);
} else {
    cfg.parse(Configuration.INPUT_STRING,
              EmbeddedConfig.getString());
}

This method is primitive because it is an either-or approach: the configuration is obtained from either a file or an embedded string. This is acceptable if there are only a handful of configuration variables. However, if the application uses hundreds of configuration variables, then it is not convenient for a user to have to write such a large configuration file when she might want only a few configuration variables to have non-default values.

It would be preferable to allow the configuration file to contain just a few variables and for the application to automatically “fallback” to an embedded configuration string for variables not specified in the configuration file. Config4* provides support for such fallback configuration; you can see an example of its use in Figure 3.4.

Figure 3.4: Fallback configuration
Configuration cfg = Configuration.create();
String cfgFile = cfg.getenv("FOO_CONFIG");
if (cfgFile != null) {
    cfg.parse(cfgFile);
}
cfg.setFallbackConfiguration(Configuration.INPUT_STRING,
                             EmbeddedConfig.getString());

Using fallback configuration involves three steps. First, you create an empty configuration object. Second, you parse a configuration source, if the user has specified one. Third, you call setFallbackConfiguration() to apply a fallback configuration object to the main configuration object. The fallback configuration object, which contains default values for all configuration variables used by the application, is typically created by invoking the getString() operation on a class that was generated by config2cpp or config2j.

The semantics of fallback configuration can be understood by considering the statement below.

str = cfg.lookupString(scope, "log.level");

The lookupString() operation first searches in the main configuration object for the log.level variable in the scope specified by the scope parameter. If the variable is not found, then the operation searches in the global scope of the fallback configuration object for the log.level variable. The global scope is used in the fallback configuration object because a scope denotes the name of an application but fallback configuration applies to all applications.

3.7  Default Values

Although embedded fallback configuration is useful in applications, some people may think it is overkill if they just want to quickly hack together a short program that uses a few configuration variables. For this reason, Config4* provides a alternative mechanism, which is an extra optional parameter (denoting a default value) that can be passed to lookup operations. For example, the first Java statement below will throw an exception if the specified variable is missing from the configuration file, but the second statement will return "/tmp".

logDir = cfg.lookupString(scope, "log.dir");
logDir = cfg.lookupString(scope, "log.dir", "/tmp");

3.8  Listing the Contents of a Scope

You can invoke listFullyScopedNames() to get a sorted list of the names of all entries (that is, variables and scopes) contained within a scope.

String[] names = cfg.listFullyScopedNames(scope, "",
                                Configuration.CFG_SCOPE_AND_VARS, true);

The first two parameters to listFullyScopedNames() are a scope and local name within that scope. These parameters are combined to form a fully-scoped name, as discussed in Section 3.4. In practice, you typically use an empty string for the local name parameter, unless you want to get a listing of a nested scope within the main scope for an application.

The third parameter is an integer bit mask that specifies what kind of entries you want to be listed. The Configuration class defines Java integer constants or C++ enum values that you can use.

The final parameter indicates if listFullyScopedNames() should recurse into nested scopes (true) or just list entries in the stated scope (false).

At the start of this chapter, I showed a scope called foo_srv. The above call to listFullyScopedNames() for that scope returns the following list of strings.

foo_srv.log
foo_srv.log.dir
foo_srv.log.level
foo_srv.timeout

Calling the same operation but specifying false for the recursive parameter returns the following.

foo_srv.log
foo_srv.timeout

By calling the operation with a value other than CFG_SCOPE_AND_VARS, you can get a list of the names of just string variables (CFG_STRING), just list variables (CFG_LIST), both string and list variables (CFG_VARIABLES), or just scopes (CFG_SCOPE).

3.8.1  Local and Fully-scopes Names

When you call listFullyScopedNames(), all the strings in the returned list are fully-scoped names, so they have the name of the scope followed by a period ("foo_srv.") as a prefix. If you do not want this prefix then you call call listLocallyScopedNames() instead.

3.8.2  Determining the Type of an Entry

Once you get a list of names within a scope, you may want to iterate over the list of names and process each one by calling, say, lookupString() or lookupList(). Obviously, to know which of these operations you should call, you need to know the type of a named entry. You can determine this by calling cfg.type(scope, localName). The value returned from this operation is one of the following integer constants.

3.8.3  Filtering Results with Patterns

The listFullyScopedNames() and listLocallyScopedNames() operations can take an additional String or String[] parameter that specify one or more wildcarded patterns.

String[] names = cfg.listLocallyScopedNames(scope, "",
                      Configuration.CFG_SCOPE_AND_VARS, true, "time*");

If you pass this extra parameter, then a name is included in the returned list only if the name matches at least one of the patterns. Within a pattern, "*" matches zero or more characters. For example, "time*" matches "timeout" but does not match "log.dir".

3.9  Working with Uid entries

Config4* provides operations that make it easy to access entries (variables and scopes) whose names start with a "uid-" prefix.

3.9.1  Expanded and Unexpanded Names

A name like uid-000000042-foo is said to be in its expanded form, while uid-foo is said to be in its unexpanded form. You can convert a name from its expanded form into its unexpanded form with the unexpand() operation.

String name = "uid-000000042-foo.bar.uid-000000043-acme";
String unexpandedName = cfg.unexpand(name);

After executing the above code, the unexpandedName variable has the value "uid-foo.bar.uid-acme".

Calling unexpand() on a string that does not contain "uid-" returns the same string. For example, calling unexpand("foo.bar.acme") returns "foo.bar.acme".

Curious readers may be wondering if there is an expand() operation that does the conversion the opposite way. Yes, there is; expand() embeds a different nine-digit number whenever it encounters "uid-" within the name. When Config4* is parsing a file, it calls expand() for every name it encounters. It is unlikely you will need to call expand() from your own code.

3.9.2  The uidEquals() Operation

The uidEquals() operation takes two parameters. It calls unexpand() on both of its parameters and returns true if the unexpanded names are identical.

name = ...;
if (cfg.uidEquals("uid-foo", name)) { ... }

3.9.3  Processing Uid Entries in Sequence

The listFullyScopedNames() and listLocallyScopedNames() operations return a sorted list of names. This guarantees that the relative order of uid names in the list reflects the order of those entries in the input configuration file. As a concrete example, consider the configuration file in Figure 3.5 (which, for convenience, is a copy of a figure from the previous chapter).

Figure 3.5: File of recipes
uid-recipe {
    name = "Tea";
    ingredients = ["1 tea bag", "cold water", "milk"];
    uid-step = "Pour cold water into the kettle";
    uid-step = "Turn on the kettle";
    uid-step = "Wait for the kettle to boil";
    uid-step = "Pour boiled water into a cup";
    uid-step = "Add tea bag to cup & leave for 3 minutes";
    uid-step = "Remove tea bag";
    uid-step = "Add a splash of milk if you want";
}
uid-recipe {
    name = "Toast";
    ingredients = ["Two slices of bread", "butter"];
    uid-step = "Place bread in a toaster and turn on";
    uid-step = "Wait for toaster to pop out the bread";
    uid-step = "Remove bread from toaster and butter it";
}

Let us assume we want to process each uid-recipe scope in order and, within each of these scopes, we want to process each uid-step in order. You can do this with the code in Figure 3.6.

Figure 3.6: Code to process the file of recipes
void processRecipeFile()
{
    String[]       recipeNames;
    Configuration  cfg;

    cfg = Configuration.create();
    cfg.parse("recipes.cfg");
    recipeNames = cfg.listLocallyScopedNames("", "",
                            Configuration.CFG_SCOPE,
                            false, "uid-recipe");
    for (int i = 0; i < recipeNames.length; i++) {
        processRecipe(cfg, recipeNames[i]);
    }
}
void processRecipe(Configuration cfg, String scope)
{
    String[]       ingredients;
    String         name;
    String[]       stepNames;
    
    name = cfg.lookupString(scope, "name");
    ingredients = cfg.lookupList(scope, "ingredients");
    ... // process name and ingredients
    stepNames = cfg.listLocallyScopedNames(scope, "",
                            Configuration.CFG_STRING,
                            false, "uid-step");
    for (int i = 0; i < stepNames.length; i++) {
        step = cfg.lookupString(scope, stepNames[i]);
        ... // process step
    }
}

The processRecipeFile() operation parses a configuration file and calls listLocallyScopedNames() to obtain a sorted list of the uid-recipe scopes. Then it calls processRecipe() to process each of these scopes.

The body of processRecipe() calls lookupString() and lookupList() to get the values of the name and ingredients variables. Then it calls listLocallyScopedNames() to get a sorted list of the uid-step string variables, and uses a for-loop to process each of these in turn.

3.10  Schema Validation

A schema is a blueprint or definition of a system. For example, a database schema defines the layout of a database: its tables, the columns within those tables, and so on. It is common for a schema to be written in the same syntax as the system it defines. For example, a database’s schema might be stored within a table of the database itself.

Another technology that uses schemas is XML. The first schema language for XML was called document type definition (DTD). Many people felt DTD was sufficient to define schemas for text-oriented XML documents, which tend to have a simple structure, but not flexible enough to define schemas for more structured, data-oriented XML documents. Because of this, several competing XML schema languages were defined, including XML Schema and RELAX NG.

By itself, a schema it not very useful; you also need to have a piece of software, called a schema validator, that can compare a system (database, XML file or whatever) against the system’s schema definition and report errors. Within the Config4* library is a class called SchemaValidator that, as its name suggests, implements a schema validator. In this section I provide a quick overview of this schema validator; you can find full details in Chapter 9.

Figure 3.7 shows a scope, foo, in a configuration file. Figure 3.8 shows some Java code that defines a schema for the foo scope, parses the configuration file and then uses the SchemaValidator class to compare the contents of the foo scope against the schema.

Figure 3.7: A configuration file to be validated
foo {
    idle_timeout = "2 minutes";
    log_level = "3";
    log_file = "/tmp/foo.log";
    price_list = [
        #  item      colour     price
        #----------------------------------
          "shirt",  "green",  "EUR 19.99",
          "jeans",  "blue",   "USD 39.99"
    ];
};
Figure 3.8: Code that performs schema validation
 1  import org.config4j.*;
 2  ...
 3  String scope = "foo";
 4  SchemaValidator sv = new SchemaValidator();
 5  String schema[] = new String[] {
 6      "@typedef colour = enum[red, green, blue]",
 7      "@typedef money = units_with_float[EUR, GBP, YEN, USD]",
 8      "idle_timeout = durationMilliseconds",
 9      "log_level = int[0, 5]",
10      "log_file = string",
11      "price_list = table[string,item, colour,colour, money,price]"
12  };
13  Configuration cfg = Configuration.create();
14  try {
15      cfg.parse(cfg.getenv("FOO_CONFIG"));
16      sv.parseSchema(schema);
17      sv.validate(cfg, scope, "");
18  } catch(ConfigurationException ex) {
19      System.out.println(ex.getMessage());
20  }

Within Figure 3.8, the schema is defined as an array of strings (lines 5–12). Within this schema definition, ignore the fist two lines (strings starting with "@typedef") for the moment. The next line defines an entry called idle_timeout of type durationMilliseconds. You can see that this describes the idle_timeout variable within the foo scope in Figure 3.7. The next line defines an entry called log_level which is an integer in the range 0 to 5. The line after that defines log_file to be of type string. The types used so far (durationMilliseconds, int and string) are built-in types for the schema validator, so the definitions for the first three entries are straightforward.

The definition for price_list is more interesting. You can see from Figure 3.7 that this variable is a list of string, but the list is formatted to look like a table with three columns. The schema definition defines this entry to be a table in which the first column is called item and is of type string, the second column is called colour and is of type colour, and the last column is called price and is of type money. The types colour and money are not built-in types for the schema validator. Instead, the lines in the schema starting with "@typedef" define these types. You can see that colour is defined to be an enum of three possible values (red, green or blue). The money type is defined to be a string of the form "<units> <float>" where the <units> can be one of: "EUR" "GBP" "YEN" or "USD".

After the configuration file has been parsed (line 15), the code uses a SchemaValidator object to parse the schema parameter and stores it in a more efficient format (line 16). Then the validate() operation (line 17) is used to validate the specified scope of the specified configuration object against the schema. If validate() encounters an error, then it reports the error by throwing an exception. The catch clause (lines 18–20) prints out the text of the exception.

3.10.1  Informative Error Messages

You can see from Figure 3.8 that the schema language (lines 5–13) is very compact and easy to understand, and that the API for using the schema validator (lines 17–18) is equally compact and easy to use. You may be wondering: if there are any errors in the configuration file, does the schema validator report easy-to-understand error messages? The answer is yes, as the following examples illustrate.

If you misspell log_level as logLevel then the schema validator reports the following error.

foo.cfg: the ’foo.logLevel’ variable is unknown

If log_level is set to "255" then the schema validator reports the following error.

foo.cfg: bad int value (’255’) for ’foo.log_level’:
outside the permitted range [0, 5]

If "car" appears instead of "green" in the colour column of price_list then the schema validator reports the following error.

foo.cfg: bad colour value (’car’) for the ’colour’ column
in row 1 of the ’foo.price_list’ table: should be one of:
’red’, ’green’, ’blue’

3.10.2  Schemas for Uid Entries

If you want to define a schema for a file that contains uid entries, then you specify the unexpanded form of uid names. For example, Figure 3.9 shows a schema for the recipes file in Figure 3.5.

Figure 3.9: Schema validation for the recipes file
String schema[] = new String[] {
    "uid-recipe = scope",
    "uid-recipe.name = string",
    "uid-recipe.ingredients = list[string]",
    "uid-recipe.uid-step = string"
};

3.11  Summary

The API of Config4* is simple. As demonstrated in Figures 3.1 and 3.2 on page ??, a basic application needs just three steps to use Config4*: (1) create an empty Configuration object; (2) parse() a configuration file; and (3) call lookup<Type>() operations to access configuration variables in a type-safe manner. Doing that will enable the application to avail of some important benefits of Config4*, such as adaptable and centralised configuration.

Other features of Config4* can be accessed with a few extra operation calls.

These capabilities of Config4* are very powerful and flexible, yet the Config4* API remains extremely easy to use.


1
Section 4.2 explains why, before Java 1.5, it was difficult to access environment variables in Java, and how Config4J works around this difficulty.
2
The curl utility was discussed in Section 3.1.

Previous Up Next