Chapter 2 Overview of Config4* Syntax

This chapter provides an overview of the syntax used in Config4* configuration files. A complete definition of the syntax is provided in Chapter 8.

2.1 Comments, Variables and Scopes

Figure 2.1 provides a simple example of a Config4* configuration file.

Figure 2.1: Example configuration file

 1  # this is a comment
 2  name = "Fred";
 3  greeting = "hello, " + name;
 4  some_names = ["Fred", "Mary", "John"];
 5  more_names = ["Sue", "Ann", "Kevin"];
 6  all_names = some_names + more_names;
 7  server.defaults {
 8      timeout = "2 minutes";
 9      log {
10          dir = "C:\foo\logs";
11          level = "0";
12      }
13  }
14  foo_srv {
15      @copyFrom "server.defaults";
16      log.level = "1";
17  }
18  bar_srv {
19      @copyFrom "server.defaults";
20      timeout = "30 seconds";
21  }

Comments, like the one shown in line 1, start with "#" and continue until the end of the line. Most of the lines in a configuration file contain assignment statements. These are of the form name=value, where the value can be a string (line 2) or a list of strings (line 4). You can use the "+" operator to concatenate both strings (line 3) and lists (line 6). Strings are usually delimited between double quotes.

There are two ways to write a string. The first way (which is illustrated in Figure 2.1) is as a sequence of characters enclosed within double quotes. Within such a string, "%" acts as an escape character. For example, %n denotes a newline character, and %" denotes a double quote.

The second way to write a string is as a (possibly multi-line) sequence of characters enclosed between <% and %>. No escape sequences are recognised between <% and %>. The <%...%> notation is useful if you want to embed, say, a code segment in a configuration file. You can combine both forms of string by using the string concatenation ("+") operator.

A configuration file can contain named scopes (lines 7, 9, 14, and 18 in Figure 2.1). Scopes can be nested (line 9) and re-opened. The scoping operator is ".". For example, the name log.level refers to a variable called level inside a scope called log. You do not have to explicitly open a scope to define a variable or a nested scope within it. For example, line 7 opens the server.defaults scope without opening the outer server scope. Likewise, line 16 defines log.level without explicitly opening the log scope.

2.2 Copying Default Values

All keywords (for example, @include, @if and @copyFrom) start with the "@" symbol: this ensures there can never be a clash between the name of a keyword and the name that you might wish to use for a configuration variable or scope.

The @copyFrom statement (lines 15 and 19 in Figure 2.1) copies the entire contents (variables and nested scopes) of the specified scope into the current scope. This provides a simple, yet effective, reuse mechanism. For example, if several applications use similar configuration values then you can put common values into one scope and then use the @copyFrom statement to copy these into application-specific configuration scopes. It is not an error to assign a new value to an existing variable. This makes it possible to override default values obtained via a @copyFrom statement.

2.3 Including Other Files

An @include statement (not shown in Figure 2.1) includes the contents of another configuration file into the current one. For example:

@include "/tmp/foo.cfg";

You can use string concatenation to form the file name. For example:

@include fileToDir(configFile()) + "/subsystem1.cfg";
@include fileToDir(configFile()) + "/subsystem2.cfg";
@include fileToDir(configFile()) + "/subsystem3.cfg";

This example also uses fileToDir(configFile()), which is a combination of two built-in function calls that returns the name of the directory in which the configuration file being parsed resides. This technique of combining the @include command with these built-in functions enables you to split a (potentially) large amount of configuration information across several smaller files. Doing this can simplify the maintenance of configuration files.

Config4* has many built-in functions. You can find a complete list of them in Section 8.12. In Section 2.2, I mentioned that all keywords are prefixed with "@" to prevent the possibility of a clash between a keyword and the name that you might wish to use for a configuration variable or scope. For the same reason, all functions are suffixed with "(". Thus, fileToDir( is the start of a function call, but fileToDir is the name of a variable or scope.

2.4 Including the Output of Commands

The @include command can include not just files, but also the output resulting from executing arbitrary shell commands. For example, the curl utility (http://curl.haxx.se) is a command-line tool that can output the contents of a specified URL, such as a web page or a file at an FTP site. If you have curl installed on your computer, then a configuration file can have @include commands similar to those shown below.¹

@include "exec#curl -sS http://localhost/someFile.cfg";
@include "exec#curl -sS ftp://localhost/someFile.cfg";

As these examples illustrate, if the argument to an @include statement starts with "exec#" then the argument is executed as a shell command and the standard output from that command is included.

The ability to execute arbitrary commands is very flexible, but it poses a security risk. For example, we need to guard against a malicious person adding something like the following to a configuration file on Windows.

@include "exec#del /F /S /Q C:\";

Such a command would delete everything on the C: drive of the computer (somewhat similar to "exec#rm -rf /" on UNIX). Chapter 5 discusses the mechanism that Config4* provides to guard against such security threats.

2.5 Accessing the Environment

You can access environmental information in a configuration file. For example, you can use getenv("FOO_HOME") to access an environment variable called FOO_HOME.

install_dir = getenv("FOO_HOME");

You can use the replace() function to perform a search-and-replace, as the example below demonstrates.

install_dir = replace(getenv("FOO_HOME"), "\", "/");

In the above example, the replace() function replaces all occurrences of "\" with "/" in the specified string. This is a useful tactic when you run an application on Windows that insists on dealing with UNIX-style file and directory names.

You can use the exec("command") function to execute an external command and capture its standard output. For example, on both UNIX and Windows, the hostname command prints the name of the computer. You can access this information as shown in the following example:

url = "http://" + exec("hostname") + ":8080/"
log_dir = "/net/" + exec("hostname") + "/logs";

2.6 Temporary Variables

Sometimes you may want several variables to have values that share a common prefix. Rather than explicitly (re)stating the common prefix several times, you might decide to assign it to a temporary variable, use that temporary variable to help you define the “real” variables, and then finally @remove the temporary variable. The example below illustrates this.

_install_dir = getenv("FOO_HOME");
bin_dir = _install_dir + "/bin";
etc_dir = _install_dir + "/etc";
log_dir = _install_dir + "/logs";
@remove _install_dir;

As the above example illustrates, a convention is that the name of a temporary variable starts with an underscore. The @remove statement does what its name suggests: it removes the specified configuration variable.

You may wonder what is the point of removing a variable: why not just leave _install_dir in existence? The answer is that by insisting a configuration file contain only required variables, an application can make use of a schema validator that can perform extensive error checking on the contents of a configuration file. I will discuss schema validation later (Section 3.10).

2.7 The @if-then-@else Statement

By themselves, the exec() and getenv() functions (discussed earlier in this chapter) are of limited use. However, they become much more useful when combined with an @if-then-@else statement. You can see some examples of this is in Figure 2.2.

Figure 2.2: Configuration file with advanced features

 1  production_hosts = ["pizza", "pasta", "zucchini"];
 2  test_hosts       = ["foo", "bar", "widget", "acme"];
 3  
 4  @if (exec("hostname") @in production_hosts) {
 5      server_x.port = "5000";
 6      server_y.port = "5001";
 7      server_z.port = "5002";
 8  } @elseIf (exec("hostname") @in test_hosts) {
 9      server_x.port = "6000";
10      server_y.port = "6001";
11      server_z.port = "6002";
12  } @else {
13      @error "This is not a production or test machine";
14  }
15  if (osType() == "windows") {
16      tmp_dir = replace(getenv("TMP"), "\", "/");
17  } @else {
18      tmp_dir = "/tmp";
19  }

To reduce the chances of a mis-configured client application on a test machine accidentally communicating with a server application in production, some organizations use one set of server port numbers in testing, and a different set of server port numbers in production. Traditionally, this separation was accomplished by having one configuration file for test machines, and having another configuration file for production machines. However, the cascading @if-then-@else statement at lines 4–14 in Figure 2.2 shows it is possible to have a single configuration file that adapts itself to its environment.

The @error statement (line 13) instructs Config4* to stop parsing and instead report an error. This provides a way for a configuration file to report that it is being used outside of its intended domain.

The osType() function (line 15) returns a string, such as "windows" or "unix", that indicates the host operating system. If you want to check which variant of UNIX is being used then you can use exec("uname").

2.8 Conditional @include and @copyFrom

By default, @include reports an error if the specified file does not exist. However, if you place @ifExists at the end of an @include statement, then @include does not complain about a non-existent file.

@include "/path/to/foo.cfg" @ifExists;

The conditional @include provides a way for an application’s configuration file to set default values and then include an optional user-specific configuration file to override default values. For example, the configuration file for a program called foo running on UNIX might be structured as shown below.²

# Set default configuration values
...
# Now optionally include user-specific overrides
@include getenv("HOME") + "/.foo.cfg" @ifExists;

You can use an "@ifExists" clause not just with an @include statement, but also with @copyFrom, as shown below.

override.pizza { ... }
override.pasta { ... }
foo_srv {
   # Set default values
   ...
   # Modify some values for particular hosts
   @copyFrom from "override." + exec("hostname") @ifExists;
}

2.9 Append Assignment

The append assignment statement, which uses the "+=" operator, was introduced in version 1.2 of Config4*.

greeting = "Hello";
greeting += ", world";

The second line in the above example is equivalent to the line below.

greeting = greeting + ", world";

The append assignment statement is often used in conjunction with the "@copyFrom" command, as shown below.

app.defaults {
  options = ["default", "options"];
  ...
}
my_app {
  @copyFrom "app.defaults";
  options += ["extra options"];
}

2.10 Conditional Assignment

Config4* provides a way for an application to integrate command-line options with a configuration file. To illustrate this, consider an application that is started in the following manner.

myApp.exe -set username Fred -set password fgTR742 -cfg foo.cfg

The application could be written to perform the following steps during initialisation.

The application creates an (initially empty) configuration object.
The application examines its command-line options. Whenever it encounters an option of the form "-set name value", it inserts that name-value pair to the configuration object.
Finally, the application uses the configuration object to parse the file specified by the "-cfg file" command-line option.

The above algorithm ensures that the command-line options processed in step 2 become “preset” variables in the configuration object when the configuration file is parsed (step 3).

Within a configuration file, the ?= operator performs conditional assignment; it assigns a value to a variable only if the variable does not already have a value.

username ?= "";
password ?= "";

In this way, a configuration file can provide default values for some variables, and those default values can be overridden via command-line options on the application.

2.11 Centralizable and Adaptive Configuration

Some basic capabilities of Config4*, for example, name=value pairs and scopes, can be found in other configuration technologies. However, many of its other capabilities are not so common.

You can use getenv() to access a named environment variable, such as HOME or USERNAME. You can also use osType() to determine the operating system’s type.
You can use exec() to capture the output from executing an external command, such as hostname or (on UNIX) uname.
You can pass the results of getenv(), exec() or osType() as arguments to @include or @copyFrom statements, or use them in conditions in @if-then-@else statements.

These capabilities mean that one Config4* file can contain configuration for multiple users, running an application on multiple computers and multiple operating systems. Or to put it another way: a configuration file can “adapt” itself to its environment. I call this ability adaptive configuration.

The ability of Config4* to parse not just a configuration file but also the output of external commands, such as curl, makes it possible for an organization to centralize the adaptive configuration files of Config4*-enabled applications. Such centralization can significantly reduce administration overheads, especially when a large organization deploys an application on hundreds, thousands, or even tens of thousands, of computers.

2.12 The uid- prefix

The discussion in this chapter so far has focussed on using Config4* to store configuration information, which, in essence, is simple data in the form of name=value pairs, optionally organised into scopes. In this section, I discuss an additional feature of Config4* that makes it possible to store more complex data in Config4* files, thus greatly expanding the potential range of uses of Config4*.

Let’s assume you want to store some information about employees in a configuration file. You might try writing the following.

employee { name = "John Smith"; ... }
employee { name = "Jane Doe"; ... }

However, that will not work. This is because the second occurrence of the employee scope re-opens the existing scope, so the details of Jane Doe overwrite those of John Smith. You could work around this by using a unique number as a suffix on the name of each scope.

employee_1 { name = "John Smith"; ... }
employee_2 { name = "Jane Doe"; ... }

This will work, but you have to keep track of the numbers that have been used already to ensure you do not accidentally reuse one of those numbers in the name of a new scope. Config4* eliminates this burden by treating an identifier (that is, the name of a scope or variable) in a special way if it starts with "uid-"; uid is an abbreviation for unique identifier. Consider the following file.

uid-employee { name = "John Smith"; ... }
uid-employee { name = "Jane Doe"; ... }

Config4* keeps a counter that starts at zero and is incremented for each identifier starting with "uid-". Config4* automatically renames these identifiers so that the counter (expressed as a nine-digit number) is embedded in them. For example, the first occurrence of uid-employee might be renamed as uid-000000000-employee, the next occurrence renamed as uid-000000001-employee, the next occurrence renamed as uid-000000002-employee and so on.³

You might be wondering why the unique number is always expressed as nine digits with leading zeros. The reason has to do with how Config4* is implemented. When Config4* parses a configuration file it stores all the entries (that is, variables and scopes) in hash tables. Hash tables provide a fast lookup mechanism but they do not preserve the order in which the entries were originally defined in the input file. However, the API of Config4* makes it easy for a program to get a sorted list of entries. Expressing uid numbers as nine digits with leading zeros guarantees that a sorted list of entries contains all the uid entries in the order in which they appeared in the input file. This makes it possible for a program to process uid entries in their original order, if desired.

As a slightly contrived example for the use of uid entries, consider a file that stores recipes, like that in Figure 2.3. Each recipe is stored in its own uid-recipe scope. I do not care about the order of recipes, but the "uid-" prefix frees me from the burden of having to think of a unique name for the scope of each recipe. Within a uid-recipe scope, the relative order of the ingredients and name entries is not important so they do not have a "uid-" prefix. However, each step in the recipe must be performed in strict sequence so they have a "uid-" prefix.

Figure 2.3: File of recipes

uid-recipe {
    name = "Tea";
    ingredients = ["1 tea bag", "cold water", "milk"];
    uid-step = "Pour cold water into the kettle";
    uid-step = "Turn on the kettle";
    uid-step = "Wait for the kettle to boil";
    uid-step = "Pour boiled water into a cup";
    uid-step = "Add tea bag to cup & leave for 3 minutes";
    uid-step = "Remove tea bag";
    uid-step = "Add a splash of milk if you want";
}
uid-recipe {
    name = "Toast";
    ingredients = ["Two slices of bread", "butter"];
    uid-step = "Place bread in a toaster and turn on";
    uid-step = "Wait for toaster to pop out the bread";
    uid-step = "Remove bread from toaster and butter it";
}

Although most readers will not be interested in using Config4* to store recipes, the issues I described in that example often occur in real-world systems. A typical case is Ant (http://ant.apache.org), which is a popular build system for Java-based applications (in much the same way that make is a popular build system for C and C++ applications). Ant reads a build specification from an XML file. The build file contains, among other things, a collection of target elements that are analogous to a “recipe” for compiling or packaging a unit of software. Within each target element there is an ordered collection of tasks, which are analogous to the ordered “steps” within a recipe. A target may also have a list of targets upon which it depends; in Config4* this could be expressed as a non-uid variable, similar to ingredients in Figure 2.3.

2.13 Summary

Config4* has several features, such as name=value pairs, scopes and an @include statement, that are common to many other configuration technologies. However, Config4* provides additional capabilities that are more rare, and which are very useful.

Adaptable configuration. A Config4* file can use getenv(), exec() and osType() to query its environment, and the results of these queries can be used in @if-then-@else, @include and @copyFrom statements. This enables a configuration file to adapt to its environment. In addition, conditional assignment (the ?= operator) enables a configuration file to take account of command-line arguments.
Centralised configuration. Config4* can parse not just a configuration file, but also the output of executing a command. Combining this capability with curl makes it feasible to store a configuration file in a centralised location, such as a web server. Such centralization can significantly reduce administration overheads, especially when a large organization deploys an application on hundreds or thousands of computers.
Uid entries. The "uid-" prefix makes it possible for Config4* to be used to store not just simple configuration files but also complex, structured data in which there may be multiple items of a similar nature or guaranteed ordering of items is important.

This chapter has presented an overview of the syntax used in a Config4* configuration file (you can find full details in Chapter 8). The next chapter provides an overview of the API provided by Config4* for C++ and Java programmers.

1: By default, curl prints diagnostics to standard error. The -s option instructs curl to be silent, but unfortunately, this option means that curl does not print error messages either. You can use -sS to instruct curl to print error messages but no other diagnostics.
2: In UNIX, the HOME environment variable specifies the “home” directory for a user, which is where a user normally stores personal files. By convention, the name of a configuration file for an application stored in this directory starts with ".", and is followed by the name of the application.
3: The use of a nine-digit number means that Config4* can cope with up to 10⁹ uid entries. This number is what most English-speaking countries call a billion, but many other countries call a thousand million (and they use the term billion to mean 10¹², that is, a million million): http://en.wikipedia.org/wiki/Long_and_short_scales. In the extremely unlikely event that you exceed the limitation of 10⁹ uid entries, the Config4* Maintenance Guide explains how you can make simple changes to the source code of Config4* to increase the limit.