Chapter 8 Configuration File Syntax

8.1 Introduction

This chapter discusses all the syntax acceptable in configuration files. Figure 8.1 provides a formal grammar for most of the syntax but, for brevity, the grammar omits some definitions. For example, the lexical definition of comments, strings and identifiers are discussed in text rather than being defined in the grammar of Figure 8.1. Likewise, the string and list functions (denoted by StringFunction and ListFunction in the grammar) are discussed in text rather than being defined in the grammar.

Figure 8.1: Formal grammar of Config4* syntax

Notation: | denotes choice, [...] denotes an optional component,
{...}* denotes 0 or more repetitions, and (...) denotes grouping.

configFile     = StmtList
StmtList       = { Stmt }*
Stmt           = IDENTIFIER ( "=" | "?=" | "+=" ) StringExpr ";"
               | IDENTIFIER ( "=" | "?=" | "+=" ) ListExpr ";"
               | IDENTIFIER "{" StmtList "}" [ ";" ]
               | "@include" StringExpr [ "@ifExists" ] ";"
               | "@copyFrom" IDENTIFIER [ "@ifExists" ] ";"
               | "@remove" IDENTIFIER ";"
               | "@error" StringExpr ";"
               | "@if" "(" Condition ")" "{" StmtList "}"
                 { "@elseIf" "(" Condition ")" "{" StmtList "}" }*
                 [ "@else" "{" StmtList "}" ]
                 [ ";" ]
StringExpr     = String { "+" String }*
String         = STRING
               | IDENTIFIER
               | StringFunction
ListExpr       = List { "+" List }*
List           = "[" StringExprList [ "," ] "]"
               | IDENTIFIER
               | ListFunction
StringExprList = empty
               | StringExpr  "," StringExpr *
Condition      = OrCondition
OrCondition    = AndCondition { "||" AndCondition }*
AndCondition   = TermCondition { "&&" TermCondition }*
TermCondition  = [ "!" ] "(" Condition ")"
               | StringExpr "==" StringExpr
               | StringExpr "!=" StringExpr
               | StringExpr "@in" ListExpr
               | StringExpr "@matches" StringExpr

8.2 Comments

A comment starts with the "#" character and continues until the end of the line, as shown in the example below:

# This is a comment

Comments are removed by the lexical analyser, which is why they are not mentioned in the formal grammar in Figure 8.1.

8.3 Strings

There are two ways to write a STRING.

The first way is as a sequence of characters enclosed within double quotes. Within such a string, "%" acts as an escape character. The recognized escape sequences are as follows. %n denotes a newline character. %t denotes a TAB character. %" denotes a double quote. %% denotes a percent sign. Many programming languages use a backslash ("\") as an escape character so the use of "%" may seem strange to some people. However, in my experience, using "\" as an escape character results in awkwardness when writing Windows-style directory names, such as C:\temp\foo.txt, which normally have to be written as C:\\temp\\foo.txt. Config4* uses "%" as the escape character to avoid this problem.

The second way to write a string is as a (possibly multi-line) sequence of characters enclosed between <% and %>. No escape sequences are recognised between <% and %>. If the <%...%> notation seems familiar to some readers it is because this notation is borrowed from Java Server Pages (JSP). The <%...%> notation is useful if you want to embed, say, a code segment in a configuration file.

You can combine both forms of string by using the string concatenation ("+") operator.

big_string = <%
    ... // some Java code
%> + "<%" + <%
    ... // some more Java code
%>

8.4 Identifiers

An IDENTIFIER is a sequence of one or more of the following characters: upper- or lower-case letters, digits, a minus sign ("-"), an underscore ("_"), a colon (":"), a period ("."), a dollar sign ($), a question mark ("?"), a forward slash ("/") or a backslash ("\"). There are two comments to be made about this range of allowable characters.

First, one goal of Config4* is to support internationalization, so accented characters (such as "á" and "ö") and ideographs are permitted in an IDENTIFIER. Likewise, the digits permitted in an IDENTIFIER include the Roman digits ("0" through to "9") as well as digits used in other scripts.¹

Second, a Config4* IDENTIFIER should be able to support names not just in many human languages, but also names in many computer languages. For example, ensuring that Foo$Bar, X::Y::Z, and done? are valid identifiers makes it possible for a Config4* file to store meta-data about applications written in many popular programming languages, such as C++, Java, Perl and Ruby. Likewise, permitting "/" and "\" in identifiers enables a Config4* file to contain meta-data about file names and (a useful subset of) URLs.

Config4* applies special treatment to any identifier that starts with "uid-", for example, uid-foo. The "uid-" prefix denotes a unique identifier; you can read the motivation for such identifiers in Section 2.12. Config4* modifies the name of a "uid-" prefixed variable by inserting a sequence of nine digits and "-" after "uid-". For example, uid-foo might be changed to uid-000000042-foo. The nine-digit number starts at zero and is incremented by one for every encounter of an identifier that has a "uid-" prefix.

If Config4* encounters an identifier starting with "uid-<digits>-", then the digits are replaced with a newly generated nine-digit number. This is to ensure correct behaviour in pathological cases such as the following. Consider a configuration file that contains multiple uid-foo identifiers. If this file is parsed and the dump() operation is used to save the parsed file to, say, expanded-uid.cfg, then the newly written file may contain identifiers of the form uid-<digits>-foo. Now consider another file of the form:

uid-foo { ... };
uid-foo { ... };
uid-foo { ... };
@include "expanded-uid.cfg";

When parsing the above file, it is necessary to replace the digits of the uid-foo entries contained in the expanded-uid.cfg file to ensure they do not conflict with the expanded form of the uid-foo entries defined before the @include command.

8.5 Assignment Statements

An unconditional assignment statement takes one the form:

name = value;

A conditional assignment statement takes the form:

name ?= value;

A conditional assignment statement assigns a value to the specified variable only if the variable does not already have a value.

An append assignment statement takes the form:

name += value;

An append assignment statement appends the specified value to an already-existing variable.

Note that all three forms of the assignment statement are terminated with a semicolon (";"). A value can be either a string or a list of comma-separated strings inside matching "[" and "]":

local_domain = "bar.com";          # a string
some_fonts = ["Times", "Courier"]; # a list
some_fonts += ["Garamond"];

You can use the "+" operator to concatenate strings and lists.

host = "foo." + local_domain;
all_fonts = some_fonts + ["Ariel", "Symbol"];

The above example also illustrates that one variable can be defined in terms of a previously defined variable. For example, the host variable is defined by concatenating together a string and the local_domain variable.

8.6 Scopes

A configuration file can contain named scopes. The following example defines a scope called server that contains several assignment statements.

server {
    name              = "bankSrv";
    timeout           = "2 minutes";
    diagnostics_level = "2";
}

You can optionally place a semicolon after the closing "}" of a scope. The reason for this is that a scope looks a bit like a class definition in C++ or Java. A semicolon appears after the class definition in C++ but not in Java.

class Foo { ... }; // C++
class Bar { ... }  // Java

Being flexible about whether or not a semicolon follows the closing "}" of a scope makes it easy for people who come from a C++ or Java background.

You cannot use an @include statement (discussed in Section 8.7) inside a scope. Instead, @include statements can be used only in the global scope.

The fully scoped name of a variable is its local name prefixed by the name of its enclosing scope and separated by ".". In the example at the start of this section, the fully scoped name of timeout is server.timeout. Use of scopes enables users to type local (that is, the short form of) names rather than the longer, fully scoped names. At the start of this section was an example that made use of a scope. That example is equivalent to the following, more verbose example, which does not use scopes:

server.name              = "bankSrv";
server.timeout           = "2 minutes";
server.diagnostics_level = "2";

You can re-open scopes and nest them arbitrarily. For example:

outer {
    inner {
        foo = "Hello, world";
    };
};
outer.inner { # re-opening of scope
        bar = "Goodbye, world";
};

When a variable is used in an expression, the search for that variable usually starts at the current scope and works outwards. You can override this search order by prefixing the variable with a dot; this instructs Config4* to look for the specified variable in the global scope. For example, the value of outer.inner.food_1 below is "apples and oranges", while the value of outer.inner.food_2 is "apples and bananas".

fruit = "bananas";
outer {
    fruit = "oranges";
    inner {
        food_1 = "apples and " + fruit;
        food_2 = "apples and " + .fruit;
    };
};

8.7 The @include Statement

An @include statement instructs Config4* to parse the specified configuration file.

@include "/tmp/foo.cfg";

By default, @include reports an error if the specified file does not exist. However, if you place "@ifExists" at the end of an @include statement then @include does not complain about a non-existent file.

@include "/tmp/foo.cfg" @ifExists;

The @include command can parse not just files, but also the output of executing an external command. This is done by using a string of the form "exec#command" as an argument to @include.

@include "exec#curl -sS http://localhost/someFile.cfg";

By default, @include reports an error if the specified command exits with an error status. You can instruct Config4* to ignore the unsuccessful execution of an @include command by placing "@ifExists" at the end of the @include statement.

@include "exec#curl -sS http://localhost/someFile.cfg" @ifExists;

Version 1.2 of Config4J introduces an additional, and Java-specific, form of the @include statement, in which the file to be included is specified on the classpath.

@include "classpath#path/to/file.cfg";

8.8 The @copyFrom Statement

The @copyFrom statement takes the following form:

@copyFrom "scope";

This command copies all the variables and nested scopes from the specified scope into the current scope. The typical use of this command is to copy default values from one scope into several other scopes, as Figure 8.2 shows.

Figure 8.2: Examples of the @copyFrom statement

acme {
    defaults {
        log {
            dir   = "C:\acme\logs";
            level = "0";
        };
        timeout = "2 minutes";
        thread_pool_size = "5";
    };
    app_1 {
        @copyFrom "acme.defaults";
    };
    app_2 {
        @copyFrom "acme.defaults";
        log.level = "1";
    };
    app_3 {
        @copyFrom "acme.defaults";
        thread_pool_size = "10";
    };
};

In this example, the acme.defaults scope contains all the configuration variables likely to have similar values in most of the applications (denoted by the scopes acme.app_1, acme.app_2 and acme.app_3). Then the scope for a particular application, for example, acme.app_1, uses the @copyFrom command to copy the values from the acme.defaults scope. Notice that the acme.app_2 and acme.app_3 scopes copy all the values from the acme.defaults scope and then selectively override some values.

When using the @copyFrom statement, you must specify the fully scoped name of the scope to be copied. For example, the @copyFrom statements in Figure 8.2 specify the scope as acme.defaults rather than as just defaults. If a configuration file contains deeply nested scopes, then specifying the fully scoped name of a scope to be copied can result in undesirable verbosity. However, Section 8.12.6 explains how the siblingScope() function can reduce such verbosity.

By default, @copyFrom reports an error if the specified scope does not exist. However, if you place "@ifExists" at the end of an @copyFrom statement then @copyFrom does not complain about a non-existent scope.

@copyFrom "acme.defaults" @ifExists;

The @ifExists form of the @copyFrom command can be used to override some variables based on, for example, the operating system, the user running the application or the host on which the application is running.

override.pizza { ... }
override.pasta { ... }
fooSrv {
   # Set default values
   ...
   # Modify some values for particular hosts
   @copyFrom "override." + exec("hostname") @ifExists;
}

8.9 The @if-then-@else Statement

Figure 8.3 shows some examples of @if-then-@else statements.

Figure 8.3: Configuration file with advanced features

 1  production_hosts = ["pizza", "pasta", "zucchini"];
 2  test_hosts       = ["foo", "bar", "widget", "acme"];
 3  
 4  @if (exec("hostname") @in production_hosts) {
 5      server_x.port = "5000";
 6      server_y.port = "5001";
 7      server_z.port = "5002";
 8  } @elseIf (exec("hostname") @in test_hosts) {
 9      server_x.port = "6000";
10      server_y.port = "6001";
11      server_z.port = "6002";
12  } @else {
13      @error "This is not a production or test machine";
14  }
15  @if (osType() == "windows") {
16      tmp_dir = replace(getenv("TMP"), "\", "/");
17  } @else {
18      tmp_dir = "/tmp";
19  }

The conditions used in @if-then-@else statements can be in any of the following formats.

"string" == "another string"
"string" != "another string"
"string" @in ["a", "list", "of", "string"]
"string" @matches "pattern". Within the pattern, "*" is a wildcard that matches zero or more characters. For example, the condition "hello" @matches "*lo" evaluates to true.
condition && condition. This is the boolean AND of two conditions.
condition || condition. This is the boolean OR of two conditions.
(condition). The parenthesis are used for grouping.
!(condition). This is the negation of a condition.

8.10 The @error Statement

The @error statement instructs Config4* to stop parsing and instead report an error.

@error "Something has gone wrong";

Config4* reports the error by throwing an exception back to application code. The application code should communicate the exception’s text message to the user, for example, by writing the text message to a console or displaying it in a GUI dialog box.

8.11 The @remove Statement

The @remove statement removes a previously-defined variable or scope. To see why the @remove command might be useful, let us assume you want to specify the full path names of several log files that happen to reside in the same directory. It would be tedious to write the full path name of the directory for each log file. Instead, you can can define a temporary variable called, say, _log_dir and used it as follows:

_log_dir = "/path/to/log/dir";
app1_log_file = _log_dir + "/app1.log";
app2_log_file = _log_dir + "/app2.log";
app3_log_file = _log_dir + "/app3.log";
@remove _log_dir;

A useful convention shown in the above example is to use an underscore ("_") at the start of the name of a temporary variable. This makes it easy to see which variables are “normal” variables and which are temporary ones that will be removed later.

You may be wondering why temporary variables should be removed at all. There are two reasons for this. First, unneeded variables clutter up a configuration file and so can cause confusion for users. Second, by insisting a configuration file contain only required variables, an application can make use of a schema validator (Chapter 9) that can perform extensive error checking on the contents of a configuration file.

8.12 Functions

Table 8.1 lists the functions that Config4* provides.

Table 8.1: Config4* functions

Function Return type Section

configFile() string 8.12.5

configType("name") string 8.12.6

exec("command") string 8.12.3

exec("command", "default value") string 8.12.3

fileToDir("/path/to/file.txt") string 8.12.5

getenv("name") string 8.12.2

getenv("name", "default value") string 8.12.2

isFileReadable("fileName.txt") boolean 8.12.6

join(["list", "of", "string"], " ") string 8.12.4

osDirSeparator() string 8.12.1

osPathSeparator() string 8.12.1

osType() string 8.12.1

readFile("/path/to/file.txt") string 8.12.5

replace("\a\b\c", "\", "/") string 8.12.4

siblingScope("name") string 8.12.6

split("red green blue", " ") list 8.12.4

Config4* considers the opening "(" to be part of a function name, so you cannot place a space before it. For example, Config4* accepts the first statement below but reports an error for the second statement:

x = configFile();  # okay
y = configFile (); # error

Treating the opening "(" as being part of a function’s name might seem strange, but Config4* does this to guarantee that the names of functions do not conflict with the names of variables or scopes. This makes it possible for future versions of Config4* to provide additional functions without any risk of the newly added functions causing problems for existing configuration files.

The following subsections discuss the functions in logical groupings.

8.12.1 Querying the Operating System

Some of the built-in functions have names starting with "os", which indicates they return information about the operating system environment.

The osType() function returns "windows" if you are running on a Microsoft Windows-based computer, and "unix" if you are running on a UNIX-based computer.

The osDirSeparator() function returns the character that the operating system uses as a directory separator. This is "\" on Windows and "/" on UNIX.

The osPathSeparator() function returns the character that the operating system uses to separate a list of directories. This is ";" on Windows and ":" on UNIX.

8.12.2 Accessing Environment Variables

The getenv() function enables you to access an environment variable. This function can take either one or two parameters. The first parameter is the name of the environment variable to access:

example = getenv("FOO_HOME");

The second (and optional) parameter to this function is a default value that is used if the specified environment variable does not exist:

example = getenv("FOO_HOME", "/tmp");

If you do not specify a default value and the specified environment variable does not exist then Config4* reports an error:

someFile.cfg, line 12: cannot access the ’FOO_HOME’
environment variable

8.12.3 Executing External Commands

The exec() function executes an external command and returns whatever text that command writes to its standard output. This function can take either one or two parameters. The first parameter is the external command to execute, as the following examples illustrate:

example_1 = exec("hostname");
example_2 = exec("ls /tmp");
example_3 = exec("ls " + getenv("HOME", "/") );

The second (and optional) parameter to this function is a default value that is used if Config4* cannot successfully execute the specified external command:

example = exec("hostname", "localhost");

If you do not specify a default value and Config4* cannot successfully execute the specified external command then Config4* reports an error:

someFile.cfg, line 3: exec("ls /x/y/z") failed:
ls: /x/y/z: No such file or directory

8.12.4 Manipulating Strings and Lists

The example below illustrates the split() and join() functions:

colours_and_spaces = "red green blue";
colour_list = split(colours_and_spaces, " ");
colours_and_commas = join(colour_list, ",");

The split() function takes two parameters. The first parameter is a string to be broken up into a list of smaller strings. The second parameter indicates a search string; the first string is broken into list elements at each occurrence of this search string. In the above example, colour_list is assigned the value ["red", "green", "blue"].

The join() function is the opposite of split(). It takes two parameters; the first parameter is a list and the second parameter is a string. The join() function concatenates all the elements of the list using the string as a separator. In the above example, colours_and_commas is assigned the value "red,green,blue".

In the above example, the overall effect of using split() and join() is to replace all spaces in a string with commas. To make this easier, Config4* provides a replace() function.

colours_and_commas=replace("red green blue", " ", ",");

The replace() function takes three string parameters: original, search and replacement. This function replaces all occurrences of the search string in the original string with the replacement string.

8.12.5 Files and Directories

The configFile() function does not take any parameters; it returns the name of the configuration file being parsed.

The fileToDir() function takes one parameter—the name of a file—and returns the name of the directory in which that file resides. The returned directory name is guaranteed to not have "/" or "\" at the end. For example, fileToDir("/tmp/foo.cfg") returns "/tmp". As the table in Table 8.2 shows, the fileToDir() function works even for boundary cases, such as for files in the root directory of a file system.

Table 8.2: Example results of calling fileToDir()

filename fileToDir(filename)

"/tmp/foo.cfg" "/tmp" (UNIX and Windows)

"C:\tmp\foo.cfg" "C:\tmp" (Windows only)

"foo.cfg" "." (UNIX and Windows)

"/foo.cfg" "/." (UNIX and Windows)

"\foo.cfg" "\." (Windows only)

"C:\foo.cfg" "C:\." (Windows only)

The combination fileToDir(configFile()) returns the directory in which the configuration file being parsed resides. This can be useful if you want to write a top-level configuration file that includes other configuration files that reside within the same directory.

@include fileToDir(configFile()) + "/file1.cfg";
@include fileToDir(configFile()) + "/file2.cfg";
@include fileToDir(configFile()) + "/file3.cfg";

This technique can work even if the configuration file is hosted on a web server and is being accessed through the curl utility. To see why, let’s assume the top-level configuration file is specified as:

exec#curl -sS http://myHost/foo/foo.cfg

Config4* will execute that command and then parse its output. During this parsing, the configFile() function returns:

exec#curl -sS http://myHost/foo/foo.cfg

The fileToDir() function does not check that its parameter is a valid file name; rather it just trims its parameter back to the last occurrence of "/", so the result of fileToDir(configFile()) is:

exec#curl -sS http://myHost/foo

The first @include statement in the example appends "/file1.cfg", so the @include statement becomes:

@include "exec#curl -sS http://myHost/foo/file1.cfg";

One thing to keep in mind is that downloading a multi-part configuration file from a web server will be slower than downloading a monolithic configuration file. It will probably take just a fraction of a second longer to download the multi-part configuration file, so you might think that such an overhead is insignificant. However, in a large organization there might be thousands of users downloading their applications’ configuration files from the same web server. In such an organization, all those fractions of a second extra overhead might add up to be a significant overhead.

8.12.6 Miscellaneous Functions

The configType() function takes a string parameter that specifies the fully-scoped name of an entry in the configuration file. It returns the value "string" if the entry is a string variable, "list" if the entry is a list variable, "scope" if the entry is a scope, or "no_value" if there is no such entry.

The isFileReadable() function takes a string parameter that specifies the name of a file. It returns true if the file exists and is readable; it returns false otherwise. An example of the intended use of this function is shown below:

files_to_process = ["file1.txt", "file2.txt", "file3.txt"];
@if (isFileReadable("file4.txt")) {
    files_to_process = files_to_process + ["file4.txt"];
}

The siblingScope() function takes a string parameter that specified the local name of a scope that is a sibling of the current scope. It returns the fully scoped name of the specified scope. This function is provided to simplify a common use case of the @copyFrom statement that is shown in Figure 8.4.

Figure 8.4: Verbose @copyFrom statements

acme.uk.london.sales {
    defaults {
        timeout = "2 minutes";
        log.level = "1";
    }
    app1 {
        @copyFrom "acme.uk.london.sales.defaults";
    }
    app2 {
        @copyFrom "acme.uk.london.sales.defaults";
       log.level = "0";
    }
    app3 {
        @copyFrom "acme.uk.london.sales.defaults";
       log.level = "0";
    }
}

It is common for the @copyFrom statement to be used to copy the contents of a scope that is at the same level of nesting—what I call a sibling scope. If the sibling scope is deeply nested in the configuration file, then, as shown in Figure 8.4, the @copyFrom statement can be quite verbose. If, later on, the scope hierarchy is renamed (perhaps by being copy-and-pasted to another part of the configuration file), then all the @copyFrom statements will have to be updated to specify the renamed sibling scope. Doing this has be tedious and error-prone.

Figure 8.5 shows the configuration file after it has been modified to make use of the siblingScope() function. The @copyFrom statements in this modified file are more concise and easier to visually verify for correctness. In addition, if the acme.uk.london.sales scope is renamed, then the @copyFrom statements will continue to work without any need for updating.

Figure 8.5: Using siblingScope() to get concise @copyFrom statements

acme.uk.london.sales {
    defaults {
        timeout = "2 minutes";
        log.level = "1";
    }
    app1 {
        @copyFrom siblingScope("defaults");
    }
    app2 {
        @copyFrom siblingScope("defaults");
       log.level = "0";
    }
    app3 {
        @copyFrom siblingScope("defaults");
       log.level = "0";
    }
}

1: Readers should be forewarned that some implementations of Config4* may have incomplete internationalization support. You can find a discussion of this in the Config4* Maintenance Guide.

Function	Return type	Section
configFile()	string	8.12.5
configType("name")	string	8.12.6
exec("command")	string	8.12.3
exec("command", "default value")	string	8.12.3
fileToDir("/path/to/file.txt")	string	8.12.5
getenv("name")	string	8.12.2
getenv("name", "default value")	string	8.12.2
isFileReadable("fileName.txt")	boolean	8.12.6
join(["list", "of", "string"], " ")	string	8.12.4
osDirSeparator()	string	8.12.1
osPathSeparator()	string	8.12.1
osType()	string	8.12.1
readFile("/path/to/file.txt")	string	8.12.5
replace`("\a\b\c", "\", "/")`	string	8.12.4
siblingScope`("name")`	string	8.12.6
split("red green blue", " ")	list	8.12.4

filename	fileToDir(filename)
"/tmp/foo.cfg"	"/tmp" (UNIX and Windows)
`"C:\tmp\foo.cfg"`	`"C:\tmp"` (Windows only)
"foo.cfg"	"." (UNIX and Windows)
"/foo.cfg"	"/." (UNIX and Windows)
`"\foo.cfg"`	`"\."` (Windows only)
`"C:\foo.cfg"`	`"C:\."` (Windows only)