Chapter 3 The SchemaValidator and SchemaType Classes

3.1 The SchemaValidator Class

The public and protected operations of the SchemaValidator class are shown in Figure 3.1. First, I will discuss the public operations, and then the protected one.

Figure 3.1: The SchemaValidator class

package org.config4j;

public abstract class Configuration {
    // Type constants
    public static final int CFG_NO_VALUE      = 0;// bit masks
    public static final int CFG_STRING        = 1;// 0001
    public static final int CFG_LIST          = 2;// 0010
    public static final int CFG_SCOPE         = 4;// 0100
    public static final int CFG_VARIABLES     = 3;// STRING|LIST
    public static final int CFG_SCOPE_AND_VARS= 7;// STRING|LIST|SCOPE
    ...
}

public class SchemaValidator
{
    // Values for the forceMode parameter
    public static final int DO_NOT_FORCE   = 0;
    public static final int FORCE_OPTIONAL = 1;
    public static final int FORCE_REQUIRED = 2;

    public SchemaValidator();

    public void    setWantDiagnostics(boolean wantDiagnostics)
    public boolean getWantDiagnostics()

    public void parseSchema(String[] schema)
                                        throws ConfigurationException; 
    public void validate(
        Configuration    cfg,
        String           scope,
        String           name,
        boolean          recurseIntoSubscopes,
        int              typeMask,
        int              forceMode) throws ConfigurationException:
    public void validate(
        Configuration    cfg,
        String           scope,
        String           name,
        boolean          recurseIntoSubscopes,
        int              typeMask) throws ConfigurationException;

    public void validate(
        Configuration    cfg,
        String           scope,
        String           name,
        int              forceMode) throws ConfigurationException;
    public void validate(
        Configuration    cfg,
        String           scope,
        String           name) throws ConfigurationException;
    protected void registerType(SchemaType type)
                                        throws ConfigurationException; 
}

3.1.1 Public Operations

The getWantDiagnostics() and setWantDiagnostics() operations enable you to get and set a boolean property, the default value of which is false. If you set this to true, then detailed diagnostic messages will be printed to standard output during calls to parseSchema() and validate(). These diagnostic messages may be useful when debugging a schema.

The parseSchema() operation parses a schema definition and stores it in an efficient internal format. The schema is specified as an array of strings. The parseSchema() operation will throw an exception if the parser encounters a problem, such as a syntax error, when parsing the schema.

After you have created a SchemaValidator object and used it to parse a schema, you can then call validate() to validate (a scope within) a configuration file. If you want, you can call validate() repeatedly, perhaps to validate multiple configuration files. The validate() operation merges the scope and localName parameters to form the fully-scoped name of the scope (within the cfg object) to be validated.

The validate() operation is overloaded several times so that some or all of its final three parameters can be omitted, in which case they are given default values.

The recurseIntoSubscopes parameter specifies whether validate() should validate only entries in the scope, or recurse down into sub-scopes to validate their entries too. The default value of this parameter is true.

The typeMask parameter is a bit mask that that specifies which types of entries should be validated. For example, CFG_VARIABLES specifies that variables (but not scopes) should be validated. The default value for this parameter is CFG_SCOPE_AND_VARS.

By default, validate() respects use of the @optional and @required keywords in the schema. However, if you specify FORCE_OPTIONAL for the forceMode parameter, then validate() will act as if all identifiers in the schema have the @optional keyword. Conversely, FORCE_REQUIRED makes validate() act as if all identifiers without an "uid-" prefix in the schema have the @required keyword.

3.1.2 Using registerType() in a Subclass

Later, in Section 3.2, I will explain how you can implement new schema types. If you implement new schema types, then you will need to write a subclass of SchemaValidator to register those new schema types. Figure 3.2 illustrates how to do this.

Figure 3.2: A subclass of SchemaValidator

class SchemaTypeDate { ... }; // Define a new schema type
class SchemaTypeHex { ... };  // Define a new schema type

public class ExtendedSchemaValidator
    extends org.config4j.SchemaValidator
{
    public ExtendedSchemaValidator() {
        super();
        registerType(new SchemaTypeDate());
        registerType(new SchemaTypeHex());
    }
}

Registration of new schema types is trivial: the constructor of the subclass calls the parent constructor, and then calls registerType() to register one instance of each of the new schema types.

Once you have implemented the ExtendedSchemaValidator class to register new schema types, your applications need only create an instance of ExtendedSchemaValidator (instead of SchemaValidator) to be able to make use of those new schema types.

3.2 The SchemaType Class

The SchemaValidator class perform very little of the validation work itself. Instead, it delegates most of this work to other classes, each of which is a subclass of SchemaType (shown in Figure 3.3).

Figure 3.3: The SchemaType class

package org.config4j;

public abstract class SchemaType implements Comparable
{
    public SchemaType(String typeName, int cfgType);

    public String  getTypeName();
    public int     getCfgType();
    public int     compareTo(Object o); // for interface Comparable

    abstract public void checkRule(
        SchemaValidator    sv,
        Configuration      cfg,
        String             typeName,
        String[]           typeArgs,
        String             rule) throws ConfigurationException;

    public void validate(
        SchemaValidator    sv,
        Configuration      cfg,
        String             scope,
        String             name,
        String             typeName,
        String             origTypeName,
        String[]           typeArgs,
        int                indentLevel) throws ConfigurationException;

    public boolean isA(
        SchemaValidator    sv,
        Configuration      cfg,
        String             value,
        String             typeName,
        String[]           typeArgs,
        int                indentLevel,
        StringBuffer       errSuffix);

    protected SchemaType findType(SchemaValidator sv, String name);

    protected final void callValidate(
        SchemaType         target,
        SchemaValidator    sv,
        Configuration      cfg,
        String             scope,
        String             name,
        String             typeName,
        String             origTypeName,
        String[]           typeArgs,
        int                indentLevel) throws ConfigurationException;

    protected final boolean callIsA(
        SchemaType         target,
        SchemaValidator    sv,
        Configuration      cfg,
        String             value,
        String             typeName,
        String[]           typeArgs,
        int                indentLevel,
        StringBuffer       errSuffix) throws ConfigurationException;
}

There is a separate subclass of SchemaType for each schema type. For example, the Config4J library contains SchemaTypeBoolean, which implements the boolean schema type, SchemaTypeInt, which implements the int schema type, and so on.

3.2.1 Constructor and Public Accessors

When the constructor of a subclass of SchemaType calls its parent constructor, the parameters specify the name of the schema type and the configuration entry’s type, which is one of: CFG_STRING, CFG_LIST or CFG_SCOPE. You can see an example of this in Figure 3.4.

Figure 3.4: Example constructor of a subclass of SchemaType

package org.config4j;

class SchemaTypeInt extends SchemaType {
    public SchemaTypeInt() {
        super("int", Configuration.CFG_STRING);
    }
    ...
}

The parameter values passed to the parent constructor are made available via the getTypeName() and getCfgType() operations shown in Figure 3.3.

The SchemaValidator class invokes registerType() to register an instance of each of the predefined schema types and, as previously shown in Figure 3.2, a subclass of SchemaValidator can invoke registerType() to register instances of additional schema types.

3.2.2 The checkRule() Operation

The SchemaValidator class invokes the checkRule() operation of an object representing a schema type when that type is encountered in a schema rule. I will illustrate this through the schema shown in Figure 3.5.

Figure 3.5: Example schema

1  String[] schema = new String[] {
2      "timeout = durationMilliseconds",
3      "fonts = list[string]",
4      "background_colour = enum[grey, white, yellow]",
5      "log = scope",
6      "log.dir = string",
7      "@typedef logLevel = int[0,3]",
8      "log.level = logLevel"
9  };

When parsing the first line of the schema, SchemaValidator invokes checkRule() on the object representing the durationMilliseconds schema type. When parsing the next line in the schema, the SchemaValidator invokes checkRule() on the object representing the list schema type, and so on.

Among the parameters passed to checkRule() is typeArgs (of type String[]), which contains the arguments, if any, for the type. This parameter will be an empty array for the rules in lines 2, 5 and 6 of Figure 3.5. For the rule in line 3, typeArgs will contain one string ("string"); and for the rule in line 4, it will contain three strings ("grey", "white" and "yellow"). You might think that typeArgs should be empty for the rule in line 8. However, the logLevel type used in line 8 was defined in line 7 to be int[0,3]. Because of this, when checkRule() is called for the rule in line 8, typeArgs will contain two strings ("0" and "3").

The implementation of checkRule() must determine whether the strings in typeArgs are valid, and throw an exception containing a descriptive error message if not. For example:

The implementation of SchemaTypeInt.checkRule() throws an exception unless: (1) there are zero strings in typeArgs; or (2) there are two strings in typeArgs, both strings can be parsed as integers, and the first integer is smaller than or equal to the second integer.
The implementation of SchemaTypeList.checkRule() throws an exception unless there is exactly one string in typeArgs, and that string is the name of a schema type whose configuration entry’s type is CFG_STRING. This checkRule() operation invokes findType() to search for the specified schema type; findType() returns null if the type does not exist.

Deciding whether the typeArgs parameter contains acceptable strings is the primary purpose of checkRule(). Most of the other parameters are provided to help checkRule() make that decision and to format an informative exception message if necessary.

One of the demonstration applications provided with Config4J is called extended-schema-validator. That demo contains a class called SchemaTypeHex that implements a hex (hexadecimal integer) schema type. That class’s implementation of checkRule() is shown in Figure 3.6. A bold font indicates how the operation makes use of parameters.

Figure 3.6: Implementation of SchemaTypeHex.checkRule()

public class SchemaTypeHex extends SchemaType
{
    public void checkRule(
        SchemaValidator    sv,
        Configuration      cfg,
        String             typeName,
        String[]           typeArgs,
        String             rule) throws ConfigurationException
    {
        int                len;
        int                maxDigits;
    
        len = typeArgs.length;
        if (len == 0) {
            return;
        } else if (len > 1) {
            throw new ConfigurationException("schema error: the ’"
                        + typeName + "’ type should take either no "
                        + "arguments or 1 argument (denoting "
                        + "max-digits) in rule ’" + rule + "’");
        }
        try {
            maxDigits = cfg.stringToInt("", "", typeArgs[0]);
        } catch(ConfigurationException ex) {
            throw new ConfigurationException("schema error: "
                    + "non-integer value for the ’max-digits’ "
                    + "argument in rule ’" + rule + "’");
        }
        if (maxDigits < 1) {
            throw new ConfigurationException("schema error: the "
                    + "max-digits argument must be 1 or greater in "
                    + "rule ’" + rule + "’");
        }
    }
    ...
}

The only parameter not used in the body of the operation is sv, which is of type SchemaValidator. That parameter is used by the checkRule() operation in the list, table and tuple types when invoking findType() to determine if items in typeArgs are names of types.

3.2.3 The isA() and validate() Operations

Subclasses of SchemaType should implement the isA() and validate() operations. However, the default implementation of isA() is suitable for list-based types, and the default implementation of validate() is suitable for string-based types. Because of this, a subclass of SchemaType needs to implement only one of these two operations.

3.2.3.1 String-based Types: isA()

If you are providing schema support for a string-based type, then you must implement the isA() operation. Among the parameters passed to this operation is a string called value; the isA() operation should return true if value can be parsed as the schema type. For example, the SchemaTypeInt::isA() operation returns true for "42" and returns false for "hello, world".

If isA() returns false, then the operation can optionally set the errSuffix parameter (which is of type StringBuffer) to be a descriptive message that explains why the string is not suitable. This message will be appended to an exception message.

Figure 3.7 illustrates how isA() might be implemented for a schema type that denotes hexadecimal integers. A bold font indicates how the operation makes use of parameters. This implementation of isA() contains two straightforward checks. First, it checks whether value consists of hexadecimal digits. Second, if typeArgs specifies a maximum number of digits, then isA() checks if this limit has been exceeded.

Figure 3.7: Implementation of isA() for a hex type

public class SchemaTypeHex extends SchemaType
{
    public boolean isA(
        SchemaValidator    sv,
        Configuration      cfg,
        String             value,
        String             typeName,
        String[]           typeArgs,
        int                indentLevel,
        StringBuffer       errSuffix) throws ConfigurationException
    {
        int                    maxDigits;

        if (!isHex(value)) {
            errSuffix.append("the value is not a hexadecimal number");
            return false;
        }
        if (typeArgs.length == 1) {
            //--------
            // Check if there are too many hex digits in the value
            //--------
            maxDigits = cfg.stringToInt("", "", typeArgs[0]);
            if (value.length() > maxDigits) {
                errSuffix.append("the value must not contain more "
                                 + "than " + maxDigits + " digits");
                return false;
            }
        }
        return true;
    }

    public static boolean isHex(String str) {
        ... // implementation will be shown later in this chapter
    }
    ...
}

3.2.3.2 List-based Types: validate()

Config4* has three built-in, list-based schema types: list, tuple and table. Each of these schema types takes arguments, for example:

String[] schema = new String[] {
    "@typedef money = units_with_float[\"£\", \"$\", \"€\"]",
    "fonts      = list[string]",
    "point      = tuple[float,x, float,y]",
    "price_list = table[string,product, money,price]"
};

Each of those list-based schema types implements validate() in a similar way, so I will discuss only the implementation for the table schema type, using the definition of price_list in the above example.

A call of cfg.lookupList(scope, name) is made to retrieve the value of the list variable from the configuration object.
The typeArgs parameter contains all the arguments to the schema type ("string", "product", "money" and "price" for the price_list variable in the example). Those pairs of strings define the types and names of columns within the table. The validate() operation checks that the length of the list is a multiple of the number of columns in the table’s definition.
Finally, validate() iterates over all the items in the list. For each item, validate() calls findType() for the item’s column type (obtained from typeArgs) to retrieve the item’s schema type; it invokes the isA() operation of that type, and throws an exception if isA() returns false.
The invocation of isA() is not made directly. Rather, it is made indirectly by invoking callIsA(), which is shown in Figure 3.3. Doing this ensures that diagnostic messages can be printed if the SchemaValidator was created with true specified for the wantDiagnostics constructor parameter.

If you want to implement schema support for a list-based type, then you should implement the validate() operation in a manner similar to that described above. I recommend that you examine the source code of the SchemaTypeList, SchemaTypeTable or SchemaTypeTuple class for concrete details.

3.3 Adding Utility Operations to a Schema Type

The infrastructure within Config4J to support a built-in data type is split over three classes:

The SchemaType<Type> class implements the schema validation infrastructure.
The SchemaValidator class calls registerType() to register each schema type.
The Configuration class provides operations with names of the form lookup<Type>(), is<Type>() and stringTo<Type>().

In this chapter, I have explained how you can provide schema validation support for a new type by writing a SchemaType<Type> class and registering it in a subclass of SchemaValidator. However, I have not yet explained how you can write a subclass of Configuration to implement the lookup<Type>(), is<Type>() and stringTo<Type>() operations.

The Configuration class is an abstract base class, and its static create() operation creates an instance of a hidden, concrete subclass. This enforces a separation between the public API and the implementation details of Config4*. Most of the time, this separation is beneficial. However, it has a drawback: you cannot write a subclass of Configuration to add additional operations, such as lookup<Type>(), is<Type>() and stringTo<Type>().

A good way to workaround this drawback is to define the desired functionality as static operations in the SchemaType<Type> class. For example, if you are writing a class called SchemaTypeHex (for hexadecimal integers), then you can implement lookupHex(), isHex(), and stringToHex() as static operations in the SchemaTypeHex class. This is illustrated in Figure 3.8.

Figure 3.8: Utility operations in the SchemaTypeHex class

public class SchemaTypeHex extends SchemaType {
    public SchemaTypeHex() {
        super("hex", Configuration.CFG_STRING);
    }
    public static int lookupHex(
        Configuration    cfg,
        String           scope,
        String           localName) throws ConfigurationException
    {
        String str = cfg.lookupString(scope, localName);
        return stringToHex(cfg, scope, localName, str);
    }
    public static int lookupHex(
        Configuration    cfg,
        String           scope,
        String           localName,
        int              defaultVal) throws ConfigurationException
    {
        if (cfg.type(scope, localName) == Configuration.CFG_NO_VALUE) {
            return defaultVal;
        }
        String str = cfg.lookupString(scope, localName);
        return stringToHex(cfg, scope, localName, str);
    }
    public static int stringToHex(
        Configuration    cfg,
        String           scope,
        String           localName,
        String           str,
        String           typeName) throws ConfigurationException
    {
        try {
            return (int)Long.parseLong(str, 16);
        } catch(NumberFormatException ex) {
            throw new ConfigurationException(cfg.fileName()
                        + ": bad " + typeName + " value (’" + str
                        + "’) specified for ’"
                        + cfg.mergeNames(scope, localName) + "’");
        }
    }

    public static int stringToHex(
        Configuration    cfg,
        String           scope,
        String           localName,
        String           str) throws ConfigurationException
    {
        return stringToHex(cfg, scope, localName, str, "hex");
    }

    public static boolean isHex(String str)
    {
        try {
            Long.parseLong(str, 16);
            return true;
        } catch(NumberFormatException ex) {
            return false;
        }
    }
    ... // checkRule() and isA() were shown earlier in this chapter
}