SchemaReader (Network Learning Toolkit API Documentation Version: 1.4.0)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

netkit.graph.io
Class SchemaReader

java.lang.Object
  netkit.graph.io.SchemaReader

public final class SchemaReader
extends java.lang.Object
extends java.lang.Object

This class reads schema type information for a Graph and builds the data structures to describe completely a network of Nodes and Edges, all of their type information and the instance data. The file format for the schema is an extended ARFF format. The extended ARFF file format is line oriented. Valid lines may contain comments, whitespace or tags. Comments are lines where a '%' or '#' appears as the first non-whitespace character and they are terminated by the end-of-line.

Tags are directives beginning with an '@' character in column one. There are six primary tags and they are case-insensitive:

@NODETYPE
@ATTRIBUTE
@NODEDATA
@EDGETYPE
@REVERSIBLE
@EDGEDATA

The first three tags are used for describing Nodes and the latter three are for Edges. Because Edges state and enforce the type of Node they may connect to, the underlying Nodes for each EdgeType must be declared first.

A Node type is defined by the following sequence, exactly one line with a @NODETYPE tag, followed by any number of lines with @ATTRIBUTE tags, followed by exactly one line with a @NODEDATA tag.

The @NODETYPE tag names the node's type or attributes container. It is used to uniquely identify a Node type and distinguish it from other types. The line declaring it appears like this:

@NODETYPE name

There can be any number of @ATTRIBUTE tags per Node type. These lines begin with @ATTRIBUTE followed by the name of the attribute field and the third token on that line is the type of that attribute. The name of each attribute identifies that field among the attributes in one Node and must be unique in that context.

There are four valid attribute types. They can be KEY, CONTINUOUS, DISCRETE or CATEGORICAL. These field types are case insensitive. REAL is a synonym for CONTINUOUS and INT is a synonym for DISCRETE. If any of the first three types appear in the third field on an attribute line, that describes the type of that field. For example:

@ATTRIBUTE field1 KEY
@ATTRIBUTE field2 CONTINUOUS
@ATTRIBUTE field3 REAL
@ATTRIBUTE field4 DISCRETE
@ATTRIBUTE field5 INT

The KEY type can appear only once per Node type and is the unique identifier for a particular Node instance. It corresponds in value to the java String type and each KEY value may appear exactly once. The CONTINUOUS type corresponds to the java double type. The DISCRETE type corresponds to the java double type, but takes on whole number (int) values.

The last type, CATEGORICAL, can take on a fixed set of token values. To declare a CATEGORICAL type the line looks like this:

@ATTRIBUTE field6 {token1,token2,token3,etc...}

The curly brace enclosed list of tokens represents the set of values that this type of field can take on. These values are converted to doubles internally. They are assigned whole number values starting at zero and incrementing by one for each extra token.

The final tag for node types is the @NODEDATA tag. This tag declares that the node type is finalized and no more attribute fields will be added. It also specifies the filename from which the instance data for Nodes of this type will be read. The filename supplied may be a hard or relative path. If a relative path is supplied, it is relative to the location of the schema file being read. The @NODEDATA line looks like this:

@NODEDATA filename

Note @RELATION is an alias for @NODETYPE and @DATA is an alias for @NODEDATA.

The latter three tags describe an EdgeType and Edge connections. Lines with the @EDGETYPE tag supply the EdgeType name in the second token. The third token is the Edge's source Node type and the fourth token is the Edge's destination Node type. Node types correspond to the @NODETYPE tag defined above. These lines appear like so:

@EDGETYPE typeName sourceNodeType destinationNodeType

The @REVERSIBLE tag by it's absence indicates that the Graph is directed. If this line appears, then each added Edge also implies another Edge in the opposite direction must be added. If the @REVERSIBLE tag appears by itself, then the same EdgeType is used for the reversed connection. This implies that the Nodes have the same type. If the @REVERSIBLE includes an optional name in the second token, then this name is used for the reversed EdgeType and is added to the list of EdgeType's in the Graph. In this case, the Nodes do not need to have the same type since they get their own connection type. These lines look like this:

@REVERSIBLE
or
@REVERSIBLE name

Finally, the @EDGEDATA tag lists the file from which to read Edge data. Like the @NODEDATA tag, this tag accepts one extra filename token which specifies where to get the Edges for this EdgeType. And similarly, the filename may be a hard or relative path. The @EDGEDATA line looks like this:

@EDGEDATA filename

Author:: Kaveh R. Ghazi, Sofus A. Macskassy
See Also:: Attribute, Attributes, TokenSet, ExpandableTokenSet, FixedTokenSet, EdgeType, Edge, Node, NodeReader, EdgeReaderRN, EdgeReaderGDA, SchemaWriter, NodeWriter, EdgeWriterRN

Constructor Summary
`SchemaReader()`

Method Summary
`static void`	`main(java.lang.String[] args)` A test driver for the class.
`static Graph`	`readGDASchema(java.io.File nodeFile, java.io.File edgeFile)` Overloaded entry point for `readGDASchema(Reader,Reader)`
`static Graph`	`readGDASchema(java.io.Reader nodeReader, java.io.Reader edgeReader)` Reads the Node and Edge information from GDA formatted input, constructs the data structures and instantiates all of the instance data.
`static Graph`	`readSchema(java.io.File file)` Overloaded entry point for `readSchema(Reader,String)`
`static Graph`	`readSchema(java.io.Reader reader, java.lang.String parentDirectory)` Reads the Graph information from a schema file, constructs the data structures and instantiates all of the instance data.
`static Graph`	`stressTest(int numFields, int numNodes, int numEdges)` This method conducts a stress test by creating a set of random nodes and edges and performs busy-work accessing the node and edge information.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

SchemaReader

public SchemaReader()

Method Detail

readGDASchema

public static Graph readGDASchema(java.io.File nodeFile,
                                  java.io.File edgeFile)

Overloaded entry point for readGDASchema(Reader,Reader)

Parameters:: nodeFile - a File containing the GDA formatted Node instance data.; edgeFile - a File containing the GDA formatted Edge instance data.
Returns:: the constructed Graph object.
Throws:: java.lang.RuntimeException - if any of the file format constraints are violated or the files cannot be read.

readGDASchema

public static Graph readGDASchema(java.io.Reader nodeReader,
                                  java.io.Reader edgeReader)

Reads the Node and Edge information from GDA formatted input, constructs the data structures and instantiates all of the instance data. This method is provided for backwards compatability with GDA file format.

Parameters:: nodeReader - a Reader object containing the GDA formatted Node instance data.; edgeReader - a Reader object containing the GDA formatted Edge instance data.
Returns:: the constructed Graph object.
Throws:: java.lang.RuntimeException - if any of the file format constraints are violated or the files cannot be read.

readSchema

public static Graph readSchema(java.io.File file)

Overloaded entry point for readSchema(Reader,String)

Parameters:: file - a File from which the schema extended ARFF description is read; the parent directory of this File is where instance data files are searched for if relative paths are supplied in the schema.
Returns:: the constructed Graph object.
Throws:: java.lang.RuntimeException - if any of the file format constraints are violated or the file cannot be read.

readSchema

public static Graph readSchema(java.io.Reader reader,
                               java.lang.String parentDirectory)

Reads the Graph information from a schema file, constructs the data structures and instantiates all of the instance data.

Parameters:: reader - a Reader object from which the schema extended ARFF description is read.; parentDirectory - a String representing the directory in which to search for instance data files if the filenames supplied in the schema are relative paths.
Returns:: the constructed Graph object.
Throws:: java.lang.RuntimeException - if any of the file format constraints are violated or the file cannot be read.

stressTest

public static Graph stressTest(int numFields,
                               int numNodes,
                               int numEdges)

This method conducts a stress test by creating a set of random nodes and edges and performs busy-work accessing the node and edge information.

Parameters:: numFields - determines how many fields are created in each Node.; numNodes - determines how many Nodes to create in the Graph.; numEdges - determines how many Edges to create in the Graph.
Returns:: the constructed Graph object.

main

public static final void main(java.lang.String[] args)

A test driver for the class.

Parameters:: args - An array of Strings as the arguments for the test driver. If one argument is supplied, it contains the name of the schema file to use in the test which is passed to readSchema. If two arguments are supplied, they contain the names of the GDA formatted Node instance data and Edge instance data which are passed to readGDASchema. If three arguments are supplied, they are integer values representing a stress test inputs for the driver which are passed to stressTest. It will create nodes in memory using number-of-fields-per-node = arg0, number-of-nodes = arg1, number-of-edges = arg2. All Graph elements are populated randomly in the latter case.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

netkit.graph.io Class SchemaReader

SchemaReader

readGDASchema

readGDASchema

readSchema

readSchema

stressTest

main

netkit.graph.io
Class SchemaReader