|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnetkit.graph.io.SchemaReader
public final class SchemaReader
This class reads schema type information for a Graph and builds the data structures to describe completely a network of Nodes and Edges, all of their type information and the instance data. The file format for the schema is an extended ARFF format. The extended ARFF file format is line oriented. Valid lines may contain comments, whitespace or tags. Comments are lines where a '%' or '#' appears as the first non-whitespace character and they are terminated by the end-of-line.
Tags are directives beginning with an '@' character in column one. There are six primary tags and they are case-insensitive:
@NODETYPE
@ATTRIBUTE
@NODEDATA
@EDGETYPE
@REVERSIBLE
@EDGEDATA
The first three tags are used for describing Nodes and the latter three are for Edges. Because Edges state and enforce the type of Node they may connect to, the underlying Nodes for each EdgeType must be declared first.
A Node type is defined by the following sequence, exactly one line with a @NODETYPE tag, followed by any number of lines with @ATTRIBUTE tags, followed by exactly one line with a @NODEDATA tag.
The @NODETYPE tag names the node's type or attributes container. It is used to uniquely identify a Node type and distinguish it from other types. The line declaring it appears like this:
@NODETYPE name
There can be any number of @ATTRIBUTE tags per Node type. These lines begin with @ATTRIBUTE followed by the name of the attribute field and the third token on that line is the type of that attribute. The name of each attribute identifies that field among the attributes in one Node and must be unique in that context.
There are four valid attribute types. They can be KEY, CONTINUOUS, DISCRETE or CATEGORICAL. These field types are case insensitive. REAL is a synonym for CONTINUOUS and INT is a synonym for DISCRETE. If any of the first three types appear in the third field on an attribute line, that describes the type of that field. For example:
@ATTRIBUTE field1 KEY
@ATTRIBUTE field2 CONTINUOUS
@ATTRIBUTE field3 REAL
@ATTRIBUTE field4 DISCRETE
@ATTRIBUTE field5 INT
The KEY type can appear only once per Node type and is the unique identifier for a particular Node instance. It corresponds in value to the java String type and each KEY value may appear exactly once. The CONTINUOUS type corresponds to the java double type. The DISCRETE type corresponds to the java double type, but takes on whole number (int) values.
The last type, CATEGORICAL, can take on a fixed set of token values. To declare a CATEGORICAL type the line looks like this:
@ATTRIBUTE field6 {token1,token2,token3,etc...}
The curly brace enclosed list of tokens represents the set of values that this type of field can take on. These values are converted to doubles internally. They are assigned whole number values starting at zero and incrementing by one for each extra token.
The final tag for node types is the @NODEDATA tag. This tag declares that the node type is finalized and no more attribute fields will be added. It also specifies the filename from which the instance data for Nodes of this type will be read. The filename supplied may be a hard or relative path. If a relative path is supplied, it is relative to the location of the schema file being read. The @NODEDATA line looks like this:
@NODEDATA filename
Note @RELATION is an alias for @NODETYPE and @DATA is an alias for @NODEDATA.
The latter three tags describe an EdgeType and Edge connections. Lines with the @EDGETYPE tag supply the EdgeType name in the second token. The third token is the Edge's source Node type and the fourth token is the Edge's destination Node type. Node types correspond to the @NODETYPE tag defined above. These lines appear like so:
@EDGETYPE typeName sourceNodeType destinationNodeType
The @REVERSIBLE tag by it's absence indicates that the Graph is directed. If this line appears, then each added Edge also implies another Edge in the opposite direction must be added. If the @REVERSIBLE tag appears by itself, then the same EdgeType is used for the reversed connection. This implies that the Nodes have the same type. If the @REVERSIBLE includes an optional name in the second token, then this name is used for the reversed EdgeType and is added to the list of EdgeType's in the Graph. In this case, the Nodes do not need to have the same type since they get their own connection type. These lines look like this:
@REVERSIBLE
or
@REVERSIBLE name
Finally, the @EDGEDATA tag lists the file from which to read Edge data. Like the @NODEDATA tag, this tag accepts one extra filename token which specifies where to get the Edges for this EdgeType. And similarly, the filename may be a hard or relative path. The @EDGEDATA line looks like this:
@EDGEDATA filename
Attribute
,
Attributes
,
TokenSet
,
ExpandableTokenSet
,
FixedTokenSet
,
EdgeType
,
Edge
,
Node
,
NodeReader
,
EdgeReaderRN
,
EdgeReaderGDA
,
SchemaWriter
,
NodeWriter
,
EdgeWriterRN
Constructor Summary | |
---|---|
SchemaReader()
|
Method Summary | |
---|---|
static void |
main(java.lang.String[] args)
A test driver for the class. |
static Graph |
readGDASchema(java.io.File nodeFile,
java.io.File edgeFile)
Overloaded entry point for readGDASchema(Reader,Reader) |
static Graph |
readGDASchema(java.io.Reader nodeReader,
java.io.Reader edgeReader)
Reads the Node and Edge information from GDA formatted input, constructs the data structures and instantiates all of the instance data. |
static Graph |
readSchema(java.io.File file)
Overloaded entry point for readSchema(Reader,String) |
static Graph |
readSchema(java.io.Reader reader,
java.lang.String parentDirectory)
Reads the Graph information from a schema file, constructs the data structures and instantiates all of the instance data. |
static Graph |
stressTest(int numFields,
int numNodes,
int numEdges)
This method conducts a stress test by creating a set of random nodes and edges and performs busy-work accessing the node and edge information. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SchemaReader()
Method Detail |
---|
public static Graph readGDASchema(java.io.File nodeFile, java.io.File edgeFile)
readGDASchema(Reader,Reader)
nodeFile
- a File containing the GDA formatted Node
instance data.edgeFile
- a File containing the GDA formatted Edge
instance data.
java.lang.RuntimeException
- if any of the file format constraints
are violated or the files cannot be read.public static Graph readGDASchema(java.io.Reader nodeReader, java.io.Reader edgeReader)
nodeReader
- a Reader object containing the GDA formatted
Node instance data.edgeReader
- a Reader object containing the GDA formatted
Edge instance data.
java.lang.RuntimeException
- if any of the file format constraints
are violated or the files cannot be read.public static Graph readSchema(java.io.File file)
readSchema(Reader,String)
file
- a File from which the schema extended ARFF
description is read; the parent directory of this File is where
instance data files are searched for if relative paths are
supplied in the schema.
java.lang.RuntimeException
- if any of the file format constraints
are violated or the file cannot be read.public static Graph readSchema(java.io.Reader reader, java.lang.String parentDirectory)
reader
- a Reader object from which the schema extended
ARFF description is read.parentDirectory
- a String representing the directory in
which to search for instance data files if the filenames
supplied in the schema are relative paths.
java.lang.RuntimeException
- if any of the file format constraints
are violated or the file cannot be read.public static Graph stressTest(int numFields, int numNodes, int numEdges)
numFields
- determines how many fields are created in each Node.numNodes
- determines how many Nodes to create in the Graph.numEdges
- determines how many Edges to create in the Graph.
public static final void main(java.lang.String[] args)
args
- An array of Strings as the arguments for the test
driver. If one argument is supplied, it contains the name of
the schema file to use in the test which is passed to
readSchema. If two arguments are supplied, they contain the
names of the GDA formatted Node instance data and Edge instance
data which are passed to readGDASchema. If three arguments are
supplied, they are integer values representing a stress test
inputs for the driver which are passed to stressTest. It will
create nodes in memory using number-of-fields-per-node = arg0,
number-of-nodes = arg1, number-of-edges = arg2. All Graph
elements are populated randomly in the latter case.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |