GO::OntologyProvider::UserlContributed)GO::OntologyProvider::OntologyParser(3)NAMEGO::OntologyProvider::OntologyParser - Provides API for retrieving data
from Gene Ontology files
SYNOPSIS
use GO::OntologyProvider::OntologyParser;
my $ontology = GO::OntologyProvider::OntologyParser->new(ontologyFile => "process.ontology");
print "The ancestors of GO:0006177 are:\n";
my $node = $ontology->nodeFromId("GO:0006177");
foreach my $ancestor ($node->ancestors){
print $ancestor->goid, " ", $ancestor->term, "\n";
}
$ontology->printOntology();
DESCRIPTIONGO::OntologyProvider::OntologyParser implements the interface defined
by GO::OntologyProvider, and parses a gene ontology file (GO) in plain
text (not XML) format. These files can be obtained from the Gene
Ontology Consortium web site, http://www.geneontology.org/. From the
information in the file, it creates a directed acyclic graph (DAG)
structure in memory. This means that GO terms are arranged into tree-
like structures where each GO node can have multiple parent nodes and
multiple child nodes.
This data structure can be used in conjunction with files in which
certain genes are annotated to corresponding GO nodes.
Each GO ID (e.g. "GO:1234567") has associated with it a GO node. That
GO node contains the name of the GO term, a list of the nodes directly
above the node ("parent nodes"), and a list of the nodes directly below
the current node ("child nodes"). The "ancestor nodes" of a certain
node are all of the nodes that are in a path from the current node to
the root of the ontology, with all repetitions removed.
The format of the GO file is as follows (taken from
http://www.geneontology.org/doc/GO.doc.html)
Comment lines:
Lines that begin ! are comment lines.
$ lines:
Line in which the first non-space character is a $ either reflect the
domain and aspect of the ontology (i.e. $text) or the end of file (i.e.
the $ character on a line by itself).
Versioning:
The first lines of each file after any html header information (in
*.html files) always carry information about the version, the date of
last update, (optionally) the source of the file, the name of the
database, the domain of the file and the editors of the file, e.g.:
! !Gene Ontology ![domain of file] ! !editors: Michael Ashburner
(FlyBase), Midori Harris (GO), Judith Blake (MGD) !Leonore Reiser
(TAIR), Karen Christie (SGD) and colleagues !with software by Suzanna
Lewis (FlyBase Berkeley).
Syntax:
Parent-child relationships between terms are represented by
indentation:
parent_term
child_term
Instance relationship:
%term0
%term1 % term2
To be read as term1 being an instance of term0 and also an instance of
term2. Part of relationship:
%term0
%term1 < term2 < term3
To be read as term1 being an instance of term0 and also a part-of of
term2 and term3.
Line syntax (showing the order in which items appear on a line; *
indicates optional item):
< | % term [; db cross ref]* [; synonym:text]* [ < | % term]*
Instance Constructor
new
This is the constructor for an OntologyParser object. The constructor
expects one of two arguments, either an 'ontologyFile' argument, or an
'objectFile' argument. When instantiated with an ontologyFile
argument, it expects it to correspond to an ontology file created by
the GO consortium, according to their file format. When instantiated
with an objectFile argument, it expects to open a previously created
ontologyParser object that has been serialized to disk (see
serializeToDisk).
Usage:
my $ontology = GO::OntologyProvider::OntologyParser->new(ontologyFile => $ontologyFile);
my $ontology = GO::OntologyProvider::OntologyParser->new(objectFile => $objectFile);
Instance Methods
printOntology
This prints out the ontology, with redundancies, to STDOUT. It does
not yet print out all of the ontology information (like relationship
type etc). This method will be likely be removed in a future version,
so should not be relied upon.
Usage:
$ontologyParser->printOntology;
allNodes
This method returns an array of all the GO:Nodes that have been
created.
Usage:
my @nodes = $ontologyParser->allNodes;
rootNode
This returns the root node in the ontology.
my $rootNode = $ontologyParser->rootNode;
nodeFromId
This public method takes a GOID and returns the GO::Node that it
corresponds to.
Usage :
my $node = $ontologyParser->nodeFromId($goid);
If the GOID does not correspond to a GO node, then undef will be
returned. Note if you try to call any methods on an undef, you will
get a fatal runtime error, so if you can't guarantee all GOIDs that you
supply are good, you should check that the return value from this
method is defined.
numNodes
This public method returns the number of nodes that exist with the
ontology
Usage :
my $numNodes = $ontologyParser->numNodes;
serializeToDisk
Saves the current state of the Ontology Parser Object to a file, using
the Storable package. Saves in network order for portability, just in
case. Returns the name of the file. If no filename is provided, then
the name of the file (and its directory, if one was provided) used for
object construction, will be used, with .obj appended. If the object
was instantiated from a file with a .obj suffix, then the same filename
would be used, if none were provided.
This method currently causes a segfault on MacOSX (at least 10.1.5 ->
10.2.3), with perl 5.6, and Storable 1.0.14, when trying to store the
process ontology. This failure occurs using either store, or nstore,
and is manifested by a segmentation fault. It has not been
investigated whether this is a perl problem, or a Storable problem
(which has large amounts of C-code). This does not cause a
segmentation on Solaris, using perl 5.6.1 and Storable 1.0.13. This
does not make it clear whether it is a MacOSX problem or a perl problem
or not. It should be noted that newer versions of both perl and
Storable exist, and the code should be tested with those as well.
Usage:
my $objectFile = $ontologyParser->serializeToDisk(filename=>$filename);
Authors
Gavin Sherlock; sherlock@genome.stanford.edu
Elizabeth Boyle; ell@mit.edu
perl v5.14.12007-GO::OntologyProvider::OntologyParser(3)