VOCABULARY(1) User Contributed Perl Documentation VOCABULARY(1)NAMEvocabulary-- extract vocabularies from Penn treebank files
SYNOPSISvocabulary [-NT ntfile] [-POS posfile] [-word wordfile] [-count]
[-binarized] [-verbose] file1 [file2...]
File1, file2 etc. are the names of Penn treebank files. If none are
specified, STDIN is used.
OPTIONS
NT Write the non-terminal node vocabulary to ntfile.
POS Write the part of speech vocabulary to posfile
word
Write the word vocabulary to wordfile.
count
Print the frequency counts for each of the categories.
binarized
The file is in binarized format.
verbose
Print filenames as they are processed.
DESCRIPTION
Given a list of Penn treebank files, this script extracts the words,
parts of speech, and non-terminal node names and emits each in a
separate file in order of frequency.
Note that giving a "-" argument for any of ntfile, posfile, or wordfile
causes the results to be written to STDOUT.
AUTHOR
W.P. McNeill <billmcn@ssli.ee.washington.edu>
perl v5.20.2 2005-01-05 VOCABULARY(1)