DJVUSED(1) DjVuLibre-3.5 DJVUSED(1)NAME
djvused - Multi-purpose DjVu document editor.
SYNOPSIS
djvused [options] djvufile
DESCRIPTION
Program djvused is a powerful command line tool for manipulating multi-
page documents, creating or editing annotation chunks, creating or
editing hidden text layers, pre-computing thumbnail images, and more.
The program first reads the DjVu document djvufile and executes a num‐
ber of djvused commands.
Djvused commands can be read from a specific file (when option -f is
specified), read from the command line (when option -e is specified),
or read from the standard input (the default).
OPTIONS-v Cause djvused to print a command line prompt before reading com‐
mands and a brief message describing how each command was exe‐
cuted. This option is very useful for debugging djvused scripts
and also for interactively entering djvused commands on the
standard input.
-f scriptfile
Cause djvused to read commands from file scriptfile.
-e command
Cause djvused to execute the commands specified by the option
argument commands. It is advisable to surround the djvused com‐
mands by single quotes in order to prevent unwanted shell expan‐
sion.
-s Cause djvused to save the file djvufile after executing the
specified commands. This is similar to executing command save
immediately before terminating the program.
-u Cause djvused to print hidden text and annotations as UTF-8
instead of encoding non-ASCII characters with octal escape
sequences for maximal portability. This option is convenient for
manually editing or viewing the djvused output. This option
also causes the emission of an UTF-8 BOM under Windows.
-n Cause djvused to disregard save commands. This is useful for
debugging djvused scripts without overwriting files on your
disk.
DJVUSED EXAMPLES
There are many ways to use program djvused. The following examples
illustrate some common uses of this program.
Obtaining the size of a page
Command size outputs the width and height of the selected pages using a
HTML friendly syntax. For instance, the following command prints the
size of page 3 of document myfile.djvu.
djvused myfile.djvu -e 'select 3; size'
Extracting the hidden text
Command print-pure-txt outputs the text associated with a page or a
document. For instance, the following shell command outputs the text
for the entire document. Lines and pages are delimited by the usual
control characters.
djvused myfile.djvu -e 'print-pure-txt'
Command print-txt produces a more extensive output describing the
structure and the location of the text components. The syntax of this
output is described later in this man page. For instance, the follow‐
ing shell command outputs extended text information for page 3 of docu‐
ment myfile.djvu.
djvused myfile.djvu -e 'select 3; print-txt'
Extracting the annotations
Annotation data can be extracted using command print-ant. The syntax
of the annotation data is described later in this man page. For
instance, the following shell command outputs the annotation data for
the first page of document myfile.djvu.
djvused myfile.djvu -e 'select 1; print-ant'
Command print-ant only prints the annotations stored in the selected
component file. Command print-merged-ant also retrieves annotations
from all the component files referenced by the current page (using INCL
chunks) and prints the merged information.
Dumping/restoring annotations and text
Three commands, output-txt, output-ant, and output-all, produce djvused
scripts. For instance, the following shell command produces a djvused
script, myfile.dsed, that recreates all the text and annotation data in
document myfile.djvu.
djvused myfile.djvu -e 'output-all' > myfile.dsed
Script myfile.dsed is a text file that can be easily edited. The fol‐
lowing shell command then recreates the text and annotation information
in file myfile.djvu.
djvused myfile.djvu -f myfile.dsed -s
Extracting a page
Both commands save-page and save-page-with create a DjVu file repre‐
senting the selected component file of a document. The following shell
command, for instance, creates a file p05.djvu containing page 5 of
document myfile.djvu.
djvused myfile.djvu -e 'select 5; save-page p05.djvu'
Each page of a document might import data from another component file
using the so-called inclusion ( INCL ) chunks. Command save-page then
produces a file with unresolved references to imported data. Such a
file should then be made part of a multi-page document containing the
required data in other component files. On the other hand, command
save-page-with copies all the imported data into the output file. This
file is directly usable. Yet collecting several such files into a
multi-page document might lead to useless data replication.
Pre-computing thumbnails
Commands set-thumbnails constructs thumbnails that can be later dis‐
played by DjVu viewers. The following shell command, for instance,
computes thumbnails of size 64x64 pixels for all pages of file
myfile.djvu.
djvused myfile.djvu -e 'set-thumbnails 64' -s
DJVUSED COMMANDS
Command lines might contain zero, one, or more djvused commands and an
optional comment. Multiple djvused commands must be separated by a
semicolon character ';'. Comments are introduced by the '#' character
and extend until the end of the command line.
Selection commands
Multi-page DjVu documents are composed of a number of component files.
Most component files describe a specific page of a document. Some com‐
ponent files contain information shared by several pages such as shared
image data, shared annotations or thumbnails. Many djvused commands
operate on selected component files. All component files are initially
selected. The following commands are useful for changing the selec‐
tion.
n Print the total number of pages in the document.
ls List all component files in the document. Each line contains an
optional page number, a letter describing the component file
type, the size of the component file, and identifier of the com‐
ponent file. Component file type letters P, I, A, and T respec‐
tively stand for page data, shared image data, shared annotation
data, and thumbnail data. Page numbers are only listed for com‐
ponent files containing page data. When it is set, the optional
page title (see command set-page-title below) is displayed after
the component file identifier.
select [fileid]
Select the component file identified by argument fileid. Argu‐
ment fileid must be either a page number or a component file
identifier. The select command selects all component files when
the argument fileid is omitted.
select-shared-ant
Select a component file containing shared annotations. Only one
such component file is supported by the current DjVu software.
This component file usually contains annotations pertaining to
the whole document as opposed to specific pages. An error mes‐
sage is displayed if there is no such component file.
create-shared-ant
Create and select a component file containing shared annota‐
tions. This command only selects the shared annotation compo‐
nent file if such a component file already exists. Otherwise it
creates a new shared annotation component file and makes sure
that it is imported by all pages in the document.
showsel
Shows the currently selected component files with the same for‐
mat as command ls.
Text and annotation commands
print-pure-txt
Print the text stored in the hidden text layer of the selected
pages. A similar capability is offered by program djvutxt.
Structural information is sometimes represented by control char‐
acters. Text from different pages is delimited by form feed
characters ("\f"). Lines are delimited by newline characters
("\n"). Columns, regions, and paragraphs are sometimes delim‐
ited by vertical tab ("\013"), group separators ("\035") and
unit separators ("\037") respectively.
print-txt
Prints extensive hidden text information for the selected pages.
This information describes the structure of the text on the doc‐
ument page and locates the structural elements in the page
image. The syntax of this output is described later in this man
page.
remove-txt
Remove the hidden text information from the selected component
files. For instance, executing commands select and remove-txt
removes all hidden text information from the DjVu document.
set-txt [djvusedtxtfile]
Insert hidden text information into the selected pages. The
optional argument djvusedtxtfile names a file containing the
hidden text information. This file must contain data similar to
what is produced by command print-txt. When the optional argu‐
ment is omitted, the program reads the hidden text information
from the djvused script until reaching an end-of-file or a line
containing a single period.
output-txt
Prints a djvused script that reconstructs the hidden text infor‐
mation for the selected pages. This script can later be edited
and executed by invoking program djvused with option -f.
print-ant
Prints the annotations of the selected component file. The
annotation data is represented using a simple syntax described
later in this document.
print-merged-ant
Merge the annotations stored in the selected component files
with the annotations imported from other component files such as
the shared annotation component file.. The annotation data is
represented using a simple syntax described later in this docu‐
ment.
remove-ant
Remove the annotation information from the selected component
files. For instance, executing commands select and remove-ant
removes all annotation information from the DjVu document.
set-ant [djvusedantfile]
Insert annotations into the selected component file. The
optional argument djvusedantfile names a file containing the
annotation data. This file must contain data similar to what is
produced by command print-ant. When the optional argument is
omitted, the program reads the annotation data from the djvused
script itself until reaching an end-of-file or a line containing
a single period.
output-ant
Print a djvused script that reconstructs the annotation informa‐
tion for the selected pages. This script can later be edited
and executed by invoking program djvused with option -f.
print-meta
Print the meta-data part of the annotations for the selected
component file. This command displays a subset of the informa‐
tion printed by command print-ant using a different syntax.
Meta-data are organized as key-value pairs. Each printed line
contains the key name such as author, title,etc., followed by a
tab character ("\t") and a double-quoted string representing the
UTF-8 encoded meta-data value.
remove-meta
Remove the meta-data part of the annotations of the selected
component files.
set-meta [djvusedmetafile]
Set the meta-data part of the annotations of the selected compo‐
nent file. The remaining part of the annotations is left
unchanged. The optional argument djvusedmetafile names a file
containing the meta-data. This file must contain data similar
to what is produced by command print-meta. When the optional
argument is omitted, the program reads the annotation data from
the djvused script itself until reaching an end-of-file or a
line containing a single period.
print-xmp
Print the XMP metadata string contained in the annotation chunk
of the selected component file. This command displays in fact a
subset of the information printed by command print-ant.
remove-xmp
Removes the XMP tag from the annotation chunk of the selected
component file.
set-xmp [xmpfile]
Set the XMP metadata part of the annotations of the selected
component file. The remaining part of the annotations is left
unchanged. The optional argument xmpfile names a file contain‐
ing the XMP metadata in a format similar to that produced by
command print-xmp. When the optional argument is omitted, the
program reads the XMP annotation data from the djvused script
itself until reaching an end-of-file or a line containing a sin‐
gle period.
output-all
Print a djvused script that reconstructs both the hidden text
and the annotation information for the selected pages. This
script can later be edited and executed by invoking program
djvused with option -f.
Outline/bookmarks commands
print-outline
Print the outline of the document. Nothing is printed if the
document contains no outline.
remove-outline
Removes the outline from the document.
set-outline [djvusedoutlinefile]
Insert outline information into the document. The optional
argument djvusedoutlinefile names a file containing the outline
information. This file must contain data similar to what is
produced by command print-outline. When the optional argument
is omitted, the program reads the hidden text information from
the djvused script until reaching an end-of-file or a line con‐
taining a single period.
Thumbnail commands
set-thumbnails sz
Compute thumbnails of size szxsz pixels and insert them into the
document. DjVu viewers can later display these thumbnails very
efficiently without need to download the data for each page.
Typical thumbnail size range from 48 to 128 pixels.
remove-thumbnails
Remove the pre-computed thumbnails from the DjVu document. New
thumbnails can then be computed using command set-thumbnails.
Save commands
The above commands only modify the memory image of the DjVu document.
The following commands provide means to save the modified data into the
file system.
save Save the modified DjVu document back into the input file djvu‐
file specified by the arguments of the program djvused. Nothing
is done if the DjVu file was not modified. Passing option -s
program djvused is equivalent to executing command save before
exiting the program.
save-bundled filename
Save the current DjVu document as a bundled multi-page DjVu doc‐
ument named filename. A similar capability is offered by pro‐
gram djvmcvt.
save-indirect filename
Save the current DjVu document as an indirect multi-page DjVu
document. The index file of the indirect document will be named
filename. All other files composing the indirect document will
be saved into the same directory as the index file. A similar
capability is offered by program djvmcvt.
save-page filename
Save the selected component file into DjVu file filename. The
selected component file might import data from another component
file using the so-called inclusion ( INCL ) chunks. This com‐
mand then produces a file with unresolved references to imported
data. Such a file should then be made part of a multi-page doc‐
ument containing the required data in other component files.
save-page-with filename
Save the selected component file into DjVu file filename. All
data imported from other component files is copied into the out‐
put file as well. This command always produces a usable DjVu
file. On the other hand, collecting several such files into a
multi-page document might lead to useless data replication.
Miscellaneous commands
help Display a help message listing all commands supported by
djvused.
dump Display the EA IFF 85 structure of the document or of the
selected component file. A similar capability is offered by
program djvudump.
size Display the width and the height of the selected pages. The
dimensions of each page are displayed using a syntax suitable
for direct insertion into the <EMBED...></EMBED> tags.
set-page-title title
Sets a page title for the selected page. When page titles are
available, recent versions of the DjVuLibre viewers display
these page titles instead of page numbers and also accept them
in page selection options. Command ls can be used to see both
the page titles and page identifiers. To unset a page title,
simply make it equal to the page identifier.
DJVUSED FILE FORMATS
Djvused uses a simple parenthesized syntax to represent both annota‐
tions and hidden text.
* This syntax is the native syntax used by DjVu for storing annota‐
tions. Program djvused simply compresses the annotation data using
the bzz(1) algorithm.
* This syntax differs from the native syntax used by DjVu for storing
the hidden text. Program djvused performs the translations between
the compact binary representation used by DjVu and the easily modi‐
fiable parenthesized syntax.
General syntax
Djvused files are ASCII text files. The legal characters in djvused
files are the printable ASCII characters and the space, tab, cr, and nl
characters. Using other characters has undefined results.
Djvused files are composed of a sequence of expressions separated by
blank characters (space, tab, cr, or nl). There are four kind of
expressions, namely integers, symbols, strings and lists.
Integers:
Integer numbers are represented by one or more digits, with the
usual interpretation.
Symbols:
Symbols, or identifiers, are sequences of printable ascii char‐
acters representing a name or a keyword. Acceptable characters
are the alpha-numeric characters, the underscore "_", the minus
character "-", and the hash character "#". Names should not
begin with a digit or a minus character.
Strings:
Strings denote an arbitrary sequence of bytes, usually inter‐
preted as a sequence of UTF-8 encoded characters. Strings in
djvused files are similar to strings in the C language. They
are surrounded by double quote characters. Certain sequences of
characters starting with a backslash ("\") have a special mean‐
ing. A backslash followed by letter "a", "b", "t", "n", "v",
"f", "r", "\", and stands for the ascii character BEL(007),
BS(008), HT(009), LF(010), VT(011), FF(012), CR(013), BACK‐
SLASH(134) and DOUBLEQUOTE(042) respectively. A backslash fol‐
lowed by one to three digits stands for the byte whose octal
code is expressed by the digits. All other backslash sequences
are illegal. All non printable ascii characters must be
escaped.
Lists: Lists are sequence of expressions separated by blanks and sur‐
rounded by parentheses. All expressions types are acceptable
within a list, including sub-lists.
Hidden text syntax
The building blocks of the hidden text syntax are lists representing
each structural component of the hidden text. Structural components
have the following form:
(type xmin ymin xmax ymax ... )
The symbol type must be one of page, column, region, para, line, word,
or char, listed here by decreasing order of importance. The integers
xmin, ymin, xmax, and ymax represent the coordinates of a rectangle
indicating the position of the structural component in the page. Coor‐
dinates are measured in pixels and have their origin at the bottom left
corner of the page. The remaining expressions in the list either is a
single string representing the encoded text associated with this struc‐
tural component, or is a sequence of structural components with a
lesser type.
The hidden text for each page is simply represented by a single struc‐
tural element of type page. Various level of structural information
are acceptable. For instance, the page level component might only
specify a page level string, or might only provide a list of lines, or
might provide a full hierarchy down to the individual characters.
Outline/Bookmark syntax
The outline syntax is a single list of the form
(bookmarks ...)
The first element of the list is symbol bookmarks. The subsequent ele‐
ments are lists representing the toplevel outline entries. Each out‐
line entry is represented by a list with the following form:
(title url ... )
The string title is the title of the outline entry. The destination
string url can be either an arbitrary percent encoded URL, or composed
of the hash character ("#") followed by a page name or number, or com‐
posed of the question mark character ("?") followed by cgi-style argu‐
ments interpreted by the djvu viewer. The remaining expressions in the
list describe subentries of this outline entry.
Annotation syntax
Annotations are represented by a sequence of annotation expressions.
The following annotation expressions are recognized:
(background color)
Specify the color of the viewer area surrounding the DjVu image.
Colors are represented with the X11 hexadecimal syntax #RRGGBB.
For instance, #000000 is black and #FFFFFF is white.
(zoom zoomvalue)
Specify the initial zoom factor of the image. Argument zoom‐
value can be one of stretch, one2one, width, page, or composed
of the letter d followed by a number in range 1 to 999 repre‐
senting a zoom factor (such as in d300 or d150 for instance.)
(mode modevalue)
Specify the initial display mode of the image. Argument mode‐
value is one of color, bw, fore, or back.
(align horzalign vertalign)
Specify how the image should be aligned on the viewer surface.
By default the image is located in the center. Argument horza‐
lign can be one of left, center, or right. Argument vertalign
can be one of top, center, or bottom.
(maparea url comment area ...)
Define an hyper-link for the specified destination.
Argument url can have one of the following forms:
href
(url href target)
where href is a string representing the destination and target
is a string representing the target frame for the hyper-link, as
defined by the HTML anchor tag <A>. The destination string href
can be either an arbitrary percent encoded URL, or composed of
the hash character ("#") followed by a page name or number, or
composed of the question mark character ("?") followed by cgi-
style arguments interpreted by the djvu viewer. Page numbers
may be prefixed with an optional sign to represent a page dis‐
placement. For instance the strings "#-1" and "#+1" can be used
to access the previous page and the next page.
Argument comment is a string that might be displayed by the
viewer when the user moves the mouse over the hyper-link.
Argument area defines the shape and the location of the hyper‐
link. The following forms are recognized:
(rect xmin ymin width height)
(oval xmin ymin width height)
(poly x0 y0 x1 y1 ... )
(text xmin ymin width height)
(line x0 y0 x1 y1)
All parameters are numbers representing coordinates. Coordi‐
nates are measured in pixels and have their origin at the bottom
left corner of the page.
The remaining expressions in the maparea list represent the vis‐
ual effect associated with the hyper-link.
A first set of options defines how borders are drawn for rect,
oval, polygon, or text hyperlink areas.
(none)
(xor)
(border color)
(shadow_in [thickness])
(shadow_out [thickness])
(shadow_ein [thickness])
(shadow_eout [thickness])
where parameter color has syntax #RRGGBB as described above, and
parameter thickness is an integer in range 1 to 32. The last
four border options are only supported for rect hyperlink areas.
The default border is a simple black line. Border options do
not apply to line areas.
When a border option is specified, the border becomes visible
when the user moves the mouse over the hyperlink. The border may
be made always visible by using the following option:
(border_avis)
The following two options may be used with rect hyperlink areas.
The complete area will be highlighted using the specified color
at the specified opacity (0-100, default 50).
(hilite color)
(opacity op)
This is often used with an empty URL for simply emphasizing a
specific segment of an image.
The following three options may be used with line areas to spec‐
ify an optional ending arrow, the line width and color. The
default is a black line with width 1 and without arrow.
(arrow)
(width w)
(lineclr color)
Finally the following three options can be used with text areas.
The default background color is transparent. The default text
color is black. The pushpin option indicates that the text is
symbolized by a small pushpin icon. Clicking the icon reveals
the text.
(backclr bkcolor)
(textclr txtcolor)
(pushpin)
(metadata ... (key value) ... )
Define meta-data entries. Each entry is identified by a symbol
key representing the nature of the meta data entry. The string
value represents the value associated with the corresponding
key. Two sets of keys are noteworthy: keys borrowed from the
BibTex bibliography system, and keys borrowed from the PDF
DocInfo metadata. BibTex keys are always expressed in lower‐
case, such as year, booktitle, editor, author, etc.. DocInfo
keys start with an uppercase letter, such as Title, Author, Sub‐
ject, Creator, Produced, Trapped, CreationDate, and ModDate.
The values associated with the last two keys should be dates
expressed according to RFC 3339.
LIMITATIONS
The current version of program djvused only supports selecting one com‐
ponent file or all component files. There is no way to select only a
few component files.
CREDITS
This program was initially written by Léon Bottou <leonb@users.source‐
forge.net> and was improved by Yann Le Cun <profshadoko@users.source‐
forge.net>, Florin Nicsa, Bill Riemers <docbill@sourceforge.net> and
many others.
SEE ALSOdjvu(1), djvutxt(1), djvmcvt(1), djvudump(1), bzz(1), Emacs djvused
front end djvu.el on GNU Elpa repository.
DjVuLibre-3.5 5/22/2005 DJVUSED(1)