prof(1)prof(1)NAME
prof, pixstats - Analyzes profile data
SYNOPSIS
prof [options] [prog_name [PC-sampling_data_file]...]
prof -pixie [options] [prog_name [Addrs_file | Counts_file]...]
prof -pixstats [options] [prog_name [Addrs_file | Counts_file]...]
pixstats [options] [prog_name [Addrs_file | Counts_file]...]
OPERANDS
Name of the program executable to be profiled. This program should be
compiled with the -g1, -g2, or -g3 option to obtain more complete pro‐
filing information. If the default symbol table level (-g0) has been
used, line number information, static procedure names, and file names
are unavailable to the profiling code. Name of a profiling data file
(default mon.out) produced by executing a program that has been linked
with the cc -p command. Name of an instruction-counts file produced by
executing a program that has been instrumented with pixie. If no
Counts_file or Addrs_file is specified, prog_name.Counts is used if
found in the current working directory. Name of an instruction-address
file produced when the executable or shared library object is instru‐
mented with pixie. By default, the path of each object.Addrs file will
be recorded in the Counts_file, so they do not need to be specified.
The order of precedence for finding an Addrs_file is as follows:
Addrs_file path specified on command line, current directory, directory
of object specified in command line argument, directory where pixie
created it.
OPTIONS
For each prof option, you need to type only enough of the name to dis‐
tinguish it from the other options. If you do not specify any options,
prof uses -procedures by default. Always specify -pixie or -pixstats
when you process and files.
The prof command accepts the following options: Causes the profiles for
all shared libraries (if any) described in the data file(s) to be dis‐
played, in addition to the profile for the executable. Causes the pro‐
filer to print the assembly instructions for each subroutine along with
the cycle counts for each instruction. The subroutines are sorted from
highest cycle count to lowest. The instructions for each subroutine are
printed in order; they are not sorted by cycle count.
When used without the -pixie option for a PC-sampling profile,
the CPU time used by each instruction is presented in millisec‐
onds. (For uprofile and kprofile, per-instruction sample counts
are also provided for events other than time.) Alters the
appropriate parts of the listing to reflect the clock speed of
the CPU. By default, the cycle time of the processor on which
program was run is used. (Use this option only with the -pixie
option.) Disassembles and shows the analyzed object code. (Use
this option only with the -pixstats option.) Limits the disas‐
sembly to blocks with f% frequency. (Use this option only with
the -pixstats option.) If you use one or more -exclude options,
the profiler omits the specified procedure and its descendents
from the listing. If any option uses an uppercase “E” (for
“Exclude”), prof also omits that procedure from the base upon
which it calculates percentages. To represent all of the varia‐
tions of an overloaded C++ function name, you can specify just
the part of the name up to but not including the “(”. Causes
the profile for the named executable or shared library not to be
printed. You can use this option multiple times in a single
prof command. Produces a file with information that the com‐
piler system can use to decide which parts of the program will
benefit most from global optimization and which parts will bene‐
fit most from in-line procedure substitution (requires basic-
block counting). (Use this option only with the -pixie option.)
This option is for compilers whose -feedback option requires a
feedback file (rather than an executable file) and that do not
support the prof command's -update option. For compilers that
support the -update option, better results can be achieved using
that option instead of the (prof) -feedback option. Reports the
most heavily used lines in descending order of use. Causes the
profile for the named shared library to be printed, in addition
to the profile for the executable. You can use this option mul‐
tiple times in a single prof command. For each procedure,
reports how many times the procedure was invoked from each of
its possible callers (requires basic-block counting). For this
listing, the -exclude and -only options apply to callees, but
not to callers. (Use this option only with the -pixie option.)
Changes the library directory search order for shared object
libraries so that prof looks for them in dir before the library
recorded in profile_file and the default library directories.
You can specify multiple -Ldir switches to specify several
directory names. Changes the library directory search order for
shared object libraries so that prof never looks for them in the
default library directories. Use this option when the default
library directories should not be searched and only the directo‐
ries specified by -Ldir are to be searched. Gives the lines in
order of occurrence within procedures. The procedures are
sorted in descending order of use. Sums the sampling data files
(or, in pixie mode, the files) and writes the result into a new
file with the specified name. The -only and -exclude options
have no effect on the merged data. Uses 1 for each basic block
count. (Use this option only with the -pixstats or -pixie
option.) Prints each procedure's starting line number if source
file information is available from the object file. If you use
one or more -only options, the profile listing includes only the
named procedures, rather than the entire program. If any option
uses an uppercase “O” for “Only,” prof uses only the named pro‐
cedures, rather than the entire program, as the base upon which
it calculates percentages. To represent all of the variations of
an overloaded C++ function name, you can specify just the part
of the name up to but not including the “(”. Selects pixie
mode, as opposed to sampling mode. Selects generation of an
alternative pixie-mode report for basic-block profiling data, as
previously produced by the pixstats(1) command. All options of
the previous version of pixstats(1) are recognized, for compati‐
bility. Reports time spent per procedure (using data obtained
from sampling or basic-block counting; the listing tells which
one). For basic-block counting, this option also reports the
number of invocations per procedure, including the aggregated
invocations of any alternate entry points. Truncates listings
after n lines (if n is an integer), after the first entry that
represents less than n percent of the total (if n is followed
immediately by a “%” character), or after enough entries have
been printed to account for n percent of the total (if n is fol‐
lowed immediately by “cum%”). For example, “-quit 15” truncates
each part of the listing after 15 lines of text, “-quit 15%”
truncates each part after the first line that represents less
than 15 percent of the whole, and “-quit 15cum%” truncates each
part after the line that brought the cumulative percentage above
15 percent. Reports all lines that never executed. (Use this
option only with the -pixie option.) For -procedures and -invo‐
cations listings, prints cumulative statistics for the entire
object file instead of for each procedure in the object. Gener‐
ates more analysis of a program to provide a more accurate read‐
ing of cycles, instead of the default which assumes each
instruction executes in one cycle. The higher the number chosen
from the arguments, the more accurate the reading, although the
profiler will run slower, and memory-access delays are still not
reflected. This option has little or no effect on EV6 (21264)
and later Alpha systems. (Use this option only with the -pixie
option.) Updates the program executable (prog_name) with pro‐
filing information in the specified .Counts files, for use in
future cc -feedback prog_name command(s). This option requires
that prog_name have been compiled with the -feedback prog_name
option or updating will fail. This option will not generate a
display unless another option forcing the display behavior is
specified. (Use this option only with the -pixie option.)
Prints the tool's version number. Prints a list of procedures
that were never invoked (requires basic-block counting). (Use
this option only with the -pixie option.)
DESCRIPTION
The prof command analyzes one or more data files generated by the com‐
piler's execution-profiling system and produces a listing. The prof
command can also combine those data files or produce a feedback file
that lets the optimizer take into account the program's run-time behav‐
ior during a subsequent compilation. Profiling is a three-step
process: Compile the program Execute the program Run prof to analyze
the data.
The compiler system provides two kinds of profiling: Interrupts the
program periodically, recording the value of the program counter.
Divides the program into blocks delimited by labels, jump instructions,
and branch instructions. It counts the number of times each block exe‐
cutes.
The uprofile and kprofile tools provide a third kind of profiling, per‐
formance counter sampling. The Alpha architecture on-chip performance
counters are used in performance counter sampling.
The following sections describe how to perform the various kinds of
profiling.
PC-Sampling Profiles
To use PC-sampling, compile your program with the -p option (strictly
speaking, it is sufficient to use this option only when linking the
program). Then, run the program containing the profiling startup rou‐
tine that calls monstartup to allocate extra memory to hold the profil‐
ing data. If the program terminates normally or calls exit(2), it
records the data in a file at the end of execution.
If your program uses shared libraries, note that only its call-shared
portion is profiled in detail. Only the total time spent in each shared
library is recorded. To individually profile all library routines a
program uses, build the program with the -non_shared switch (by
default, the compiler produces a call-shared object unless -non_shared
is explicitly specified), or set the PROFFLAGS environment variable as
described in the Environment Variables section.
After running your program, use prof to analyze the PC-sampling data
file. For example:
cc -c myprog.c cc -p -o myprog myprog.o myprog (gener‐
ates mon.out) prof myprog mon.out
When you use prof for PC-sampling, the program name defaults to a.out.
The PC-sampling data file name defaults to mon.out; if you specify more
than one PC-sampling data file, prof reports the sum of the data.
PC-Sampling Environment Variables
You can use environment variables to change the default PC sampling and
profile data collection behavior. The variables are PROFDIR and PROF‐
FLAGS. The general form for setting these variables is: For C shell:
setenv varname "value" For Bourne shell: varname = "value"; export var‐
name For Korn shell: export varname = value
In the preceding example, varname can be one of the following: This
environment variable causes PC-sampling data files to be generated with
unique file names in a specified directory.
You specify a directory path as the value and your prof results
are placed in the file path/pid.progname where path is the path‐
name, pid is the process ID of the executing program, and prog‐
name is the program name. This environment variable can take
any of the following values: Causes a separate data file to be
generated for each thread. The name of the data file takes the
following form: pid.sid.progname.
The form of the filename resolves to pid as the process ID of
the program, sid as the sequence number of the thread, and prog‐
name as the name of the program being profiled. Causes the pro‐
gram to fully profile all the permanently loaded shared
libraries, in addition to the nonshared or call-shared exe‐
cutable. Causes the program to profile only the named exe‐
cutable or shared library. Causes the program not to profile
the named executable or shared library. Causes prof to change
the ratio of text segment stride size to PC-sample counter buf‐
fer size, that is, the number of instructions that are counted
together in a single counter word. The appropriate ratio
involves a tradeoff of size versus precision. Strides of 1, 2,
4, and 8 are supported. A special stride of 0 causes a single
PC-sample count to be recorded for each text segment.
The default stride is 2 for the executable, and 0 for each of
its shared libraries. If -all or -incobj are specified, all
selected objects are profiled with the same stride. Automati‐
cally establishes monitor_signal(3) as the signal handler for
the named signal, and it causes monitor_signal(3) to zero the
profile after it is written to a file. This allows a signal to
be sent several times without the successive profiles overlap‐
ping, if the file is renamed. The asynchronous nature of a sig‐
nal may cause small variations in the profile. Unrecognized sig‐
nal-names are ignored. The -threads option is ignored if com‐
bined with -sigdump. Specifies the directory path in which the
profiling data file or files are created. [Disables] or enables
the addition of the process-id number to the name of the profil‐
ing data file or files.
You can use the PROFDIR and PROFFLAGS environment variables together.
For more information, see the Programmer's Guide.
Basic-Block Counting
To use basic-block counting, compile your program without the option
-p. Use the pixie program to translate your program into a profiling
version and generate a file (prog_name.Addrs) containing block
addresses. Then, run the pixie version of the program, which (assuming
the program terminates normally or calls exit(2)) will generate
a file (prog_name\.Counts) containing block counts.
After running the pixie version of your program, use prof with the
-pixie option to analyze the and files. Notice that you must specify
the name of your original program, not the name of the version. For
example:
cc -c myprog.c cc -o myprog myprog.o pixie myprog (generates
myprog.Addrs and myprog.pixie) myprog.pixie (generates
myprog.Counts) prof -pixie myprog myprog.Addrs myprog.Counts
When you use prof with the -pixie option, the file name defaults to
prog_name.Addrs, and the file name defaults to prog_name.Counts. Note
that, when the file name defaults to prog_name.Counts, prof does not
attach any path prefix to prog_name, and it looks for the file in the
current working directory. If you specify more than one file, prof
reports the sum of the data.
For each shared library selected for profiling, the prof command
searches for an file in the following locations if the file location
is not explicitly specified on the command line: Current directory
Directory in which the object file is located if the location of the
object file is explicitly specified on the command line Directory in
which pixie created it, as recorded in the file
For each selected shared library, the prof command searches for an
object file in the following locations: Directories specified in -Ldir
options Directory in which pixie found it, as recorded in the file, if
the -L option is specified Standard library search directories, as
searched by ld, if the -L option is not specified
Basic-Block Statistics
Use the -pixstats option to get an alternative profile. All options of
the previous version of the pixstats(1) command are recognized, for
compatibility.
If a disassembly is requested, all basic blocks (or those whose execu‐
tion count exceeds the -dislimit percentage of total instructions) are
disassembled, in increasing address order. Each block is labeled with
its procedure name and any offset from the start of the procedure. For
each instruction, the relative estimated CPU cycle at which the
instruction executes is printed, plus its source line, address, binary
code, and assembly language. The total CPU cycles used by one execu‐
tion of the block, the number of times it was executed, and its per‐
centage of all instructions executed are printed at the end of the
block, following any line reporting a non-zero delay caused to a fol‐
low-on block.
The main report begins with a record of the command line. This is fol‐
lowed by a summary of the program's behavior: Total CPU cycles used by
the profiled objects, plus the equivalent number of seconds Total num‐
ber of instructions executed Total delay caused by instructions exe‐
cuted in the preceding basic block Total integer and floating-point no-
op, arithmetic and logical, logical, shift, load, store, load and
store, load followed by load, load and store and fetch (data bus use),
load and store relative to the stack or global pointers, floating-
point, floating-point compare, conditional branch instructions executed
(itemized). Also, total number of branch instructions executed whose
target instruction is another branch; and total number of such branches
that are estimated to be taken, rather than executing the next instruc‐
tion in line. Total basic blocks, procedure calls, and branches that
skip a single instruction that were executed.
Next, some ratios are printed: Stores : stores + loads Instructions :
basic block Instructions : branches Backward branches : branches CPU
cycles : procedure calls Instructions : procedure calls Integer no-ops
: integer and floating-point no-ops Floating-point no-ops : integer and
floating-point no-ops Floating-point pipeline interlocks : floating-
point operators
Next, basic blocks are analyzed according to how many instructions they
contain. For each size, pixstats reports the execution count, its pre‐
centage and cumulative percentage relative to both instructions and
basic blocks, the number of instructions contained in blocks of that
size, the percentage and cumulative percentage of this relative to all
instructions, and the CPU-cycle cost per instruction of blocks of that
size. Then, pixstats prints various averages and quartiles of basic
block size, plus the largest basic block execution count encountered
(to indicate the chance of integer overflow in the analysis).
Next, pixstats analyzes the number of registers (integer and floating-
point) that are saved on procedure entry (and restored on exit). It
prints the number of procedure entries that save a given number of reg‐
isters, and the percentage and cumulative percentage of this relative
to all procedure entries, all registers saved, and all instructions
executed. Finally, it prints some averages and ratios.
The next two tables contain information on the sizes of executed proce‐
dures' stack frames and the frequency of execution of each kind of
instruction. Frame sizes are reported in “bits”; for example, 6 bits
means a 32- to 48-byte stack frame. The number, percentage, and cumula‐
tive percentage of executed calls to procedures with the given frame
size is printed. Similarly, the execution count is printed for each
machine instruction code, but this table is ordered by decreasing
usage.
The next four tables are similar. They provide information about the
size of literals used by various categories of Alpha instructions:
ADD,SUB,CMP instructions AND,BIC,BIS,XOR,CMOV instructions MUL instruc‐
tions SHIFT,EXT,INS,MSK,ZAP instructions
(Note that a table may be omitted if there is no use of literals in the
program for the particular instruction category). For each of these
tables the size of the literal is reported in bits (for example, 4 bits
means the literal is greater than or equal to 8 and less than 16).
The next six tables are similar. They contain information on the size
of the memory displacement from a base register: LDA displacement from
0 (used like a load immediate instruction) LDAH displacement from 0
(used like a load immediate high) Branch SP-based load/store (load or
store within a stack frame) GP-based load/store (load or store within a
global offset table) All load or store instructions
Again, the “size” of the displacement is reported in bits; for example,
6 bits means a 32 to 63 byte displacement. For both positive displace‐
ments (in the “0-extend” column) and negative displacements (in the
“1-extend” column), the execution count is printed along with percent‐
age and cumulative percentage. The summed cumulative percentage is
printed last (in the “Total” column).
In the “static” analysis of instructions, each instruction is counted
once per executed basic-block. The “static” distribution will be the
same as the regular opcode distribution when -nocounts is specified.
Following “static” totals for instructions and basic blocks, the number
and percentage of each instruction code is listed.
The next two tables contain information on how many times each integer
and floating-point register was accessed, plus its percentage, ordered
by register number. For integer registers, the number and percent of
uses as a base register in memory operations is also listed.
Finally, pixstats prints a flat profile of CPU cycles used by proce‐
dures. This includes the CPU cycles used by the procedure, the per‐
centage of the total, the cumulative percentage, the number of instruc‐
tions executed as part of the procedure, its average number of CPU
cycles per instruction, the number of calls made to the procedure, the
average number of CPU cycles per call, and the procedure name. If -num‐
bers is specified, the object and source file names and line number are
also printed.
Performance Counter Samples
After running the uprofile or kprofile utility to collect profiling
data or your program or the kernel, respectively, run prof to examine
the resulting mon.out or kmon.out file, as follows: For uprofile out‐
put: prof prog_name mon.out For kprofile output: prof /vmunix kmon.out
Use prof as for PC sampling, except that only the executable has a pro‐
file. Old performance counter sample data files, generated on versions
of the operating system prior to DIGITAL UNIX Version 4.0, must be ana‐
lyzed as if they contained PC-sampling data.
RESTRICTIONS
The -pixstats option models execution assuming a perfect memory system.
Memory system events such as cache misses will increase execution above
the -pixstats predictions.
The set of statistics reported by the -pixstats option and the format
of the report are the same as for previous versions of the pixstats(1)
command, but note the following: The labels on disassembled basic
blocks take the form procedure-name (or proc_at_0x... if no symbol is
available) for an initial block and procedure-name+offset for subse‐
quent blocks. All reported cycles reflect CPU pipeline interlocks, so
they usually do not match the reported instruction counts. If not all
the shared objects used by a program are profiled, the procedure-call
counts may be smaller than the jsr/bsr instruction counts.
FILES
Normal startup code Startup code for PC-sampling Library for PC-sam‐
pling Default kprofile data file Default PC-sampling data file Default
uprofile data file
SEE ALSO
Introduction: prof_intro(1)
Commands: as(1), cc(1), gprof(1), pixie(1), uprofile(1), kprofile(1),
dxprof(1). (dxprof is available as an option.)
Functions: monitor(3), profil(2)
Programmer's Guide
prof(1)