Extensible Stats Diagnostics

September, 2010

Introduction

Starting with Release 13 of PRMan, per-frame statistics are output in a simple XML format. This opens the door to specialized processing and reporting of a wealth of render information, which is outlined in Using XML Frame Statistics, Section 5. This application note shows how to use Python scripts to harness the flood of data produced by PRMan and distill it down to the most potent information.

Transforming Statistics into Diagnostics

A base Python script, statsdiag.py, is provided as a starting point for creating a customized statistics processing solution. By default, it tries to call out symptoms of sub-optimal performance, printing a report of only those statistics that might indicate problems. This lets you quickly triage the results from many renders, concentrating on the problematic areas of slow renders without sifting through the statistics of efficient ones. If you have an efficient scene (or an inefficiency in something that is not checked), this script will not output anything. If there are checked areas with potential improvement, they will be reported.

Usage

Usage: statsdiag.py [options] stats.xml

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -o out.txt, --output=out.txt
                        output file (default is stdout)
  -M MODULES, --modules=MODULES
                        List of modules to import separated by ':'
  -a, --all             All statistics in the XML document to be printed out
                        in text format before any summary information

  Module Dependent Options:
    These options behave as indicated when the default module is used, and
    it is suggested that any extension modules follow their convention,
    but it is not required.

    -t, --timers        Enable timer summary.
    -m, --memory        Enable memory summary.
    -T, --totaltime     Enable reporting multi-frame total-time sums.
    -e, --expensive     Enable reporting of the most expensive elements from
                        the render.  This includes shaders and textures
                        measured with different metrics and also the biggest
                        users of time and memory in the pipeline
    -v, --verbose       Enable reporting of verbose output, forcing report
                        output even where performance is judged to be good.

Basic Report

If no options are provided, the script will go through the given statistics file and print out a report of any problematic areas detected. If no problems are detected, nothing is printed out.

Bounds Reports

A bounds report is used to call out any shaders that have inaccurate bounds. When displacement is done within a shader, a bound is used by PRMan to efficiently process the displacement. A bound that is too low can cause visual artifacts because geometry outside the bound may not be rendered. A bound that is too high will make the bounding volume too big and prevent efficient culling of the geometry from calculations, degrading performance. For each shader, the number of objects where the displacement is too high and too low is reported, along with the average ratio of actual displacement to the bounds provided. Detailed bounds reports can be viewed by opening the XML statistics file in an appropriate web browser.

Cache Reports

There are several caches used by PRMan to accelerate access to various assets. If the cache is too small, your hit-rate will go down, decreasing performance rapidly. The diagnostics script will use the statistics to estimate a hit-rate to detect these problems. The hit-rate estimate will remove compulsory hits when possible, so the best hit-rate is a full 100%. The cache classification is done for the following caches:

Pushed Grids Report

This report compares the number of grids pushed to those diced. In the best case, a grid will be diced once and handled entirely in the bucket, resulting in a pushed to diced ratio of 0. If we assume that every grid doesn't quite fit into a single bucket, this could lead to pushing into each unprocessed neighbor bucket - on average about 4 pushes. Motion blur or extreme displacements can increase this number, but overly-conservative bounding boxes will also increase pushes. A ratio greater than 4 will be reported here, indicating that performance may be improved with a larger bucket size.

CPU Usage Report

This report compares the time spent by threads of the renderer doing computation to the actual wall-clock time of a render (times the number of threads). Any time not spent doing computation is time that a thread is waiting - it could be waiting for a disk read to complete, a lock to be acquired, or simply to be scheduled on the CPU while other processes/threads are running. If more than half of the render time is spent waiting, the value will be reported as wait bound.

Raw Text Statistics

As with statsinfo.py, the entire XML file can be converted to text using the -a command-line argument. The end of the raw text is marked with:

**END OF RAW STATS**

The regular report will follow this marker.

Timers and Memory

Using the -m and -t command line arguments, summary timing and memory usage information will be included using the same format as statsinfo.py.

Total Time

For a multi-frame XML statistics file, the total time used across all frames will be included if the -T command line option is included.

Highlighting Expensive Items

In order to improve render performance, you often need to know what most of your resources are being devoted to. The -e option enables reporting the most expensive items in several categories. For a given resource, such as time spend in shading, these reports will print out the items that use the most of that resource up until 50% of the resource is accounted for. Here is an example report from a run where total shading time was about 21 seconds:

Shader total time (seconds): most expensive
                                     Lunar_Bump_0[d]       6.01 s ---------- 28%
                                           Spot_0[l]       3.73 s ------     18%
                                         skylight[l]       2.74 s ----       13%
   |:
   |:.
   |::.
 % |:::..
   |::::::.....
   +-------------------
       Entries

In this case the three most expensive shaders are reported:

Lunar_Bump_0, which is a displacement shader

Spot_0, which is a light shader

skylight, which is also a light shader

Adding these up, we get about 12 seconds, just over half of the total time. Just to the right of the number of seconds, we see some dashes. Looking down the right side, the number of dashes is proportional to the number of seconds used by that shader, producing a sideways bar-graph for a visual comparison of relative expenses. All the way on the right is the percentage of the total that each shader contributed - a total of 59%.

Below the individual item reports is a ASCII-art style summary graph used to show the overall distribution of time across all shaders. For this run, there were 19 shaders in all, and each gets a column of the graph. If there are more shaders than will fit in the width of the report, the script scales the graph down. The | and _ characters represent the axes, where Y is the percentage of the item being measured, and X is the sorted entries.

Verbose Mode

If you want a full report of everything the diagnostics script checks, good or bad, use the -v command line option. This will print a report for everything that is checked, and expensive item reports will be extended to their full length instead of just 50% of usage. For expensive item reports with few items, such as the shader time used above, the summary graph will be left off because the graph on the right is an adequate replacement:

Shader total time (seconds): all
                                     Lunar_Bump_0[d]       6.01 s ---------- 28%
                                           Spot_0[l]       3.73 s ------     18%
                                         skylight[l]       2.74 s ----       13%
                                      Chromium_FX[s]       1.59 s --          7%
                                       Plastic512[s]       1.48 s --          7%
                                      Combine_2_0[d]       1.16 s -           5%
                                   AdditiveFX_4_0[s]      0.950 s -           4%
                                        Layer_8_0[s]      0.860 s -           4%
                                        magicSurf[s]      0.838 s -           4%
                                       magicLight[l]      0.702 s -           3%
                              ChromeEnvironment_1[l]      0.361 s             2%
                                  Lunar_Surface_0[s]      0.223 s             1%
                             mtorDirectionalLight[l]      0.209 s             1%
                                    Plastic_grass[s]      0.202 s             1%
                                  Environment_6_0[l]      0.107 s             1%
                                   BrushedMetal_0[s]     0.0638 s             0%
                                     points_blinn[s]     0.0103 s             0%
                                      mtorLambert[s]   0.000480 s             0%
                                        Plastic_0[s]   7.30e-05 s             0%

However, if the output is more than 25 items long, the summary graph will be included. This eliminates the need to scroll to see the entire graph.

Customizing Diagnostics

Like the rest of PRMan, the diagnostic script is built to provide basic functionality and a flexible framework for supporting user generated extensions. The full source is provided, so any aspect may be changed by directly editing the source; however, using the following modular approach is recommended.

Extension API

The diagnostic script is provided as three major components:

statsdiag.py

The command line interface to run diagnostics.

diagnosticModules/util.py

Python utility classes and methods to aid in writing extensions.

diagnosticModules/defaultModule.py

The default module that produces all of the default behaviors of the diagnostics except for the raw statistics output at the beginning.

Modules

The diagnostics are designed to be provided as modules. If no modules are provided on the command line, the default will be loaded from $RMANTREE/bin/diagnosticModules/defaultModule.py. You can create your own modules in Python and use them instead with the -M command line option. As an example, let's say we want to use two modules, one in:

$RMANTREE/bin/diagnosticModules/memMod.py

and the other in:

~/lib/myShaderMod.py

Assuming ~/lib/ is in your PYTHONPATH, you can use these modules with the command line:

statsdiag.py -M diagnosticModules/memMod:myShaderMod stats.xml

When loading a module, the statsdiag.py script looks for a method with the following signature and executes it:

def initializeExtension(options, filters, frameReports, globalReports, numFrames):

The options arg is the object that is the result of using the Python optparse library to parse the command line arguments. The filters, frameReports, and globalReports are used to customize the output. The numFrames argument provides the number of total frames that are in this XML file to be processed. The modules are initialized in the order given on the command line. To be composable, a module should only add to filters, frameReports, and globalReports, not remove or overwrite with new instances.

For each frame of the report, all of the filters are first run on the statistics for that frame in an unspecified order, then the per-frame reports are called to generate output in the order they are specified by the frameReports list. This two-pass approach allows you to collect statistics from several sub-trees and combine them in the output in a well-defined order. After all of the frame reports are generated, the global reports that can accumulate data across all frames are run in the order defined by the globalReports list.

The statistics XML file is parsed into data-structures like those described in Appendix B of Using XML Frame Statistics. They can be included in your module, along with other useful definitions, with the following import:

import diagnosticModules.util

Filters

A filter is used to gather data from sections of the statistics file. Users can specify the nodes to filter over by binding it in the filters dictionary. The dictionary is a map from node names (the name attribute in the XML statistics) to a list of filters that should be run on nodes of that name.

filters["<node name>"] = [myFilter1, myFilter2, myFilter3]

It is important to note that names of nodes are not guaranteed to be unique, so a filter may be run on multiple nodes in the same frame. Additionally, the filter will need to distinguish the nodes through some sort of processing.

Filters support the following methods:

newFrame(self)

An optional method called at the beginning of each frame, once for each time, this object appears in either filters or frameReports; it is useful for counting per-frame or resetting data values.

summarize(self, node, options, frameNum)

A method called whenever specified by the filters dictionary, this is expected to store information and formatted strings for later reporting; it should not produce output. The arguments are:

node: The statistics node that matched the name in filters
options: An options object from optparse containing command line options
frameNum: The number of the current frame being processed, starting from 1 for XML files with multiple frames, or 0 for a single frame file

Frame Reports

A frame report is used to output processed data on a frame-by-frame basis. It can use any local or global information available to generate output. It is recommended that you use diagnosticModules.util.printf, diagnosticModules.util.printstr or write to``sys.stdout`` to generate output that will print to the appropriate location. A frame report is guaranteed to run after all filters have been run on the current frame and after any frame reports that occur before it in the frameReports list. A frame report object can be added in initializeExtension like so:

frameReports.append(myReport)

A frame report can be any object that supports the following methods:

newFrame(self)

An optional method called at the beginning of each frame once for each time this object appears in either filters or frameReports, useful for counting or resetting

reportFrameResults(self, options, frameNum)

A method called whenever specified by the filters dictionary, this is expected to store information and formatted strings for later reporting; it should not produce output. The arguments are:

options: An options object from optparse containing command line options
frameNum: The number of the current frame being processed, starting from 1 for XML files with multiple frames, or 0 for a single frame file

Global Reports

A global report is used to output processed data after all frames have been reported on. It can accumulate data across frames to produce a total report. As with frame reports, it is recommended that you used the provided printf, printstr or write to sys.stdout for output. It is guaranteed to run after all filters and frame reports for all frames have finished, and all reports that precede it in the globalReports list. A global report object can be added in initializeExtension like so:

globalReports.append(myGlobalReport)

A global report can be any object that supports the following methods:

accumulateFrame(self, options, frameNum)

A method called after all filters have run on a given frame. For multiple global reports, these methods are called in the order that the reports appear in the globalReports list. This should be used to accumulate frame values as they are processed.

options: An options object from optparse containing command line options
frameNum: The number of the current frame being processed, starting from 1 for XML files with multiple frames, or 0 for a single frame file

reportGlobalResults(self, options)

A method called after all frames have been processed. This is where calculations on the accumulated data and all output should occur

options: An options object from optparse containing command line options.

Example Customizations

This section will demonstrate how to customize the diagnostics by going through several examples. The first example is for creating a simple report that extracts specific data from the XML statistics, while the rest walk through some of the reports included in diagnosticModules/defaultModule.py, highlighting the use of code provided in diagnosticModules/util.py.

Simple Line Report Module

The most basic customization involves leveraging a class called LineReport in util to create a module that prints out data per-frame, based on a format string and a list of statistics. Our module, dubbed customModule, will print out the frame number, the total number of frames in the XML file, the elapsed time, and the number of threads used to render. Here is the entire code for the module:

import diagnosticModules.util as util

def initializeExtension(options, filters, frameReports, globalReports,
                        numFrames):

    statsList = [("frameNumber", "frame-%d-"),
                 ("numberOfFrames", "(%d tot)"),
                 ("timers/totaltime.elapsed", "time %2.2f"),
                 ("options/threads", "threads %d")]

    reportFormatString = "%s of "+", ".join((len(statsList)-1)*["%10.10s"])+"\n"

    util.CreateLineReport(options, filters, frameReports, globalReports,
                        numFrames, statsList, reportFormatString)

Running this with (assuming it is saved in $RMANTREE/bin/diagnosticModules):

statsdiag.py -M diagnosticModules/customModule stats.xml

will produce something like:

frame-1- of    (3 tot), time 21.14, threads 16
frame-2- of    (3 tot), time 20.02, threads 16
frame-3- of    (3 tot),  time 6.16, threads 16

The first few lines are straightforward, importing the utility code and declaring the function that will be called by statsdiag.py.

The next two assignments are setting up arguments for the call to util.CreateLineReport, which does all the heavy lifting. CreateLineReport takes the options, filters, frameReports, globalReports, and numFrames passed in to the initializeExtension method. In addition, it takes a list of tuples and a string. The list is composed tuples describing statistics you want extract from the XML file, in the order you want them to appear in the report. Each tuple consists of a path to the statistic and a format string. For each frame the statistic will be extracted and formatted according to the format string. The path to the statistic does not need to be the entire path, but it does need to be long enough to uniquely identify the statistic to be extracted. In addition, there are two special values that can be used: frameNumber and numberOfFrames. These will be filled in with the number of the current frame being processed, and the number of frames total being processed. For multiple frames in a single file, the frame numbering will start at 1. If there is only a single frame, it will be numbered 0.

The reportFormatString argument specifies how each of the strings created from the extracted statistics will be combined together. The line shown is just an adjustable way of specifying the formatting string "%s of %10.10s, %10.10s, %10.10s\n".

By changing the format string, you can quickly re-target your data output for different applications. The above example is formatted for review, but it could easily be changed to ", ".join(len(statsList)*["%s"])+"\n" to create a comma separated value list to be used in a spreadsheet or plotting tool such as gnuplot. The data could even be formatted as an SQL database entry command using something like "INSERT INTO render_stats ("+ ", ".join(len(statsList)*["%s"])+")\n".

Basic Filter

To create your own custom filter, you have to define at least three things:

an initializeExtenstion function to add your custom filter and report
an object defining summarize to extract the data of interest, optionally with a newFrame method to reset the collection per frame
an object defining reportFrameResult to format and output the data

The second two can even be the same object, as demonstrated in this example, which extends two simple classes from the util package to provide the resetting and reporting functionality:

class BaseFrameInit:
    def __init__(self):self.newFrame()
    def newFrame(self):
        pass

class SimpleFrameReportSummary(BaseFrameInit):
    def newFrame(self):
        self.report=[""]
    def reportFrameResults(self, options, frameNum):
        reporttxt = "".join(self.report)
        printstr(reporttxt)

The BaseFrameInit class ensures that all data fields reset according to the per-frame reset mechanism on creation. The SimpleFrameReportSummary defines a reporting mechanism that uses a list of strings named report as a string buffer. This list of strings is concatenated into a single string and sent to the appropriate output using the printstr function, also part of util. Any class that inherits from SimpleFrameReportSummary simply needs to append output strings to the report variable, and it the output will be reported and cleared at the appropriate times.

The example is the full code for the example filter:

import diagnosticModules.util as util

PUSHED_PER_DICED_REPORT_THRESHOLD = 4

class PushedGridsSummary(util.SimpleFrameReportSummary):
    """
    Collects and reports the amount that grids are pushed across buckets.
    """
    def summarize(self, section, options, frameNum):
        statNames = ["nGridsPushed","nGridsDiced"]
        [pushed,diced] = util.readSimpleChildren(statNames, section)
        pushedThreshold = PUSHED_PER_DICED_REPORT_THRESHOLD
        if(diced!=None and diced>0):
            ratio = float(pushed)/float(diced)
            if(ratio>pushedThreshold):
                quality = "MEMORY INTENSIVE"
            else:
                quality = "GOOD"
            if(ratio>pushedThreshold or options.verbose):
                util.lprintf(self.report,"Grids Pushed to Diced Ratio: "
                        " %4.2f      %s\n",ratio,quality)
        else:
            if(options.verbose):
                util.lprintf(self.report,"Grids Pushed to Diced Ratio: "
                        "           NOT USED")

def initializeExtension(options, filters, frameReports, globalReports,
                        numFrames):
    # Create the filter
    pushedGrids = PushedGridsSummary()

    # Set up the section->filter mapping for extracting statistics
    filters["reyes"] = [pushedGrids]

    # Set up the per frame reports
    frameReports.append(pushedGrids)

The first line imports the utility module to make SimpleFrameReportSummary available for use. Next, the filter PushedGridsSummary is defined and relies on SimpleFrameReportSummary functionality to produce reports. This means that the filter only needs to extract the data and put formatted strings into the report field of the filter.

The purpose of this filter is to examine the number of grids that are being pushed into other buckets. The REYES algorithm works most efficiently when the scene is easily rendered a single bucket at a time, and so PRMan attempts to split the geometry down into grids that fit in a single bucket. However, grids near the edge may cross bucket boundaries, or a displacement may move the grid, resulting in a "push" to another bucket where the grid will actually appear. The nGridsDiced statistic, found in the reyes section of the statistics, provides a count of the number of grids created. The nGridsPushed statistic counts the number of times a grid is pushed into another bucket. A grid will be diced once, but may be pushed multiple times. A high number of pushes relative to the amount of dicing could indicate a frame that is challenging to render efficiently, and may render faster with larger buckets. This filter will check that ratio and report if it is above a threshold.

The filter uses the utility function readSimpleChildren. This function takes a list of statistic names and a section of the statistics (a StatsList object). It looks for statistics that are the immediate children of the section passed in and returns their values. Which statistics to look for and the return order are specified by the name list. Python's unpacking is used to store the values in pushed and diced. These values are used to compute a ratio that is checked against the threshold defined at the beginning of the code.

The options object is also passed in, and verbose is True when the user invokes the verbose flag on the command line. The filter here always prints the ratio out when verbose is True, but only reports ratios larger than the threshold otherwise.

The filter uses the utility function lprintf for output. This funcion is named to indicate it is a "list-printf". The first argument is the list that it should append the string to, the next argument is a Python formatting string, and the subsequent arguments are used to construct the formatted string, like a C-style printf.

The final block of code defines the module initialization function. First, a filter object is created. This object is added to the filters map under the reyes key because the nGridsPushed and nGridsDiced statistics are in the reyes section of the statistics. Each key must map to a list of filters to be run on that section, so we create a single element list containing our filter object. This object will handle both filtering and reporting, so it is also added to the end of the frameReports list, which is used by the script framework for creating the output report.

Combining Filter Results

The method used in the previous example works well when all of the data of interest fall in a single section in the XML statistics. Since that will not always be the case, this example demonstrates how to combine data from separate filters into a single report. This example is taken from the "wait-bound" report in the default filters. This report examines the amount of wall-clock time and compares it to the amount of CPU time used by the renderer. If the wall-clock time is greater than the CPU time, the time-difference between the two was spent waiting on something - I/O, locks, memory paging, et cetera - instead of actual render computation. In a multithreaded render, the computation time reported is accumulated across all threads, so the CPU time must be divided by the number of threads to make it comparable to the wall-clock time.

To create this report, we will have two filters - one for time and one for number of threads - and a per-frame report. The report will have access to both filters, and can pull the data required straight from them. Here is the source code for this example:

import diagnosticModules.util as util

DEFAULT_MIN_CPU_RATIO=.5

class ThreadSummary(util.BaseFrameInit):
    """
    Extracts the number of threads that were used for this execution.

    The number of threads is saved in numThreads.
    """
    def newFrame(self):
        self.numThreads = -1

    def summarize(self, section, options, frameNum):
        self.numThreads = section.value

class TotalFrameTimeSummary(util.BaseFrameInit):
    """
    Extracts the total time reported for each frame.

    The time is saved as a TimerStat object named totalTimer.
    """
    def newFrame(self):
        self.totalTimer = None

    def summarize(self, section, options, frameNum):
        for child in section:
            if child.name == "totaltime":
                self.totalTimer = child

class WaitBoundReport:
    def __init__(self, timeSummary, threadSummary):
        self.timeSummary = timeSummary
        self.threadSummary = threadSummary
    def reportFrameResults(self, options, frameNum):
        minCPURatio = DEFAULT_MIN_CPU_RATIO

        numThreads = float(self.threadSummary.numThreads)
        elapsed = float(self.timeSummary.totalTimer.elapsed)
        user = float(self.timeSummary.totalTimer.user)/numThreads
        system = float(self.timeSummary.totalTimer.system)/numThreads
        cpuRatio = (user+system)/elapsed
        if cpuRatio < minCPURatio:
            util.printf("CPU Report:  user-total ratio %4.2f"
                        "     LOW (Wait Bound)\n",cpuRatio)
        elif options.verbose:
            util.printf("CPU Report:  user-total ratio %4.2f"
                        "    GOOD (CPU Bound)\n",cpuRatio)

def initializeExtension(options, filters, frameReports, globalReports,
                        numFrames):
    totalFrameTimeSummary = TotalFrameTimeSummary()
    threadSummary = ThreadSummary()

    filters["timers"] = [totalFrameTimeSummary]
    filters["threads"] = [threadSummary]

    # Set up the per frame reports
    frameReports.append(WaitBoundReport(totalFrameTimeSummary,
                                        threadSummary))

The first two class definitions are basic filter definitions that only extract data of interest. The interesting sections for this example are the initializeExtension definition and the WaitBoundReport class. In initializeExtension, the two filters are created and placed in the filters map under the keys for their associated sections. Then, both of these filters are passed to the __init__ of the WaitBoundReport class, which saves the reference in the two object attributes timeSummary and threadSummary. This gives the WaitBoundReport access to the data collected by the two filters via their attributes. When the report is constructing the output, it pulls the number of threads from the numThreads attribute of the threadSummary filter, where it was stored when the filter ran. The timeSummary filter contains a TimerStat object, which holds the elapsed, user and system time for the run. The user and system time can be added and divided by the number of threads to obtain the average CPU-time per thread. The per-thread CPU-time to elapsed (wall-clock) time ratio is compared against the threshold value to generate the appropriate output report. The util.printf function is used which ensures the output is directed to the appropriate place - the console or a file depending on the options provided on the command line. Since the framework assures all filters for a frame are run before all reports, the data values will be available for the report if they exist.

Filter with Separate Global Report

Generating a global report is similar to generating a per-frame report. This example generates a report detailing the time usage across all frames in an XML statistics file, using the TotalFrameTimeSummary filter from the previous example. There are two snippets of code that need to be added to the previous example to create this new global report. The first is the definition of the global report:

class TotalGlobalTimeReport:
    def __init__(self, totalFrameTimeSummary):
        self.elapsed = 0
        self.user = 0
        self.system = 0
        self.totalFrameTimeSummary = totalFrameTimeSummary
    def accumulateFrame(self, options, frameNum):
        if self.totalFrameTimeSummary.totalTimer:
            tmr = self.totalFrameTimeSummary.totalTimer
            for attr in ["user", "elapsed", "system"]:
                oldVal = getattr(self, attr)
                frameVal = getattr(tmr, attr, 0)
                setattr(self, attr, oldVal + frameVal)

    def reportGlobalResults(self, options):
        util.printf("Total-time sums (user,sys,elapsed):  %f  %f  %f\n",
                    self.user, self.system, self.elapsed)

The __init__ clears some attributes that will be used to accumulate the total time and saves a reference to the filter for the total frame time. As each frame is processed, the framework will call the accumulateFrame method. Like the per-frame reports, the accumulate methods are called after all filters have been run for the given frame. This means that the filters user, elapsed, and system attributes will all have their values set for the current frame, ready to add to the global report's accumulators. The for loop shown leverages Python's dynamic attribute access functionality to avoid copy-and-paste code for accumulating the three values.

Finally, once all of the individual frames have been processed, the framework will run the reportGlobalResults method, formatting the accumulators nicely for output in this example. To ensure this happens for the filter, a TotalGlobalTimeReport object must be added to the globalReports list in the initializeExtension method. This is accomplished with the second code snippet:

globalReports.append(TotalGlobalTimeReport(totalFrameTimeSummary))

Notice that the constructor is given the totalFrameTimeSummary filter so that it will be able to accumulate the values as they are processed per-frame.

Creating Custom Expensive Item Reports

The previous examples demonstrated the basics of generating diagnostic reports based on render statistics. This example shows how to use another util feature, the BaseExpensiveReport class. As the name suggests, this base class is used to create all of the reports of the most expensive elements, complete with the ASCII-graphing functionality. This example describes the ExpensiveTextureReport class from the defaultModule.

An expensive item report is based on the idea that with a list of items and their cost, measured using some metric, the most expensive items can be printed until a significant fraction of usage is accounted for without printing the entire list. A report that uses BaseExpensiveReport will need three components: a global report class that extends BaseExpensiveReport, a class to hold the data records that consist of labels and costs of items, and a filter that generates the data records from the XML statistics is used to create the list of items.

This example will demonstrate building two reports of the most expensive shaders, measured both by time per point shaded and total time elapsed. Here is the entire source code, with explanations of individual portions to follow:

import diagnosticModules.util as util

class ShaderRPT:
    name = " "
    elapsed  = 0.0
    pointsShaded = 0
    type = ""

class ShaderSummary(util.BaseFrameInit):
    """
    Extracts reports of time elapsed and number of points shaded for shaders.

    All of the reports are ShaderRPT objects stored in the reports
    dictionary of this object, indexed by the shader name.  This combines
    all shader types into one dictionary, but the type can be retrieved via
    the ShaderRPT.type value.
    """
    def newFrame(self):
        self.reports = {}

    def summarize(self, section, options, frameNum):
        for childSection in section:
            # filter to known sections, because plug-ins may
            # add other sections
            if(childSection.name in ["displacement", "surface", "volume",
                                     "light", "method", "imager"]):
                for shaderEntry in childSection:
                    if isinstance(shaderEntry, util.StatsList):
                        rpt = ShaderRPT()
                        rpt.name = shaderEntry.name
                        rpt.type = childSection.name
                        hasTimer = False
                        hasPointsShaded = False
                        for valEntry in shaderEntry:
                            if(valEntry.name=="shaderTimer"):
                                rpt.elapsed = valEntry.elapsed
                                hasTimer = True
                            elif(valEntry.name=="pointsShaded"):
                                rpt.pointsShaded = valEntry.value
                                hasPointsShaded = True
                        if hasTimer and hasPointsShaded:
                            if not childSection.name in self.reports:
                                self.reports[childSection.name] = []
                            self.reports[childSection.name].append(rpt)

class ExpensiveShaderReport(util.BaseExpensiveReport):
    rptAttrs = ["elapsed","pointsShaded"]
    def __init__(self, shdSummary):
        self.shaderSummary = shdSummary
        self.accumRpts = {}
    def accumulateFrame(self, options, frameNum):
        for shClass,rptList in self.shaderSummary.reports.items():
            for rpt in rptList:
                self.accumulateReport(rpt,self.accumRpts,
                                      "%s[%1.1s]"%(rpt.name,rpt.type),
                                      ShaderRPT)

    def reportGlobalResults(self, options):
        self.reportMajorResults("Shader average time (microseconds) "
                      "per point shaded:",
                      options, self.accumRpts.values(),
                      lambda rpt: float(rpt.elapsed)/float(rpt.pointsShaded),
                      1000000, "us/pt", .5)

        self.reportMajorResults("Shader total time (seconds):",
                                options, self.accumRpts.values(),
                                lambda rpt: rpt.elapsed, 1.0, "s", .5)

def initializeExtension(options, filters, frameReports, globalReports,
                        numFrames):
    # Create some filters
    shaderSummary = ShaderSummary()

    # Set up the section->filter mapping for extracting statistics
    filters["shaders"] = [shaderSummary]

    # Set up the global reports
    globalReports.extend([ExpensiveShaderReport(shaderSummary)])

The ShaderRPT class is a straightforward data record which will store data for each shader that is needed to compute the time metrics.

The ShaderSummary class is the filter that will create the ShaderRPT objects from the XML statistics. The shader statistics are divided up by shader type, so this filter loops through the known shader types and aggregates all entries into a single list.

The ExpensiveShaderReport class is the global report derived from BaseExpensiveReport. During accumulateFrame, the individual records extracted by the ShaderSummary filter are accumulated with records from previous frames. This is done by calling the BaseExpensiveReport.accumulateFrame method:

self.accumulateReport(rpt,self.accumRpts,
                      "%s[%1.1s]"%(rpt.name,rpt.type),
                      ShaderRPT)

The arguments to this method are:

rpt

This is the current record to accumulate.

self.accumRpts

This maps from entry names to records to accumulate into.

"%s[%1.1s]"%(rpt.name,rpt.type)

This is an entry name for the current record. Here it is constructed from the shader name followed by the first letter of the type in brackets. Note that this becomes the new value for the name field in the accumulated reports

ShaderRPT

This the class that record entries are created from.

The accumulateFrame method dynamically retrieves the attributes on the record entries and accumulates them to matching attributes in the accumRpts map. Entries are indexed by the name given, and if an entry hasn't been created for a given entry name, it is dynamically created from the passed in class.

Once all of the individual entries are accumulated, the reportGlobalResults function generates two reports, the first based on time per point shaded and the second based on total shader time. The first call looks like this:

self.reportMajorResults("Shader average time (microseconds) "
              "per point shaded:",
              options, self.accumRpts.values(),
              lambda rpt: float(rpt.elapsed)/float(rpt.pointsShaded),
              1000000, "us/pt", .5)

The arguments are:

"Shader average time (microseconds) per point shaded:"

This is the header displayed at the begining of the report.

options

This is the option object from the command line.

self.accumRpts.values()

This is the set of entries to sort and output.

lambda rpt: float(rpt.elapsed)/float(rpt.pointsShaded)

This is an lambda (or anonymous) function that will take an entry record and return the metric value. Here we want the elapsed time per point shaded, that is the value returned.

This is the scaling factor to multiply the result by before reporting.

"us/pt"

This is a unit indicator string that will be appended after the value.

.5

This is the accumulation threshold to stop output. Set at .5, it means that the minimum number of items to account for at least .5 of the total for the defined metric will be printed out.

By simply changing the scaling, units, and lambda function, the same data can be used to report the most expensive shaders based on total time:

self.reportMajorResults("Shader total time (seconds):",
                        options, self.accumRpts.values(),
                        lambda rpt: rpt.elapsed, 1.0, "s", .5)

There are some additional default options to reportMajorResults that can be used to adjust the plot output:

barWidth=10

This is the maximum width of the bar immediately following each item output.

plotHeight=5, plotWidth=75, plotStart=4,

These are the height, width, and starting column respectively of the summary plot printed after the individual most expensive items list.

reportWidth=80

This is the report width to fit item entries into.

This base provides a flexible starting point for quickly assessing items that are the most demanding in your renders.

Caveats, Limitations, Future Directions

Every production is different, and while every effort has been made to provide reasonable suggestions in the default module, render wranglers are encouraged to review and customize the decisions made in reporting problems. The various thresholds and constants used are defined at the top of defaultModule.py with comments explaining them to make tuning the values straight-forward. The included statsdiag.py, diagnosticModules\util.py, and diagnosticModules\defaultModule.py are not meant to be comprehensive, but a framework with concrete examples to serve as a spring-board for the development of production/farm-specific implementations. Also, the default module was built to be reasonably robust, but arbitrary XML code may be inserted by custom extensions. There may be cases where some adjustments to the default scripts are needed to handle customized XML output appropriately.

Frequently Asked Questions

Q: All of these diagnostics are wonderful, but what do I do if I'm trying to diagnose a frame that crashes PRMan and don't have a statistics file?

A: If you are running on a UNIX system (Linux, OS X), you can send the prman process a SIGHUP signal any time during rendering to generate a statistics file at that point in time. Beware, though, that if prman was terminated too violently it may not generate the final two closing tags for the statistics file. These can be easily added by hand to the end of the file that doesn't have them:

</rm:statsDoc>
</doc>

Q: I'm more of a visual person, so these text-based statistics just don't work that well for me. Is there a way I can generate a picture so I can see what is the taking the most time?

A: A frame can be generated showing the CPU time spent per pixel. This can help you quickly identify which portions of a scene are the most costly. For details, see the discussion of the __CPUtime variable at the end on the Frame Buffer Control section of the Options documentation.