Skip to content
  • Pekka Jääskeläinen's avatar
    0c3147ce
    Add a minimally intrusive and easy-to-use kernel execution time profiler · 0c3147ce
    Pekka Jääskeläinen authored
    Setting POCL_TRACING=cq collects kernel execution times by force
    enabling the command queue profiling feature, and dumps collected stats
    atexit(). The purpose of this feature is to enable implementation of
    minimally intrusive profile collection; the profile data collector can
    choose the occasions when it gathers the time stamp data from the events.
    The impact to the observed execution profile is minimized by avoiding writing
    any logs, copying objects or such while collecting the data during
    execution.
    
    It relies on the standard event timestamps to enable devices update them
    as (and when) they see fit during the execution.
    
    The drawback is accumulation of cl_object garbage, which should be taken
    in account in the data collection interval; the collector should release the
    events and the extra data objects they hold often enough to avoid
    memory consumption to become a problem.
    
    The current version does not perform garbage collection, but assumes
    the alive OpenCL objects that are kept until the exit is a non-problem,
    which is clearly the case with most of the OpenCL programs which are rather
    simple; not long running, nor launch a lot of commands over their lifetime.
    
    The default profile data collector counts only kernel commands at the moment.
    Collecting stats of data transfers would be a useful addition.
    0c3147ce
    Add a minimally intrusive and easy-to-use kernel execution time profiler
    Pekka Jääskeläinen authored
    Setting POCL_TRACING=cq collects kernel execution times by force
    enabling the command queue profiling feature, and dumps collected stats
    atexit(). The purpose of this feature is to enable implementation of
    minimally intrusive profile collection; the profile data collector can
    choose the occasions when it gathers the time stamp data from the events.
    The impact to the observed execution profile is minimized by avoiding writing
    any logs, copying objects or such while collecting the data during
    execution.
    
    It relies on the standard event timestamps to enable devices update them
    as (and when) they see fit during the execution.
    
    The drawback is accumulation of cl_object garbage, which should be taken
    in account in the data collection interval; the collector should release the
    events and the extra data objects they hold often enough to avoid
    memory consumption to become a problem.
    
    The current version does not perform garbage collection, but assumes
    the alive OpenCL objects that are kept until the exit is a non-problem,
    which is clearly the case with most of the OpenCL programs which are rather
    simple; not long running, nor launch a lot of commands over their lifetime.
    
    The default profile data collector counts only kernel commands at the moment.
    Collecting stats of data transfers would be a useful addition.
Loading