Skip to content
Snippets Groups Projects
  • Pekka Jääskeläinen's avatar
    dda05ce4
    Basic cl_khr_subgroups implementation for CPU · dda05ce4
    Pekka Jääskeläinen authored
    The subgroup is always the X dimension, and there is only one
    subgroup in flight at the same time. Thus there are Z*Y SGs
    per WG. It passes the CTS test even though independent forward
    progress is not supported because there's only one SG in flight
    at the same time making progress.
    
    Also preliminary shuffle and ballot implementations which work
    only with uniform execution for now.
    
    The cross-WI data exchange is implemented via
    __pocl_{wg,local_mem}_alloca(), an internal function that
    allocates "local memory" (thread stack) dynamically.
    dda05ce4
    History
    Basic cl_khr_subgroups implementation for CPU
    Pekka Jääskeläinen authored
    The subgroup is always the X dimension, and there is only one
    subgroup in flight at the same time. Thus there are Z*Y SGs
    per WG. It passes the CTS test even though independent forward
    progress is not supported because there's only one SG in flight
    at the same time making progress.
    
    Also preliminary shuffle and ballot implementations which work
    only with uniform execution for now.
    
    The cross-WI data exchange is implemented via
    __pocl_{wg,local_mem}_alloca(), an internal function that
    allocates "local memory" (thread stack) dynamically.