Skip to content
Snippets Groups Projects
Commit dda05ce4 authored by Pekka Jääskeläinen's avatar Pekka Jääskeläinen
Browse files

Basic cl_khr_subgroups implementation for CPU

The subgroup is always the X dimension, and there is only one
subgroup in flight at the same time. Thus there are Z*Y SGs
per WG. It passes the CTS test even though independent forward
progress is not supported because there's only one SG in flight
at the same time making progress.

Also preliminary shuffle and ballot implementations which work
only with uniform execution for now.

The cross-WI data exchange is implemented via
__pocl_{wg,local_mem}_alloca(), an internal function that
allocates "local memory" (thread stack) dynamically.
parent 3a52cd71
No related branches found
No related tags found
No related merge requests found
Showing
with 871 additions and 322 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment