Newer
Older
0.8 unreleased
===============
[changes updated from bzr log until 2013-01-01]
0.7 January 2013
=================
This lists only the most interesting changes. Please
refer to the version control log for a full listing.
Overall
-------
- Support for LLVM 3.2.
- Multi-WI work group functions can be now generated
using loops which are only partially unrolled. Reduces
code size explosion with large WGs in comparison to
the full replication method.
- PowerPC 64 support (tested on Cell/Debian Sid/PS3).
- PowerPC 32 support (tested on Cell/Debian Sid/PS3).
- ARM v7 support (on Linux)
- Beginning of Cell SPU support (very experimental!).
- Most of the AMD APP SDK OpenCL examples now work and have been
added to the pocl test suite.
- Most of the Parboil benchmark cases added to the test
suite.
Kernel Compiler Passes
----------------------
- Several miscompilations and compiler crashes fixed.
- Multiple bugs fixed from the work group vectorizer.
Vladimir Guzma
committed
- Updated metadata format pocl uses to pass information
to vectorization and TCE backend to simplify debuging.
- Kernel pointer arguments are not always marked 'noalias' (restricted).
Doing this previously was a specs misunderstanding.
- ConstantGEPs to static variables generated from automated
locals caused problems. Now converting them to normal GEPs
using a pass from the SAFECode project.
OpenCL Platform Layer implementations (OpenCL 1.2 Chapter 4)
-------------------------------------------------------
- clGetDeviceInfo now uses the hwloc lib for device property
queries. Many new queries implemented.
- clGetKernelInfo (initial implementation)
- clGetMemObjectInfo (initial implementation)
- clGetCommandQueueInfo (initial implementation)
- clReleaseDevice
- clRetainDevice
- Proper freeing of devices in clReleaseContext
The OpenCL Runtime Implementations (OpenCL 1.2 Chapter 5)
---------------------------------------------------------
- clBuildProgram: support for passing options to the compiler.
- clEnqueueMarker
OpenCL C Builtin Function Implementations (OpenCL 1.2 Section 6.12)
-------------------------------------------------------------------
- Atomic Functions (6.12.11)
- get_global_offset() was not linked correctly
Framework
---------
- Made it possible to override the .cl -> .bc build command
called by clBuildProgram per device.
Device Drivers
--------------
- pthread/basic:
* extract CPU clock frequency from /proc/cpuinfo, if available
* return cl_khr_fp64 if doubles supported by the CPU
- ttasim: support for explicitly calling custom/special operations
through the vendor extensions API
Misc.
-----
- Fixes for MacOSX builds.
- Fixed passing NULL as a buffer argument to clSetKernelArguments
- Fixed a major bug when launching the same kernel multiple times:
the arguments very not copied to the command object.
- Fixed several issues with ICD, it is now considered stable to be
used by default.
- Added initial optimized kernel library for X86_64/SSE.
- Preliminary support for ARM architectures on Linux
(briefly tested on MeeGo/Nokia N9).
- Multithreading at the work group granularity using pthreads.
- Tries to figure out the optimal maximum number of
threads for the system based on the available hardware
threads. Currently works only in Linux using the
/proc/cpuinfo interface.
- Region-based customized memory allocator for speeding up buffer
allocations.
- Most of the tricky work group barrier cases (barriers inside
for-loops etc) now supported.
- Support for local variables, also automatic locals.
- Reuse previous compilation results, if available.
- Automatic vectorization of work groups (multiple work items
in parallel).
Miscellaneous
-------------
- Installable Client Driver (icd) support.
- Event profiling support (incomplete, works only for kernel and
buffer read/write/map/unmap events).
Known issues
------------
- Non-pointer struct kernel arguments fail due to varying ABIs
* https://bugs.launchpad.net/pocl/+bug/987905
- Produces always "fully unrolled" chains of work items for
work groups causing code size explosion for large WGs.