Skip to content
Snippets Groups Projects
CHANGES 4.18 KiB
Newer Older
0.8  unreleased
===============

[changes updated from bzr log until 2013-01-01]

This lists only the most interesting changes. Please
refer to the version control log for a full listing.
- Multi-WI work group functions can be now generated
  using loops which are only partially unrolled. Reduces
  code size explosion with large WGs in comparison to
  the full replication method.
- PowerPC 64 support (tested on Cell/Debian Sid/PS3).
- PowerPC 32 support (tested on Cell/Debian Sid/PS3).
- ARM v7 support (on Linux)
- Beginning of Cell SPU support (very experimental!).
- Most of the AMD APP SDK OpenCL examples now work and have been
  added to the pocl test suite. 
- Most of the Parboil benchmark cases added to the test
  suite.

Kernel Compiler Passes
----------------------
- Several miscompilations and compiler crashes fixed.
- Multiple bugs fixed from the work group vectorizer.
- Updated metadata format pocl uses to pass information
  to vectorization and TCE backend to simplify debuging.
- Kernel pointer arguments are not always marked 'noalias' (restricted).
  Doing this previously was a specs misunderstanding.
- ConstantGEPs to static variables generated from automated
  locals caused problems. Now converting them to normal GEPs
  using a pass from the SAFECode project.
OpenCL Platform Layer implementations (OpenCL 1.2 Chapter 4)
-------------------------------------------------------
- clGetDeviceInfo now uses the hwloc lib for device property
  queries. Many new queries implemented.
- clGetKernelInfo (initial implementation)
- clGetMemObjectInfo (initial implementation)
- clGetCommandQueueInfo (initial implementation)
- clReleaseDevice
- clRetainDevice
- Proper freeing of devices in clReleaseContext

The OpenCL Runtime Implementations (OpenCL 1.2 Chapter 5)
---------------------------------------------------------
- clBuildProgram: support for passing options to the compiler.
OpenCL C Builtin Function Implementations (OpenCL 1.2 Section 6.12)
-------------------------------------------------------------------
- Atomic Functions (6.12.11)
- get_global_offset() was not linked correctly

Framework
---------

- Made it possible to override the .cl -> .bc build command
  called by clBuildProgram per device. 

Device Drivers
--------------

- pthread/basic: 
  * extract CPU clock frequency from /proc/cpuinfo, if available
  * return cl_khr_fp64 if doubles supported by the CPU
- ttasim: support for explicitly calling custom/special operations 
  through the vendor extensions API

Misc.
-----

- Fixes for MacOSX builds.
- Fixed passing NULL as a buffer argument to clSetKernelArguments
- Fixed a major bug when launching the same kernel multiple times:
  the arguments very not copied to the command object.
- Fixed several issues with ICD, it is now considered stable to be
  used by default.
0.6   August 2012
=================
Kernel library 
--------------

- Added initial optimized kernel library for X86_64/SSE.
- Preliminary support for ARM architectures on Linux
  (briefly tested on MeeGo/Nokia N9).

Pthread device driver
---------------------

- Multithreading at the work group granularity using pthreads.
- Tries to figure out the optimal maximum number of
  threads for the system based on the available hardware
  threads. Currently works only in Linux using the 
  /proc/cpuinfo interface.
- Region-based customized memory allocator for speeding up buffer 
  allocations.
Kernel compiler
---------------
- Most of the tricky work group barrier cases (barriers inside 
  for-loops etc) now supported.
- Support for local variables, also automatic locals.
- Reuse previous compilation results, if available.
- Automatic vectorization of work groups (multiple work items
  in parallel).
  
Miscellaneous  
-------------
- Installable Client Driver (icd) support.
- Event profiling support (incomplete, works only for kernel and
  buffer read/write/map/unmap events).

Known issues
------------

- Non-pointer struct kernel arguments fail due to varying ABIs
  * https://bugs.launchpad.net/pocl/+bug/987905
- Produces always "fully unrolled" chains of work items for
  work groups causing code size explosion for large WGs.