Commits · 514edd14a4c4dbcd66578da819446046fd73f293 · TRACES-public / Vortex / POCL

Dec 12, 2019
- add a paragraph discussing convert_T to 1.5 CHANGES · 514edd14
  Martin Krastev authored 5 years ago
  
  514edd14
- change vector sizes enumeration for less entropy vs original code · 5a4cb22a
  Martin Krastev authored 5 years ago
  
  5a4cb22a
Dec 11, 2019
- refactor integral-vector conversions for better autovectorizing codegen · a3ec2fa8
  Martin Krastev authored 5 years ago
  
  a3ec2fa8
Dec 03, 2019
- LLVM 10 support · e2eaf368
  KOLANICH authored 5 years ago
  
  Fixes: #772, #773, #774
  e2eaf368
Nov 16, 2019

clGetPlatformIDs return value fix · 8d618fec

It should return CL_SUCCESS in case num_platforms == NULL && num_entries
== 0.  At least Glow checks for availability of OpenCL (in general)
using these parameters.

Specs say:
"If platforms is not NULL, the num_entries must be greater than zero."

8d618fec

Oct 19, 2019

Merge branch 'pocl-profiling' of https://github.com/parmance/pocl · f1dee8d8
Pekka Jääskeläinen authored 5 years ago

f1dee8d8

Add a minimally intrusive and easy-to-use kernel execution time profiler · 0c3147ce

Pekka Jääskeläinen authored 5 years ago

Setting POCL_TRACING=cq collects kernel execution times by force
enabling the command queue profiling feature, and dumps collected stats
atexit(). The purpose of this feature is to enable implementation of
minimally intrusive profile collection; the profile data collector can
choose the occasions when it gathers the time stamp data from the events.
The impact to the observed execution profile is minimized by avoiding writing
any logs, copying objects or such while collecting the data during
execution.

It relies on the standard event timestamps to enable devices update them
as (and when) they see fit during the execution.

The drawback is accumulation of cl_object garbage, which should be taken
in account in the data collection interval; the collector should release the
events and the extra data objects they hold often enough to avoid
memory consumption to become a problem.

The current version does not perform garbage collection, but assumes
the alive OpenCL objects that are kept until the exit is a non-problem,
which is clearly the case with most of the OpenCL programs which are rather
simple; not long running, nor launch a lot of commands over their lifetime.

The default profile data collector counts only kernel commands at the moment.
Collecting stats of data transfers would be a useful addition.

0c3147ce

Oct 18, 2019
- Misc. cleanups and documentation · 790df884
  Pekka Jääskeläinen authored 6 years ago
  
  Except for the atomic decrement of cl_context_count instead of non- atomic.
  790df884
- Basic might be built separately from pthread · 6ddb6742
  Pekka Jääskeläinen authored 5 years ago
  
  6ddb6742
- Add pytorch/Glow to the test suite · 1decde78
  Pekka Jääskeläinen authored 5 years ago
  
  It (still) requires a noasserts LLVM build, thus not ready to be a tier1 test just yet.
  1decde78
Oct 16, 2019
- Merge branch 'misc-hsa-updates' of https://github.com/parmance/pocl · a9dce317
  Pekka Jääskeläinen authored 5 years ago
  
  a9dce317
Oct 15, 2019
- LLVM 7.0 fix · e36e453b
  Pekka Jääskeläinen authored 5 years ago
  
  e36e453b
- Merge branch 'misc-fixes' of https://github.com/parmance/pocl · 5dc44646
  Pekka Jääskeläinen authored 5 years ago
  
  5dc44646
Oct 14, 2019
- Updated web site for release. · 379331f8
  Pekka Jääskeläinen authored 5 years ago
  
  379331f8
Oct 12, 2019

[hsa] Update hsa-native (x86-64) known-good test set · f8d7360f
Pekka Jääskeläinen authored 6 years ago

f8d7360f
[hsa] memory leak fixes · 6f544e72
Pekka Jääskeläinen authored 5 years ago

6f544e72

[hsa] Do not worry about "system memory" with base profile · 59b225e3

Pekka Jääskeläinen authored 5 years ago

Only full profile needs to concern other allocations from the system
memory. In base profile, each device have their own global space
from which the mem objects are allocated.

59b225e3

format-branch · 7b14314a
Pekka Jääskeläinen authored 5 years ago

7b14314a

Fix pocl.barrier calls were removed too early · 8fa40e98

Henry Linjamäki authored 5 years ago

Workgroup pass replaced pocl.barrier declaration with an empty
definition which then caused barrier calls to be removed and
unwanted/illegal code duplication to happen in the following standard
LLVM optimizations.

8fa40e98

Fix a warning on test_ldexp.cl · fa28c8de
Henry Linjamäki authored 5 years ago
```
Fix a warning on test_ldexp.cl when cl_khr_fp64 is not available.
```
fa28c8de

Do not enable vectorization for SPMD devices · e854d641

Pekka Jääskeläinen authored 5 years ago

If they desire intra-WI vectorization, they can
launch it in their target passes. This can have
dramatic impact to WG IR compilation time.

e854d641

Optional calls to dump LLVM IR pass execution timing info · e8973c55
Henry Linjamäki authored 5 years ago
```
Helps finding the compilation time bottlenecks.
```
e8973c55
Also reset num_buffers to zero · cac6cd33
Pekka Jääskeläinen authored 5 years ago
```
Avoids segfault if the freeing is invoked multiple times for
a reason or another.
```
cac6cd33

Do not run scalarizer · 016d2e2e

Pekka Jääskeläinen authored 5 years ago

It is unclear if this is anymore beneficial with the vectorizers in
the latest LLVM versions. The logic should be integrated to the
loop vectorizer which should selectively scalarize vector datatypes
and leave them intact in case it cannot produce better vectorization
across the loop iterations.

016d2e2e

Avoid (re)optimization of printf · a9033fb4

Pekka Jääskeläinen authored 5 years ago

As printf is optimized during builtin library generation, it just
slows down each kernel's compilation which calls printf. Actually,
we generally are not interested in printf's performance since it's
typically used on debugging mode or on non-performance critical
parts.

a9033fb4

Remove invalid setPreservesAll()s. · 246acdeb
Pekka Jääskeläinen authored 5 years ago
```
They seem illegal since we modify the functions.
```
246acdeb

Do not call instcombine explicitly anymore · 7d5a46d5

Pekka Jääskeläinen authored 5 years ago

The extra calls seem to not be needed anymore with current LLVM versions
for good quality results, they just slow down the WG function IR generation.

7d5a46d5

Don't internalize globals starting with "__wrap_" · 3a56d466

Henry Linjamäki authored 5 years ago

A use case is call replacement via GNU linker switch --wrap. The
functions starting with "__wrap_" may not be referenced until final
link and LLVM optimizations may delete them if they are internalized.

3a56d466

Fix triggered an assetion when replacing __cl_printf · 78ea22a8

Henry Linjamäki authored 5 years ago

Fix LLVM assertion was triggered when replacing calls to __cl_printf
to __pocl_printf due to return value type mismatch.  LLVM changed
return value of __cl_printf to void when no one was using the value
and thus lead to the issue.

78ea22a8

printf: fix arguments, have meaningful return value · cd33ff4a

Henry Linjamäki authored 5 years ago

- __cl_printf: Put valid arguments into __pocl_printf_format_full()
  call so LLVM's interprocedural optimizations do not wreack havoc,
  e.g. turning call into a trap call because the format string
  argument was NULL (as placeholder).
- Actually return possible error value instead of returning always
  zero in __cl_printf and __pocl_printf functions.

cd33ff4a

matrix1: fixes to the test case · 1a85af95

Pekka Jääskeläinen authored 6 years ago

Fix the case when max local size is larger than global. Also fix
a div by zero due to an illegal assertion. The div by zero got
triggered if the local wg is larger than matrix size. It just
gets silenced by the FPE handler which is installed in case any
of the CPU devices is built in.

1a85af95

Sep 25, 2019
- Merge branch 'release_1_4' · 8c207564
  Michal Babej authored 5 years ago
  
  8c207564
- Update documentation · 21394986
  Michal Babej authored 5 years ago
  
  21394986
Sep 24, 2019
- Merge branch 'release_1_4' · d893992b
  Michal Babej authored 5 years ago
  
  d893992b
- Fixes to global memory size detection · e8af9811
  Michal Babej authored 5 years ago
  
  * fix getrlimit() use without CMake detection * fix rlimit_data applied only to max_mem_alloc_limit, instead of global_mem_size * fix computation in size_t, use cl_ulong instead, even on 32bit systems
  e8af9811
- Remove unused/broken configurations from Travis CI · 1bcbdba8
  Michal Babej authored 5 years ago
  
  LLVM 4 is not supported anymore, and the Clang build on Mac OS X seems broken because of unknown compiler flag.
  1bcbdba8
- perform compile test to select -march or -mcpu for clang · ffa98c2c
  Andreas Beckmann authored 5 years ago
  
  ffa98c2c
- add custom_try_compile_clang_silent macro · f97288db
  Andreas Beckmann authored 5 years ago
  
  f97288db
- add printf tests for parameter passing · f821ea47
  Andreas Beckmann authored 5 years ago
  
  f821ea47
- enable --exclude-libs on all UNIX except Mac OS X · 1f552af2
  Andreas Beckmann authored 5 years ago
  
  1f552af2