- Dec 12, 2019
-
-
Martin Krastev authored
-
Martin Krastev authored
-
- Dec 11, 2019
-
-
Martin Krastev authored
-
- Dec 03, 2019
-
-
KOLANICH authored
Fixes: #772, #773, #774
-
- Nov 16, 2019
-
-
Pekka Jääskeläinen authored
It should return CL_SUCCESS in case num_platforms == NULL && num_entries == 0. At least Glow checks for availability of OpenCL (in general) using these parameters. Specs say: "If platforms is not NULL, the num_entries must be greater than zero."
-
- Oct 19, 2019
-
-
-
Pekka Jääskeläinen authored
Setting POCL_TRACING=cq collects kernel execution times by force enabling the command queue profiling feature, and dumps collected stats atexit(). The purpose of this feature is to enable implementation of minimally intrusive profile collection; the profile data collector can choose the occasions when it gathers the time stamp data from the events. The impact to the observed execution profile is minimized by avoiding writing any logs, copying objects or such while collecting the data during execution. It relies on the standard event timestamps to enable devices update them as (and when) they see fit during the execution. The drawback is accumulation of cl_object garbage, which should be taken in account in the data collection interval; the collector should release the events and the extra data objects they hold often enough to avoid memory consumption to become a problem. The current version does not perform garbage collection, but assumes the alive OpenCL objects that are kept until the exit is a non-problem, which is clearly the case with most of the OpenCL programs which are rather simple; not long running, nor launch a lot of commands over their lifetime. The default profile data collector counts only kernel commands at the moment. Collecting stats of data transfers would be a useful addition.
-
- Oct 18, 2019
-
-
Pekka Jääskeläinen authored
Except for the atomic decrement of cl_context_count instead of non- atomic.
-
Pekka Jääskeläinen authored
-
Pekka Jääskeläinen authored
It (still) requires a noasserts LLVM build, thus not ready to be a tier1 test just yet.
-
- Oct 16, 2019
-
-
- Oct 15, 2019
-
-
Pekka Jääskeläinen authored
-
-
- Oct 14, 2019
-
-
Pekka Jääskeläinen authored
-
- Oct 12, 2019
-
-
Pekka Jääskeläinen authored
-
Pekka Jääskeläinen authored
-
Pekka Jääskeläinen authored
Only full profile needs to concern other allocations from the system memory. In base profile, each device have their own global space from which the mem objects are allocated.
-
Pekka Jääskeläinen authored
-
Henry Linjamäki authored
Workgroup pass replaced pocl.barrier declaration with an empty definition which then caused barrier calls to be removed and unwanted/illegal code duplication to happen in the following standard LLVM optimizations.
-
Henry Linjamäki authored
Fix a warning on test_ldexp.cl when cl_khr_fp64 is not available.
-
Pekka Jääskeläinen authored
If they desire intra-WI vectorization, they can launch it in their target passes. This can have dramatic impact to WG IR compilation time.
-
Henry Linjamäki authored
Helps finding the compilation time bottlenecks.
-
Pekka Jääskeläinen authored
Avoids segfault if the freeing is invoked multiple times for a reason or another.
-
Pekka Jääskeläinen authored
It is unclear if this is anymore beneficial with the vectorizers in the latest LLVM versions. The logic should be integrated to the loop vectorizer which should selectively scalarize vector datatypes and leave them intact in case it cannot produce better vectorization across the loop iterations.
-
Pekka Jääskeläinen authored
As printf is optimized during builtin library generation, it just slows down each kernel's compilation which calls printf. Actually, we generally are not interested in printf's performance since it's typically used on debugging mode or on non-performance critical parts.
-
Pekka Jääskeläinen authored
They seem illegal since we modify the functions.
-
Pekka Jääskeläinen authored
The extra calls seem to not be needed anymore with current LLVM versions for good quality results, they just slow down the WG function IR generation.
-
Henry Linjamäki authored
A use case is call replacement via GNU linker switch --wrap. The functions starting with "__wrap_" may not be referenced until final link and LLVM optimizations may delete them if they are internalized.
-
Henry Linjamäki authored
Fix LLVM assertion was triggered when replacing calls to __cl_printf to __pocl_printf due to return value type mismatch. LLVM changed return value of __cl_printf to void when no one was using the value and thus lead to the issue.
-
Henry Linjamäki authored
- __cl_printf: Put valid arguments into __pocl_printf_format_full() call so LLVM's interprocedural optimizations do not wreack havoc, e.g. turning call into a trap call because the format string argument was NULL (as placeholder). - Actually return possible error value instead of returning always zero in __cl_printf and __pocl_printf functions.
-
Pekka Jääskeläinen authored
Fix the case when max local size is larger than global. Also fix a div by zero due to an illegal assertion. The div by zero got triggered if the local wg is larger than matrix size. It just gets silenced by the FPE handler which is installed in case any of the CPU devices is built in.
-
- Sep 25, 2019
-
-
Michal Babej authored
-
Michal Babej authored
-
- Sep 24, 2019
-
-
Michal Babej authored
-
Michal Babej authored
* fix getrlimit() use without CMake detection * fix rlimit_data applied only to max_mem_alloc_limit, instead of global_mem_size * fix computation in size_t, use cl_ulong instead, even on 32bit systems
-
Michal Babej authored
LLVM 4 is not supported anymore, and the Clang build on Mac OS X seems broken because of unknown compiler flag.
-
Andreas Beckmann authored
-
Andreas Beckmann authored
-
Andreas Beckmann authored
-
Andreas Beckmann authored
-