diff --git a/CHANGE_LOG.TXT b/CHANGE_LOG.TXT index 1c7e6ab15a..8c0b9a89ad 100644 --- a/CHANGE_LOG.TXT +++ b/CHANGE_LOG.TXT @@ -1,5 +1,39 @@ //----------------------------------------------------------------------------- +1.3.0 03/03/2014 + - New features: + - CUB's collective (block-wide, warp-wide) primitives underwent a minor + interface refactoring: + - To provide the appropriate support for multidimensional thread blocks, + The interfaces for collective classes are now template-parameterized + by X, Y, and Z block dimensions (with BLOCK_DIM_Y and BLOCK_DIM_Z being + optional, and BLOCK_DIM_X replacing BLOCK_THREADS). Furthermore, the + constructors that accept remapped linear thread-identifiers have been + removed: all primitives now assume a row-major thread-ranking for + multidimensional thread blocks. + - To allow the host program (compiled by the host-pass) to + accurately determine the device-specific storage requirements for + a given collective (compiled for each device-pass), the interfaces + for collective classes are now (optionally) template-parameterized + by the desired PTX compute capability. This is useful when + aliasing collective storage to shared memory that has been + allocated dynamically by the host at the kernel call site. + - Most CUB programs having typical 1D usage should not require any + changes to accomodate these updates. + - Bug fixes: + - Fixed bug in cub::WarpScan (which affected cub::BlockScan and + cub::DeviceScan) where incorrect results (e.g., NAN) would often be + returned when parameterized for floating-point types (fp32, fp64). + - Workaround-fix for ptxas error when compiling with with -G flag on Linux + (for debug instrumentation) + - Misc. workaround-fixes for certain scan scenarios (using custom + scan operators) where code compiled for SM1x is run on newer + GPUs of higher compute-capability: the compiler could not tell + which memory space was being used collective operations and was + mistakenly using global ops instead of shared ops. + +//----------------------------------------------------------------------------- + 1.2.3 03/03/2014 - Bug fixes: - Fixed access violation bug in DeviceReduce::ReduceByKey for non-primitive value types diff --git a/README.md b/README.md index a1a60911e3..3eae36af7b 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@
+05/12/2014 +[CUB v1.3.0](download_cub.html) + | + - New features: + - CUB's collective (block-wide, warp-wide) primitives underwent a minor + interface refactoring: + - To provide the appropriate support for multidimensional thread blocks, + The interfaces for collective classes are now template-parameterized + by X, Y, and Z block dimensions (with \p BLOCK_DIM_Y and \p BLOCK_DIM_Z being + optional, and \p BLOCK_DIM_X replacing \p BLOCK_THREADS). Furthermore, the + constructors that accept remapped linear thread-identifiers have been + removed: all primitives now assume a row-major thread-ranking for + multidimensional thread blocks. + - To allow the host program (compiled by the host-pass) to + accurately determine the device-specific storage requirements for + a given collective (compiled for each device-pass), the interfaces + for collective classes are now (optionally) template-parameterized + by the desired PTX compute capability. This is useful when + aliasing collective storage to shared memory that has been + allocated dynamically by the host at the kernel call site. + - Most CUB programs having typical 1D usage should not require any + changes to accomodate these updates. + - Bug fixes: + - Fixed bug in cub::WarpScan (which affected cub::BlockScan and + cub::DeviceScan) where incorrect results (e.g., NAN) would often be + returned when parameterized for floating-point types (fp32, fp64). + - Workaround-fix for ptxas error when compiling with with -G flag on Linux + (for debug instrumentation) + - Misc. workaround-fixes for certain scan scenarios (using custom + scan operators) where code compiled for SM1x is run on newer + GPUs of higher compute-capability: the compiler could not tell + which memory space was being used collective operations and was + mistakenly using global ops instead of shared ops. + - See the [change-log](CHANGE_LOG.TXT) for further details + |
04/01/2014 -[CUB v1.2.3](download_cub.html) +CUB v1.2.3 |
- Bug fixes:
- Fixed access violation bug in DeviceReduce::ReduceByKey for non-primitive value types
diff --git a/test/test_block_scan.cu b/test/test_block_scan.cu
index 31f4d13c31..16b7bc6b4f 100644
--- a/test/test_block_scan.cu
+++ b/test/test_block_scan.cu
@@ -642,7 +642,7 @@ void Test(
#if defined(_WIN32) || defined(_WIN64)
// Accommodate ptxas crash bug (access violation) on Windows
- static const bool special_skip = (TEST_ARCH <= 130) && (Equals |