forked from ANL-CESAR/RSBench
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES
59 lines (51 loc) · 2.84 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
=====================================================================
NEW IN VERSION 5
=====================================================================
- Added a new temperature dependence feature that
uses Doppler broadening to translate pole data from 0K to any
arbitrary material temperature. This is accomplished by evaluating
the Faddeeva function (via an exponential multiplied by the standard
C error function).
This new Doppler mode is by default enabled in Version 5. It is
set to default, but can be disabled via the "-d" flag on the command
line.
Impact varies by compiler, but is relatively small (only a
~10-15% slowdown).
- Changed the default runtime parameters from 250 windows per nuclide
down to 100 windows per nuclide (increasing the number of poles
per lookup from an average of 4 to an average of 10). This
directly impacts performance, so expect lookups/sec to change in
Version 5 as compared to Version 4. This change was made as more
nuclides have had multipole data generated for them in the time
since RSBench was originally written, so estimates for these variables
have been updated to more accurately reflect the full scale code.
=====================================================================
NEW IN VERSION 4
=====================================================================
- Minor bug fix that caused incorrect operation when compiled with
GCC (intel worked fine). This was due to a bug in OpenMP causing
private variables to not be correctly copied into the parallel
region. The result was that the RNG was improperly initialized,
generating the same value every time, speeding up the calculation
significantly. Intel's OMP implementation was correct so all
results from intel runs should have the same performance.
=====================================================================
NEW IN VERSION 3
=====================================================================
- [Bug Fix] Fixed an issue with the window pole assignment
function (generate_window_params). There was a counter that was
not being properly incremented, causing some windows to be
assigned the same poles, and causing an issue with a border case
receiving many more poles than others. This issue has been fixed
so that all windows should now recieve about the same number of
poles, and all windows should be unique. This change effects
performance slightly, showing a 10-15% speedup in version 3
vs. version 2.
=====================================================================
NEW IN VERSION 2
=====================================================================
- [Optimization] Moved the sigTfactor dynamic array allocation out
of the inner program loop and back up to the top so millions of
allocs/free's are saved. This appears to increase performance
significantly (~33%) when compared to v1.
=====================================================================