Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate space for locals together with stack #382

Merged
merged 3 commits into from
Sep 23, 2020
Merged

Conversation

chfast
Copy link
Collaborator

@chfast chfast commented Jun 9, 2020

This move memory management of args and locals to OperandStack where
space for all can be allocated in single go.

@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #382 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #382   +/-   ##
=======================================
  Coverage   98.73%   98.74%           
=======================================
  Files          58       58           
  Lines        8703     8765   +62     
=======================================
+ Hits         8593     8655   +62     
  Misses        110      110           

@axic
Copy link
Member

axic commented Jul 16, 2020

This seems to be very similar to #358.

What is the story with this?

@axic axic mentioned this pull request Sep 3, 2020
@chfast chfast force-pushed the locals_optimization branch 2 times, most recently from cd4815e to c28b01f Compare September 10, 2020 15:15
@chfast
Copy link
Collaborator Author

chfast commented Sep 10, 2020

I resurrected this one because I will need it in near future.
Benchmarks are not great, but reviews are welcome anyway.

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.0298         +0.0302            75            77            75            77
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.0305         +0.0307          1140          1175          1140          1175
fizzy/execute/ecpairing/onepoint_mean                             -0.0227         -0.0225        389763        380929        389707        380931
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0390         -0.0389            98            94            98            94
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0325         -0.0325          1442          1396          1442          1396
fizzy/execute/memset/256_bytes_mean                               +0.0142         +0.0143             6             6             6             6
fizzy/execute/memset/60000_bytes_mean                             +0.0272         +0.0272          1385          1422          1385          1422
fizzy/execute/mul256_opt0/input0_mean                             +0.0244         +0.0245            24            25            24            25
fizzy/execute/mul256_opt0/input1_mean                             +0.0228         +0.0227            24            25            24            25
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.0770         -0.0770           125           116           125           116
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0116         -0.0117            85            84            85            84
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0262         -0.0262          1197          1166          1197          1166
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0352         -0.0352            88            85            88            85
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0365         -0.0365          1212          1168          1212          1168
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0312         -0.0312         41338         40047         41338         40046
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.2578         -0.2578             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0742         +0.0742             4             5             4             5
fizzy/execute/micro/factorial/10_mean                             -0.4361         -0.4361             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             -0.4605         -0.4605             1             0             1             0
fizzy/execute/micro/fibonacci/24_mean                             -0.3686         -0.3686          7465          4714          7465          4713
fizzy/execute/micro/host_adler32/1_mean                           -0.2608         -0.2608             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         -0.0155         -0.0155             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0074         -0.0074            29            29            29            29
fizzy/execute/micro/spinner/1_mean                                -0.4394         -0.4394             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0629         -0.0629             9             8             9             8

@chfast chfast requested review from axic and gumb0 September 10, 2020 15:19
@chfast chfast marked this pull request as ready for review September 10, 2020 15:29
@chfast
Copy link
Collaborator Author

chfast commented Sep 10, 2020

LTO builds looks a bit better:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0107         -0.0107            86            85            86            85
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0100         -0.0100          1299          1286          1299          1286
fizzy/execute/ecpairing/onepoint_mean                             -0.0579         -0.0579        434523        409343        434527        409348
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0549         -0.0549           107           101           107           101
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0615         -0.0615          1553          1458          1553          1458
fizzy/execute/memset/256_bytes_mean                               -0.0340         -0.0340             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             -0.0308         -0.0308          1593          1544          1593          1544
fizzy/execute/mul256_opt0/input0_mean                             -0.0275         -0.0275            28            27            28            27
fizzy/execute/mul256_opt0/input1_mean                             -0.0265         -0.0265            28            27            28            27
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.0119         -0.0119           133           131           133           131
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0417         -0.0417            94            90            94            90
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0391         -0.0391          1314          1262          1314          1262
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0435         -0.0435            96            92            96            92
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0385         -0.0385          1323          1272          1323          1272
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0593         -0.0593         42790         40253         42790         40253
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.3254         -0.3254             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0100         +0.0100             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             -0.4073         -0.4073             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             -0.4338         -0.4338             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             -0.3372         -0.3372          7620          5050          7620          5050
fizzy/execute/micro/host_adler32/1_mean                           -0.2280         -0.2280             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         +0.0298         +0.0298             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0593         -0.0593            31            29            31            29
fizzy/execute/micro/spinner/1_mean                                -0.4066         -0.4066             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0275         -0.0275            11            10            11            10

Copy link
Collaborator

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, would be nice to add some tests with non-empty locals and args.

lib/fizzy/stack.hpp Show resolved Hide resolved
@gumb0
Copy link
Collaborator

gumb0 commented Sep 11, 2020

Does it make sense to try to increase small_storage_size because it now includes not only stack, so might go above more often?

@chfast
Copy link
Collaborator Author

chfast commented Sep 11, 2020

Does it make sense to try to increase small_storage_size because it now includes not only stack, so might go above more often?

I tried with 1K and 2K sizes. The performance is not much better, but it causes stack overflow in some infinite call tests.
I think it is fine to leave it as is because #529 replaces "small storage" with "external storage" anyway. That main point of this PR is to separate storage management for args. locals and stack so we can experiment with storage allocation more easily.

@axic axic mentioned this pull request Sep 11, 2020
@chfast chfast force-pushed the locals_optimization branch 2 times, most recently from ab6bea2 to e6917fa Compare September 15, 2020 11:01
@chfast
Copy link
Collaborator Author

chfast commented Sep 15, 2020

We may consider renaming OperandStack type as it now manages:

  • operand stack,
  • locals,
  • arguments,
  • "stack space" allocation.

In future it will probably be converted to RAII type and cooperate with "thread execution context" e.g. to bump call depth.

Proposals:

  • ExecutionContext
  • FrameContext
  • Frame

@chfast
Copy link
Collaborator Author

chfast commented Sep 15, 2020

Does it make sense to try to increase small_storage_size because it now includes not only stack, so might go above more often?

I tried with 1K and 2K sizes. The performance is not much better, but it causes stack overflow in some infinite call tests.
I think it is fine to leave it as is because #529 replaces "small storage" with "external storage" anyway. That main point of this PR is to separate storage management for args. locals and stack so we can experiment with storage allocation more easily.

I also tried the small storage of 256 bytes, but I see no performance difference. So leaving it at 128.

lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, it would be also nice to test rbegin() / rend() for case with locals.

TEST(operand_stack, large)
{
constexpr auto max_height = 33;
OperandStack stack({}, 0, max_height);
ASSERT_GT(address_diff(&stack, stack.rbegin()), 100) << "not allocated on the heap";
Copy link
Member

@axic axic Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this GT while the others LT? Ah I see heap in the message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 100 is arbitrary.

lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
Copy link
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test where it tries to pop the stack to access the locals?

lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
@@ -61,37 +67,63 @@ class OperandStack
/// in the constructor after the m_storage.
Value* m_top;

Value* m_bottom;

Value* m_locals;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the order of these three manifest speed differences?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected this. New version may be slightly faster, but hard to tell for sure as they are very close.

if (max_stack_height > small_storage_size)
m_large_storage = std::make_unique<Value[]>(max_stack_height);
m_top = bottom() - 1;
const auto num_args = args.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this one could make sense to be typed as size_t because then we can be sure storage_size_required is size_t? Though I guess size() returns that anyhow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine now, but probably should be changed to uint32_t at some point.

lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
lib/fizzy/stack.hpp Outdated Show resolved Hide resolved
@chfast
Copy link
Collaborator Author

chfast commented Sep 21, 2020

Can you add a test where it tries to pop the stack to access the locals?

Extended existing tests to have this check.

This move memory management of args and locals to OperandStack where
space for all can be allocated in single go.
@chfast chfast merged commit 33b91b9 into master Sep 23, 2020
@chfast chfast deleted the locals_optimization branch September 23, 2020 06:58
@chfast chfast mentioned this pull request Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants