Code Structure

oyente.py

This is the main entry point to the program. Oyente is able to analyze smart contracts via the following inputs

solidity program
evm bytecode
remote contracts

Other configuration options include getting the input state, setting timeouts for z3, etc. (Check python oyente.py --help or global_params.py for the full list of configuration options available). These options are collated and set in the global_params module which will be used during the rest of the execution.

The contracts are then disassembled into opcodes using the evm disasm command.

After this, the symexec module is called with the disassembled file which carries out the analyses of the contracts for various vulnerabilities (TOD, timestamp-dependence, mishandled exceptions).

symExec.py

The analysis starts off with the build_cfg_and_analyze function. We break up the disasm file created by oyente.py into tokens using the native tokenize python module.

The collect_vertices and construct_bb functions identify the basic blocks in the program and we store them as vertices. Basic blocks are identified by using opcodes like JUMPDEST, STOP, RETURN, SUICIDE, JUMP and JUMPI as separators. Each basic block is backed by an instance of BasicBlock class defined in basicblock.py

After the basic blocks are created, we start to symbolically execute each basic block with the full_sym_exec function. We get the instructions stored in each basic block and execute each of them symbolically via the sym_exec_ins function. In this function, we model each opcode as closely as possible to the behaviour described in the ethereum yellow paper. Some interesting details regarding each class of opcodes is discussed below.

Model

The stack is modelled using a simple python list. The memory is modelled as a growing list. The maximum index of used by the memory list is stored as current_miu_i variable. The storage is stored as a python object as key-value pairs.

0s: Stop and Arithmetic Operations, 10s: Comparison & Bitwise Logic Operations

These group of opcodes is the most straightforward to implement. If one of the operands is symbolic, both of them are converted into a 256-bit symbolic variable. The arithmetic operation is carried out (symbolically, if the operands are symbolic) and the result is pushed on to the stack.

20s: SHA3

A generic symbolic variable is created to mimic the behaviour of the SHA3 opcode

30s: Environmental Information, 40s: Block Information

For most of these opcodes, a unique symbolic variable is generated to represent it (similar to SHA3). In some cases, to speed up the symbolic execution, concrete values for these opcodes are taken from the state.json file. This behaviour is enabled via the --state flag. We haven't found ways to robustly simulate CODECOPY and EXTCODESIZE symbolically yet.

40s: 50s: Stack, Memory, Storage and Flow Operations

New edges which are found during analysing the JUMP and JUMPI instructions are added to the call graph on the fly.

f0s: System operations

To handle the CALL and CALLCODE opcodes, we construct symbolic expressions to ensure there are enough funds in the sender's account and the sender's address is different from the receiver's address. If these conditions hold true, we update the corresponding global state.

After this, add this basic block to the list of already visited blocks and follow it to the next basic block. We also maintain the necessary path conditions required to get to the block in the path_conditions_and_vars variable. In case of instructions like JUMP, there is only one basic block to follow the program execution to. In other cases like JUMPI, we first check if the branch expression is provably True or False using z3. If not, we explore both the branches by adding the branch expression and the negated branch expression to the path_conditions_and_vars variable.

Callstack attack

Checking for the callstack attack is done by the check_callstack_attack function. If a CALL or a CALLCODE instruction is found without the SWAP4, POP, POP, POP, POP, ISZERO (or SWAP3 followed by 3 POP, etc.) following it, we flag it as being vulnerable to the callstack attack. This opcode sequence is the one generated by solc corresponding to the following recommended code pattern to prevent against the attack.

if (owner.send(amount)) {..}

Timestamp dependence attack

We find out if the path_conditions variable contains the symbolic variable corresponding to the block timestamp. If so, the program can be concluded to take a path in the program which makes use of the block timestamp, making it vulnerable to the Timestamp dependence attack.

Reentrancy bug

This presence of this bug is analysed in the check_reentrancy_bug function in analysis.py. At each CALL that is encountered, we obtain the path condition for the execution before the CALL is executed. We then check if such condition with updated variables (e.g., storage values) still holds (i.e., if the call can be executed again). If so, we consider this a vulnerability, since it is possible for the callee to re-execute the call before finishing it. We also consider the case that users use sender and transfer instead of call function. It is safe to use sender and transfer because of the limited gas as part of send and transfer. To check whethera contract is safe or not based on the gas as part of these functions, we set a threshold to 2300 which is the amount of gas that sender and transfer provide. And then comparing the gas sent along with these functions with the threshold. If the gas is greater than the threshold, we flag the contract as being vulnerable to the reentrancy attack. Otherwise, we flag it as being safe.

Concurrency bug

We track the sender, recepient and the value transferred at each CALL and SUICIDE instruction in the update_analysis function. If these values are different for different flows, we report the bug in the detect_money_concurrency function.

Assertion fails

This feature is active only if the option -a is used.

The feature verifies Solidity assertions, which tries to report assert fails if INVALID instruction is reachable in the program. Because INVALID can be caused from different cases other than assert, there would be some cases that lead to false positives due to the ambiguity between an INVALID generated by assert and other types of INVALID. Currently, we consider all INVALID instructions to be derived from assert except thoses that follow a sequences of JUMPDEST, CALLVALUE, ISZERO, PUSH, JUMPI instructions. To find the function that contains the assertion fails, we record the path that leads to the INVALID instruction. By using this path, we can trace back and find the top-level function that causes the failure in the check_assertions function in symExec.py.

vargenerator.py

This is a utility class to provide unqiue symbolic variables required for analysis

source_map.py

This is a utility class to map problematic opcodes into the source code

Tests

Testing opcodes in Oyente in order to check if opcodes are implemented correctly based on the final state of the storage and the memory. The tests are based on the VM tests of Ethereum.

The flow of testing:

Load test data (using the existing EVM tests in here)
Run oyente with the input was specified in the test data
Compare the results (storage, memory and gas) after running oyente with the results being specified in the test data
Report bugs

run_tests.py

This is the main entry point to the testing program. The program loads a specific test data file in folder test_evm/test_data/ and start running oyente.py with the input being specified in the loaded test data to get an exit code which is returned from oyente.py. From this exit code the testing program can report the bug

evm_unit_test.py

A utility class to extract concerned sections and fields (code, storage, out, gas and gas in exec section) in the test data, run the tests, compare the results and return an exit code

symExec.py

compare_storage_and_gas_unit_test(global_state, analysis) starts comparing the results and return an exit code after the final opcode is implemented

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code.md

code.md

Code Structure

oyente.py

symExec.py

Model

0s: Stop and Arithmetic Operations, 10s: Comparison & Bitwise Logic Operations

20s: SHA3

30s: Environmental Information, 40s: Block Information

40s: 50s: Stack, Memory, Storage and Flow Operations

f0s: System operations

vargenerator.py

source_map.py

Tests

run_tests.py

evm_unit_test.py

symExec.py

Files

code.md

Latest commit

History

code.md

File metadata and controls

Code Structure

oyente.py

symExec.py

Model

0s: Stop and Arithmetic Operations, 10s: Comparison & Bitwise Logic Operations

20s: SHA3

30s: Environmental Information, 40s: Block Information

40s: 50s: Stack, Memory, Storage and Flow Operations

f0s: System operations

vargenerator.py

source_map.py

Tests

run_tests.py

evm_unit_test.py

symExec.py