This is the main entry point to the program. Oyente is able to analyze smart contracts via the following inputs
- solidity program
- evm bytecode
- remote contracts
Other configuration options include getting the input state, setting timeouts for z3, etc. (Check python oyente.py --help
or global_params.py
for the full list of configuration options available).
These options are collated and set in the global_params module which will be used during the rest of the execution.
The contracts are then disassembled into opcodes using the evm disasm
command.
After this, the symexec module is called with the disassembled file which carries out the analyses of the contracts for various vulnerabilities (TOD, timestamp-dependence, mishandled exceptions).
The analysis starts off with the build_cfg_and_analyze
function. We break up the disasm file created by oyente.py into tokens using the native tokenize python module.
The collect_vertices and construct_bb functions identify the basic blocks in the program and we store them as vertices. Basic blocks are identified by using opcodes like JUMPDEST
, STOP
, RETURN
, SUICIDE
, JUMP
and JUMPI
as separators. Each basic block is backed by an instance of BasicBlock class defined in basicblock.py
After the basic blocks are created, we start to symbolically execute each basic block with the full_sym_exec function. We get the instructions stored in each basic block and execute each of them symbolically via the sym_exec_ins function. In this function, we model each opcode as closely as possible to the behaviour described in the ethereum yellow paper. Some interesting details regarding each class of opcodes is discussed below.
The stack is modelled using a simple python list.
The memory is modelled as a growing list. The maximum index of used by the memory list is stored as current_miu_i
variable.
The storage is stored as a python object as key-value pairs.
These group of opcodes is the most straightforward to implement. If one of the operands is symbolic, both of them are converted into a 256-bit symbolic variable. The arithmetic operation is carried out (symbolically, if the operands are symbolic) and the result is pushed on to the stack.
A generic symbolic variable is created to mimic the behaviour of the SHA3 opcode
For most of these opcodes, a unique symbolic variable is generated to represent it (similar to SHA3). In some cases, to speed up the symbolic execution, concrete values for these opcodes are taken from the state.json file. This behaviour is enabled via the --state flag. We haven't found ways to robustly simulate CODECOPY
and EXTCODESIZE
symbolically yet.
New edges which are found during analysing the JUMP
and JUMPI
instructions are added to the call graph on the fly.
To handle the CALL
and CALLCODE
opcodes, we construct symbolic expressions to ensure there are enough funds in the sender's account and the sender's address is different from the receiver's address. If these conditions hold true, we update the corresponding global state.
After this, add this basic block to the list of already visited blocks and follow it to the next basic block. We also maintain the necessary path conditions required to get to the block in the path_conditions_and_vars
variable. In case of instructions like JUMP, there is only one basic block to follow the program execution to. In other cases like JUMPI
, we first check if the branch expression is provably True or False using z3. If not, we explore both the branches by adding the branch expression and the negated branch expression to the path_conditions_and_vars
variable.
- Callstack attack
Checking for the callstack attack is done by the check_callstack_attack function. If a CALL
or a CALLCODE
instruction is found without the SWAP4, POP, POP, POP, POP, ISZERO
(or SWAP3 followed by 3 POP, etc.) following it, we flag it as being vulnerable to the callstack attack. This opcode sequence is the one generated by solc corresponding to the following recommended code pattern to prevent against the attack.
if (owner.send(amount)) {..}
- Timestamp dependence attack
We find out if the path_conditions
variable contains the symbolic variable corresponding to the block timestamp. If so, the program can be concluded to take a path in the program which makes use of the block timestamp, making it vulnerable to the Timestamp dependence attack.
- Reentrancy bug
This presence of this bug is analysed in the check_reentrancy_bug
function in analysis.py. At each CALL that is encountered, we obtain the path condition for the execution before the CALL is executed. We then check if such condition with updated variables (e.g., storage values) still holds (i.e., if the call can be executed again). If so, we consider this a vulnerability, since it is possible for the callee to re-execute the call before finishing it.
We also consider the case that users use sender
and transfer
instead of call
function. It is safe to use sender
and transfer
because of the limited gas as part of send
and transfer
. To check whethera contract is safe or not based on the gas as part of these functions, we set a threshold to 2300 which is the amount of gas that sender
and transfer
provide. And then comparing the gas sent along with these functions with the threshold. If the gas is greater than the threshold, we flag the contract as being vulnerable to the reentrancy attack. Otherwise, we flag it as being safe.
- Concurrency bug
We track the sender, recepient and the value transferred at each CALL
and SUICIDE
instruction in the update_analysis
function. If these values are different for different flows, we report the bug in the detect_money_concurrency
function.
- Assertion fails
This feature is active only if the option -a
is used.
The feature verifies Solidity assertions, which tries to report assert
fails if INVALID instruction is reachable in the program. Because INVALID can be caused from different cases other than assert
, there would be some cases that lead to false positives due to the ambiguity between an INVALID generated by assert
and other types of INVALID. Currently, we consider all INVALID instructions to be derived from assert
except thoses that follow a sequences of JUMPDEST, CALLVALUE, ISZERO, PUSH, JUMPI instructions. To find the function that contains the assertion fails, we record the path that leads to the INVALID instruction. By using this path, we can trace back and find the top-level function that causes the failure in the check_assertions
function in symExec.py
.
This is a utility class to provide unqiue symbolic variables required for analysis
This is a utility class to map problematic opcodes into the source code
Testing opcodes in Oyente in order to check if opcodes are implemented correctly based on the final state of the storage and the memory. The tests are based on the VM tests of Ethereum.
The flow of testing:
- Load test data (using the existing EVM tests in here)
- Run oyente with the input was specified in the test data
- Compare the results (storage, memory and gas) after running oyente with the results being specified in the test data
- Report bugs
This is the main entry point to the testing program. The program loads a specific test data file in folder test_evm/test_data/
and start running oyente.py
with the input being specified in the loaded test data to get an exit code which is returned from oyente.py.
From this exit code the testing program can report the bug
A utility class to extract concerned sections and fields (code
, storage
, out
, gas
and gas
in exec
section) in the test data, run the tests, compare the results and return an exit code
compare_storage_and_gas_unit_test(global_state, analysis)
starts comparing the results and return an exit code after the final opcode is implemented