Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86][MC] Over-decode invalid instruction with mutual exclusive prefix and unmatch opcode #117306

Open
Mar3yZhang opened this issue Nov 22, 2024 · 1 comment
Labels
backend:X86 mc Machine (object) code

Comments

@Mar3yZhang
Copy link

Mar3yZhang commented Nov 22, 2024

Work environment

Questions Answers
OS/arch/bits x86_64 Ubuntu 20.04
Architecture x86_64
Source of Capstone git clone, default on master branch.
Version/git commit llvm-20git, f08278

minimum PoC disassembler

#include <llvm-c/Disassembler.h>
#include <llvm-c/Target.h>

int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
    LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
    LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
        errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
                                        LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
    }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
                                             output_string, sizeof(output_string));

    if (instr_len > 0) {
        printf("%s\n", output_string);
    } else {
        printf("Error: Unable to disassemble the input bytes.\n");
    }
}

Instruction bytes giving faulty results

f2 f0 41 0f b7 d6

Expected results

It should be:

Error: Unable to disassemble the input bytes.

Actually results

$./min_llvm_disassembler "f2f0410fb7d6"
        xacquire

Additional Logs, screenshots, source code, configuration dump, ...

This is similar to a verified bug in the capstone engine. Bytes "f2f0410fb7d6" can not be translated into valid x86 instructions because of mutual exclusive prefixes f2, f0 and LOCK prefix on register operation. But llvm MC accepts it into instruction xacquire. All the other instruction decoders like the Capstone, Zydis, and Xed reject the byte sequences. Not sure whether the workaround in this pull request can fix this.

@RKSimon RKSimon added backend:X86 mc Machine (object) code and removed new issue labels Nov 22, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2024

@llvm/issue-subscribers-backend-x86

Author: VinkyQZ (Mar3yZhang)

### Work environment
Questions Answers
OS/arch/bits x86_64 Ubuntu 20.04
Architecture x86_64
Source of Capstone git clone, default on master branch.
Version/git commit llvm-20git, f08278

<!-- INCORRECT DISASSEMBLY BUGS -->

minimum PoC disassembler

#include &lt;llvm-c/Disassembler.h&gt;
#include &lt;llvm-c/Target.h&gt;

int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
    LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
    LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
        errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
                                        LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
    }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
                                             output_string, sizeof(output_string));

    if (instr_len &gt; 0) {
        printf("%s\n", output_string);
    } else {
        printf("Error: Unable to disassemble the input bytes.\n");
    }
}

Instruction bytes giving faulty results

f2 f0 41 0f b7 d6

Expected results

It should be:

Error: Unable to disassemble the input bytes.

Actually results

$./min_llvm_disassembler "f2f0410fb7d6"
        xacquire

<!-- ADDITIONAL CONTEXT -->

Additional Logs, screenshots, source code, configuration dump, ...

This is similar to a verified bug in the capstone engine. Bytes "f2f0410fb7d6" can not be translated into valid x86 instructions because of mutual exclusive prefixes f2, f0 and LOCK prefix on register operation. But llvm MC accepts it into instruction xacquire. All the other instruction decoders like the Capstone, Zydis, and Xed reject the byte sequences. Not sure whether the workaround in this pull request can fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 mc Machine (object) code
Projects
None yet
Development

No branches or pull requests

3 participants