Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghidra #3

Open
maidenone opened this issue Oct 29, 2020 · 32 comments
Open

Ghidra #3

maidenone opened this issue Oct 29, 2020 · 32 comments

Comments

@maidenone
Copy link

maidenone commented Oct 29, 2020

previous work on RV32 and RV64
https://delaat.net/rp/2019-2020/p49/report.pdf

https://reverseengineering.stackexchange.com/questions/22558/reversing-a-key-gen-firmware-for-risc-v

Ghidra release do not support Risc-V but if you install from source it does.

@gamelaster
Copy link
Member

I already tried nightly Ghidra (there are some repos with prebuilt windows binaries), sadly,
I wasn't able to configure Ghidra to correctly disassemble the blobs (yet)

@micahswitzer
Copy link

I was able to get a nightly build partially working. The batch import didn't work properly, and it also doesn't support a few kinds of relocations needed, but I was able to load a few object files manually.

There's a fork of Ghidra that has a few more updates to the RISCV module, but I haven't had the chance to check it out yet: https://github.com/mumbel/ghidra/tree/riscv

@stschake
Copy link
Contributor

stschake commented Oct 29, 2020

I've used the out-of-tree version here: https://github.com/mumbel/ghidra_riscv

Seems to work okay, even if it doesn't have the exact ISA. In general, I'd recommend just giving Ghidra an ELF from the build instead of trying to get it to digest the raw object files.

This is the built processor, extract to $GHIDRA_DIR -> Ghidra -> Processors:
ghidra_9.1.2_PUBLIC_20201029_ghidra_riscv.zip

@micahswitzer
Copy link

Ah okay, I didn't realize that version existed. That's much cleaner.

I guess the reason I'd prefer reversing straight from the objects is so that we can more easily isolate the behavior of each API function. Maybe the goal is not so much to duplicate their API in C as it is to identify how to interact with the radios? I guess I'd like to see a clear goal established when it comes to the RE work.

@gamelaster
Copy link
Member

@micahswitzer This is very good question. Although, I think the most easiest way of testing the setup if our RE'ed implementation works, is just reimplement their API.

@maidenone
Copy link
Author

I got 3 BL602 boards on its way to me, i will make them remotely available if there are things we want to execute on real hardware. I also got a SDR that i can hook up to look at what happens in the spectrum when we poke at registers.

@gamelaster
Copy link
Member

@maidenone well, how you want to deal with flashing? AFAIK, the flash tools are closed-sourced.

@maidenone
Copy link
Author

have not thought about that. but given that it is a SiFive E25 core that uses JTAG, making OpenOCD talk with it should not be that hard?

I have poked around with OpenOCD and BMP code before to add new targets.

@WildCryptoFox
Copy link

My understanding of ghidra's decompiler is that it is written in C++ and doesn't depend on Java at all; but the linked repo appears to use Java..? I don't (yet) have Ghidra, nor Java, could someone present some decompilation samples over the objects in this repository?

Yesterday, I experimented with adding RISC-V support to r2dec. The output is a naive translation of assembly to a pseudo-c; arguably not much better than the assembly itself. The result could be greatly improved with post-processing but r2dec isn't really designed for data-flow transformations, so this would be limited to trivial cases.

This might be enough but I'd rather invest time in a new decompiler, which can use deep data-flow analysis to simplify the result. I'd start with RVSDG, an optimization-friendly data-flow intermediate representation in SSA-form without the total order of control-flow graphs. Such a decompiler could naturally be repurposed as an optimizing recompiler.

Would anyone be interested in either working on improving an existing decompiler or working towards a new one?

@micahswitzer
Copy link

Yes, the decompiler itself is written in C++. However, the processor specifications are written in a DSL called Pcode, and the code that tells Ghidra how to load platform specific relocations and DWARF information is written in Java.

So yes, you can run the decompiler without Java (I believe radare can do that), but it's much more useful if you use it within the context of Ghidra with all of the tools that Ghidra has to offer.

That being said, the RISCV module for Ghidra is not quite production ready. In my incredibly brief testing, I noted that most of the non-trivial relocation types were not implemented. I also read that there were some other issues with the pcode that caused Ghidra to misinterpret the meaning of the assembly (not the disassembly itself). I'm relatively new to the RISC-V ISA, but I'd be willing to see if I could at least implement the missing relocations which would greatly improve the usability of Ghidra for this project.

I'm also willing to lend another set of eyes on such an effort should someone else with more experience want to tackle this issue with another RE platform.

@stschake
Copy link
Contributor

stschake commented Oct 30, 2020

I might be misunderstanding, but since the ELF here is built purely so the code can be pulled out of it for a raw flash image, it won't have any relocations - there isn't any code in the ROM that could load them anyway. So while the Ghidra RISCV processor doesn't support a lot of them, that doesn't matter if you load the ELF into it?

@micahswitzer
Copy link

You are correct that there will be no relocations in the final binary loaded into ROM. However, since the library code references other internal functions and data structures, relocations are necessary to allow for flexibility during the final linking step.

I think you may be suggesting that we could simply compile and link a sample application which we could then RE since it would no longer have any relocations. If so, issue #6 suggests the same thing.

@Yangff
Copy link
Contributor

Yangff commented Oct 31, 2020

Yes, the decompiler itself is written in C++. However, the processor specifications are written in a DSL called Pcode, and the code that tells Ghidra how to load platform specific relocations and DWARF information is written in Java.

So yes, you can run the decompiler without Java (I believe radare can do that), but it's much more useful if you use it within the context of Ghidra with all of the tools that Ghidra has to offer.

That being said, the RISCV module for Ghidra is not quite production ready. In my incredibly brief testing, I noted that most of the non-trivial relocation types were not implemented. I also read that there were some other issues with the pcode that caused Ghidra to misinterpret the meaning of the assembly (not the disassembly itself). I'm relatively new to the RISC-V ISA, but I'd be willing to see if I could at least implement the missing relocations which would greatly improve the usability of Ghidra for this project.

I'm also willing to lend another set of eyes on such an effort should someone else with more experience want to tackle this issue with another RE platform.

yes, plaease use the elf I pushed to the blobs. (or, if you want, compile them like @micahswitzer said ) They should also contain all the symbols and in my case, ghidra with risc-v plugin load them mostly good. Only problem is the floating point, which I think we can do it manually.

However, it seems that the decompile result contains some problem. as far as I can read, some memory r/w are missing.. i can see them in the assembly, but they disappear in decompile.

Snipaste_2020-10-30_22-38-34

@WildCryptoFox
Copy link

WildCryptoFox commented Oct 31, 2020

Reko, a capstone-based decompiler, might be a candidate. Unfortunately it doesn't seem to understand at least RISC-V ELF relocations. I don't know how much work is needed.

@mumbel
Copy link
Contributor

mumbel commented Nov 1, 2020

if you come across any disassembly/instruction issues (I just fixed a bug in c.fsw and c.fswsp) or if you come across any ELF unimplemented relocations (I didn't notice any yet) feel free to file an issue on either of my repos. If its a bigger non-RISCV issue, i'd go ahead and file with ghidra's issues. There is an open bug being looked at if you come across a subtract popup.

edit: this was done in my side time, and I haven't used this module extensively so there are likely bugs, if anything looks off, comments are welcome, and hopefully 9.2 will be out soon so you don't have to build ghidra, but would for sure at least use my RISCV/data/languages/ which is compiled by ghidra at runtime based on if the timestamp for the .sla file is out of date/doesn't exist yet.

@WildCryptoFox
Copy link

WildCryptoFox commented Nov 3, 2020

@Yangff phy_init as derived using Reko. Do you see any issues with this result? (see also #14)

The if(false) looks suspicious. I wonder if this is assuming a static read from memory it doesn't understand is mmio? (cc @uxmal)

@Yangff
Copy link
Contributor

Yangff commented Nov 3, 2020

@Yangff phy_init as derived using Reko. Do you see any issues with this result? (see also #14)

The if(false) looks suspicious. I wonder if this is assuming a static read from memory it doesn't understand is mmio? (cc @uxmal)

The first cond should be

  uVar1 = ((__DATA_44c00000 >> 8 & 0xf) - 1 & 0xff) << 4;
  if ((uVar1 & 0xffffff8f) != 0) {
    assert_err("(((uint32_t)rxnssmax << 4) & ~((uint32_t)0x00000070)) == 0","module",0xa09);
  }
  __DATA_44c00820 = uVar1 | __DATA_44c00820 & 0xffffff8f;

as decompiled by ghidra.

This is actually an inlined funciton, and should have a name like mdm_rxnssmax_setf.

void mdm_rxnssmax_setf(uint8_t rxnssmax) {
  assert_err((((uint32_t)rxnssmax << 4) & ~((uint32_t)0x00000070)) == 0);
  REG_PL_WR(0x44c00820,  ((uint32_t)rxnssmax << 4) | REG_PL_RD(0x44c00820) & ~((uint32_t)0x00000070) );
}

All other if (false) seems to have the same problme.

@WildCryptoFox
Copy link

@Yangff Could you upload ghidra's decompilation results for the 3 ELFs for all who don't have ghidra setup?

@Yangff
Copy link
Contributor

Yangff commented Nov 3, 2020

ghidra's

yes, let me try.

@Yangff
Copy link
Contributor

Yangff commented Nov 3, 2020

@Yangff Could you upload ghidra's decompilation results for the 3 ELFs for all who don't have ghidra setup?

added #15

@stschake
Copy link
Contributor

stschake commented Nov 3, 2020

@Yangff I think the problem with disappearing writes is that Ghidra doesn't know about the memory-mapped PHY stuff. I fixed that by adding it to the Memory Map with Start 0x44c00000, Size 0xd000 and marking it Read/Write+Volatile. Check mdm_reset and you should then see two distinct writes instead of the previous coalesced one (which wouldn't have worked to reset the thing).

I've also attached my notes on the various PHY registers there:
phy.txt

@micahswitzer
Copy link

Yes, volatile memory regions are key for getting Ghidra to interpret mmio properly.

@stschake could you create a PR with that text file? I think it would be incredibly useful to keep a running list of registers and their functions as we continue to RE the blobs.

@stschake
Copy link
Contributor

stschake commented Nov 3, 2020

I've sent pine64/bl602-docs#18

There is another mmio peripheral at 0x44b00000 (till ~0x44b09000) that has what the firmware calls mm or MAC management.

@uxmal
Copy link

uxmal commented Nov 4, 2020

The binaries @WildCryptoFox provided have exposed some bugs in Reko's Risc-V disassembler, specifically the decoding of Risc-V compressed instructions. I'm working on fixes and will have something by end of today.

@micahswitzer
Copy link

I've implemented the relocations necessary to load the raw libraries/objects into Ghidra. I'm not 100% sure it works correctly, but everything was looking nice in my testing. I've attached a build of my version of the extension here. If you find any issues with relocations specifically, you can open an issue on my fork here.

Now that I have them working, I can finally start doing some actual RE!

@mumbel
Copy link
Contributor

mumbel commented Nov 6, 2020

@micahswitzer nice, that's a decent amount of relocations (not sure why I left all those TODO comments for the ones I implemented, maybe they were untested). Hadn't come across the need to handle unlinked ELFs until now, guessing most of those go away after linking, or did you see some unimplemented in the linked demos as well?

I'd fork NSA's ghidra repo and submit a PR for the new additions, probably wouldn't make it into 9.2 (which should be released soon reading their comments about it), but at least 9.2.1 hopefully. Not sure when you forked mine, but what is currently in my ghidra_riscv is what is in ghidra repo (just in tree).

@mumbel
Copy link
Contributor

mumbel commented Nov 13, 2020

9.2 was released today, which includes RISC-V support

@micahswitzer
Copy link

@mumbel I just saw that. Great work on that feature, it will be very useful for this project!

I will probably spend some time this weekend cleaning up my code so that I can submit a PR as you suggested.

@lorenz
Copy link

lorenz commented Nov 24, 2020

FYI: https://github.com/pine64/bl602-docs/tree/main/hardware_notes#rf-ip

I found most of the code as source deep in various SDKs that people posted. As far as I can see most functions are in there and for all functions where I checked the behavior in the blob is the same as in the source.

@rpavlik
Copy link

rpavlik commented Dec 10, 2020

So given the source discovery, I'm not sure if any decompiling is still needed (I want to contribute but I'm having a hard time figuring out what exactly the intermediate goals are), but FWIW, you can add the 3 ROM sections mentioned in the link above to the memory map in Ghidra, and you can load a slightly modified version of the SVD
soc602_reg.svd.txt

with a slightly modified version of https://leveldown.de/blog/svd-loader/ (just comment out the sys.exit() call for non-cortex-m cpus), and it appears to work OK.

@gamelaster
Copy link
Member

Hi @rpavlik , at the moment, there isn't any target, because we are waiting until Bouffalo officially, it should be in end of this month according this post. After that, we will decide and focus on spare blobs 😊

@sajattack
Copy link

I cracked this binary open in Ghidra as soon as I found out about it. Eager to contribute if I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests