-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wchar array encoding in structs #1348
base: dev
Are you sure you want to change the base?
Conversation
Thanks for the quick reaction! |
As a follow-up, this problem also seems to be ocurring with struct.ref assignments, example of my own project where the same problem occurs (in assigning fdata_obj.cFileName), even after my patch:
I'll await your reply to see if this also needs its own patch or if we need to find another solution with regards to the python versions. |
I tested that on a Python 3.9 running on top of WSL, and it behaves like 3.8 (that is, use excessive padding) so it is not a matter of versions after all. I beleive that could be a result of a default encoding, where Python over POSIX set it to UTF-8 by default [means that any string you provide it is already implemented as a UTF-8 buffer under the hood, which gets re-padded by My Windows Python encoding is set to cp1252 (Western Europe), so it might be the reason it works OK there.
>>> import locale ; locale.getpreferredencoding()
'UTF-8'
>>> import ctypes ; bytes(ctypes.create_unicode_buffer("hello", 8))
b'h\x00e\x00l\x00l\x00o\x00\x00\x00\x00\x00\x00\x00'
>>>
>>> import locale ; locale.getpreferredencoding()
'cp1252'
>>> import ctypes ; bytes(ctypes.create_unicode_buffer("hello", 8))
b'h\x00e\x00l\x00l\x00o\x00\x00\x00\x00\x00\x00\x00' So I am a bit confued here.. I am pretty much convienced this is an encoding / locale issue, but I cannot figure out how to determine that in order to work around it. |
Yeah i'm pretty sure it has to do with encoding of arrays. When i tested it with some dummy code that called bytes() seperately for each ctypes.c_wchar, that does result in proper encoding (with proper \x00 padding) Calling bytes() on a ctypes_c_wchar_array object results in double encoding (with \x00\x00\x00 padding). |
I think it is dependent on if your python build is compiled using UCS2 or UCS4, can you check which type you are running? |
Both show a result of ctypes.sizeof(ctypes.c_wchar) One returns 4 ("extra padding"), while the other returns 2 ("normal padding"). I do believe this has something to do with the locale, but I can't figure it out. |
I've spent quite some time following this up, but the most logical conclusion i can think of is that ctypes follows the "internal" python representation of the OS, and it is still related to Linux/Mac representing a wchar_t as 4 bytes while windows represents it as 2 bytes. |
No doubt about that. |
I just pushed a fix that adds a wrapper class to substitue c_wchar arrays and represent them as an array of c_ubytes under the hood. This should ensure that they are always handled as UCS2 regardless of the host os. I tested it and it passes the struct tests of qiling & it works with the binary i'm emulating 👍 |
Why did you add +2 for null terminator? I think you are confusing it with |
My bad, that indeed causes misalignment. For my use-case that didn't matter, but i'll change it to the proper length of bytes in a new commit. I have a different question about some of the other windows struct types, could you elaborate on why you internally represent a lot of the string pointers types as the STRING type in qiling/os/windows/api.py? For example |
Tagging an argument as However, I will add this to my fix backlog [probably through OS resolver handlers]. |
Checklist
Which kind of PR do you create?
Coding convention?
Extra tests?
Changelog?
Target branch?
One last thing
Structs that use the ctypes.c_wchar_array type were being double encoded, causing the bytes to be padded with '\x00\x00\x00' instead of just '\x00'. I added a few lines to struct.py that fix this.
Example code that shows the problem by reading KUSER_SHARED_DATA->NtSystemRoot: