-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sign extending an i8 to i32 gives an unexpected result (LLVM-403) #102
Comments
@ivmarkov FYI |
@jothan What target you build your program for? What CPU: Xtensa or RISCV? |
@gerekon I should have mentioned, this is on Xtensa / xtensa-esp32-espidf. |
Full related code dump. mod netif {
pub fn extract_ip(value: &sys::esp_ip_addr_t) -> Option<IpAddr> {
match value.type_ as _ {
sys::ESP_IPADDR_TYPE_V4 => Some(
Ipv4Addr::from(u32::from_be(unsafe { value.u_addr.ip4.addr }).to_be_bytes()).into(),
),
sys::ESP_IPADDR_TYPE_V6 => Some(extract_ipv6(unsafe { value.u_addr.ip6.addr }).into()),
_ => None,
}
}
fn extract_ipv6(value: [u32; 4]) -> Ipv6Addr {
let mut out = [0; 16];
value
.into_iter()
.map(u32::from_be)
.map(u32::to_be_bytes)
.zip(out.array_chunks_mut())
.for_each(|(i, o)| {
*o = i;
});
Ipv6Addr::from(out)
}
}
pub mod dns {
use std::{ffi::CString, mem::ManuallyDrop, ops::DerefMut};
use netif::extract_ip;
use super::*;
const LWIP_DNS_ADDRTYPE_IPV4: u8 = 0;
const LWIP_DNS_ADDRTYPE_IPV6: u8 = 1;
extern "C" {
fn dns_gethostbyname_addrtype(
hostname: *const c_char,
addr: *mut sys::esp_ip_addr_t,
found: Option<
unsafe extern "C" fn(
name: *const c_char,
ipaddr: Option<NonNull<sys::esp_ip_addr_t>>,
callback_arg: *mut c_void,
),
>,
callback_arg: *mut c_void,
dns_addrtype: u8,
) -> sys::err_t;
}
type IpSender = async_oneshot::Sender<Result<sys::esp_ip_addr_t, Error>>;
#[derive(thiserror::Error, Debug, Clone, Copy)]
pub enum Error {
#[error("Invalid name")]
InvalidName,
#[error("DNS lookup error")]
LookupError,
}
pub struct Name(CString);
impl TryFrom<&str> for Name {
type Error = Error;
fn try_from(value: &str) -> Result<Self, Self::Error> {
value.to_owned().try_into()
}
}
impl TryFrom<String> for Name {
type Error = Error;
fn try_from(mut value: String) -> Result<Self, Self::Error> {
value.push(0 as char);
Ok(Name(
CString::from_vec_with_nul(value.into()).map_err(|_| Error::InvalidName)?,
))
}
}
#[expect(dead_code)]
pub async fn resolve_ipv4(name: &Name) -> Result<Ipv4Addr, Error> {
let ip = resolve(&name.0, LWIP_DNS_ADDRTYPE_IPV4).await?;
match ip {
IpAddr::V4(ip) => Ok(ip),
_ => Err(Error::LookupError),
}
}
#[expect(dead_code)]
pub async fn resolve_ipv6(name: &Name) -> Result<Ipv6Addr, Error> {
let ip = resolve(&name.0, LWIP_DNS_ADDRTYPE_IPV6).await?;
match ip {
IpAddr::V6(ip) => Ok(ip),
_ => Err(Error::LookupError),
}
}
/// Call resolve_raw from the TCP/IP thread
async fn resolve(name: &CStr, addr_type: u8) -> Result<IpAddr, Error> {
let (tx, rx) = async_oneshot::oneshot();
struct Ctx(*const c_char, u8, IpSender);
unsafe impl Send for Ctx {}
let mut ctx = ManuallyDrop::new(Ctx(name.as_ptr(), addr_type, tx));
unsafe { sys::esp_netif_tcpip_exec(Some(cb), ctx.deref_mut() as *mut _ as *mut c_void) };
unsafe extern "C" fn cb(ctx: *mut c_void) -> sys::esp_err_t {
let Ctx(name, addr_type, tx) = unsafe { std::ptr::read(ctx as *mut Ctx) };
unsafe { resolve_raw(name, addr_type, tx) };
sys::ESP_OK
}
rx.await
.map_err(|_| Error::LookupError)?
.and_then(|ip| extract_ip(&ip).ok_or(Error::LookupError))
}
unsafe fn resolve_raw(name: *const c_char, addr_type: u8, tx: IpSender) {
let mut ip: sys::esp_ip_addr_t = Default::default();
let tx: *mut IpSender = Box::into_raw(Box::new(tx));
let ret = unsafe {
dns_gethostbyname_addrtype(name, &mut ip, Some(cb), tx as *mut c_void, addr_type)
};
unsafe extern "C" fn cb(
_name: *const c_char,
ipaddr: Option<NonNull<sys::esp_ip_addr_t>>,
callback_arg: *mut c_void,
) {
let mut tx = unsafe { Box::from_raw(callback_arg as *mut IpSender) };
let ipaddr = ipaddr
.map(|ip| unsafe { ip.read() })
.ok_or(Error::LookupError);
let _ = tx.send(ipaddr);
}
// -5 gets "sign-extended" into 251 when doing i8 as i32
// Codegen bug ? https://github.com/espressif/llvm-project/issues/102
const OK: i8 = sys::err_enum_t_ERR_OK as i8;
const INPROGRESS: i8 = sys::err_enum_t_ERR_INPROGRESS as i8;
const ARG: i8 = sys::err_enum_t_ERR_ARG as i8;
match ret {
OK => {
let mut tx = unsafe { Box::from_raw(tx) };
let _ = tx.send(Ok(ip));
}
INPROGRESS => (),
ARG => {
let mut tx = unsafe { Box::from_raw(tx) };
let _ = tx.send(Err(Error::InvalidName));
}
_ => {
panic!("Unexpected result from DNS resolution system.");
}
}
}
} |
Can not reproduce this in C. @MabezDev any thoughts? Have you seen this? |
@jothan Is it possible to get disassembly for let ret: i8 = unsafe { dns_gethostbyname_addrtype(...) };
let ret_i32 = ret as i32; ? |
Problematic rust code with disassembly. Making sign_extender never inline makes the issue go away. Also, I'm having trouble getting xtensa-esp32-elf-objdump to show decoded instructions for this function for some reason. It shows disassembly for most other functions, it seems. #[no_mangle]
unsafe fn resolve_raw(name: *const c_char, addr_type: u8, tx: IpSender) {
let mut ip: sys::esp_ip_addr_t = Default::default();
let tx: *mut IpSender = Box::into_raw(Box::new(tx));
let ret = unsafe {
dns_gethostbyname_addrtype(name, &mut ip, Some(cb), tx as *mut c_void, addr_type)
};
unsafe extern "C" fn cb(
_name: *const c_char,
ipaddr: Option<NonNull<sys::esp_ip_addr_t>>,
callback_arg: *mut c_void,
) {
let mut tx = unsafe { Box::from_raw(callback_arg as *mut IpSender) };
let ipaddr = ipaddr
.map(|ip| unsafe { ip.read() })
.ok_or(Error::LookupError);
let _ = tx.send(ipaddr);
}
// -5 gets "sign-extended" into 251 when doing i8 as i32
// Codegen bug ? https://github.com/espressif/llvm-project/issues/102
match sign_extender(ret) {
sys::err_enum_t_ERR_OK => {
let mut tx = unsafe { Box::from_raw(tx) };
let _ = tx.send(Ok(ip));
}
sys::err_enum_t_ERR_INPROGRESS => (),
sys::err_enum_t_ERR_ARG => {
let mut tx = unsafe { Box::from_raw(tx) };
let _ = tx.send(Err(Error::InvalidName));
}
v => {
log::error!("crash: {v}");
panic!("Unexpected result from DNS resolution system.");
}
}
}
#[inline(always)]
fn sign_extender(v: i8) -> i32 {
v as i32
} 40104158 <resolve_raw>:
.map_err(|_| Error::LookupError)?
.and_then(|ip| extract_ip(&ip).ok_or(Error::LookupError))
}
#[no_mangle]
unsafe fn resolve_raw(name: *const c_char, addr_type: u8, tx: IpSender) {
40104158: 62010136
4010415c: 388100c1
40104160: e006ad31
let mut ip: sys::esp_ip_addr_t = Default::default();
40104164: 8a0c0008
40104168: 28814b0c
4010416c: 0008e030
40104170: 47520a7d
40104174: 82074904
40104178: e380ffa0
let tx: *mut IpSender = Box::into_raw(Box::new(tx));
let ret = unsafe {
dns_gethostbyname_addrtype(name, &mut ip, Some(cb), tx as *mut c_void, addr_type)
4010417c: 3942c110
40104180: ad394381
40104184: dd06bd02
40104188: 0008e007
4010418c: 1a87087c
let _ = tx.send(ipaddr);
}
// -5 gets "sign-extended" into 251 when doing i8 as i32
// Codegen bug ? https://github.com/espressif/llvm-project/issues/102
match sign_extender(ret) {
40104190: 87b87c29
40104194: fa56401a
40104198: 11617203
sys::err_enum_t_ERR_OK => {
let mut tx = unsafe { Box::from_raw(tx) };
4010419c: 4b18c162
let _ = tx.send(Ok(ip));
401041a0: 00c1b2a6
401041a4: a3818c1c
401041a8: 0008e02f
401041ac: 4182080c
401041b0: 39378118
401041b4: 06bd07ad
401041b8: 720003c6
}
sys::err_enum_t_ERR_INPROGRESS => (),
sys::err_enum_t_ERR_ARG => {
let mut tx = unsafe { Box::from_raw(tx) };
401041bc: 180c1161
let _ = tx.send(Err(Error::InvalidName));
401041c0: b20c5182
401041c4: 328118c1
401041c8: e007ad39
401041cc: c1a20008
401041d0: 39308144
401041d4: 1d0008e0
v => {
log::error!("crash: {v}");
panic!("Unexpected result from DNS resolution system.");
}
}
}
401041d8: 81d1a9f0
v => {
401041dc: 088830cd
log::error!("crash: {v}");
401041e0: 0c048816
401041e4: 0ca18908
401041e8: 81717917
401041ec: 6189392b
401041f0: c1829179
401041f4: 81818938
401041f8: f1893929
401041fc: 8934c182
40104200: 3927a1e1
40104204: e030c681
40104208: 61a20008
4010420c: 82082c15
40104210: 24911461
40104214: 13619239
40104218: 92126182
4010421c: c1a21161
40104220: 44c1c218
40104224: bd30c081
40104228: 0008e007
4010422c: f0 .byte 0xf0
4010422d: 41 .byte 0x41
... |
Hmm. Looks stragne. Could you send me program binary? |
@jothan I'm not able to repro this, a full repo (preferably minimized as much as possible) would be appreciated. In the mean time, if you print What is the size of |
A separate repo with build instructions is best, as we can try out various things to narrow it down further. |
@MabezDev here you go: https://github.com/jothan/llvm-sign-extension-bug I got it down to 250 lines and it is still fully runnable and reproduces the bug. |
Good news, objdump disassembly is now working. ASM dump: 400d5904 <resolve_raw>:
.map_err(|_| Error::LookupError)?
.and_then(|ip| extract_ip(&ip).ok_or(Error::LookupError))
}
#[no_mangle]
unsafe fn resolve_raw(name: *const c_char, addr_type: u8, tx: IpSender) {
400d5904: 010136 entry a1, 128
400d5907: 00c162 addi a6, a1, 0
400d590a: eabe81 l32r a8, 400d0404 <_stext+0x3e4> (400dadd8 <<esp_idf_sys::bindings::esp_mqtt_client_config_t_network_t as core::default::Default>::default>)
let mut ip: sys::esp_ip_addr_t = Default::default();
400d590d: 06ad mov.n a10, a6
400d590f: 0008e0 callx8 a8
400d5912: 8a0c movi.n a10, 8
400d5914: e9ca81 l32r a8, 400d003c <_stext+0x1c> (400d52ac <alloc::alloc::exchange_malloc>)
400d5917: 0008e0 callx8 a8
400d591a: 0a7d mov.n a7, a10
400d591c: 044752 s8i a5, a7, 4
400d591f: 0749 s32i.n a4, a7, 0
400d5921: ffa082 movi a8, 255
let tx: *mut IpSender = Box::into_raw(Box::new(tx));
let ret = unsafe {
dns_gethostbyname_addrtype(name, &mut ip, Some(cb), tx as *mut c_void, addr_type)
400d5924: 10e380 and a14, a3, a8
400d5927: eab8c1 l32r a12, 400d0408 <_stext+0x3e8> (400d59dc <llvm_sign_extension::resolve_raw::cb>)
400d592a: eab881 l32r a8, 400d040c <_stext+0x3ec> (400f0998 <dns_gethostbyname_addrtype>)
400d592d: 02ad mov.n a10, a2
400d592f: 06bd mov.n a11, a6
400d5931: 07dd mov.n a13, a7
400d5933: 0008e0 callx8 a8
400d5936: 087c movi.n a8, -16
//log::info!("ret: {ret}");
// -5 gets "sign-extended" into 251 when doing i8 as i32
// Codegen bug ? https://github.com/espressif/llvm-project/issues/102
match sign_extender(ret) {
400d5938: 291a87 beq a10, a8, 400d5965 <resolve_raw+0x61>
400d593b: b87c movi.n a8, -5
400d593d: 401a87 beq a10, a8, 400d5981 <resolve_raw+0x7d>
400d5940: 03fa56 bnez a10, 400d5983 <resolve_raw+0x7f>
sys::err_enum_t_ERR_OK => {
let mut tx = unsafe { Box::from_raw(tx) };
400d5943: 116172 s32i a7, a1, 68
400d5946: 18c162 addi a6, a1, 24
let _ = tx.send(Ok(ip));
400d5949: a64b addi.n a10, a6, 4
400d594b: 00c1b2 addi a11, a1, 0
400d594e: 8c1c movi.n a12, 24
400d5950: e9ce81 l32r a8, 400d0088 <_stext+0x68> (4000c2c8 <memcpy>)
400d5953: 0008e0 callx8 a8
400d5956: 080c movi.n a8, 0
400d5958: 184182 s8i a8, a1, 24
400d595b: eaad81 l32r a8, 400d0410 <_stext+0x3f0> (400d2cd4 <async_oneshot::sender::Sender<T>::send>)
400d595e: 07ad mov.n a10, a7
400d5960: 06bd mov.n a11, a6
400d5962: 0003c6 j 400d5975 <resolve_raw+0x71>
}
sys::err_enum_t_ERR_INPROGRESS => (),
sys::err_enum_t_ERR_ARG => {
let mut tx = unsafe { Box::from_raw(tx) };
400d5965: 116172 s32i a7, a1, 68
400d5968: 180c movi.n a8, 1
let _ = tx.send(Err(Error::InvalidName));
400d596a: 0c5182 s16i a8, a1, 24
400d596d: 18c1b2 addi a11, a1, 24
400d5970: eaa881 l32r a8, 400d0410 <_stext+0x3f0> (400d2cd4 <async_oneshot::sender::Sender<T>::send>)
400d5973: 07ad mov.n a10, a7
400d5975: 0008e0 callx8 a8
400d5978: 44c1a2 addi a10, a1, 68
400d597b: eaa681 l32r a8, 400d0414 <_stext+0x3f4> (400d4cb8 <core::ptr::drop_in_place<alloc::boxed::Box<async_oneshot::sender::Sender<core::result::Result<esp_idf_sys::bindings::_ip_addr,llvm_sign_extension::Error>>>>>)
400d597e: 0008e0 callx8 a8
v => {
log::error!("crash: {v}");
panic!("Unexpected result from DNS resolution system.");
}
}
}
400d5981: f01d retw.n
v => {
400d5983: d1a9 s32i.n a10, a1, 52
400d5985: e9f481 l32r a8, 400d0158 <_stext+0x138> (3ffb368c <log::MAX_LOG_LEVEL_FILTER>)
400d5988: 0888 l32i.n a8, a8, 0
log::error!("crash: {v}");
400d598a: 048816 beqz a8, 400d59d6 <resolve_raw+0xd2>
400d598d: 080c movi.n a8, 0
400d598f: a189 s32i.n a8, a1, 40
400d5991: 170c movi.n a7, 1
400d5993: 7179 s32i.n a7, a1, 28
400d5995: eaa081 l32r a8, 400d0418 <_stext+0x3f8> (3f4006a8 <_flash_rodata_start+0x588>)
400d5998: 6189 s32i.n a8, a1, 24
400d599a: 9179 s32i.n a7, a1, 36
400d599c: 38c182 addi a8, a1, 56
400d599f: 8189 s32i.n a8, a1, 32
400d59a1: ea9e81 l32r a8, 400d041c <_stext+0x3fc> (400e5534 <core::fmt::num::imp::<impl core::fmt::Display for i32>::fmt>)
400d59a4: f189 s32i.n a8, a1, 60
400d59a6: 34c182 addi a8, a1, 52
400d59a9: e189 s32i.n a8, a1, 56
400d59ab: ea9da1 l32r a10, 400d0420 <_stext+0x400> (3f4006b0 <_flash_rodata_start+0x590>)
400d59ae: e9ed81 l32r a8, 400d0164 <_stext+0x144> (4014d494 <log::__private_api::loc>)
400d59b1: 0008e0 callx8 a8
400d59b4: 1561a2 s32i a10, a1, 84
400d59b7: 381c movi.n a8, 19
400d59b9: 146182 s32i a8, a1, 80
400d59bc: e9eb91 l32r a9, 400d0168 <_stext+0x148> (3f400614 <_flash_rodata_start+0x4f4>)
400d59bf: 136192 s32i a9, a1, 76
400d59c2: 126182 s32i a8, a1, 72
400d59c5: 116192 s32i a9, a1, 68
400d59c8: 18c1a2 addi a10, a1, 24
400d59cb: 44c1c2 addi a12, a1, 68
400d59ce: e9e781 l32r a8, 400d016c <_stext+0x14c> (400dab54 <log::__private_api::log>)
400d59d1: 07bd mov.n a11, a7
400d59d3: 0008e0 callx8 a8
400d59d6: 0041f0 break 1, 15 |
I've updated my bug repo to use automatically generated bindings for the main FFI call, reducing the possibility of error on my previous manual function declaration of |
Hmm, I do not see problem here. After call to 400d592a: eab881 l32r a8, 400d040c <_stext+0x3ec> (400f0998 <dns_gethostbyname_addrtype>)
400d592d: 02ad mov.n a10, a2
400d592f: 06bd mov.n a11, a6
400d5931: 07dd mov.n a13, a7
400d5933: 0008e0 callx8 a8
400d5936: 087c movi.n a8, -16
//log::info!("ret: {ret}");
// -5 gets "sign-extended" into 251 when doing i8 as i32
// Codegen bug ? https://github.com/espressif/llvm-project/issues/102
match sign_extender(ret) {
400d5938: 291a87 beq a10, a8, 400d5965 <resolve_raw+0x61>
400d593b: b87c movi.n a8, -5 Looks like the problem is in how that value is passed from |
@gerekon Using the automatically generated binding made no change. |
Ahh, I see. Then need to look at |
|
It is my understanding that a2 gets loaded with 0x000000fb. Later, since it is a call8, the return value ends up in a10 and gets compared:
From my limited understanding of Xtensa assembly, it would seem that it gets compared against 0xfffffffb that gets loaded into a8. The comparison then ends up non-matching and chaos ensues. |
From section 8.1.4 of the Xtensa ISA manual:
So, according to this, the C code is returning the value as if it was a u8 (zero extended) and this would be the source of the problem. |
I should have quoted 8.1.5 instead, but it's the same principle:
|
Mr Obvious here. Just to mention that the ESP-IDF C code is (still) built with xtensa-GCC. Not sure where the ESP-IDF folks are with the effort to make ESP-IDF buildable and operable with clang. So we have two backends in play here: xtensa-GCC for the ESP-IDF C code, and xtensa-llvm for Rust. |
@ivmarkov Yes, so it seems that the xtensa-GCC folks should have a look at this. I'd venture a guess that both GCC and LLVM want to be sticking to the ISA manual calling convention with regards to signed byte value returns. Meanwhile, I'm looking at the C code right now to see if it somehow ends up with an accidental u8 return type when compiled. |
I checked that with Clang and it worked OK. But here we need to compile program calling C function from lib/obj compiled with GCC :-). |
Which should still work as both compilers need to agree on the calling convention. This is not just a problem for Rust "esp-idf"-based crates calling into GCC-compiled ESP-IDF. |
I remember running into the same difference between calling conventions between GCC and xcc (Cadence proprietary compiler) some time in 2016 — whether |
#include <stdint.h>
int8_t secret_code(void)
{
return -5;
} Compiling it with GCC gives:
If this gets patched out, would it be a flag day type thing ? You would essentially need to not mix code compiled with an older compiler. |
https://github.com/espressif/gcc does not have issues enabled, does anybody know the proper place to report a bug on xtensa-gcc ? |
@jothan No need to open the issue in espressif/gcc, (almost) all the relevant people from Espressif are already participating here. |
I'm in good hands then, thanks ! |
cc @Lapshin |
I agree, it looks like the code generated by the gcc does not match the ABI. It is working with the other code generated by the gcc because it does zero/sign extension every time it uses the values that are narrower than a register.
Unless that zero/sign extension on use is removed the code compiled in accordance with the ABI should still interoperate with the old code. This is somewhat related to the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116467 |
I'm relieved there is a way forward, and that the code generated by GCC extends values as needed and is therefore robust as to what it accepts. Should LLVM match this behaviour (extend on demand) ? Adding an instruction or two to ignore the upper bits seems like a decent compromise to avoid undefined behaviour. Is there a "normal" way to parametrize calling conventions for LLVM ? Should there be a separate calling convention to enable this behaviour ? |
Checklist
How often does this bug occurs?
always
Expected behavior
I'm doing some work with FFI using esp-idf-svc and related crates. I am calling dns_gethostbyname_addrtype (from the TCP-IP thread), the return value is an i8 for all intents and purposes. I am then matching this return code with constants such as esp-idf-sys::err_enum_t_ERR_INPROGRESS (-5) that are i32.
Actual behavior (suspected bug)
Using the latest esp-rs 1.81 toolchain, ret_i32 ends up with a value of 251, as if I did (ret as u8 as i32) instead of -5.
Converting an i8 to an i32 in a constant context gives the expected result. I have not yet managed to minify this problem into a smaller case.
Error logs or terminal output
No response
Steps to reproduce the behavior
(tentative, unknown if this affects all similar cases)
Project release version
esp-rs 1.81 toolchain
System architecture
Intel/AMD 64-bit (modern PC, older Mac)
Operating system
Linux
Operating system version
Debian Trixie
Shell
Bash
Additional context
No response
The text was updated successfully, but these errors were encountered: