Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract and parse unexporsed attributes #459

Closed
wants to merge 1 commit into from

Conversation

flier
Copy link
Contributor

@flier flier commented Jan 29, 2017

As you know, C/C++ support a lot of extentions base on the attributes. I'm working on a patch to extract and parse those unexporsed attributes in LLVM.

Until now, it could extract attribute with Vec<cexpr::token::Token>, but only parse and translate unused, cold and deprecated attribute, maybe support aligned later when RFC 1358 ready.

For example

struct foo { int x[2] __attribute__ ((unused, aligned (8), deprecated)); };

Will be translate to

#[repr(C)]
#[derive(Debug, Copy)]
pub struct foo {
    #[deprecated]
    #[doc = "__attribute__(aligned(8))"] // should be #[repr(align=8)]
    #[allow(dead_code)]
    pub x: [::std::os::raw::c_int; 2usize],
}

Besides, I prefer to add a global helper functions for constructor and destructor attributes, seems Rust doesn't care those, and cause some library doesn't work very well

@highfive
Copy link

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

Copy link
Contributor

@emilio emilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions regarding this change:

  • What's the actual advantage (aside from documentation) right now with this patch? Do you plan to expand it? What's the whole purpose of tracking all these attributes that rust doesn't understand?

  • Does this work when the attribute is the expansion of a macro? (I believe it should, but just to be sure). Please test this use case too.

  • I usually use tokenization as a last resort approach to anything. It uses to be a relatively poor solution, and indeed tokenization was buggy and returns off-by-one values in old versions of libclang (I fixed this in llvm-mirror/clang@3b61b92). Why using it here instead of adding APIs to libclang for the stuff we can handle? Generally libclang is really receptive with patches to expand it.

@@ -49,6 +54,11 @@ impl Enum {
&self.variants
}

/// The special attributes of enumeration
pub fn attributes(&self) -> &Vec<Attribute> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: &[Attribute], here and everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • My original intent is to solve aligned and constructor/destructor attributes, because when I use bindgen with some libraries, I have to patch the generated binding files. So, I think there should be a more general mechanism to solve those Unexposed Attribute.
    For the aligned struct, I realized there are not a perfect solution for Rust before RFC 1358, but I will submit a workaound base on padding fields later. (in same pull request will be ok?)
    Those document attributes just an example about the patch, because I need your review and comments about the mechanism itself, before I submit all those changes.

  • The expansion of a macro doesn't work, because the parser failed to resolve __attribute__

#define __rte_packed __attribute__((__packed__))

struct rte_memseg {
	uint32_t nchannel;          /**< Number of channels. */
	uint32_t nrank;             /**< Number of ranks. */
} __rte_packed;

DEBUG:bindgen::ir::context: builtin_or_resolved_ty: Type(, kind: Invalid, decl: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None), canon: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None)), Some(Cursor(__rte_packed kind: macro definition, loc: tests/headers/attributes.hpp:33:9, usr: Some("c:attributes.hpp@704@macro@__rte_packed"))), None
DEBUG:bindgen::ir::context: Not resolved, maybe builtin?
DEBUG:bindgen::ir::context: builtin_or_resolved_ty: Type(, kind: Invalid, decl: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None), canon: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None)), Some(Cursor(__rte_packed kind: macro definition, loc: tests/headers/attributes.hpp:33:9, usr: Some("c:attributes.hpp@704@macro@__rte_packed"))), None
DEBUG:bindgen::ir::context: Not resolved, maybe builtin?
DEBUG:bindgen::ir::ty: from_clang_ty: ItemId(515), ty: Type(, kind: Invalid, decl: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None), canon: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None)), loc: Some(Cursor(__rte_packed kind: macro definition, loc: tests/headers/attributes.hpp:33:9, usr: Some("c:attributes.hpp@704@macro@__rte_packed")))
DEBUG:bindgen::ir::ty: currently_parsed_types: []
WARN:bindgen::ir::ty: invalid type Type(, kind: Invalid, decl: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None), canon: Cursor( kind: NoDeclFound, loc: builtin definitions, usr: None))
DEBUG:bindgen::ir::item: Unhandled cursor kind CXCursorKind(502): Cursor(__rte_packed kind: macro expansion, loc: tests/headers/attributes.hpp:60:3, usr: None)

  • I choose to use cexpr because some attribute support use expression as argument
struct align {
	int foo;
} __attribute__((aligned(2 * sizeof(long long))));

I plan to use cexpr to evalute it later, like clang did

static void handleAlignedAttr(Sema &S, Decl *D, const AttributeList &Attr) {
  // check the attribute arguments.
  if (Attr.getNumArgs() > 1) {
    S.Diag(Attr.getLoc(), diag::err_attribute_wrong_number_arguments)
      << Attr.getName() << 1;
    return;
  }

  if (Attr.getNumArgs() == 0) {
    D->addAttr(::new (S.Context) AlignedAttr(Attr.getRange(), S.Context,
               true, nullptr, Attr.getAttributeSpellingListIndex()));
    return;
  }

  Expr *E = Attr.getArgAsExpr(0);
  if (Attr.isPackExpansion() && !E->containsUnexpandedParameterPack()) {
    S.Diag(Attr.getEllipsisLoc(),
           diag::err_pack_expansion_without_parameter_packs);
    return;
  }

  if (!Attr.isPackExpansion() && S.DiagnoseUnexpandedParameterPack(E))
    return;

  if (E->isValueDependent()) {
    if (const auto *TND = dyn_cast<TypedefNameDecl>(D)) {
      if (!TND->getUnderlyingType()->isDependentType()) {
        S.Diag(Attr.getLoc(), diag::err_alignment_dependent_typedef_name)
            << E->getSourceRange();
        return;
      }
    }
  }

  S.AddAlignedAttr(Attr.getRange(), D, E, Attr.getAttributeSpellingListIndex(),
                   Attr.isPackExpansion());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we do have the information of the alignment already (it's on the type's layout), why do we need to rely on parsing the attribute ourselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we have would be the equivalent of calling alignof(T), which I think it's all you need for this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, for the the struct layout, alignof(T) have enough information for most of scenes

On the other hand, sometime we need field layout beyond its type.

For example, when parse a packet header with fixed size prefix,

struct header
{
    char proto;
    unsigned int size __attribute__ ((packed));
    unsigned char data[] __attribute__ ((aligned(8)));
} __attribute__ ((aligned));

We hard to generate the right padding bytes without knowledge of the field layout.

Another option seems could use clang_Cursor_getOffsetOfField and add a offset property to the field. (I'm not familiar with LLVM API, just found it minutes ago)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, copied diff was cut, here is the full diff

diff --git a/src/clang.rs b/src/clang.rs
index 9cf5143..9af4b5c 100644
--- a/src/clang.rs
+++ b/src/clang.rs
@@ -475,6 +475,17 @@ impl Cursor {
         unsafe { clang_CXXField_isMutable(self.x) != 0 }
     }

+    /// Get the offset of the field represented by the Cursor.
+    pub fn offset_of_field(&self) -> Result<usize, LayoutError> {
+        let offset = unsafe { clang_Cursor_getOffsetOfField(self.x) };
+
+        if offset < 0 {
+            Err(LayoutError::from(offset as i32))
+        } else {
+            Ok(offset as usize)
+        }
+    }
+
     /// Is this cursor's referent a member function that is declared `static`?
     pub fn method_is_static(&self) -> bool {
         unsafe { clang_CXXMethod_isStatic(self.x) != 0 }
diff --git a/src/ir/comp.rs b/src/ir/comp.rs
index bad661d..e511b51 100644
--- a/src/ir/comp.rs
+++ b/src/ir/comp.rs
@@ -102,6 +102,8 @@ pub struct Field {
     bitfield: Option<u32>,
     /// If the C++ field is marked as `mutable`
     mutable: bool,
+    /// The offset of the field
+    offset: usize,
 }

 impl Field {
@@ -111,7 +113,8 @@ impl Field {
                comment: Option<String>,
                annotations: Option<Annotations>,
                bitfield: Option<u32>,
-               mutable: bool)
+               mutable: bool,
+               offset: usize)
                -> Field {
         Field {
             name: name,
@@ -120,6 +123,7 @@ impl Field {
             annotations: annotations.unwrap_or_default(),
             bitfield: bitfield,
             mutable: mutable,
+            offset: offset,
         }
     }

@@ -152,6 +156,11 @@ impl Field {
     pub fn annotations(&self) -> &Annotations {
         &self.annotations
     }
+
+    /// The offset of the field
+    pub fn offset(&self) -> usize {
+        self.offset
+    }
 }

 impl CanDeriveDebug for Field {
@@ -532,7 +541,7 @@ impl CompInfo {
         cursor.visit(|cur| {
             if cur.kind() != CXCursor_FieldDecl {
                 if let Some((ty, _)) = maybe_anonymous_struct_field {
-                    let field = Field::new(None, ty, None, None, None, false);
+                    let field = Field::new(None, ty, None, None, None, false, 0);
                     ci.fields.push(field);
                 }
                 maybe_anonymous_struct_field = None;
@@ -555,7 +564,8 @@ impl CompInfo {
                                                        None,
                                                        None,
                                                        None,
-                                                       false);
+                                                       false,
+                                                       0);
                                 ci.fields.push(field);
                             }
                         }
@@ -572,6 +582,7 @@ impl CompInfo {
                     let annotations = Annotations::new(&cur);
                     let name = cur.spelling();
                     let is_mutable = cursor.is_mutable_field();
+                    let offset = cur.offset_of_field().unwrap(); // TODO

                     // Name can be empty if there are bitfields, for example,
                     // see tests/headers/struct_with_bitfields.h
@@ -585,7 +596,8 @@ impl CompInfo {
                                            comment,
                                            annotations,
                                            bit_width,
-                                           is_mutable);
+                                           is_mutable,
+                                           offset);
                     ci.fields.push(field);

                     // No we look for things like attributes and stuff.
@@ -748,7 +760,7 @@ impl CompInfo {
         });

         if let Some((ty, _)) = maybe_anonymous_struct_field {
-            let field = Field::new(None, ty, None, None, None, false);
+            let field = Field::new(None, ty, None, None, None, false, 0);
             ci.fields.push(field);
         }

Copy link
Contributor Author

@flier flier Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang_Type_getOffsetOf and clang_Cursor_getOffsetOfField seems use similar implementation

long long clang_Type_getOffsetOf(CXType PT, const char *S) {
  // check that PT is not incomplete/dependent
  CXCursor PC = clang_getTypeDeclaration(PT);
  long long Error = validateFieldParentType(PC,PT);
  if (Error < 0)
    return Error;
  if (!S)
    return CXTypeLayoutError_InvalidFieldName;
  // lookup field
  ASTContext &Ctx = cxtu::getASTUnit(GetTU(PT))->getASTContext();
  IdentifierInfo *II = &Ctx.Idents.get(S);
  DeclarationName FieldName(II);
  const RecordDecl *RD =
        dyn_cast_or_null<RecordDecl>(cxcursor::getCursorDecl(PC));
  // verified in validateFieldParentType
  RD = RD->getDefinition();
  RecordDecl::lookup_result Res = RD->lookup(FieldName);
  // If a field of the parent record is incomplete, lookup will fail.
  // and we would return InvalidFieldName instead of Incomplete.
  // But this erroneous results does protects again a hidden assertion failure
  // in the RecordLayoutBuilder
  if (Res.size() != 1)
    return CXTypeLayoutError_InvalidFieldName;
  if (const FieldDecl *FD = dyn_cast<FieldDecl>(Res.front()))
    return Ctx.getFieldOffset(FD);
  if (const IndirectFieldDecl *IFD = dyn_cast<IndirectFieldDecl>(Res.front()))
    return Ctx.getFieldOffset(IFD);
  // we don't want any other Decl Type.
  return CXTypeLayoutError_InvalidFieldName;
}

long long clang_Cursor_getOffsetOfField(CXCursor C) {
  if (clang_isDeclaration(C.kind)) {
    // we need to validate the parent type
    CXCursor PC = clang_getCursorSemanticParent(C);
    CXType PT = clang_getCursorType(PC);
    long long Error = validateFieldParentType(PC,PT);
    if (Error < 0)
      return Error;
    // proceed with the offset calculation
    const Decl *D = cxcursor::getCursorDecl(C);
    ASTContext &Ctx = cxcursor::getCursorContext(C);
    if (const FieldDecl *FD = dyn_cast_or_null<FieldDecl>(D))
      return Ctx.getFieldOffset(FD);
    if (const IndirectFieldDecl *IFD = dyn_cast_or_null<IndirectFieldDecl>(D))
      return Ctx.getFieldOffset(IFD);
  }
  return -1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the explanation is that getFieldOffset returns the offset in bits, according to grepping LLVM's source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will submit another pull request base on LLVM API later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment solution has submitted #468 base on the LLVM API, if that PR could be merged, this one could be simplify, only for document improvement.

@@ -1,3 +1,6 @@
#include <stdint.h>
#include <stddef.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't include system headers in tests since bindgen will produce different output depending on the content of them, so tests will never pass. If you definitely need them, add another file to the bindgen-integration crate.

@bors-servo
Copy link

☔ The latest upstream changes (presumably #443) made this pull request unmergeable. Please resolve the merge conflicts.

@bors-servo
Copy link

☔ The latest upstream changes (presumably #471) made this pull request unmergeable. Please resolve the merge conflicts.

@emilio
Copy link
Contributor

emilio commented Feb 13, 2017

@flier is this still needed/wanted? As I said, I'd like to know a bit more about the rationale for this, since apart from alignment which was fixed in #468 I don't see an immediate benefit from this.

@flier
Copy link
Contributor Author

flier commented Feb 14, 2017

@flier is this still needed/wanted? As I said, I'd like to know a bit more about the rationale for this, since apart from alignment which was fixed in #468 I don't see an immediate benefit from this.

@emilio Yes, my major requirement is to solve struct alignment issue, since #468 has been merged, this PR only provides some document improvements. So, its depend on your decision.

@emilio
Copy link
Contributor

emilio commented May 24, 2017

I don't think it's worth to have this open right now.

I see a use case for it combined with the ParseCallbacks stuff (for example, you could use them to generate actual rust annotations, like #[must_use], or combined with custom derive...).

@upsuper, do you think this would be a nice way to fix the addreffed stuff? If we add a custom attribute to the functions/parameters, we can audit it.

This would need a pretty big rebase though, so closing for now.

@emilio emilio closed this May 24, 2017
@upsuper
Copy link
Contributor

upsuper commented May 24, 2017

I think it would be helpful to add it to ParseCallbacks somehow, and it should probably take the #[must_use] case and this alignment case into consider. (Although we are probably not going to use addrefed stuff from bindings due to the ABI issue...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants