add GAT layer #3

rkurchin · 2020-08-24T14:15:19Z

I'd like to implement a graph attention mechanism a la this paper.

rkurchin · 2020-08-24T15:19:55Z

Question: Is there a way to do this for input graphs with variable size (number of nodes)?

rkurchin · 2020-08-24T17:13:30Z

it seems that they do so in this paper, CODE HERE... I had trouble figuring this out (they also have the extra complication of this clustering thing) but this seems to be a way to do it. My best summary of the procedure:

NOTATION:

attention heads: h
input features for layer: l
output features for layer: m

LAYER PROPERTIES

a_l, a_r (splitting concatenated thing from eq3 in Velickovic into two separate parts): [1 x h x m]
linear dense layer W: [l x mh]

LAYER INPUTS:

graph with n nodes
feature matrix feat_in: [n x l]

LAYER ACTIONS (my commentary in italics):

apply dense layer: feat = feat_in x W: [n x l] x [l x mh] => [n x mh], reshape to [n x h x m] (this is the usual dealio)
compute attention coefficients: elementwise multiply of feat * a_l and a_r, yielding e_l and e_r: [n x h x m] * [1 x h x m] => [n x h x m] (for an undirected graph, l and r are the same, but there are these updates of graph.srcdata and graph.dstdata which I'm still trying to puzzle through whether that breaks the symmetry, I don't think so though...)
sum this along final dimension and unsqueeze: [n x h x 1] for each e_l and e_r (why...?)
compute edge attention: calculate message on each edge as sum of e_l and e_r (again, for an undirected graph we've just doubled it)
apply activation to these messages, then softmax the results, still store as messages on edges (fine, just a normalization basically)
pass messages along edges to each node, multiply features (element-wise) by attention: [n x h x m] * [n x h x m], sum along first dimension at each node which I think is h, so end result is [n x m] as desired (so much element-wise multiplication...)
apply output activation

What I wonder...

Does this end up being different from a convolution? If so, how?

Since the attention is trainable per-head and per-output-feature, what are we really doing? I think the message passing should somehow encode structure, since a node with more neighbors will have a larger sum of input stuff, but how would the attention tensors ever "learn" about that in order to weight things appropriately?

rkurchin · 2020-08-31T15:56:13Z

Summary/my takeaways of chat with Shaojie about this just now:

can think of this as a fully-trainable, weighted convolution: how much prior information about graph structure/connectivity is assumed is totally up to the user
because of this, ALL positional information MUST be encoded in the feature vector (or as strict conditions/masking on attentional weights) if it is important at all
there also must be a consistent indexing scheme for nodes such that if the same structure with the same node labels is fed in, an identical set of index labels comes out
I don't think this is necessarily able to do exactly what I had envisioned "out-of-the-box," namely discerning patterns about different local structures (i.e. connectivity patterns) in a graph; or at least not without combining it with something else like EigenPooling or graph clustering and subsequent labeling of subgraphs/supernodes with information about their structure
NB that training of self-attention can be unstable/sensitive
definitely a powerful technique that could add a lot, but a lot of things would need to be updated about how graph-building and featurization are done, so probably not a short-term priority

thazhemadam · 2021-10-04T14:25:40Z

Bump. now that we've made some pretty significant changes wrt we do graph building and featurization, it's probably a good time to discuss how we could go about this?

rkurchin · 2021-10-04T21:23:24Z

I would be very psyched for this to be implemented. Realistically, I won't personally have the bandwidth in the super near future (i.e. the next ~4-6 weeks), but certainly happy to discuss and/or sketch out in more detail for whoever ends up taking a crack at it, be that future me, current/future you, or someone else entirely! :)

jonathanmfung · 2022-04-16T22:38:09Z

Here is another paper: "Crystal graph attention networks for the prediction of stable materials"

Paper: doi:10.1126/sciadv.abi7948
Python Code: hyllios/CGAT
Data: materialscloud:2021.222

I believe the novelty here is "replacing the precise bond distances with embeddings of graph distances", but cmiiw.

rkurchin added the enhancement New feature or request label Aug 24, 2020

rkurchin self-assigned this Aug 24, 2020

rkurchin added longterm longer-term idea, likely won't be implemented in the near term and removed enhancement New feature or request labels Aug 31, 2020

rkurchin mentioned this issue Oct 30, 2020

PAGTNs #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GAT layer #3

add GAT layer #3

rkurchin commented Aug 24, 2020

rkurchin commented Aug 24, 2020

rkurchin commented Aug 24, 2020 •

edited

Loading

rkurchin commented Aug 31, 2020

thazhemadam commented Oct 4, 2021

rkurchin commented Oct 4, 2021

jonathanmfung commented Apr 16, 2022

add GAT layer #3

add GAT layer #3

Comments

rkurchin commented Aug 24, 2020

rkurchin commented Aug 24, 2020

rkurchin commented Aug 24, 2020 • edited Loading

NOTATION:

LAYER PROPERTIES

LAYER INPUTS:

LAYER ACTIONS (my commentary in italics):

What I wonder...

rkurchin commented Aug 31, 2020

thazhemadam commented Oct 4, 2021

rkurchin commented Oct 4, 2021

jonathanmfung commented Apr 16, 2022

rkurchin commented Aug 24, 2020 •

edited

Loading