General doubts of node prediction model #352

ecastillot · 2022-03-16T15:03:40Z

ecastillot
Mar 16, 2022

Hi everyone,
I am here to share different doubts about my node prediction model. First, I am going to introduce the problem. At the end I'll ask the questions.

I present the structure of my graph. Each node has 360 features that corresponding to a time series of 360 samples between 0 and 1 amplitudes. I will connect each node by a specific condition. The important thing is that there are three type of edges. And each edge is weighted according to the inverse of the distance between two stations that recorded the time series. By the other hand, each node only could be linked if it corresponds to a time series of the same day. Finally, the idea is predict a label 360 samples in length between 0 and 1 amplitudes (other time series).

The above was implemented as follows. Let "i" the index to reference the day "i". The graph by day G_i is designed by [x_i,e_i,a_i]. Where:

x_i = (n_nodes,360)_i ---->360 features for each node that was recorded on day "i".
e_i = (n_links,3)_i ---> 3 features for each edge to distinguish between the edge type.
a_i = (n_nodes,n_nodes)_i -> The adjacency matrix is weighted by the inverse of distance
y_i = (n_nodes,360)_i -> 360 is the lenght of the label.

For use in Spektral, I create my customs dataset as recommend here. The read method returns a list of graphs by day.

data = MyDataset(...,transforms=NormalizeAdj(symmetric=True)))
F = data.n_node_features # Dimension of node features ->360
S = data.n_edge_features # Dimension of edge features -> 3
n_out = data.n_labels # Dimension of the target -> 360

Nodes of different days must not exchange information. Therefore, I am going to use DisjointLoader to load disconnected graphs.

data_tr = data[idx_tr] #where idx_tr is the index list of the different graph
loader_tr = DisjointLoader(data_tr,...,node_level=True)

To build the model. I first think about tu use traditional NNs in the node features to extract high level representations. Such as Conv1D, NiN and LSTM. After, I get new features in the nodes to use GNNs.

`
class MyModel(Model):
def init(self):
super().init()

def call(self, inputs):
    x, a, e, i = inputs  ### x.shape: (N_nodes,360)   a.shape:(N_nodes,N_nodes)  
                                ### e.shape (N_edges,3)  i (N_nodes)
                                ### I think N_nodes is the sum of nodes of different graphs. It would be a problem. 
                                       How I control N_nodes to use nodes of the same day?
    x = tf.expand_dims(x, axis=1)  ## (N_nodes, 1, 360)
    x = Conv1D(....)(x)  # CNN
    x = MaxPooling1D()(x)
    x = Bidirectional(LSTM(...),...)(x) # RNN
    x = Conv1D(filters, 1 ,..)(x)  #NiN
    x = BatchNormalization()(x)
    x  = Flatten()(x)  # The new features of the nodes

    x = ECCConv(32, activation="relu")([x, a, e])
    x = BatchNormalization()(x)
    x = Dropout(0.5)(x)
    x = GATConv((n_out, activation="sigmoid")([x, a])
    return x

model = MyModel
optimizer = Adam(lr=learning_rate)
model.compile(optimizer=optimizer, loss=BinaryCrossentropy(),
metrics=["binary_accuracy"])
model.fit(loader_tr.load(), steps_per_epoch=loader_tr.steps_per_epoch,
epochs=100)
`
When I run the model no errors happen but I'm not sure what I'm doing.

Doubts:

When I use x, a, e, i = inputs , x,a,e come from disconnected graphs? . The question is because after the traditional NN blocks, I pass them to ECCConv. Therefore, I don't know if disconnected graphs are sharing information. It's what I don't want. How do I know that the new embedding will feed by connected nodes of the corresponding graph (i.e., nodes of the same day)?. I mean, nowhere am I using the index i that tells me which graph it corresponds to.
Am I connecting traditional NNs and GNNs well?. I mean, Is "tf.expand_dims(x, axis=1)" the correct way to begin?. 1D convolution on sequences expects a 3D input. I am not sure because the new shape is (N_nodes, 1, 360). why not (1,N_nodes,360)?
I don't know if ECCConv will be a good layer because it computes learning weights to predict edge features. But what happen with the adjacency matrix weights. In my graph these weights are important too. I don't kown if this weight are being considered.
In the prediction I also use Disjoint loader, loader_te = DisjointLoader(data_te,,node_level=True). To predict, I use
predictions = model.predict(loader_te.load(), steps=loader_te.steps_per_epoch). How can I get the predictions by the corresponding day graph.

Thank you for your support
Regards.

danielegrattarola · 2022-03-16T15:52:50Z

danielegrattarola
Mar 16, 2022
Maintainer

Hey,

thanks for this very detailed question, it makes it really easy to answer.

So to answer point by point:

DisjointLoader will take the graphs in a batch (list) and combine them together so that they are represented by their disjoint union. In this way, the adjacency matrix a will only have non-zero entries at position i, j only if BOTH i and j belong to the same graph. You can print out the matrix and see that it is block-diagonal (see example here). This also means that any message-passing will happen between nodes that belong to the same graphs, there is no exchange of information between different graphs because there is no edge i,k where i is in one graph and k in another.
If you don't trust the math or the implementation, to verify that this is actually happening you can try to create custom inputs (for example, a graph with all features equal to 1 and one with all features equal to 1000) and see the effect of convolution on the two graphs. It should be pretty evident.
It also makes sense that you are not using the batch index i anywhere since this is only required when doing pooling or global pooling. Here you are computing a node-to-node function so there's no need for the batch index, the separation between graphs is guaranteed by the adjacency matrix.
I would say no, the dimension expansion should be on axis=-1. Conv1d expects inputs of shape (batch, steps, input_dim) and returns (batch, new_steps, filters) (see here).
However, you are passing to Conv1d a vector of shape (batch, 1, 360) which means "a batch of time series of length 1 with 360 features" (note that batch==n_nodes as far as we're concerned). However, from your description, you have "a batch of time series of length 360 with 1-dimensional features". So the correct shape should be (batch, 360, 1).
You don't want (1, n_nodes, 360) because the meaning there would be "one time series of length n_nodes with 360 features"
Regarding ECC:

it computes learning weights to predict edge features

No, it predicts weights to transform node features. The edge features remain unchanged.
In other terms, while typically a message from node i to node j is computed as W @ x_i, ECC computes W is a function of the edge, eg., W(i, j) = f(e_ij) and the message becomes W(i, j) @ x_i.

And you are correct, ECC doesn't consider the edge weights of the adjacency matrix. if these are important, you should consider adding them as an extra edge attribute so that overall data.n_edge_features == 4.
If you need individual graph predictions, it's easier to iterate over the loader manually:

loader_te = DisjointLoader(data_te, node_level=True, epochs=1, batch_size=1)
for data in loader_te:
   inputs, targets = data
   pred = model(data)
   # Do whatever with `pred`

Let me know if you need more help!

0 replies

ecastillot · 2022-03-16T18:47:08Z

ecastillot
Mar 16, 2022
Author

Hey @danielegrattarola . Thanks for your quick and detailed reply.

I understood almost everything. I only don't understand why ECC doesn't consider the weights of the adjacency matrix. Therefore, how can the model understand the neighbours importance of one node?.

If I understand what you explained to me, ECC compute the message from node i to node j as W(i, j) @ x_i, where W(i, j) have e_ij features as parameters. But, it should use the adjacency matrix to understand what are the neighbours (doesn't matter if is weighted or not). If it is weighted, therefore, it could understand the neighbors it should pay more attention to. Am I right?. In this sense, it could aggregate the neighbour messages appropiattely.

1 reply

danielegrattarola Mar 16, 2022
Maintainer

You are right that in principle it could be able to do it, it's just that this particular formulation from the paper does not use the weights of the adjacency matrix anywhere. So every neighbor has the same importance.

If you want to include that information, you can either re-implement the layer to compute a weighted aggregation of the messages using the weights of the adjacency matrix, or you can add the weights as an edge feature like I was saying above.

There is no "correct" way of doing things so if you need a particular feature it's usually OK to just implement it how you want it.

ecastillot · 2022-03-17T12:33:45Z

ecastillot
Mar 17, 2022
Author

Thanks @danielegrattarola. It is much clearer to me.
Therefore, as you suggest, I am going to add the weights as an edge features.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General doubts of node prediction model #352

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

General doubts of node prediction model #352

ecastillot Mar 16, 2022

Replies: 3 comments · 1 reply

danielegrattarola Mar 16, 2022 Maintainer

ecastillot Mar 16, 2022 Author

danielegrattarola Mar 16, 2022 Maintainer

ecastillot Mar 17, 2022 Author

ecastillot
Mar 16, 2022

Replies: 3 comments 1 reply

danielegrattarola
Mar 16, 2022
Maintainer

ecastillot
Mar 16, 2022
Author

danielegrattarola Mar 16, 2022
Maintainer

ecastillot
Mar 17, 2022
Author