Node predictions using GCN #441

MarcosZ · 2023-11-21T20:39:30Z

MarcosZ
Nov 21, 2023

I'm attempting to solve the traveling salesman problem using the standard GCN implementation (I have read some papers on message passing networks so I wanted to try this) by training against tour lengths of the optimal.

I'm generating my training data, much of which will have 0's in for certain labels because these actions - cities to visit - are unselectable due to the current tour selection.

For example, in a 10 node configuration, I might have an optimal tour (computed by networkx) of [1,2,3,4,5,6,7,8,9]. So for my first set of labels, I generate the optimal tour length from each of the 10 nodes as the starting point, resulting in one being more favorable than the rest. I then want to estimate the subtour lengths in my training data as I select each node. This is paired with an updated node attribute vector that keeps track of which index has been "visited". This vector starts out as np.ones(10) and then each selected vertex becomes 0.
My next training datapoint would be to generate subtour lengths of [2,3,4,5,6,7,8,9] since we would have already selected 1 in the previous state.
Ultimately the problem configuration isn't too important, but what is important is that in this next datapoint, you'll have a 0 for tour length for index 1 because it can't revisit itself. Or maybe even a really high value to denote that it should not be selected since we're trying to minimize this problem. The final dataset for a single graph solution will resemble a 2D vector as such

[num, num, num, num, num, num, num, num, num, num] # first label because all cities can be selected from
[0, num, num, num, num, num, num, num, num, num]
[0, 0, num, num, num, num, num, num, num, num]
[0, 0, 0, num, num, num, num, num, num, num]
[0, 0, 0, 0, num, num, num, num, num, num]
[0, 0, 0, 0, 0, num, num, num, num, num]
[0, 0, 0, 0, 0, 0, num, num, num, num]
[0, 0, 0, 0, 0, 0, 0, num, num, num]
[0, 0, 0, 0, 0, 0, 0, 0, num, num]
[0, 0, 0, 0, 0, 0, 0, 0, 0, num]

The question ends up becoming: if I have a GCN that is configured as such

model = GCN(n_labels=dataset.n_labels, output_activation='linear', dropout_rate=0.1, channels=64)
model.compile(
    optimizer=Adam(1e-3),
    loss=MSE,
    )

cutoff = int(dataset.n_graphs*.8)

# Important to set node_level to true for multiple graphs
loader_tr = DisjointLoader(dataset[:cutoff], node_level=True)
loader_va = DisjointLoader(dataset[cutoff:], node_level=True)

What is the best way to manage this intentionally "missing" data and correlate it with the node attributes that will ultimately have a notation that this index is 0 for a reason (that is has already been selected)? I would imagine with a linear activation this isn't possible because it would predict values linearly, so it can't just drop to 0 from one state to the next. However, is there a layer in Spektral that can fuse node attributes and multiply their values against the prediction vector? I'm worried that the network won't learn properly because of all the 0's in the dataset are not being mapped to their node attribute also being 0. I've trained a network against this data and it never learns to set the values to 0, but it does set their predictions lower than the rest. Surely this is because it's a linear activation.

Just curious about how to leverage node attributes properly in a non-linear node prediction problem. Maybe a different activation kernel?

Open to any suggestions and can post more information if not enough!

Thanks,
Marcos

Answered by MarcosZ

Dec 1, 2023

I realized that the real issue was that I was using complete graphs paired with a GCN. I retrained using incomplete graphs generated by networkx and used the GeneralGNN instead. This turned into much more reproducible results as I had read in literature.

View full answer

MarcosZ · 2023-11-28T11:14:06Z

MarcosZ
Nov 28, 2023
Author

I'm thinking that the reason I'm seeing strange behavior is because of the GCNConv layer. I'm now trying to replicate a GNN from a paper using the MessagePassing layer because it has access to sum aggregation. I'm wondering how I can tell the equivalent in GCNConv since it's built on a Conv layer?

0 replies

MarcosZ · 2023-12-01T06:11:53Z

MarcosZ
Dec 1, 2023
Author

I realized that the real issue was that I was using complete graphs paired with a GCN. I retrained using incomplete graphs generated by networkx and used the GeneralGNN instead. This turned into much more reproducible results as I had read in literature.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node predictions using GCN #441

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Node predictions using GCN #441

MarcosZ Nov 21, 2023

Replies: 2 comments

MarcosZ Nov 28, 2023 Author

MarcosZ Dec 1, 2023 Author

MarcosZ
Nov 21, 2023

MarcosZ
Nov 28, 2023
Author

MarcosZ
Dec 1, 2023
Author