Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dk-series distance #243

Merged
merged 12 commits into from
Aug 13, 2019
Merged

dk-series distance #243

merged 12 commits into from
Aug 13, 2019

Conversation

sdmccabe
Copy link
Collaborator

@sdmccabe sdmccabe commented Aug 2, 2019

First pass at the dk-series (2k-series) distance. Do not merge without
discussion.
This is essentially the DegreeDivergence, but instead of the
degree distribution it's the distribution of edges between degree-labelled
nodes. Some outstanding questions and concerns:

  1. This is not memory-efficient because it uses NxN dense matrices.
  2. Have I understood the dk-series correctly? That is, does the dk2_series
    function return something meaningful?
  3. I'm not sure if the dk-series is defined for directed graphs. For simplicity
    I have coerced to undirected graphs.

First pass at the dk-series (2k-series) distance. **Do not merge without
discussion.** This is essentially the `DegreeDivergence`, but instead of the
degree distribution it's the distribution of edges between degree-labelled
nodes. Some outstanding questions and concerns:

1. This is not memory-efficient because it uses NxN dense matrices.
2. Have I understood the dk-series correctly? That is, does the `dk2_series`
function return something meaningful?
3. I'm not sure if the dk-series is defined for directed graphs. For simplicity
I have coerced to undirected graphs.
@sdmccabe sdmccabe marked this pull request as ready for review August 12, 2019 15:52
netrd/distance/__init__.py Outdated Show resolved Hide resolved

"""

def dk2_series(G):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being pedantic: can we put this outside the distance class? This is a prime candidate for refactoring if/when #174 ever becomes a thing.

netrd/distance/dk2_distance.py Outdated Show resolved Hide resolved
D1 = np.zeros((N, N))
D2 = np.zeros((N, N))

for (i, j), k in G1_dk.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it to look into making all of these use sparse matrices? Pretty sure COO matrices would speed this up by a bunch. Even with small N, COO matrices would avoid this loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew you would call me out on this! I'll look into it. How would it avoid the loop, though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uuuuh I guess it wouldn't avoid it, but at least we would be off-loading it to scipy, and I trust them to profile their loops.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's fine. I think it should be pretty straightforward to change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up using DOK matrices instead because it seemed most convenient. I'm open to reconsidering this if people have strong opinions about sparse matrix formats.

@leotrs
Copy link
Collaborator

leotrs commented Aug 12, 2019

Awesome! Just curious: did you find a paper that uses this? If so, please add the reference at the top. If not, please add some general reference to dk series at the top.

(Curious how you decided to use JSD?)

@sdmccabe
Copy link
Collaborator Author

There's no paper (beside the dk-series papers), it's part of the graphend project.

@sdmccabe
Copy link
Collaborator Author

(Curious how you decided to use JSD?)

Gonna tag @jkbren in on this. He might also have thoughts on names.

@sdmccabe
Copy link
Collaborator Author

I'm cool with merging. The outstanding issue is the name. I see a couple possibilities here:

  • dk2
  • dk2Distance
  • dk2Series

The second is fine with me; the third might be better? I don't like the first.

@sdmccabe
Copy link
Collaborator Author

Other names raised:

  • dk2Divergence
  • JointDegreeDivergence

Another possibility is to make it a general dkSeries distance, where if k==1 call DegreeDivergence, if k==2 run this code, and for anything else raise a NotimplementedError.

@leotrs
Copy link
Collaborator

leotrs commented Aug 12, 2019

Let's do the last thing!

@sdmccabe
Copy link
Collaborator Author

Should we merge DegreeDivergence into this, or just call it? My instinct is the latter, since people unfamiliar with the dk series might be confused.

@leotrs
Copy link
Collaborator

leotrs commented Aug 12, 2019 via email

@sdmccabe sdmccabe changed the title dk2-series distance dk-series distance Aug 12, 2019
@sdmccabe
Copy link
Collaborator Author

Can someone double-check the docs to make sure it all still reads right under the newly expanded goal of the module? Also, I ran the tests with k==1 and k==2 as the default argument and both passed.

@sdmccabe
Copy link
Collaborator Author

If someone other than me ends up merging this, please squash and merge, this PR wound up being a ton of commits.

@leotrs leotrs merged commit 4d9c927 into netsiphd:master Aug 13, 2019
@sdmccabe sdmccabe deleted the dk-dist branch August 13, 2019 13:43
@sdmccabe
Copy link
Collaborator Author

Leaving for my own reference: I will try to add some doc tweaks to clarify that d==1 is explicitly calling DegreeDivergence under the hood, and write a test that covers both values of d (#202).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants