Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bfs iterator for multiple source nodes #382

Merged
merged 16 commits into from
Jun 26, 2024
69 changes: 30 additions & 39 deletions src/iterators/bfs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,16 @@ end
BFSVertexIteratorState

`BFSVertexIteratorState` is a struct to hold the current state of iteration
in BFS which is needed for the `Base.iterate()` function. A queue is used to
keep track of the vertices which will be visited during BFS. Since the queue
can contains repetitions of already visited nodes, we also keep track of that
in a `BitVector` so that to skip those nodes.
in BFS which is needed for the `Base.iterate()` function. We use two vectors,
one for the current level nodes and one from the next level nodes to visit
the graph. Since the queue can contains repetitions of already visited nodes,
Tortar marked this conversation as resolved.
Show resolved Hide resolved
we also keep track of that in a `BitVector` so that to skip those nodes.
Tortar marked this conversation as resolved.
Show resolved Hide resolved
"""
mutable struct BFSVertexIteratorState
visited::BitVector
queue::Vector{Int}
neighbor_idx::Int
curr_level::Vector{Int}
next_level::Vector{Int}
node_idx::Int
n_visited::Int
end

Expand All @@ -58,14 +59,16 @@ First iteration to visit vertices in a graph using breadth-first search.
function Base.iterate(t::BFSIterator{<:Integer})
visited = falses(nv(t.graph))
visited[t.source] = true
return (t.source, BFSVertexIteratorState(visited, [t.source], 1, 1))
state = BFSVertexIteratorState(visited, [t.source], Int[], 0, 0)
return Base.iterate(t, state)
end

function Base.iterate(t::BFSIterator{<:AbstractArray})
visited = falses(nv(t.graph))
visited[first(t.source)] = true
state = BFSVertexIteratorState(visited, copy(t.source), 1, 1)
return (first(t.source), state)
curr_level = unique(s for s in t.source)
visited[curr_level] .= true
state = BFSVertexIteratorState(visited, curr_level, Int[], 0, 0)
return Base.iterate(t, state)
end

"""
Expand All @@ -74,36 +77,24 @@ end
Iterator to visit vertices in a graph using breadth-first search.
"""
function Base.iterate(t::BFSIterator, state::BFSVertexIteratorState)
graph, visited, queue = t.graph, state.visited, state.queue
while !isempty(queue)
if state.n_visited == nv(graph)
return nothing
end
# we visit the first node in the queue
node_start = first(queue)
if !visited[node_start]
visited[node_start] = true
state.n_visited += 1
return (node_start, state)
end
# which means we arrive here when the first node was visited.
neigh = outneighbors(graph, node_start)
if state.neighbor_idx <= length(neigh)
node = neigh[state.neighbor_idx]
# we update the idx of the neighbor we will visit,
# if it is already visited, we repeat
state.neighbor_idx += 1
if !visited[node]
push!(queue, node)
state.visited[node] = true
state.n_visited += 1
return (node, state)
# we fill nodes in this level
if state.node_idx == length(state.curr_level)
state.n_visited += length(state.curr_level)
state.n_visited == nv(t.graph) && return nothing
@inbounds for node in state.curr_level
Tortar marked this conversation as resolved.
Show resolved Hide resolved
for adj_node in outneighbors(t.graph, node)
if !state.visited[adj_node]
push!(state.next_level, adj_node)
state.visited[adj_node] = true
end
end
else
# when the first node and its neighbors are visited
# we remove the first node of the queue
popfirst!(queue)
state.neighbor_idx = 1
end
length(state.next_level) == 0 && return nothing
state.curr_level, state.next_level = state.next_level, empty!(state.curr_level)
state.node_idx = 0
end
# we visit all nodes in this level
state.node_idx += 1
@inbounds node = state.curr_level[state.node_idx]
Tortar marked this conversation as resolved.
Show resolved Hide resolved
return (node, state)
end
4 changes: 2 additions & 2 deletions test/iterators/bfs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
end
end
nodes_visited = collect(BFSIterator(g2, [1, 6]))
@test nodes_visited == [1, 2, 3, 6, 5, 7, 4]
@test nodes_visited == [1, 6, 2, 3, 5, 7, 4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these tests are sensitive to the ordering at each level, which is an implementation detail, can you rework them to make them independent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure if the ordering on each level could be considered an implementation detail, if this is it we could speed-up the running time by 2x by ordering the level nodes (for cache locality reasons I presume), but this is actually wrong because you want to follow strictly how bfs works, which is to look for each of the neighbors of a certain node in the previous level and just then go to the next node of the previous level

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you consider two graph structures where the neighbors of each vertex make up the same (mathematical) set but are stored in different orders, the BFS algorithm will return different things for node_visited. And they will both be correct. So our tests should be agnostic to that

Copy link
Contributor Author

@Tortar Tortar Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I mean I think you are right that we should be agnostic to that, just to clarify what I mean:

         0 
      /      \
    1         2
   /  \     /  \
  3    4    5    6

given a graph like this if we start at node 0 it is okay to have e.g. 0, 1, 2, 3, 4, 5, 6 or 0, 2, 1, 5, 6, 3, 4 but not 0, 1, 2, 5, 6, 3, 4 or 0, 1, 2, 5, 3, 4, 6.

But I think this is what you are actually saying in your last comment, so we need to have tests which are okay with all acceptable versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It's a bit of a pain so I'm not making it strictly necessary for the PR to be merged, but essentially in your example we would want to check that the returned vector has the form [.|..|....] where the first subset is {0}, the second is {1, 2} and the third is {3, 4, 5, 6} but in any order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your definition iterates sligthly differently than what I had in mind...but it's totally okay I wanted just to understand which one was preferable and indeed parallelizing the algorithm effectively would require to drop mine. So let's go with yours, I think that we can also get a 2x speed-up by sorting with that :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly biased by a recent bachelor project I supervised on... parallel BFS ;) check out the repo of my interns https://github.com/KassFlute/ParallelGraphs.jl for a multithreaded and even BLAS-ified version of BFS that is much faster than the one here! ping @KassFlute and @AntoineBut

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentle ping @Tortar if you want to adjust the tests so that I can merge while it's still fresh in our minds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be okay now 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

nodes_visited = collect(BFSIterator(g2, [8, 1, 6]))
@test nodes_visited == [8, 9, 1, 2, 3, 6, 5, 7, 4]
@test nodes_visited == [8, 1, 6, 9, 2, 3, 5, 7, 4]
end
Loading