Quick Find / Quick Union (Connected Nodes)

Setup

This week I learned about the "Quick Find" or "Quick Union" algorithm. Imagine an NxN grid of nodes, some of which are connected by lines. A connection can be interpreted as accessibility: if two nodes are connected, you can get from one to the other. Every node is accessible to itself: to get where you already are, stay there. Also, If you can get from A to B, you can go back from B to A. And if you can get from A to B and from B to C, then you can get from A to C. As a consequence, the connection between nodes divides the grid into regions of mutually accessible nodes. You can travel from any node in a given region to any other node in that region -- but not to any nodes outside that region (exercise to reader -- proof by contradiction).

The problem has two parts. First, find a way to represent this grid structure and the accessibility relation; second, use your schema to efficiently calculate whether two given nodes are accessible to each other.

Solution

A. Representation

The data structure can be represented elegantly as an array. Each index identifies a node, and the value of each index identifies that node's nearest connection. I said "nearest," but the designation is arbitrary. It falls out of the definition of accessibility that if a node is connected to one node in a network, it is connected to every other node in the network (hence "network"). Identifying a "nearest" connection is just a convenient way of getting a network up and running. The rest falls out of the method we use to join two nodes together.

Joining

When I issue the command to connect two nodes, I can go ahead and identify one or the other as the closest relative. For example, if I want to connect nodes 3 and 5, it doesn't matter whether I say 5 is nearest to 3 or 3 is nearest to 5. At the end of the operation, both will be nearest to the same thing. To clarify even further -- in the beginning, each node is nearest to itself. Then I connect two nodes -- again, say 3 and 5. Before the operation, 3 is nearest itself, 3, and 5, 5. After the operation, either 3 will be nearest 5 or 5 will be nearest 3, but both will be nearest the same thing. That's the connection. (Another way to think of it: both are assigned to the same network, and the network is arbitrarily identified with one or the other.)

But now I deem fit to bring a third item into the loop. I've connected 3 and 5; say now I connect 5 and 7. It won't be enough to set 5 nearest to 7 (or 7 nearest to 5). That will leave 3 off by itself. The problem that presents itself is this: to preserve the relation of accessibility on connection, I need to connect not only the two nodes in question, but everything connected to either of the nodes. The simplest way to do this, conceptually, is to set, say, 5 closest to 7, then look through the rest of the nodes, setting everything (that was formerly closest to 5) to 7 as well. This preserves the relationship.

But as so often happens in programming, what turns out to be the simplest solution to a problem, conceptually speaking, ends up being very costly, computationally. (There are many kinds of simplicity -- something easy to understand can be hard to execute -- and vice versa). Having the computer loop through the entire array each time it connects two nodes is not an efficient use of processing power. Future posts will try to elaborate on the methods I learned for improving execution.

B. Discovery?

This gets tossed off at the end of the discussion, almost as a note. "What about the original problem," you ask. "We wanted to tell whether one node is accessible from another -- but you spent all this time rehearsing how to connect them." That's the beautiful bit! Once you've connected all your nodes and done all the house-keeping involved in connecting them, testing connections is easy. Just check whether the two nodes are nearest the same thing -- or, if you like, whether they're in the same network. That's simply a matter of accessing the value at an index on the array -- an operation that comes practically for nothing. (Figuring out how they're connected or, say, the shortest route between them is quite another matter.)

Coding Journal

Search This Blog