Skip to main content

Quick Find / Quick Union (Connected Nodes)

Setup

This week I learned about the "Quick Find" or "Quick Union" algorithm. Imagine an NxN grid of nodes, some of which are connected by lines. A connection can be interpreted as accessibility: if two nodes are connected, you can get from one to the other. Every node is accessible to itself: to get where you already are, stay there. Also, If you can get from A to B, you can go back from B to A. And if you can get from A to B and from B to C, then you can get from A to C. As a consequence, the connection between nodes divides the grid into regions of mutually accessible nodes. You can travel from any node in a given region to any other node in that region -- but not to any nodes outside that region (exercise to reader -- proof by contradiction).

The problem has two parts. First, find a way to represent this grid structure and the accessibility relation; second, use your schema to efficiently calculate whether two given nodes are accessible to each other.

Solution

A. Representation

The data structure can be represented elegantly as an array. Each index identifies a node, and the value of each index identifies that node's nearest connection. I said "nearest," but the designation is arbitrary. It falls out of the definition of accessibility that if a node is connected to one node in a network, it is connected to every other node in the network (hence "network"). Identifying a "nearest" connection is just a convenient way of getting a network up and running. The rest falls out of the method we use to join two nodes together.
Joining
When I issue the command to connect two nodes, I can go ahead and identify one or the other as the closest relative. For example, if I want to connect nodes 3 and 5, it doesn't matter whether I say 5 is nearest to 3 or 3 is nearest to 5. At the end of the operation, both will be nearest to the same thing. To clarify even further -- in the beginning, each node is nearest to itself. Then I connect two nodes -- again, say 3 and 5. Before the operation, 3 is nearest itself, 3, and 5, 5. After the operation, either 3 will be nearest 5 or 5 will be nearest 3, but both will be nearest the same thing. That's the connection. (Another way to think of it: both are assigned to the same network, and the network is arbitrarily identified with one or the other.)

But now I deem fit to bring a third item into the loop. I've connected 3 and 5; say now I connect 5 and 7. It won't be enough to set 5 nearest to 7 (or 7 nearest to 5). That will leave 3 off by itself. The problem that presents itself is this: to preserve the relation of accessibility on connection, I need to connect not only the two nodes in question, but everything connected to either of the nodes. The simplest way to do this, conceptually, is to set, say, 5 closest to 7, then look through the rest of the nodes, setting everything (that was formerly closest to 5) to 7 as well. This preserves the relationship.

But as so often happens in programming, what turns out to be the simplest solution to a problem, conceptually speaking, ends up being very costly, computationally. (There are many kinds of simplicity -- something easy to understand can be hard to execute -- and vice versa). Having the computer loop through the entire array each time it connects two nodes is not an efficient use of processing power. Future posts will try to elaborate on the methods I learned for improving execution.

B. Discovery?

This gets tossed off at the end of the discussion, almost as a note. "What about the original problem," you ask. "We wanted to tell whether one node is accessible from another -- but you spent all this time rehearsing how to connect them." That's the beautiful bit! Once you've connected all your nodes and done all the house-keeping involved in connecting them, testing connections is easy. Just check whether the two nodes are nearest the same thing -- or, if you like, whether they're in the same network. That's simply a matter of accessing the value at an index on the array -- an operation that comes practically for nothing. (Figuring out how they're connected or, say, the shortest route between them is quite another matter.)

Comments

Popular posts from this blog

Getting Geodata From Google's API

The apps I'm going to be analyzing are part of Dr. Charles Severance's MOOC on Python and Databases and work together according to the following structure (which applies both in this specific case and more generally to any application that creates and interprets a database using online data). The data source, in this case, is Google's Google Maps Geocoding API.  The "package" has two components: geoload.py  and geodump.py .  geoload.py  reads a list of locations from a file -- addresses for which we would like geographical information -- requests information about them from Google, and stores the information on a database ( geodata.db ).  geodump.py  reads and parses data from the database in JSON, then loads that into a javascript file.  The javascript is then used to create a web page on which the data is visualized as a series of points on the world-map.  Dr. Severance's course focuses on Python, so I'm only going to work my way through ...

Compiling and Executing Java Files With -cp

I decided I was going to "man up" and figure out how to compile a java program with an external dependency from the command line instead of relying on an IDE-- the DOS command line, to be more specific. I ran into a few problems: 1.  The external dependency was given to me as a java file.  I experimented compiling it as a .jar, but I wasn't sure how to import a class from a .jar, so I ended up compiling it into a class. 2.  When I tried to run the file, I got an error saying that the class had been compiled with a different version of Java than my JRE.  The Internet told me to check my path variable for Java.  It sure looked like it was pointing to the latest JRE (and the same version of Java as my compiler).  I asked the Internet again and found the following command: for %I in (java.exe) do @echo %~$PATH:I I'm not exactly sure what the syntax of that magic command is (intuitively it's returning the path that executes when I run the "java" com...