Skip to main content

Problems Come In Three Sums

Odyssey

Today was a somewhat frustrating day.  I got held up 4 hours this morning with algorithms.  I've been trying to work through bits of Princeton's Algorithm course on Coursera.  So this morning I decided I was going to try and start following along with the code examples in Java.  Well I started with a sudo apt-get java-sdk (or equivalent) -- simple, right?  Then I created my first "Hello World" application (a blast back to my high school days), figured out how to compile it with javac, and things were moving along.  Then I tried to follow along with the algorithm we were learning about, a brute force solution to the Three Sum problem, which included the following code:

In in = new In(args[0]);

I tried to compile it, naive as I was, and got an "unknown symbol" error (for that "In").  Stack Overflow -- oh, the Princeton course uses their own custom made libraries.  So I'll just add them to my working directory and import them, right?  But they're in a .jar file.  Don't know how to use classpath.  "We have our own programming environment for Java at Princeton Algorithms."  But I don't install it correctly.

And so begins the four hour Odyssey, which ends with me installing Eclipse, then installing it again, then, at the end of the day, installing it again.  Because it turns out there are two versions of Eclipse in the Ubuntu App Store -- and you want the one that says it isn't secure and will mess with all your files.  Because if you try to install the other one, maybe you go through the AWT tutorial to get practice using Eclipse to import libraries, try to run your application, get horrible runtime errors.  And it all ends around 10:00pm tonight with me obsessively trying to figure out how to add a .desktop file to my shared applications because when you don't install Eclipse from the app store, it doesn't automatically add itself to the launcher.  And then there's no quick and easy way to uninstall files.  And I REALLY HATE HAVING RANDOM FILES FROM OLD INSTALLS HANGING AROUND ON MY COMPUTER.  I have been known to do a clean wipe from time to time just to get rid of junk.

Three Sums

The double entendre isn't lost on me.

Today's lecture was an introduction to modeling the performance of algorithms.  It gets slightly mathematical, and as with all math, I won't understand any of that until I watch it several more times obsessively (why I never finish anything). But I'll see if I can remember a few basic concepts.

Experimental method.  One way to approach how efficient an algorithm is, is to use the experimental method and consider it as a natural process (running on a computer somewhere).  So I think of the algorithm as a black box, tell my computer to run it with different data-sets or parameters, and measure how fast the algorithm runs.  Then I can use -- I think it's linear regression? -- or anyway fancy methods to plot out my data (how fast it ran) and come up with a theory (projection) of its efficiency.  As far as data analysis goes, the time may be increasing exponentially in relation to the input, so you would use a log / log graph to make the relationship into a straight line.  I don't remember exactly what the advantage of that is -- I think it's that you get a constant for the slope of the line, which serves as a nice benchmark for the algorithm's efficiency.  The time an algorithm will take to execute given N inputs, on this model, is often going to be something like

A * N ^ C

where A represents the efficiency of the computer and C represents just how bad or good the algorithm is (you want it to be a small number).

Mathematical Models.  But Donald Knuth (I remember him from philosophy -- think he wrote a logic textbook too) thought we could also figure out how efficient an algorithm is mathematically.  The first idea is that we can actually take an inventory of all the basic functions that a computer performs and get a rough estimate of the time it takes the computer to perform them.  Of course, we may have to determine this experimentally -- especially because modern computers won't do these operations (I mean +, -, and so on) in a uniform way (I think).  

The second idea, though, is that it will get pretty complicated and irksome to determine how many operations are being performed say when you have loops nested in loops -- so we can just count the most important things (an idea that goes back to Turing in a somewhat different context -- calculating how long it should take a human to perform).  And the important things end up being array accesses.  So when you're working with an algorithm, a nice thing to think about is how many times it accesses (reads from, writes to) the array you're using.  If you're using an array.  I'm thinking that's pretty much what the n is going to be in the big-O notation -- the number of times an array is accessed, in particular cases -- but in general the number of times an important operation is performed.

Job Search

Last note on the job search.  Not going well so far.  Job searching is just horrible -- you can't help but feel others are constantly intimating that you aren't good enough, that you're nothing to write HR about.  And then I say, "Since I graduated college 20 years ago to now, I've never found a company that thought I had merit."  But I have to tell myself it isn't like that.  It's just a matter of trust -- companies, understandably, can't really trust anyone who applies to them.  They have to come up with elaborate methods of proving the applicants.  And then they also need filters, because there are just so many of us.  The result isn't really the best for anybody -- because the proofs just aren't that great.  But the companies need to protect themselves and justify themselves -- and I've got to keep throwing myself at the wall until I find the little slit that lets me slip through.  And in the meantime I have to keep learning.

So think of it as a matter of statistics.  People in my situation apply for jobs, and a certain percentage of them find a job after a certain amount of time -- and I imagine that percentage climbs as the time increases.  So I'm just part of that statistic and I have to roll the dice.  And in the meantime keep it interesting by, well, learning new things, writing this journal, and trying to make interesting connections.  And if I had to get another job, I'd get another job -- but if I keep at this, and if I make room for it (since I'm not necessarily going to have the luxury of studying it full time if worse comes to worst)...I have to get there eventually.

And even aside from that -- the tests I fail or the challenges I botch -- it's a software error, not a problem with the hardware.  I didn't study math or computer science -- but that just means I've put less time into some things, more time into other things.  And the longer I put time into this, the less meaningful that lack is and the smaller the gap becomes.  But I'm back to chasing math and science after years in the humanities.  And that's not a bad thing.  I refuse to be confined to one area or to be written off as possessing only one kind of talent.  In the end, all of these acquisitions are going to amount to something.  Even if just an interesting person.  No one will mind another interesting person.

Comments

Popular posts from this blog

Getting Geodata From Google's API

The apps I'm going to be analyzing are part of Dr. Charles Severance's MOOC on Python and Databases and work together according to the following structure (which applies both in this specific case and more generally to any application that creates and interprets a database using online data). The data source, in this case, is Google's Google Maps Geocoding API.  The "package" has two components: geoload.py  and geodump.py .  geoload.py  reads a list of locations from a file -- addresses for which we would like geographical information -- requests information about them from Google, and stores the information on a database ( geodata.db ).  geodump.py  reads and parses data from the database in JSON, then loads that into a javascript file.  The javascript is then used to create a web page on which the data is visualized as a series of points on the world-map.  Dr. Severance's course focuses on Python, so I'm only going to work my way through ...

Compiling and Executing Java Files With -cp

I decided I was going to "man up" and figure out how to compile a java program with an external dependency from the command line instead of relying on an IDE-- the DOS command line, to be more specific. I ran into a few problems: 1.  The external dependency was given to me as a java file.  I experimented compiling it as a .jar, but I wasn't sure how to import a class from a .jar, so I ended up compiling it into a class. 2.  When I tried to run the file, I got an error saying that the class had been compiled with a different version of Java than my JRE.  The Internet told me to check my path variable for Java.  It sure looked like it was pointing to the latest JRE (and the same version of Java as my compiler).  I asked the Internet again and found the following command: for %I in (java.exe) do @echo %~$PATH:I I'm not exactly sure what the syntax of that magic command is (intuitively it's returning the path that executes when I run the "java" com...

Quick Find / Quick Union (Connected Nodes)

Setup This week I learned about the "Quick Find" or "Quick Union" algorithm. Imagine an NxN grid of nodes, some of which are connected by lines. A connection can be interpreted as accessibility: if two nodes are connected, you can get from one to the other. Every node is accessible to itself: to get where you already are, stay there. Also, If you can get from A to B, you can go back from B to A. And if you can get from A to B and from B to C, then you can get from A to C. As a consequence, the connection between nodes divides the grid into regions of mutually accessible nodes. You can travel from any node in a given region to any other node in that region -- but not to any nodes outside that region (exercise to reader -- proof by contradiction). The problem has two parts. First, find a way to represent this grid structure and the accessibility relation; second, use your schema to efficiently calculate whether two given nodes are accessible to each other. ...