Problems Come In Three Sums

Odyssey

Today was a somewhat frustrating day. I got held up 4 hours this morning with algorithms. I've been trying to work through bits of Princeton's Algorithm course on Coursera. So this morning I decided I was going to try and start following along with the code examples in Java. Well I started with a sudo apt-get java-sdk (or equivalent) -- simple, right? Then I created my first "Hello World" application (a blast back to my high school days), figured out how to compile it with javac, and things were moving along. Then I tried to follow along with the algorithm we were learning about, a brute force solution to the Three Sum problem, which included the following code:

In in = new In(args[0]);

I tried to compile it, naive as I was, and got an "unknown symbol" error (for that "In"). Stack Overflow -- oh, the Princeton course uses their own custom made libraries. So I'll just add them to my working directory and import them, right? But they're in a .jar file. Don't know how to use classpath. "We have our own programming environment for Java at Princeton Algorithms." But I don't install it correctly.

And so begins the four hour Odyssey, which ends with me installing Eclipse, then installing it again, then, at the end of the day, installing it again. Because it turns out there are two versions of Eclipse in the Ubuntu App Store -- and you want the one that says it isn't secure and will mess with all your files. Because if you try to install the other one, maybe you go through the AWT tutorial to get practice using Eclipse to import libraries, try to run your application, get horrible runtime errors. And it all ends around 10:00pm tonight with me obsessively trying to figure out how to add a .desktop file to my shared applications because when you don't install Eclipse from the app store, it doesn't automatically add itself to the launcher. And then there's no quick and easy way to uninstall files. And I REALLY HATE HAVING RANDOM FILES FROM OLD INSTALLS HANGING AROUND ON MY COMPUTER. I have been known to do a clean wipe from time to time just to get rid of junk.

Three Sums

The double entendre isn't lost on me.

Today's lecture was an introduction to modeling the performance of algorithms. It gets slightly mathematical, and as with all math, I won't understand any of that until I watch it several more times obsessively (why I never finish anything). But I'll see if I can remember a few basic concepts.

Experimental method. One way to approach how efficient an algorithm is, is to use the experimental method and consider it as a natural process (running on a computer somewhere). So I think of the algorithm as a black box, tell my computer to run it with different data-sets or parameters, and measure how fast the algorithm runs. Then I can use -- I think it's linear regression? -- or anyway fancy methods to plot out my data (how fast it ran) and come up with a theory (projection) of its efficiency. As far as data analysis goes, the time may be increasing exponentially in relation to the input, so you would use a log / log graph to make the relationship into a straight line. I don't remember exactly what the advantage of that is -- I think it's that you get a constant for the slope of the line, which serves as a nice benchmark for the algorithm's efficiency. The time an algorithm will take to execute given N inputs, on this model, is often going to be something like

A * N ^ C

where A represents the efficiency of the computer and C represents just how bad or good the algorithm is (you want it to be a small number).

Mathematical Models. But Donald Knuth (I remember him from philosophy -- think he wrote a logic textbook too) thought we could also figure out how efficient an algorithm is mathematically. The first idea is that we can actually take an inventory of all the basic functions that a computer performs and get a rough estimate of the time it takes the computer to perform them. Of course, we may have to determine this experimentally -- especially because modern computers won't do these operations (I mean +, -, and so on) in a uniform way (I think).

The second idea, though, is that it will get pretty complicated and irksome to determine how many operations are being performed say when you have loops nested in loops -- so we can just count the most important things (an idea that goes back to Turing in a somewhat different context -- calculating how long it should take a human to perform). And the important things end up being array accesses. So when you're working with an algorithm, a nice thing to think about is how many times it accesses (reads from, writes to) the array you're using. If you're using an array. I'm thinking that's pretty much what the n is going to be in the big-O notation -- the number of times an array is accessed, in particular cases -- but in general the number of times an important operation is performed.

Job Search

Last note on the job search. Not going well so far. Job searching is just horrible -- you can't help but feel others are constantly intimating that you aren't good enough, that you're nothing to write HR about. And then I say, "Since I graduated college 20 years ago to now, I've never found a company that thought I had merit." But I have to tell myself it isn't like that. It's just a matter of trust -- companies, understandably, can't really trust anyone who applies to them. They have to come up with elaborate methods of proving the applicants. And then they also need filters, because there are just so many of us. The result isn't really the best for anybody -- because the proofs just aren't that great. But the companies need to protect themselves and justify themselves -- and I've got to keep throwing myself at the wall until I find the little slit that lets me slip through. And in the meantime I have to keep learning.

So think of it as a matter of statistics. People in my situation apply for jobs, and a certain percentage of them find a job after a certain amount of time -- and I imagine that percentage climbs as the time increases. So I'm just part of that statistic and I have to roll the dice. And in the meantime keep it interesting by, well, learning new things, writing this journal, and trying to make interesting connections. And if I had to get another job, I'd get another job -- but if I keep at this, and if I make room for it (since I'm not necessarily going to have the luxury of studying it full time if worse comes to worst)...I have to get there eventually.

And even aside from that -- the tests I fail or the challenges I botch -- it's a software error, not a problem with the hardware. I didn't study math or computer science -- but that just means I've put less time into some things, more time into other things. And the longer I put time into this, the less meaningful that lack is and the smaller the gap becomes. But I'm back to chasing math and science after years in the humanities. And that's not a bad thing. I refuse to be confined to one area or to be written off as possessing only one kind of talent. In the end, all of these acquisitions are going to amount to something. Even if just an interesting person. No one will mind another interesting person.

Throughput, Latency, and Pipelines: Diagnosis Of A Fallacy

Source here . Latency is the time it takes for a job to complete from start to finish -- for example, if we're downloading a file from a server, we might define the latency of the download as the amount of time it takes from the initial request for the file to the time the last byte is received. Throughput is a measure of how much can be completed in a given time. Following the same example, how many files could we download in an hour? What is the relationship between latency and throughput? It make take, again, 1 hour to download a file. Notice already that we have a relationship between some amount of work that can be completed and time -- if the file is, say, 2 GB, it takes 1 hour to download 2 GB in our example. What we really want to do here is generalize: how long does it take to download 10 GB? How many GB can we download in ten hours or one minute? Generalizing over time, we derive latency; generalizing over work completed, we derive the...

Coding Journal

Search This Blog