Adam Smith

The Ugliest Hack I've Ever Pulled Off

June, 2007

Machine learning (6.867) was my favorite class at MIT. I just ran across the report from my final project in that class, which I've uploaded to Scribd: Friendship Prediction on Facebook.

A Hack
As part of my project, I wrote a web site that allowed someone to type in their name and get back a list of people I thought they were friends with in real life but not on facebook. I put together the web site between about 10pm and 8am the day the report was due. [1]

Facebook Friendship Prediction - The Machine


The web site was the ugliest hack I've ever pulled off; it was in the final hour and I just needed it to work. Once someone entered their name, a task record was created in a MySQL table. I had a Java process polling the DB for new requests. Once pulled, that Java process would create 6000 feature vectors, one for each person at MIT that the query user might be friends with. Those were saved to a file. Then I needed to invoke a program called Weka to evaluate the feature vectors and output yes or no for each one. Trying to do this Shell() from Java wasn't working, so I had the Java app write out a Windows batch file with the appropriate command. I wrote a VB app to poll for batch files, and execute them as they came up. [2]

I had another Java app poll for result files, parse them, and put them into the DB.

Each request, end to end, would take a couple minutes if there wasn't any other load. The first java app kept about 1.8 GB of data in RAM that it needed to determine how close two people were in the friendship network, including things like how many photos they were both in, what their gender was, and so on. (See the paper for the full details!)

Meanwhile, the client was being shown a page with a <META REFRESH..> so every 20 seconds it would invoke PHP to poll the MySQL DB for results.

Ah, the beauty of throw away code!

Notes

[1] One of my favorite essays talks about the productive pressure of a deadline. Indeed.
[2] Here's the main part of the VB app!
Private Sub Timer1_Timer()
File1.Refresh
For i = 0 To File1.ListCount - 1
Path = File1.Path & "\" & File1.List(i)
Open Path For Input As #1
Input #1, toexec
Close #1
Kill Path


Dim k As Integer
Math.Randomize k = Int(Math.Rnd() * 984)
On Error Resume Next
Kill "c:\a" & k & ".bat"
On Error GoTo 0
Open "c:\a" & k & ".bat" For Output As #2
Print #2, toexec
Print #2, ""
Close #2
Shell ("c:\a" & k & ".bat")

Next
End Sub

Comments powered by Disqus