Archive for June 26th, 2008

An online Mark Twain project

June 26, 2008

About the site:

Mark Twain Project Online applies innovative technology to more than four decades’ worth of archival research by expert editors at the Mark Twain Project. It offers unfettered, intuitive access to reliable texts, accurate and exhaustive notes, and the most recently discovered letters and documents.

Its ultimate purpose is to produce a digital critical edition, fully annotated, of everything Mark Twain wrote. MTPO is a collaboration between the Mark Twain Papers and Project of The Bancroft Library, the California Digital Library, and the University of California Press.

Via Maud.

If you really want your children to read something,

June 26, 2008

For God’s sake, keep it to yourself.  Guard that book like a precious secret. Shelve it judiciously. And under no circumstance put it on a list.

From this must-read picture essay called Compulsory reading; link via Maud.

Update on scientific methodology obsoleteness

June 26, 2008

Remember the recent post I wrote about Wired editor Chris Anderson’s article on how scientific method is becoming obsolete with the availability of large chunks of data? In that post, I conceded that it might be possible to develop some technologies without recourse to the underlying science:

At a more fundamental level, in spite of what Chris Anderson has to say, science is about explanations, coherent models and understanding. In my opinion, all of what Anderson shows is that, if you have enough data, you can develop technologies without having a clear handle on the underlying science; however, it is wrong to call these technologies science, and argue that you can do science without coherent models or mechanistic explanations.

Cosma Shalizi at Three-toed-Sloth (who knows more about these models than I do) sets the record straight, and shows how the development of some technologies is impossible without a proper grounding in science — in this eminently quotable post (which, I am going to quote almost in its entirety):

I recently made the mistake of trying to kill some waiting-room time with Wired. (Yes, I should know better.) The cover story was a piece by editor Chris Anderson, about how having lots of data means we can just look for correlations by data mining, and drop the scientific method in favor of statistical learning algorithms. Now, I work on model discovery, but this struck me as so thoroughly, and characteristically, foolish — “saucy, ignorant contrarianism“, indeed — that I thought I was going to have to write a post picking it apart. Fortunately, Fernando Pereira (who actually knows something about machine learning) has said, crisply, what needs to be said about this. I hope he won’t mind (or charge me) if I quote him at length:

I like big data as much as the next guy, but this is deeply confused. Where does Anderson think those statistical algorithms come from? Without constraints in the underlying statistical models, those “patterns” would be mere coincidences. Those computational biology methods Anderson gushes over all depend on statistical models of the genome and of evolutionary relationships.Those large-scale statistical models are different from more familiar deterministic causal models (or from parametric statistical models) because they do not specify the exact form of observable relationships as functions of a small number of parameters, but instead they set constraints on the set of hypotheses that might account for the observed data. But without well-chosen constraints — from scientific theories — all that number crunching will just memorize the experimental data.

I might add that anyone who thinks the power of data mining will let them write a spam filter without understanding linguistic structure deserves the in-box they’ll get; and that anyone who thinks they can overcome these obstacles by chanting “Bayes, Bayes, Bayes”, without also employing exactly the kind of constraints Pereira mentions, is simply ignorant of the relevant probability theory.

Have fun!

Identifying great engineers and on project postmortems

June 26, 2008

Here are a couple of links through stackoverflow podcast:

[1] Steve Yegge writes about probably the only reliable way of identifying great engineers: working with them. Along the way, he also has some things to say about educational systems, their reliance on memory, and how that kills understanding:

So we all think we’re smart for different reasons. Mine was memorization. Smart, eh? In reality I was just a giant, uncomprehending parrot. I got my first big nasty surprise when I was in the Navy Nuclear Power School program in Orlando, Florida, and I was setting historical records for the highest scores on their exams. The courses and exams had been carefully designed over some 30 years to maximize and then test “literal retention” of the material. They gave you all the material in outline form, and made you write it in your notebook, and your test answers were graded on edit-distance from the original notes. (I’m not making this up or exaggerating in the slightest.) They had set up the ultimate parrot game, and I happily accepted. I memorized the entire notebooks word-for-word, and aced their tests.

They treated me like some sort of movie star — that is, until the Radar final lab exam in electronics school, in which we had to troubleshoot an actual working (well, technically, not-working) radar system. I failed spectacularly: I’d arguably set another historical record, because I had no idea what to do. I just stood there hemming and hawing and pooing myself for three hours. I hadn’t understood a single thing I’d memorized. Hey man, I was just playing their game! But I lost. I mean, I still made it through just fine, but I lost the celebrity privileges in a big way.

Having a good memory is a serious impediment to understanding. It lets you cheat your way through life. I’ve never learned to read sheet music to anywhere near the level I can play (for both guitar and piano.) I have large-ish repertoires and, at least for guitar, good technique from lots of lessons, but since I could memorize the sheet music in one sitting, I never learned how to read it faster than about a measure a minute. (It’s not a photographic memory – I have to work a little to commit it to memory. But it was a lot less work than learning to read the music.) And as a result, my repertoire is only a thousandth what it could be if I knew how to read.

My memory (and, you know, overall laziness) has made me musically illiterate.

[2] Mike Gunderloy in on the need for project postmortems and nine suggestions for effective ones:

The difference between average programmers and excellent developers is not a matter of knowing the latest language or buzzword-laden technique. Rather, it can boil down to something as simple as not making the same mistakes over and over again. Fortunately, there’s a powerful tool that any developer can use to help learn from the past: the project postmortem.

Take a look!