Archive for June, 2008

On curious learning

June 27, 2008

Bertrand Russell on the benefits of curious learning in his In praise of idleness:

Curious learning not only makes unpleasant things less unpleasant, but also makes pleasant things more pleasant. I have enjoyed peaches and apricots more since I have known that they were first cultivated in China in the early days of the Han dynasty; that Chinese hostages held by the great King Kaniska  introduced them to India, whence they spread to Persia, reaching the Roman Empire in the first century of our era; that the word ‘apricot’ is derived from the same Latin source as the word ‘precocious’, because the apricot ripens early; and that the A at the beginning was added by mistake, owing to a false etymology. All this makes the fruit taste much sweeter.

The book is an interesting read; some of Russell’s philosophy is akin to that of Hardy in his Mathematicians apology, namely, that less useful some research, the better it is (which, one of my friends recently called the Brahminical attitude towards research).

An online Mark Twain project

June 26, 2008

About the site:

Mark Twain Project Online applies innovative technology to more than four decades’ worth of archival research by expert editors at the Mark Twain Project. It offers unfettered, intuitive access to reliable texts, accurate and exhaustive notes, and the most recently discovered letters and documents.

Its ultimate purpose is to produce a digital critical edition, fully annotated, of everything Mark Twain wrote. MTPO is a collaboration between the Mark Twain Papers and Project of The Bancroft Library, the California Digital Library, and the University of California Press.

Via Maud.

If you really want your children to read something,

June 26, 2008

For God’s sake, keep it to yourself.  Guard that book like a precious secret. Shelve it judiciously. And under no circumstance put it on a list.

From this must-read picture essay called Compulsory reading; link via Maud.

Update on scientific methodology obsoleteness

June 26, 2008

Remember the recent post I wrote about Wired editor Chris Anderson’s article on how scientific method is becoming obsolete with the availability of large chunks of data? In that post, I conceded that it might be possible to develop some technologies without recourse to the underlying science:

At a more fundamental level, in spite of what Chris Anderson has to say, science is about explanations, coherent models and understanding. In my opinion, all of what Anderson shows is that, if you have enough data, you can develop technologies without having a clear handle on the underlying science; however, it is wrong to call these technologies science, and argue that you can do science without coherent models or mechanistic explanations.

Cosma Shalizi at Three-toed-Sloth (who knows more about these models than I do) sets the record straight, and shows how the development of some technologies is impossible without a proper grounding in science — in this eminently quotable post (which, I am going to quote almost in its entirety):

I recently made the mistake of trying to kill some waiting-room time with Wired. (Yes, I should know better.) The cover story was a piece by editor Chris Anderson, about how having lots of data means we can just look for correlations by data mining, and drop the scientific method in favor of statistical learning algorithms. Now, I work on model discovery, but this struck me as so thoroughly, and characteristically, foolish — “saucy, ignorant contrarianism“, indeed — that I thought I was going to have to write a post picking it apart. Fortunately, Fernando Pereira (who actually knows something about machine learning) has said, crisply, what needs to be said about this. I hope he won’t mind (or charge me) if I quote him at length:

I like big data as much as the next guy, but this is deeply confused. Where does Anderson think those statistical algorithms come from? Without constraints in the underlying statistical models, those “patterns” would be mere coincidences. Those computational biology methods Anderson gushes over all depend on statistical models of the genome and of evolutionary relationships.Those large-scale statistical models are different from more familiar deterministic causal models (or from parametric statistical models) because they do not specify the exact form of observable relationships as functions of a small number of parameters, but instead they set constraints on the set of hypotheses that might account for the observed data. But without well-chosen constraints — from scientific theories — all that number crunching will just memorize the experimental data.

I might add that anyone who thinks the power of data mining will let them write a spam filter without understanding linguistic structure deserves the in-box they’ll get; and that anyone who thinks they can overcome these obstacles by chanting “Bayes, Bayes, Bayes”, without also employing exactly the kind of constraints Pereira mentions, is simply ignorant of the relevant probability theory.

Have fun!

Identifying great engineers and on project postmortems

June 26, 2008

Here are a couple of links through stackoverflow podcast:

[1] Steve Yegge writes about probably the only reliable way of identifying great engineers: working with them. Along the way, he also has some things to say about educational systems, their reliance on memory, and how that kills understanding:

So we all think we’re smart for different reasons. Mine was memorization. Smart, eh? In reality I was just a giant, uncomprehending parrot. I got my first big nasty surprise when I was in the Navy Nuclear Power School program in Orlando, Florida, and I was setting historical records for the highest scores on their exams. The courses and exams had been carefully designed over some 30 years to maximize and then test “literal retention” of the material. They gave you all the material in outline form, and made you write it in your notebook, and your test answers were graded on edit-distance from the original notes. (I’m not making this up or exaggerating in the slightest.) They had set up the ultimate parrot game, and I happily accepted. I memorized the entire notebooks word-for-word, and aced their tests.

They treated me like some sort of movie star — that is, until the Radar final lab exam in electronics school, in which we had to troubleshoot an actual working (well, technically, not-working) radar system. I failed spectacularly: I’d arguably set another historical record, because I had no idea what to do. I just stood there hemming and hawing and pooing myself for three hours. I hadn’t understood a single thing I’d memorized. Hey man, I was just playing their game! But I lost. I mean, I still made it through just fine, but I lost the celebrity privileges in a big way.

Having a good memory is a serious impediment to understanding. It lets you cheat your way through life. I’ve never learned to read sheet music to anywhere near the level I can play (for both guitar and piano.) I have large-ish repertoires and, at least for guitar, good technique from lots of lessons, but since I could memorize the sheet music in one sitting, I never learned how to read it faster than about a measure a minute. (It’s not a photographic memory – I have to work a little to commit it to memory. But it was a lot less work than learning to read the music.) And as a result, my repertoire is only a thousandth what it could be if I knew how to read.

My memory (and, you know, overall laziness) has made me musically illiterate.

[2] Mike Gunderloy in on the need for project postmortems and nine suggestions for effective ones:

The difference between average programmers and excellent developers is not a matter of knowing the latest language or buzzword-laden technique. Rather, it can boil down to something as simple as not making the same mistakes over and over again. Fortunately, there’s a powerful tool that any developer can use to help learn from the past: the project postmortem.

Take a look!

Painter of signs!

June 25, 2008

It is a fairly big yard as only yards without any large, permanent structures can be. Of course, there is a small thatched roof to house the power supply main (and, probably also to accommodate a cot for the owner of the place, when he comes by — but, today there is no cot to be seen). The place is dotted with trees of various sizes; these trees are neatly lined up along the circumference to mark the boundary of the place; all but one huge rain tree are neem.

The place is buzzing with activity; at one corner, a old woman is cutting leaves to feed a goat; another is sitting a bit away from her in the sun; she uses the corner of her saree to cover her head; the goat-feeding lady shouts “Hey! Sit in the shadow! What is this rain-sun, rain-sun, you are saying? It is too hot”. At the diagonally opposite corner, surrounding a huge mound of coconuts (with their hard outer skins removed) sit four men; three huge bamboo poles are stretched in front of them. The men use the strip of land between the poles as bins to separate the coconuts into three groups; they take coconuts; tap them with their fingers; shake and listen to the sound; sometimes they tap two coconuts together or one of them against the stone; and then, using some strange mechanism, which I could never fathom, throw the coconuts into one of the strips. As the separation proceeds, even my untrained eye can pick the general characteristics of the pile — probably, in the market, each pile will be priced differently. At the adjacent side, several men, all in their fifties, are removing the upper layer of the coconut skins by impaling them ever so slightly on the crowbars. Nearby, a couple of children are playing; one of the men notice that they are breaking the coconuts open; he tells them not to; their mother comes by, picks those coconuts, and gives the water to them and then takes them away. Finally, at one corner stands a tractor; it is not even clear how the tractor was parked in that place, in that crooked manner in the first place. Several women and men, lined up from one end to the other with huge baskets full of compost on their head, transfer it from the mound at one corner to the tractor carrier.

The place is filled with the fragrance of drying copra; it is also full of noises — the indistinct murmur as a result of the different conversations taking place at various parts of the yard, the bleating of goats, the cawing of crows, and the sounds of at least a few different species of birds, the screeching of squirrels, and the movie songs that waft (on and off) in the air from some TV or radio. Occasionally, a goods train also passes by just behind the yard, adding to the sights and sounds of the place.

He sits at one corner with his tools, which consist of two aluminium bowls filled with purple and dark green dyes, a tin stencil and a couple of brushes. He spreads nearly fifteen or twenty jute bags; putting the stencil on them, with one brush he marks the first line of the name in purple and the remaining two lines in green with the other. This is the first time I notice how the jute bags get the names painted on them. And, thus, he — the painter of signs — completes the scenery at a coconut mandi at eleven in the morning!

Exquisite is the word!

June 24, 2008

You must take a look at this wash basin! Oh, it is ever so lovely!

A recommendation for The Craftsman

June 24, 2008

Rex at Savage Minds:

I also recently finished reading Richard Sennett’s The Craftsman, which I would highly recommend to all and sundry. In The Craftsman Sennett explores how “the craft of making physical things provides insight into the techniques of experience that can shape our dealings with others”. By charting out a sort of phenomenology of working with the hands he attempts to understand how we can best work with each other. Its a vindication of craft over art, of workmannship over ‘inspiration’ in a truly American idiom—written with a homespun clarity which is also truly elegant. The chapter comparing three different recipes for stuffed boneless chicken took my breath away.

Take a look!

What is a tribe?

June 24, 2008

Following the recent Gujjar agitation to be identified as scheduled tribes, Andre Beteille looks at the question of how to define a tribe from a sociological viewpoint:

Leaving aside the rivalries among Meenas, Gujjars and Jats, can the claims of the Gujjars, or any community, to be designated as a scheduled tribe be judged any longer on merit, or on objective grounds? Does expert or professional opinion on the subject count any more? The problem is not simply that the subject itself is replete with ambiguity, but that professional opinion on such subjects bends so easily to the prevailing political winds.

What was so striking about the claims and counter-claims made over the designation of the Gujjars as a scheduled tribe, was the absence of any serious discussion of what we should mean by the term ‘tribe’. Does a tribe have any specific features as a social formation, or can any social formation be designated as a tribe because it once had, or is presumed to have had, the characteristics of a tribe even though its social composition and organization have in the meantime changed substantially?

Anthropologists have written about tribes for well over a hundred years. It was, in fact, one of the key concepts of their discipline in its formative years. No one will claim that all anthropologists have reached complete agreement on the definition of tribe, but that does not mean that no yardstick exists for deciding which groups may be regarded as tribes. One reason why anthropologists shifted their attention away from tribes is that in the world as a whole there are today fewer communities that can be reasonably characterized as tribes than there were even a hundred years ago.

In October 1960, that is, nearly 50 years ago, the Seminar magazine brought out an issue on ‘Tribal India’. In my contribution to that issue, I had suggested criteria for the definition of tribe, and, like several of the other contributors, including N.K. Bose and Verrier Elwin, had drawn attention to the many changes in tribal life that had already become visible. The criteria proposed by me were that a tribe should be more or less self-contained as a community, and that it should be relatively small and compact, and relatively undifferentiated and unstratified. Like the other contributors, I too had pointed out that what we had in India were not so much tribes in their pristine form as tribes that were in transition to a different mode of organization.

Significant changes have taken place in the character and composition of many of the groups that continue to be designated as tribes. Despite the changes they have undergone since independence, there is little prospect of any of them being declassified and removed from the list of STs. As a matter of fact, new groups have been added to the list, so that the officially designated tribal population increased significantly as a proportion of the total population between 1951 and 2001. It is said that groups such as the Meenas and the Gujjars were organized as tribes at some time in the past. This is almost certainly true: the Burgundians and the Lombards were also tribes at a certain time, and the Germany about which the Roman historian, Tacitus, wrote was inhabited mainly by tribes. All this changed elsewhere in due course of time: only in India, once a tribe, always a tribe.

The piece also describes at least one documented case of the conflict between the “Sanskritisation” tendency of caste groups, and the political forces that favour a “backward”-isation tendency:

In a book entitled First We Are People, the Swedish anthropologist, Stefan Molund, described the changing social position of a caste called the Koris in Uttar Pradesh. The Koris had, before independence, been grouped with the SCs. This, their leaders felt, compromised their dignity by tainting them with the stigma of pollution. They successfully petitioned the government to have their name removed from the list. But shortly after independence, their new leaders realized that they had foregone the special benefits in education and employment by asking for a change of status. So they made another plea, again successfully, to be re-included in the SC list. The Koris are not the only community to have gone through this kind of forward and backward movement.

An interesting piece; take a look!

PS: Though I am subscribed to Telegraph, I missed the piece till I saw the link at Churmuri.

Has scientific method become obsolete?

June 24, 2008

An article in Wired by Chris Anderson argues that it is:

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Learning to use a “computer” of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There’s no reason to cling to our old ways. It’s time to ask: What can science learn from Google?

There are several other articles too in the same issue about areas where petabytes of data are the norm — crop predictions, monitoring epidemics, visualization of big data and so on.

However, I still do not see this kind of “science without models” succeeding in all areas of science; from the examples that are discussed, I see that this type of methodology might be very useful in cases where there are far too many parameters, and most of them are not controllable.

At a more fundamental level, in spite of what Chris Anderson has to say, science is about explanations, coherent models and understanding.  In my opinion, all of what Anderson shows is that, if you have enough data, you can develop technologies without having a clear handle on the underlying science; however, it is wrong to call these technologies science, and argue that you can do science without coherent models or mechanistic explanations.