DCblog: March 2007

Saturday, 24 March 2007

On di(a)ereses

Another question from a Fight for English reader:

'On your discussion of the hyphen: I came across a modern American non-fiction book (sadly, I can't now locate it) where the diaeresis was regularly used, instead of the hyphen, for "coöperate" and the like. I don't know whether this was a given publisher's house style, an author's preference, or even how widespread this usage is. It would be fun to know how this came about, and how much it is used on each side of the Atlantic. I rather like its arcane nature, whilst also find it intrusive!'

Marks to distinguish letters and words, as an aid to reading, go back to classical times. The term diaeresis (earlier diæresis, US dieresis) derives from a Greek word meaning 'divide' or 'separate'. In English the practice goes back to the end of the 16th century, when most of the modern punctuation marks started to be systematically used. The earliest reference to the term in the OED is 1611, where it is used to distinguish adjacent vowels in such words as queuë - the intention being to ensure that the vowel sequence was not pronounced as a diphthong.

These days, the vowel-separating function seems to have largely died out, being replaced by the hyphen. I still see it occasionally as a pronunciation guide in such words as naïve. And it's still quite often found in proper names, such as Noël or the Brontës. Some people are quite proud of it. I know a woman who, when asked her name, says 'It's Chloë with two dots'.

I don't know of any statistical study of contemporary usage, I'm afraid, or whether different things are happening between the US and UK. Certainly, if any modern book used it frequently, it would have to be an authorial decision, bucking the trend, and there for a stylistic purpose. I've seen it, for instance, in representations of dialect speech and also in some artificial languages (as in science fiction), where it helps make the speech look alien. Nice subject for a small study.

On -ise vs -ize

A US correspondent, having read The Fight for English, raises this question.

'American spellings: I note with some interest that the OUP house style is to use "-ize" instead of "-ise". I wonder: does this come from Webster (unlikely, I'd have thought!)? Does it suggest an English pre-American practice? Or is it just OUP preference? I guess I shall never know!'

The -ize spelling was preferred by classical scholars, especially in the 16th century, for verbs which came into English from Greek and Latin, and that etymological argument has fostered the use of z ever since. The USA and Canada adopted it from the outset. And the editors of the Oxford English Dictionary opted for it, at the end of the 19th century, partly on etymological grounds (a z is used in Greek and Latin) and partly on phonological grounds (that the letter better reflects the sound). 'In this dictionary', they say, 'the termination is uniformly written -ize'. This influenced Henry Hart, who compiled his 'Rules for Compositors and Readers' at the press in Oxford. He opens his first booklet with a section on spellings, and adopts the -ize spellings used in Murray's dictionary. And Murray, in turn, had been influenced by Dr Johnson, whose Dictionary has agonize, analyze, anatomize, and so on.

So where did the -ise alternatives come from? Some of the words such as baptize) were spelled with both an s and a z from their earliest days in Middle English. The trend to spell all such verbs with s began when verbs came into English with increasing frequency from French, where the suffix was -iser. A verb of this kind borrowed directly from French, it was argued, should be spelled with -ise, to reflect that source. Some felt it important to maintain a spelling link between related words, such as analyse and analyst. And during the 19th century, this usage grew.

The problem, of course, is that it is often unclear whether a verb has come into English from French or from Latin. Confusion led 19th-century printers to try to sort it out, and they did this by imposing a uniform rule for all such verbs where alternatives exist. Hart opted for -ize. But several other publishers - perhaps in an effort to distinguish themselves from Oxford - opted for -ise. They may also have been influenced by the fact that there are fewer exceptions if you go for the -ise rule. Several verbs can only appear in -ise (such as advise, revise, surprise...), and you have to remember what they are.

World usage varies. -ize is the overall preference in North America; -ise in Australia. Usage in the UK is mixed, with -ise beating -ize in a ratio of 3:2. There's a nice discussion of current trends in Pam Peters' Cambridge Guide to English Usage. Usage is certainly changing. Some publishers these days are adopting a more relaxed attitude: they don't mind which authors use, as long as they are consistent. Personally, having had my usage pushed first one way and then the other by publishers over the years, I've given up having a preference!

Saturday, 17 March 2007

On relevance in advertising

Sarah sends a comment which asks about online contextual advertising. She asks: Why are those snappy short ads working so well in certain cases, and not at all in others? And is this influencing other parts of our language use at all?

This is in fact an area of applied internet linguistics which I've spent a lot of time on in the last ten years. You can read up more about it at the site www.crystalsemantics.com - but essentially my procedure avoids the problem you've noticed by providing a full lexical specification of the content of a page. To see why this is needed, consider the following example.

A few years ago there was a page on CNN reporting a street stabbing in Chicago. The ads down the side said such things as 'Buy the best knives here', 'Get knives on eBay', and so on! The stupid software had found the word 'knife' and assumed that the page was about knives, and automatically assigned cutlery ads to it. No-one was happy, least of all the cutlery firms, who certainly didn't want their product to be associated with homicides.

To avoid this crazy result, my approach analyses all the content words (strictly, 'lexemes') on the page and weights them in terms of relevance. For the CNN page, a word like 'knife' would be outranked by the cluster of other words on the page that relate to crime. It then classifies the page using a set of around 1500 categories derived from the taxonomy I developed when working on the Cambridge encyclopedia family (there's an earlier post about that). It would conclude that this is a page about a crime - specifically, a homicide. It might also conclude that it was about some other things, too, such as policing or urban renewal. (Web pages are usually multi-thematic.)

Any advertiser wanting to place an ad alongside this report would want it to be relevant to the page - an ad about crime prevention, say, or careers in the police force. All advertisers have to do is apply the same classification system to their ad portfolio, and the software picks out the relevant ads. It's a simple principle, but it works very well, and is now beginning to be widely used by the company that is now developing it, adpepper media.

The principle is simple, but the linguistics took a long time to develop - ten years, in fact. Every sense of every content word in a college-sized English dictionary had to be investigated and assigned to the relevant encyclopedic category, and significant collocations also had to be identified. The initial task took a team of lexicographers several years, and the software engineering took another team several years more. Indeed, the refining of the approach is still going on, to make sure it is fast enough and robust enough to cope with commercial demands, which might run to hundreds of millions of page-analyses and ad-assignments a day.

Incidentally, the same procedure can be used for other internet applications, such as improving search-engine relevance, automatic document classification, and internet security. It's difficult to get the big firms and organizations to run with these new ideas, though, I find. They are very set in their ways, and prefer to carry on using their familiar methods (even if they don't work that well) than to invest in new strategies. For instance, a couple of years ago I developed a method (called 'Chatsafe') for tracking paedophile gambits in conversations, based on this sort of lexical analysis. It worked fine, and I thought it would be welcomed by the Powers That Be concerned with this sort of thing, such as the Home Office or chatroom companies. But despite a lot of talk, nobody picked it up, so it's stayed on the shelf.

Is this influencing language use in general? I don't see much sign of that. I talk about the extent to which the internet is influencing current usage in my Language and the Internet, and also in A Glossary of Textspeak and Netspeak. Although the internet is linguistically revolutionary in certain respects, the impact it has so far had on actual usage in a language is pretty limited.

On 'Living On' again

The staged reading of 'Living On' that I mentioned in my last post will take place in the Khalili Theatre of the School of Oriental and African Studies (between Russell Square and Malet Street) on the evening of Monday 23rd April, starting at 6.30. It is the opening event of 'Endangered Languages Week'. The evening should end at about 9.30.

That week looks really interesting. It is being organized by the Endangered Languages Academic Programme in the Linguistics Department of SOAS. There will be a workshop, round-table and films on the Tuesday; an Open Day on the Wednesday; and a workshop and film on the Thursday. Publicity will be going out shortly from SOAS.

Monday, 5 March 2007

On 'Living On'

A correspondent writes to ask what is the story behind my play 'Living On', and has it had a production yet?

It came out of a conversation with Greg Doran, now associate director at the Royal Shakespeare Company. In the late 1990s, Greg was in North Wales directing a play, and he gave me a call. While in the US he had read an article I had written on endangered languages for the Library of Congress magazine, Civilization, and he thought this was an excellent subject for a play. I wholeheartedly agreed, as it had long been my view that the best way to communicate the issue to a wide public would be to get the artists of the world to deal with it in their different genres. And fiction, as Disraeli once said, 'stands the best chance of influencing opinion'.

Greg came across to Holyhead, and we spent an afternoon exploring possibilities. The only professional playwright we knew who had approached the subject was Harold Pinter, in 'Mountain Language', but that was a twenty-minute piece with a particular political angle, and not an exploration of the general theme of language death. Other playwrights, it seemed, either displayed little interest or little knowledge - hardly surprising, given the fact that the extent of the world language crisis had become known, even in linguistics, only five years before. As I knew the subject and had had some playwriting experience, he suggested I take the job on. The idea was to put the play on at the new theatre in Keswick, with which Greg was associated.

I set to with enthusiasm. I created a 'last speaker' of a language, Shalema, invented a language, Tamasa, for him to speak, and gave him a cultural background which was a fusion of notions derived from several endangered-language communities around the world. The plot revolves around the interaction between him and a field linguist, Derek, who has been documenting his language, and a British Council officer, Miranda, who works in the city where Shalema lives. All has been going well, but then Shalema refuses to cooperate any further...

The play was nearly finished when Greg moved to Stratford to become associate director at the RSC. Shakespeare - I suppose, not unreasonably - then took priority. I remember complaining at the time that 'Shakespeare had already had his chance, and it was time to let a new generation have a go...'! But although Greg gave me some excellent feedback about the writing, he wasn't able to take the idea any further.

I completed the play nonetheless, and through my linguist-turned-actor son Ben made contact with Bob Wolstenholme, a London-based director who was interested in taking the project forward. Bob worked with me on the script, and the final version is the result of this revision, along with other revisions which took place after various staged readings around the world. I've been able to try it out with audiences in Australia, Brazil, India, and Mexico, as well as in several parts of Europe, often tying it in with a lecture on language death, so I'm pretty confident that it 'works'. Indeed, I was dismayed/delighted to meet someone recently who recalled a combined lecture/play reading from a few years ago. 'I can't remember what you talked about in the lecture,' she said, 'but I remember the play very well!'

The text of 'Living On' is freely available to any group, and I have often sent it out. It is therefore possible that amateur readings or productions have taken place in some parts of the world. Any profits received from a commercial production should be assigned to a local endangered languages association, if there is one, and failing that, to the Foundation for Endangered Languages in the UK. A published version of the play will be available in due course, once it has had a full staged production. There's a staged reading planned in London during the week of 23rd April 2007, as part of 'Endangered Languages Week' at the School of Oriental and African Studies in Malet Street. There may be a full production in London in 2008. It is not exactly mainstream theatre, however, so I am not holding my breath.

A couple of performance notes. The play uses a culturally diverse cast, and a strong ethnic identity is needed for the character of Shalema. I had Morgan Freeman in mind, when I was writing, but no particular racial group is assumed: the characters could be from virtually any part of the world. The rainforest setting used in the script could be altered to any other, without this affecting the plot. The two leading white characters also have a regional background: Derek is Welsh and Miranda is Irish. For parts of the world where the allusions to the 'Celtic fringe' may not be especially meaningful, the text could be adapted to incorporate alternatives, also without this affecting the plot. The same point applies to any proposed translations. The play involves music and choreography/movement, for which appropriate specialists would need to be involved.

'Living On' will readily adapt for a screenplay; indeed, the physical events which are depicted in some ways are easier to display through the medium of film. These events, though, would be difficult to portray on radio, and some rewriting and plot adaptation would be necessary for any radio performance.

Sunday, 4 March 2007

On alliterating, or not

An alliteration addict from Brazil has asked whether the notion of alliteration has to be purely sound-based. He had circulated a poem in which every line contained at least three alliterative words, and received a comment from one reader that some of his lines weren't really alliterative at all because they began with the same letter but not always with the same sound. For example, one of his lines reads 'Hope and humility are honorable'; another reads 'Always in artistic appreciations'.

In fact the earliest uses of the term in English refer only to letters, not sounds. The first citation in the OED is 1656, a definition which says that alliteration is 'a figure in Rhetorick, repeating and playing on the same letter'. One of the first quotations given in that dictionary is 'Apt Alliteration's artful aid' (1763), where the four instances of letter a represent three sounds.

The OED definition doesn't actually answer the question. It defines alliteration as: 'The commencing of two or more words in close connexion, with the same letter, or rather the same sound', Alliterate, however, has no qualification: 'To begin with the same letter or group of letters'.

Certainly, the way alliteration has been used in oral poetry (from Old English times) and in the oral performance of poetry has privileged the auditory sense of the term, and that is the dominant use today. Few people, I think, would view a sequence such as 'Peter the philosopher saw a ptarmigan' as alliterative.

What is the difference, then, between 'Always in artistic appreciations', which does sound alliterative, and 'Peter the philosopher...', which doesn't? The amount of phonetic similarity between the sounds. The phonetic values of the a letters are all relatively open vowels; the phonetic values of the p letters range from a plosive to a fricative to zero.

There is always an element of subjectivity in a judgement about alliteration (as indeed also about other effects, such as assonance and rhyme). Whether two sounds are perceived to alliterate depends on how close they are to each other, whether the two syllables are stressed or not, and - the issue here - whether the sounds are sufficiently phonetically similar to be perceived to be 'the same'. There are therefore no grounds for a 'black and white' interpretation of the notion. It would be unwise to insist that sounds in alliteration have to be always phonetically identical. A lot of effects that we recognize as alliterative (such as the OED example) would be excluded if we did.

There is another issue, of course: is it aesthetically acceptable to mix the two systems - graphic and phonic - in the same poem? That's perhaps more a question for literary criticism than linguistics. But if the aim of alliteration is not just to sound nice, but to relate word-meanings or to reinforce a poetic structure (such as parallelism between lines), then I don't see why not. Empson and others used to say that similarities of sound prompt similarities of sense. Similarities of shape can do the same thing.

Saturday, 3 March 2007

On new branches of applied linguistics

A student, interested in maybe doing some linguistic research one day, writes to ask if there are any totally unexplored regions on the linguistics planet. I remember thinking about this for a paper I gave to the British Association of Applied Linguistics in 2002. A number of topics (or questions, if you prefer) came to mind where, even if the occasional study exists, there is certainly no established body of research. For instance:

There is as yet no field of 'theatrical linguistics' - or maybe 'theatrical phonetics' would be a better label - to answer questions like 'Why are some actors’ vocal performances more effective than others?' 'What was it exactly that made John Gielgud’s voice so memorable?' 'What phonetic techniques and materials might be devised to help actors and directors?' Actor son Ben tells me that, when he was training, it was really hard finding high-quality taped materials on regional accents, at the level of detail he would need for a particular character.

There is as yet no field of 'musical linguistics' to answer questions like: 'Why are some languages suitable to opera and not others?' 'Why is English the language of pop music?' 'Is there something about the structure of English which makes it suit rock-and-roll, or reggae?' I wonder if one could ever devise a more linguistically representative and diverse (i.e. non-English) Eurovision song contest?

There is as yet no field of 'forensic internet linguistics'. I went to a conference in Brussels in 2002 on Internet security in the face of increased threats from hacking, fraud, and cyberterrorism. A wide range of questions was being addressed to do with methods of spam exclusion, porn filtering, linguistic identification of forged messages, and so on, all of which presupposed a descriptive linguistic frame of reference for what I have elsewhere called ‘Internet linguistics’, and which hardly yet exists.

And there are other undeveloped branches of Internet linguistics. 'What kind of language should we use on the Internet?' 'How can Internet language be taught to children?' 'How does the arrival of the Internet impact on children’s abilities to read and write?' This might one day be a field called 'applied educational internet linguistics'.

Very little relevant work has been done. Even some very basic questions haven't been addressed - such as how to describe the linguistic features in use. To explore such topics as the difference between Gielgud and Olivier’s voices would need a fuller phonetic transcription of tones of voice than we currently have. Similarly, current transcriptions are not really capable of investigating musicological questions. For instance, how on earth does one transcribe musical quotations in speech - cases where a musical extract is given a generalized linguistic interpretation?

A common contemporary example is the theme from Jaws. The jocular expression of an approaching dangerous social situation is often conveyed by its ominous low-pitched glissando quavers. Try transcribing that. Or (to take other examples I have heard over the months in conversational settings - not always very well performed, but sufficiently recognizable for me to note them down): the theme from The Twilight Zone, Dr Who, Dragnet, the shower-room scene in Psycho, Laurel and Hardy’s clumsy walk music, the riff in Close Encounters of the Third Kind, and the opening motif of Beethoven’s Fifth Symphony. The extract may be highly stereotyped and brief. Someone who arrives in a room with something special may accompany it with ‘Ta-raa’, or the racecourse riff, or the whistled motif from Clint Eastwood’s Spaghetti Western films, or the chase music from a Keystone Kops film. Devotees of The Prisoner cult TV series introduce its musical motifs into their speech to the point where one would dearly like to see the balloon guardians of The Village appear and hustle them away!

I had a personal close encounter with a new linguistics field just a couple of years ago. At the beginning of 2004 I was approached to help the Shakespeare's Globe company in London put on a production of Romeo and Juliet in 'original pronunciation' - that is, as close as possible to how the words would have been pronounced in Shakespeare's time. I wrote it all up afterwards in a book, Pronouncing Shakespeare. But until then, I had never conceived that there might be a field called, roughly, 'applied historical theatrical linguistics'.

DCblog