Some correspondents have been contributing to my last post incognito. It was a post about a point of usage in which, it began to emerge, there was an interesting usage divide between British and American English. The situation is probably more complex than that, with such factors as age, gender, and social context being relevant as well as regional origin. And very important is to establish the relevance, if any, of the contributors' language background. Without a sociolinguistic perspective of this kind, it is impossible to interpret what people are saying. 'I say this' or 'I never say this' is useless without knowing who 'I' is.
And this local issue reflects the main problem presented by the Internet, when it comes to interpreting language data. It's often said that the Internet is the largest linguistic corpus ever, and this is a goldmine for linguists. Well, up to a point, Lord Copper. Because it is also the largest anonymous linguistic corpus there has ever been, and this is an immense frustration for linguists. I take it as axiomatic, these days, that a linguistic analysis has to be sociolinguistically and pragmatically informed. If we want to explain linguistic patterns, as opposed to just describing them, we need to answer the question 'why'. Traditionally, linguistics had its focus on the what and when and where (descriptive, historical, and dialectological perspectives). Today we want to know why a usage occurs. What type of person uses it, in what situation? What was the intention behind using it and what was the effect? It is questions of this kind that sociolinguistics, stylistics, and pragmatics seek to answer. And they can't be answered without basic data, which is what the Internet so often does not provide. The fact that most contributions on the Internet are incognito, or pseudocognito, makes serious sociolinguistic investigation impossible. On the Internet, as the New Yorker cartoon once said, nobody knows you're a dog.
I'm well aware that there are some situations - some social networking domains, for example - where the opposite is the case. People tell the world everything about themselves. But there are still problems. Three, in particular. First, not everything we read can be trusted: false identities are all over the place, in which people adopt alternative ages, genders, roles... Second, saying too much about oneself is almost as problematic as saying too little, as nobody has got the time to trawl through a pile of (linguistically) irrelevant data about hobbies, likes and dislikes, and so on, in order to extract those values which relate to sociolinguistically relevant parameters. And third, linguists have spent a lot of time refining their investigative procedures in recent decades, so that they know the right kind of questions to ask, when approaching a usage issue, and these questions may not be addressed in the information people offer about themselves.
We do not yet have detailed linguistic accounts of the consequences of anonymity. All that is clear is that traditional theories don’t account for it. Try using Gricean maxims of conversation to the Internet: our speech acts should be truthful (maxim of quality), brief (maxim of quantity), relevant (maxim of relation), and clear (maxim of manner). Take quality: Do not say what you believe to be false; Do not say anything for which you lack evidence. Which world was Grice living in? A pre-Internet world, evidently. Analyses in pragmatics traditionally assume that human beings are nice. The Internet has shown that a lot of them are not. Is a paedophile going to be truthful, brief, relevant and clear? Are the people sending us tempting offers from Nigeria - beautifully pilloried in Neil Forsyth’s recent book, Delete This at your Peril (2010)? Are extreme-views sites (such as hate racist sites) going to follow Geoffrey Leech’s maxims of politness (tact, generosity, approbation, modesty, agreement, sympathy)? If brevity was the soul of the Internet, we would not have such coinages as bloggorhea and twitterhea.
I've just come back from a splendid corpus linguistics conference in Oslo (ICAME 32) where this was among the issues being addressed. The paper I gave will be up on my website shortly, but it raises more questions than answers. Maybe one day the Internet as a whole will provide linguistically sophisticated metadata, but I'm not holding my breath. And there may be a limit to what can be, given the collaborative nature of many Web pages, such as those we see on Wikipedia, which are often sociolinguistically heterogenous, reflecting contributions from people of diverse backgrounds. Stylistic conglomerates are emerging as a consequence. None of this helps the poor sociolinguist.
Can anything be done to improve the situation? Well, one small thing is that usage forums could start by demanding greater explicitness when usage issues are raised. And so, from now on, I will not publish contributions to my blog on points of usage that are sociolinguistically incognito. What is relevant to the debate will vary. Sometimes it will be regional background (as in the last post), sometimes it will be age, or gender, or occupation. But there needs to be something, and I hope we will see similar things happening in other usage forums, so that, gradually, a sociolinguistically more informed Internet climate evolves.