Languages of the world. Or are some of them dialects?

Languages of the world. Or are some of them dialects?


Whether a tongue should be classified as a language or a dialect is a matter of much heated debate. In modern society, there are perhaps no linguistic issues as contentious as the status of Sicilian, Flemish or Ebonics (just look up 'Oakland ebonics controversy'). Inquisitive minds have wondered about the languagehood of Scots and Welsh, which, although not representing sovereign states, bear little resemblance to English, their common linguistic neighbour (particularly Welsh). Similar questions have been posed regarding the diverse languages of Germany and Italy. And the abundance of Internet know-alls 'correcting' others' language-dialect judgements should be palpable to long-time users of 'teh Interwebz'.

In recent years, traditional dialect vs. language distinctions have been brought under scrutiny. Some of them have been challenged by the linguist and the citizen alike. One high-profile case is that of Cantonese. Radically departing from Mandarin's sound patterns, word formation rules and sentence structures, Cantonese is slowly losing its status as a substandard dialect, and the Hong Kong Government's attempt to restore this waning image was greeted with widespread public backslash.

The underlying factors behind the Cantonese movement are complex, a mixed bag of social, political and economic factors. Yet the driving force is clear: It is the desire to create a new linguistic identity. When asked for a objective reason for reclassification, though, the response from proponents is, unequivocally, the lack of mutual intelligibility - the ability to understand each other - between Mandarin and Cantonese (perhaps along with a few dubious cultural claims thrown in for good measure).

Is mutual intelligibility as good a touchstone as we make it out to be? Is there merit in traditional language-dialect distinctions? And the million-dollar question: Are X and Y separate languages or dialects of the same language?

Language: Not Just a Dialect with an Army and a Navy

Max Weinreich's old adage is perhaps the most oft-repeated summary of traditional language-dialect distinctions. A dialect, he claims, gains the status of language if the speech community forms a nation-state with its own separate army and navy. A variant of this definition adds a national flag, which doesn't change it a lot.

This traditional notion can be easily challenged. It is true to an extent, considering that the dialects of Chinese are all under the protection of the People's Liberation Army. Scandinavia is divided into three countries, Denmark, Norway and Sweden, so the Scandinavian Languages are considered separate. Since the dissolution of Yugoslavia, Serbo-Croatian, once a single language, has been splitting off into Serbian, Croatian, Bosnian and Montenegrin, despite the absence of major differences.

But wait. Mainland and Taiwanese Mandarin aren't considered separate languages, even though they have a different army and navy. The Bai language, which is closely related to the varieties of Chinese, is considered a language in its own right, despite the fact that it lacks national defence. The many nations of the Arab world, which speak mutually unintelligible varieties of Arabic, do not share armed forces, yet are not considered to be speakers of separate languages.

The worn-out canard is really a misrepresentation of conventional dialect-language distinctions. True, politics does play a major role in assigning arbitrary boundaries between dialect and language. Yet the correspondence between language status and sovereign state status is far from perfect.


Belgium has its own army, but Flemish is traditionally classified as a dialect of Dutch, and nobody has ever tried to reclassify Belgian French as a language either.



A Common Linguistic Identity

Now, it may be tempting to define a language as an entity with a single standardised variant. Chinese, after all, has only one standardised language based on Pekingese Mandarin. This variety of Mandarin, known as Putonghua, is studied by all Chinese schoolchildren and is the official lingua franca, not to mention the high-prestige variety. Serbian and Croatian are now regulated by independent bodies, which seek to widen the divide between the two. This is how they became independent languages.

Linguists use the concept of autonomy and heteronomy to distinguish between dialects and languages in this sense. Low German is heteronomous: It has a standard variety, Standard German. Parisian French, by contrast, is autonomous: It is the national standard, with official dictionaries, grammars and a regulating body, the Academie française.

Yet life is still not so simple. The lack of a regulation cannot entail the lack of a language status, or there would have been no languages before the likes of the Academie française popped up. One could argue that de facto standards also count, but this does not render the definition invincible. The tongues spoken by Chinese ethnic minorities, for example, are not standardised, but we remain keen on partitioning them into languages. Moreover, British, American, Canadian and Australian English are all dialects of the same English language, in spite of different standards in different nations.

Costa Rican Spanish
Credit: The LEAF Project -

An student flipping through a dictionary - a sign of standardisation - with an 'official guide' to Costa Rican Spanish in the foreground.


The Dialect Continuum

In recent years, a popular notion, even among non-linguists who would not otherwise look into dialectology, is that if two speech varieties are mutually intelligible, then they are dialects of a common language; if not, they are distinct languages. To the untrained ear, this may sound appealing and unambiguous, but even a slightly closer inspection reveals that the criterion creates more problems for itself than it resolves.

One of the greatest difficulties faced by this method of taxonomy is the dialect continuum. Sometimes, it is easy for A to understand B, B to understand C and C to understand D, but not for A to understand D. This is because the differences between the idiolects (dialects of individuals) are cumulative.

The Romance languages - including French, Spanish, Portuguese, Italian and Rumanian - are a great example. We usually think of the Romance languages as discrete blocks, each with their own diverse, yet mutually intelligible, dialects. The real-world situation is far from so simple. People living on either side of the Spain-France border understand each other quite well, even though their tongues are considered dialects of different languages. In fact, Romance languages like Portuguese, Spanish, French, Italian form a dialect continuum called the West Romance Dialect Continuum.

The Romance Continuum

The problem isn't just faced by linguistic geographers, either. Sometimes, within the same geographic location at the same time, the tongues spoken by the population can be extremely diverse. Now, few would consider Cockney to be a language separate from the Queen's English, but in other places, low-prestige tongue can be very distinct from the high-prestige ones. Our best examples are probably creoles, like Jamaican Creole (known as Jamaican Patois to its speakers) and Hawaiian Creole. This lady, who has an excellent YouTube series introducing Jamaican Creole, makes it clear from the first video that Jamaican Creole is not broken English and not slang, but an independent language (around 01:00):

 But don't be quick to conclude that the low-prestige variety is a language yet. Between the tongue of the upper echelons of society (known as the superstrate language) and the vulgar vernacular (known as the substrate language), there is again a continuum. The divide between the gutter and bourgeoisie is not a sharp one.[2] In a way, even the lady above wasn't talking to you in 100% English: Her accent shows Jamaican influence, after all. And despite her efforts to deter you from treating Jamaican Creole as broken English, it's not hard to see why such a misunderstanding arose:


Jamaican Creole isn't the only example. Citing an influential book on sociolinguistics, Wikipedia shows us an array of ways to say 'I gave him one' in Guyana. It starts from the clearly English 'I gave him one' and ends with the clearly non-English 'me bin gee am wan', along with 16 other varieties in between, in varying degrees of Englishness: 'a give him wan', 'a did gee him wan', 'me bin gi ee wan' and so on. Linguists call this a post-creole continuum, and it's still unclear where the Creole ends and English starts. Do people who say 'a give him wan' speak an English dialect, and people who say 'me bin gi ee wan' speak in a creole language? Then what is the status of 'a did gee him wan'? Where do we draw the line, and why?

What Are 'Languages'?

So far, we've been using names like 'English' and 'Jamaican Creole' as if they were concrete linguistic entities that exist in the world. After all, you don't tell your friends you're learning a tongue spoken in Madrid - you tell them you're learning Spanish, unwittingly presupposing the existence of a putative entity known as 'Spanish'. Yet, as we have seen above, the grey areas are humongous. To properly distinguish dialects and languages, we need to know what exactly we are referring to when we say 'English' and 'Jamaican Creole'.

Now, if we're going to talk about standard languages, then perhaps there is an entity called English. It comprises a standardised lexicon (codified in a standard dictionary) and a standardised grammar. An example of rigid standardisation is Putonghua ('common speech', a form of standardised Mandarin). The Chinese government has spent tremendous efforts to make the standard tongue as homogeneous as possible, in the hopes of fostering communication between the linguistically diverse regions of China.

But while the concept of an entity called 'Putonghua' may exist in the books, actual users of Putonghua hardly follow its conventions to the book. Non-standard uses abound: Ta bei zisha le! (He was 'suicided') is a textbook example of linguistic innovation. The construction was created as a sardonic expression for murders covered up as suicides. Nor can we assume that people's mental lexicons are merely subsets of the standard lexicon. Technical jargon, regional vocabulary and remnants of Classical Chinese are just some examples of non-dictionary words uttered by real people, even in 'Putonghua' speech.

In computer science, a high-level programming language like Java or C++ is used by humans to write instructions. These instructions are turned into machine language by various means, including interpretation and compilation. For example, your browser interprets JavaScript to perform actions on dynamic web pages. Imagine an interpreted programming language that is interpreted slightly differently in every computer. That's what a human language is; we don't have the luxury of having the same interpreter on each machine.

What can we make of all of this? Perhaps the noted linguist Charles Hockett has the answer.


A sign in a Chinese school saying, 'Promote our race and our culture. Everyone speak Putonghua!'

Conceptualising Languages

Each person has her own way of speaking, known to linguists as idiolect. The individual's grammatical quirks are captured by her mental grammar. The set of words she knows - including eggcorns (lexical 'errors' like eggcorn for acorn and doggy-dog world for dog-eat-dog world), which are highly idiosyncratic features - constitute her mental lexicon.

A language is a collection of similar idiolects. A dialect is defined in a similar manner. The difference between language and dialect, then, is a matter of degree: Idiolects in a dialect are less diverse than in a language.

How, then, are we to measure diversity? We can, once again, use the concept of mutual intelligibility. Hockett introduced the ideas of L-simplex and L-complex, both of which denote collections of idiolects.[1] An L-simplex is a collection of idiolects where everyone understands everyone else, i.e. every pair of idiolects is mutually intelligible. The diagram below shows an L-simplex. Each idiolect is represented by a letter, and a line between two letters indicates mutual intelligiblity.

An L-simplex

A chain is a series of idiolects in which adjacent idiolects are mutually comprehensible, but non-adjacent dialects don't have to be. Dialect continua, which we've seen last time, are good examples of chains. This, for example, is a chain:


An L-complex is a group of dialects that are connected to each other in some way:


Now that we have know what L-simplexes and L-complexes are, we know what English and French refer to. French is a largely homogeneous collection of idiolects, so it's very close to an L-simplex, where perhaps only the most distant dialects are disconnected from each other. English is a tad more diverse than French, an L-complex with more disconnected idiolects. According to Hockett, French and Italian form a sort of dumbell shape: two L-simplexes on either side, with a small number of mutually comprehensible idiolects connecting them.

Chinese, in contrast, is a huge L-complex, comprised of many small L-simplexes, like the Cantonese dialect spoken in Guangzhou, Hong Kong and their neighbours, or the Shanghainese dialect spoken in (surprisingly) Shanghai. The same can be said of Arabic.

We can even start thinking of dialects in terms of L-complexes and L-simplexes. Many dialects are L-simplexes - like General American English, for example. This isn't always the case. The Yue dialect of Chinese is made up of several L-simplexes that form an L-complex. I speak the 'standard' variety, Cantonese, but I could glean very little information from a YouTube video in Hoiping, another Yue dialect.

The concepts of 'language' and 'dialect', 'English' and 'Quebec French' do not, from a scientific angle, exist. They are not technical constructs; at a technical level, perhaps only L-complexes and L-simplexes are meaningful. But we still haven't seen the whole picture yet...

When Mutual Intelligibility Fails

So far, we have assumed mutual intelligibility is all-or-nothing. In our diagrams, idiolects were either connected or disconnected; there was no in-between. In reality, though, many idiolects are only mutually comprehensible to a certain extent.

If you listen to the Glasgow dialect on YouTube, you'll see why. Sometimes, you can pick up some information here and there. Sometimes, it sounds eerily English-like, but you don't have a clue what they're talking about. Try this one on for size:

I share similar sentiments about the Toishanese dialect. In fact, it sometimes sounds like Cantonese with a heavy accent that I'm used to. (Presumably, some of the immigrants I grew up listening to spoke with a Toishanese accent!) In some parts of Africa, the tongues of other dialects are referred to as 'two-day languages', 'one-week languages' and so on. Two-day dialects are close to your own, so you can adapt to the tongue in two days.  This sort of half-intelligibility between dialects leads us to the conclusion that mutual intellgibility isn't black and white, but a continuum with many shades of grey.

Intelligibility isn't always mutual, either. Danes understand Norwegians better than the other way around. This can be due to several factors, most notably the distinction between acquired intelligibility and inherent intelligibility. Two dialects may not be inherently intelligible, but those who speak a low-prestige variety are likely to have gained more exposure to the high-prestige variety, so they can understand the high-prestige variety better. By contrast, speakers of the high-prestige variety have less exposure to the low-prestige one, hindering their understanding of the low-prestige variety.

A Toishanese would understand what I say better than vice versa because Cantonese is the de facto prestige dialect, used for trade and commerce. A Toishanese in Hong Kong probably hears more Cantonese in a day than I will hear Toishanese in my lifetime. Even if the Toishanese doesn't speak Cantonese, he's probably developed enough comprehension skills to cope with daily tasks. 

The concept of mutual intelligibility is itself laden with complexities and vagueness. To define a good criterion for mutual intelligibility is already a formidable task. To device a non-arbitrary distinction between language and dialect based on this already muddy concept is a Herculean, if not impossible, mission.

'Languages': Not so indispensable after all

Having seen all of these difficulties in defining 'languages' and 'dialects', perhaps we can come to a conclusion: At a certain technical level, the concepts of 'languages' and 'regional/social dialects' are perhaps not useful or interesting at all. The only 'lect' that is helpful is the idiolect, which represents the tacit knowledge of the speakers who speak it: The grammatical rules they have internalised, the vocabulary they have (mostly unconsciously) memorised...

Our conclusion is consistent with what linguists of the Chomskyan tradition has argued for over the last half-century. Linguistics should focus on the internal state of the langauge users' minds that allow them to use language: The I-language. Much work in the past fifty years have been focused on uncovering the principles that govern our intuitions about the language. 58 years later, the search is still on, and we are still yet to uncover the true nature of our language-processing mind, though many psychologists and neuroscientists have joined this quest.

Note how careful I have been throughout the article to differentiate between 'languages' and 'language'. 'Languages' are varieties of human speech, like French or Burmese; 'language' is the sum of all 'languages', the ability of our species to communicate verbally (or, in the case of signed languages, manually) a wide range of expressions. Chomskyan linguists study I-language, not I-languages. (The technicalities involved in I-language and what Chomskyans actually study are too complex for me to go into in this article, and are perhaps more suited for another one, so I will not digress further.)

Now, I am not saying the concept of 'languages' is useless. From an ordinary perspective, and from the perspective of sociolinguists who study language as a social construct, the concept of 'languages' is convenient. It would be ridiculous to tell others that you're learning the tongue of the Parisian people, rather than standard French.

Ethnologue, the largest language catalogue in the world, follows a set of principles to distinguish dialects and languages. It would be unthinkable for them to document idiolects instead. To partition tongues into dialects, they use a mutual intelligibility criterion - 85% inherent intelligibility. To quantify mutual intelligibility - and a fortiori inherent intelligibility, which must be uncontaminated by external factors - is hard indeed, but I don't doubt their intentions or the scientific methods they have employed when they compiled their stats.

The concept of 'language' ought to be defended on practical grounds, whether for classification, easy appellation or, from the government's perspective, standardisation. I don't want to diminish its value in these regards. The fact simply remains that, pragmatic uses aside, 'languages' are an abstraction, and an arbitrary, artificial one at that. Technically, they do not exist.

Epilogue: The One Distinction We Have to Make

Having dispensed of the dialect/language distinction, we can say that all idiolects are ultimately varieties - or dialects, if that's what you want to call them - of language. Let's call this thingamajig Language, with a capital L. In the Chomskyan tradition, it is often conjectured that if a Martian were to arrive on Earth, he would find that all earthlings speak a single Language, although each person speaks a slightly different 'dialect' thereof. This is due to the many similarities, or linguistic universals, that human languages share.

Now that we see Language as a whole, an interesting next step is to see what properties Language possesses, what differentiates it from other semiotic or communicative systems, what causes the Martians to perceive it as different from their own communicative or semiotic systems. These attributes allow us to delineate what is Language and what isn't. This line is probably much more interesting than some arbitrary language/dialect distinction.

Let me conclude this article by saying that research in this line of enquiry has been very fruitful in recent years. Scientists have teased out a set of general properties that belong to human language and not the communication systems of our brethren in the animal kingdom. Traces of these properties in the vocalisation of primates have begun to shed light on the evolution of our own Language. Principles universal to all human language have been proposed, tested, confirmed and refuted. Although fields like cognitive science, evolutionary linguistics and generative grammar are still in their infancy, their rapid growth in a matter of decades lends to their fecundity. Perhaps, in the years to come, we will unravel more secrets of mind and language than our ancestors have ever imagined or conceived of.

Dialectology (Cambridge Textbooks in Linguistics)
Amazon Price: $64.99 $47.23 Buy Now
(price as of Dec 20, 2015)