History

Stories are the most important thing in the world. Without stories, we wouldn’t be human beings at all.

(Philip Pullman)

The Lemizh language originated in the region of Lemaria, which stretches from Moldavia and the southern Ukraine to the river Don. The dialect spoken to the north of the Danube Delta prevailed against other variants and is considered the standard language today. But let’s start from the beginning, with some sketches of the modern language’s ancestors.

Contrary to the grammar, I will be using some linguistic terms here that might not be widely known, as well as a fair number of IPA symbols. Please look them up on Wikipedia or elsewhere on the internet if you don’t know them already.

The code next to each title is the corresponding HTML language tag.

Proto-Indo-European `x-ine`

Proto-Indo-European (PIE) was spoken during the early fourth millennium BCE, probably in the Pontic-Caspian steppe to the northeast of the Black Sea, by a people who had tamed the horse and knew the wheel, but were illiterate. It is the common ancestor of a number of major historical and modern languages, which served as a basis for reconstructing the protolanguage in great detail. The descendants are categorised into ten branches (disregarding a few unclassified languages): Lemizh, †Anatolian, †Tocharian, Iranian, Hellenic, Celtic, Waldaiic, Sabellic, Armenian, and Albanian.

While it is widely regarded as a descendant of a poorly reconstructed language dubbed the Hemetera, a minority view PIE as one of the primordial languages that originated from the Blessing of Babel.

Phonology

PIE had to make do with only three genuine vowels, but many of the continuant consonants could form syllables as well. Only the plosives, of which there were as many as fifteen, and the sibilant *s lacked this capability.

	Vowels	Glides	Liquids	Nasals	Laryngeals	Sibilant	Plosives
	Vowels	Glides	Liquids	Nasals	Laryngeals	Sibilant	Labial	Dental	Palatal	Velar	Labiovelar
Syllabic	e o *a	i u	l̥ r̥	m̥ n̥	h̥₁ h̥₂ *h̥₃	—	—
Nonsyllabic	—	i̯ u̯	l r	m n	h₁ h₂ *h₃	*s	p b *bʰ	t d *dʰ	k̑ g̑ *g̑ʰ	k g *gʰ	kʷ gʷ *gʷʰ

All syllabic sounds except the laryngeals had long counterparts, but it is not clear which of these were genuine long vowels (if any), and which were originally short vowels with a following laryngeal, e.g. *ih₁ > *ī. A vowel plus a glide might or might not have been analysed as a diphthong (*e͜u/*eu̯?) by native speakers.

*bʰ *dʰ etc. were breathy voiced plosives as in Sanskrit ‘Buddha’. The laryngeals *h₁ *h₂ *h₃ were likely pronounced [h] [χ/ħ] [ɣʷ], respectively, in their nonsyllabic forms, and [ə] [ɐ] [ɵ] when syllabic. The second laryngeal caused adjacent *e to be coloured to *a, while the third laryngeal caused o-colouring of this vowel.

PIE had a pitch-accent system, with the prominent syllable having higher pitch than the surrounding ones. There is no way of predicting the position of the accent in a word, so we will mark it whenever possible: *nás‑os ‘nose’. Such a feature is called a mobile accent.

Morphology and syntax

Verbs, nouns and adjectives had a basically tripartite common structure: root, suffix and ending. Each of these parts had a core vowel which was an *e, sometimes an *a, in its basic form or full grade, and could change (ablaut) to *ē/*ā (lengthened grade), *o (o-grade), *ō (lengthened o-grade), or vanish altogether, often with syllabification of a neighbouring consonant (zero grade), depending of the word’s grammatical form: *bʰer‑ • *bʰor‑ • *bʰēr‑ • *bʰōr‑ • *bʰr‑/bʰr̥‑ ‘carry, bear’. The root contained the lexeme. Suffix and ending of verbs encoded grammatical information, while nouns and adjectives often had derivational suffixes. Depending on the last sound of the suffix, we distinguish thematic forms – ending in an ablauting vowel – and athematic ones – ending in a consonant, including a syllabic glide.

Verbs inflected for tense/aspect (present, imperfect, aorist, perfect, and possibly pluperfect), mood (indicative, imperative, subjunctive, optative, and possibly desiderative), voice (active and mediopassive), person and number (singular, dual and plural); nouns and adjectives for case (eight of them) and number, and adjectives additionally for gender and comparison. The language also had pronouns and various types of particles, such as pre- and postpositions, conjunctions, and interjections.

The default word order might have been SOV, but this is disputed.

Proto-Lemizh `x-lmp`

Proto-Lemizh is the ancestor of two major present-day languages, Lemizh and Volgan. It is very poorly attested in form of some papyri found near the northwestern shore of the Black Sea, to the north of the Dniester Liman, dated about 2700 BCE (_4C0 in the Lemizh calendar), notably one enigmatic fragment relating to a squirrel. (Tortoises came into play much later.) Most of what we know about the protolanguage, however, is inferred from PIE, Old Lemizh, loanwords in neighbouring languages, and place names.

Proto-Lemizh became a distinct language probably in the early or mid third millennium BCE, well before Proto-Anatolian.

Phonology

Proto-Lemizh had a standard vowel inventory a e i o u, with long counterparts for all five. The occurrence of diphthongs is disputed: while w is usually reconstructed as [v], it might actually have been [u̯] after vowels as these combinations show vocalic reflexes in Old Lemizh. Regarding the consonants, the language partly kept the PIE phonemic distinction between palatals and velars. It also featured some unusual affricates.

	Labial	Dental	Alveolar	Postalveolar	Palatal	Velar	Glottal (pharyngeal?)
Liquids		l, r
Nasals	m	n
Plosives	p • b	t • d			k • g
Fricatives	f • w /v/	th /θ/ • dh /ð/	s • z	sh /ʃ/ • zh /ʒ/	ç • j /ʝ/	x • gh /ɣ/	h • ɦ
Affricates	pf • bw		ts • dz		kç • gj	kx • ggh

Accent was fixed (predictable): it fell on the penultimate syllable unless there was a long vowel or diphthong in the word, in which case it received the accent. Proto-Lemizh likely had stress accent, meaning increased volume of the prominent syllable.

Diachronically (i.e. regarding its development from PIE), the language is remarkable in having a consonantal triple reflex, i.e. *h₁ > h, *h₂ > x, *h₃ > f, unless preceded by a vowel. The presence of vowel-initial Proto-Lemizh words such as *oçtōw ‘eight’ has given headaches to linguists who object to PIE words beginning in vowels and prefer reconstructing *h₃ek̑tṓu̯ to *ok̑tṓu̯, subsequently trying to explain where the expected PLem *f has gone. We will wriggle out of this difficulty by ignoring the arguments against initial vowels in PIE.

Morphology and syntax

Not much is known about the language’s morphology, except that it was highly inflected like its ancestor. Verb stems generally continued the zero grade of PIE athematic present forms, most commonly root presents (*negʷ‑ ‘grow dark’, zero grade *n̥gʷ‑ > angw‑), nasal-infix presents (*bʰei̯d‑ ‘split’, *bʰi‹n›d‑ > bwind‑), and presents with the ending *‑sk̑‑ (*mer‑ ‘die’, *mr̥‑sk̑‑ > marsk‑), but also some reduplicated presents (*bʰer‑ ‘carry, bear’, *bʰi‑bʰr‑ > bwimbr‑ ‘give birth to’). However, reflexes of Narten presents (full-grade forms) also occurred frequently (*h₁ed‑ ‘eat’ > hedh‑). Resultative verbs sometimes continued PIE perfect forms (*bʰei̯h₂‑ ‘come to fear’, *bʰe‑bʰih₂‑ > bwembī‑ ‘fear’). Nouns and adjectives mostly generalised the strong (nominative) stem. Word endings seem to have undergone far-reaching innovation.

Word order varied, albeit maybe only in poetic usage.

Old Lemizh `x-lmo`

Old Lemizh is a fairly well attested language. Its earliest known documents were probably written around 2100 BCE (_280) along the northern and western shores of the Back Sea. Due to some archaeologists’ sloppy work, dating is uncertain.

The Old Lemizh people were seafarers and loved mathematics and poetry. They were a proud and lofty people, but to no avail.

Phonology

Old Lemizh added y ([ɯ] or [ə]), ö and ü (both long vowels) to the vowel inventory, and lost the long vowels save ē and ī, as well as the voiced glottal fricative ɦ. n was already pronounced [ŋ] unless followed by t, d, s, or z. Probably at a late stage of Old Lemizh, gh split into the velar [ɣ] and the uvular [ʁ], although a later date for this change cannot be ruled out. Following this, the palatals were merged with the velars.

As in Proto-Lemizh, accent fell on the penultimate, or on a long vowel or diphthong if present. Old Lemizh definitely had stress accent.

Morphology and syntax

The language had highly regular grammatical endings. Verbal endings encoded tense, person, number and voice; participles were inflected for tense, voice and comparison; and nouns, for case and number. Many nouns were derived from verbs by a simple exchange of the ending. Adjectives were lost early on, being replaced with participles (‘white’ > ‘being white’). Pronouns and a variety of particles completed the word inventory.

There were seven tenses: pre-past, past, post-past, present, pre-future, future, and post-future; four of which mainly expressed dependent clauses’ relationships to their main clause. In addition to the three Indo-European persons, the language innovated a ‘fourth person’, actually an impersonal form. The PIE dual was lost, so we are left with singular and plural. The five voices were really combinations of voice and aspect: active, direct passive (turning the accusative object into the subject), indirect passive (turning the dative object into the subject), and direct and indirect passive perfect. Old Lemizh featured the three familiar degrees of comparison, positive, comparative and superlative; but infinitives were formally participles with a special fourth comparison ending. Finally, the language knew eight cases, only partly corresponding to the present plot cases, and supplemented with a number of prepositions.

Despite its case endings, everyday language had a rigid SVO word order; and modifiers (participles and genitive attributes) followed the head. Poetry had a much freer word order. Finite subordinate clauses interestingly had their subject in the case of the clause: the subject of a local clause was in the locative case without having a local meaning in itself. This is thought to be a generalisation of the old accusative and infinitive (Lat accusativus cum infinitivo) construction.

Ghean `x-gh`

Ghean [ˈɣɛən], often erroneously pronounced [ˈɡiːən] or [ˈɡe͜ɪən], is a language with no known genetic relationships. It was spoken by a people of unknown origin and doubtful morals, who subdued the Lemizh tribes in around 1000 BCE (1C0) and ruled for infamous three generations.

The term ‘Ghean’ was coined, quite pompously, in Early New Lemizh times: je (then pronounced [ʝε]), inner nominative of the temporal verb ja ‘having done’, i.e. ‘those who have done’. The language’s endonym (what the Gheans called it themselves) is unknown.

Phonology

Ghean sample sentence: oətTⁿö̂i pə̄a aəxshshˡāo ‘All hail the king!’

The vowels were a, o, ə, e, ö, i; each of them with a corresponding long vowel and a diphthong (aə, oə, əə, ei, öi, ii). The ‘diphthongs’ əə and ii, as well as sequences of two identical short vowels such as aa, were likely pronounced like the ‘true’ long vowels but more open. Each vowel or diphthong, as well as the unpronounced zero vowel that occured word-initially and in the second component of some compounds, could be followed by a consonant or consonant cluster of up to three sounds. There were seven pronunciation types of such clusters, each of which only contained consonants from a limited set of eight. Ghean orthography reflected this by having only eight consonant letters. A consonant cluster could be marked with one of six modifiers which determined the pronunciation type. The modifier’s names do not always reflect the actual pronunciations associated with them.

This is the transliteration and pronunciation scheme:

Modifier	Bilabial		Dental		Post- alveolar		Uvular
Modifier	p	f	t	s	ᴛ	sh	q	x
(none)	p	ɸ	t	s̟	t̠	ʃ	q	χ
ⁿ (nasal)	pⁿ	ɸ	tⁿ	s̟	t̠ⁿ	ʃ	qⁿ	χ
ʳ (trilled)	ʙ	ʙ̥	r	r̥	r̠	r̠̥	ʀ	ʀ̥
ˇ (voiced)	b	β	d	z̟	d̠	ʒ	ɢ	ʁ
ˡ (lateral)	bʷ	β	dˡ	z̟	d̠ˡ	ʒ	ɢˠ	ʁ
ʱ (breathy voiced)	bʱ	m	dʱ	n	d̠ʱ	n̠	ɢʱ	ɴ
̑ (implosive)	ɓ	∅	ɗ	∅	ɗ̠	∅	ʛ	∅

Here are some illustrations of Ghean moræ, i.e. vowel (including the zero vowel) plus optional consonant or consonant cluster:

a = /a/, f = /ɸ/, tˇ = /d/, esʱ = /ɛn/, öfᴛʱ = /œmd̠ʱ/, ȫsqʳ = /øːr̥ʀ/.
Nasal and lateral release apparently were only pronounced at the end of a consonant cluster: ətpⁿ = /ətpⁿ/, ə̄tshˡ = /ɯːdˡʒ/, [ɯːdʒ]? – however, ᴛsˡ = /d̠ˡz̟/ to judge from ᴛsˡə > Middle Lemizh lzyr ‘green’.
The consonant letters f, s, sh and x were silent in implosive clusters, but still occured in some words, possibly for etymological reasons: oəps ̑ = /ɔ͜əɓ/.

Ghean was a register tonal language; the individual moræ were marked by their relative pitch level for grammatical purposes:

Vowel type	low tone	neutral tone	mid-high tone	high tone
short	—	a	ạ	—
long	à	ā	á	â
diphthong	àə	aə	áə	âə

A short overview over the Ghean script is available as a PDF here.

Morphology and syntax

Verbs and nominals (a combined noun/adjective/participle part of speech) consisted of – at least – three moræ (or two when used as main predicates). Pronouns had one mora less, and particles had two less (meaning there were particles that consisted solely of consonants). The penultimate mora of a verb, nominal or pronoun carried grammatical information:

Verbs had a short vowel or diphthong in this position, which expressed the three persons, two numbers, questions and the imperative. The consonant cluster encoded various tense/aspect combinations.
Nominals had a long vowel expressing inner case. The consonant cluster was basically a case suffix in the modern Lemizh sense. It also expressed number, although plural marking was not obligatory in nominals.
In pronouns, the penultimate inflected for person and number.

All three were also inflected for outer case, which was encoded in the ultimate mora; the main predicate lacked this part. Tone expressed grammatical level (as well as the imperative and vocative); rising by more than one level additionally employed lengthening of the outer case vowel. Ghean comparison handling is not well understood.

The word order was VSO, but also VOS, and head before modifier, as we would expect given the existence of level. The language featured brackets in the sense of Modern Lemizh grammar, but no coordinations.

(Standard) Middle Lemizh `x-lmm`

The Gheans discouraged the use of the natives’ language, but obviously tolerated Lemizh words (or rather word stems) to stand in for unfamiliar Ghean ones. The grammar of simple sentences was easy enough to learn for the Lemizh, as they were used to case endings and head-first phrases, and likely still knew VSO sentences from poetry. After two or three generations, the natives must have spoken a mixed language or creole with a more or less Ghean grammar but an abundance of Lemizh words, especially outside the core vocabulary. This is a quite unusual development as most creoles draw their lexicon mainly from the dominant group, and tend to be grammatically more innovative. (The Tanzanian language Mbugu might have had a somewhat similar development with more or less analogous outcomes.)

After the mysterious disappearance of the Gheans, Lemizh patriots tried to revive their old language, which failed spectacularly for the grammar but reintroduced many Lemizh words of the core vocabulary. The Ghean hexadecimal counting system stuck. The modern Lemizh alphabet is an invention of this period.

Phonology

Under the influence of Ghean, Middle Lemizh (re)introduced a number of diphthongs and long vowels, but shortened U. Unless the uvular fricative [ʁ] dates from before the Ghean conquest, it is also a consequence of the language contact. The labiodental frivatives became bilabial, and h was lost or merged into x.

Middle Lemizh continued the Ghean tonal system mainly in word endings, while turning stem syllables that were accented in Old Lemizh to ones with a low tone.

Morphology and syntax

Middle Lemizh had verbs, nominals, pronouns and particles, following the Ghean model. The penultimate mora carried the same grammatical information as in Ghean, and, as we would expect, outer case was expressed by the ultimate mora.

Syntax continued to be level-based, a feature that of course lasts until today. As in Ghean, brackets expressed adjective and participle attributes as well as relative clauses. Coordinations were implemented subsequently.

Late Middle Lemizh `x-lml`

During the seventh to third centuries BCE, in a period still classified as pre-Late Middle Lemizh, diphthongs were simplified and long vowels shortened while retaining their low tone, now marked orthographically with a ‘`’: leemin‑ > lèmin‑ ‘make Lemizh’. R [ʁ] attained its present pronunciation of [ɹ].

By definition, we speak of Late Middle Lemizh from the time pronouns and tenses started to be used relatively to the respective predicate rather than the main predicate, the principle of relativity. This development dates from about 200 BCE (500).

Early New Lemizh `x-lmn`

The cover of the Tlöngö̀l in Penguin Classics

For a millennium and a half, Lemizh sources remained almost silent, and the few texts are orthographically and grammatically inconsistent. These are the Lost Years, from which the language re-emerged in a phonologically considerably altered shape. The major grammatical changes were still to come.

The Tlöngö̀l (lit. ‘the reason for enduring / for plucking up courage’, NLem tlOna < Koi τλῆναι < PIE *telh₂‑ ‘lift up, take upon oneself’), a pathetic and literary utterly irrelevant epic novel (but Jorge Louis Borges, in one of his lesser known essays, defends it), has nevertheless triggered a new literary high and defined a language standard that is still palpable in Modern Lemizh. It was published in 1351 CE (AFE), which is the formal birth date of New Lemizh.

Phonology

During the Lost Years, a number of phonological changes had taken place. These included (very roughly in the following order)

syncope (elimination) of certain unstressed vowels (one rule being that vowels with Middle Lemizh low tone never syncopated),
contraction of nasal + plosive to a nasal (e.g. mg > n),
methatesis of nasal + fricative under certain circumstances, sometimes leaving the place of articulation behind (mj > wn),
various changes involving liquids, including metathesis, but also contraction (Ld, Lt > L, Lz > R, Lc > r),
two consecutive different plosives became plosive + fricative or fricative + plosive unless there was an adjacent fricative,
two consecutive different fricatives contracted when part of a larger consonant cluster,
of two consecutive plosives or nasals, the second was eliminated (lèmin‑ > *lemn‑ > lem‑),
double fricatives and liquids (ff cc RR etc.) were simplified,
words starting in a fricative or plosive + liquid + nasal lost the liquid,
remaining clusters of four consonants, and those of the type plosive + fricative + plosive, were broken up by an epenthetic vowel,
and fricative + plosive and plosive + fricative clusters underwent anticipatory voicing assimilation.

The exact rules have been studied extensively; but detailling them would go beyond the scope of this overview.

At this stage, Lemizh had simplified the earlier tonal system, resulting in the modern two-way pitch-accent system for expressing level.

Morphology

The inner factive case had already been in existence for some time to form verbal, or gerund-like, nouns. The Tlöngö̀l popularised dependent clauses headed by such nouns, replacing finite clauses, and also introduced verbal nouns as main predicates with adverbial clauses. This development led to the eventual extinction of the verb. Pronouns lost their status as a separate part of speech, leaving us with nouns (which today are known as verbs because of their verbal stems) and a small number of particles, principally ‘yes, no(t), and, inclusive or, exclusive or’.

Modern Lemizh `x-lm`

There is no exact definition for the beginning of the Modern Lemizh period. The present language differs significantly from the one we know from the Tlöngö̀l, but the main changes are spread out over several centuries.

Phonology

The most obvious development after the Tlöngö̀l is the forming of poststems, starting in the mid-18^th century (C00). They have three major sources: at first, the inner case vowel replaced the last vowel of the stem, leaving the following consonants (if any) to form the poststem. Liquids and nasals in this position were either eliminated (elision), switched places with the inner case vowel (metathesis), or became fricatives (fortition), following a number of rather complicated rules. In some stative verbs, the perfect ending -s turned into a poststem; and in certain nominal verbs the singular and plural endings -r -l,* going back to Ghe ᴛʳə /r̠ə/ ‘one’ and ᴛˡi /d̠ˡɪ/ ‘several’, became the poststems -c -j, respectively (under the fortition rules: whence lemÌc., from the collective singular). As words with zero poststem came to be viewed as female, and those with non-zero poststem as male, ‘gender change’ can also occur by elimination or addition of poststems: this is the source of the final consonant of sxnèz. ‘Sun’, which is male in Indo-European mythology. Addition of a poststem also has the function of distinguishing words that would otherwise be homophones: for example j‑a > jàx. ‘move’ vs. the relative pronoun jà.. Sometimes an Early New Lemizh word has two modern descendants with different poststem formations. Often, one of them is an everyday word, while the other is a technical term, e.g. rOsy ‘frost’ > rÌs. ‘frost’ (regular poststem) / rOsÌc. ‘crystal’ (poststem from singular) or canxwy ‘dust’ > cnÌxw. ‘dust’ (regular poststem) / canxwÌ. ‘dark matter’ (no poststem). These mechanisms are still productive, at least for assimilating loanwords.

The only other notable sound shift was the dissimilation of consonant clusters beginning with a plosive. This shift is the reason why Modern Lemizh is entirely without affricates.

* mnemonic: ‘singular’ and ‘plural’

Lexicon and morphology

What with the Middle Lemizh Renaissance, and the influx of predominantly Indo-Euopean loanwords over the course of three millennia (including those introduced or popularised by the Tlöngö̀l), we now have a language with a largely Indo-Euopean lexicon.

Conversely, starting with the Ghean occupation, augmented by subsequent grammatical simplifications, and maybe completed by the extinction of particles, the Lemizh language finally arrived at a thoroughly un-IE and highly unlikely regular grammar. The chances for this to have happened are two to the power of two hundred and seventy-six thousand seven hundred and nine to one against. The future, however, will doubtlessly introduce new irregularities.

History

Proto-Indo-European `x-ine`

Phonology

Morphology and syntax

Further reading

Proto-Lemizh `x-lmp`

Phonology

Morphology and syntax

Further reading

Old Lemizh `x-lmo`

Phonology

Morphology and syntax

Further reading

Ghean `x-gh`

Phonology

Morphology and syntax

Further reading

(Standard) Middle Lemizh `x-lmm`

Phonology

Morphology and syntax

Further reading

Late Middle Lemizh `x-lml`

Early New Lemizh `x-lmn`

Phonology

Morphology

Further reading

Modern Lemizh `x-lm`

Phonology

Lexicon and morphology

Further reading

History

Proto-Indo-European x-ine

Phonology

Morphology and syntax

Further reading

Proto-Lemizh x-lmp

Phonology

Morphology and syntax

Further reading

Old Lemizh x-lmo

Phonology

Morphology and syntax

Further reading

Ghean x-gh

Phonology

Morphology and syntax

Further reading

(Standard) Middle Lemizh x-lmm

Phonology

Morphology and syntax

Further reading

Late Middle Lemizh x-lml

Early New Lemizh x-lmn

Phonology

Morphology

Further reading

Modern Lemizh x-lm

Phonology

Lexicon and morphology

Further reading

Proto-Indo-European `x-ine`

Proto-Lemizh `x-lmp`

Old Lemizh `x-lmo`

Ghean `x-gh`

(Standard) Middle Lemizh `x-lmm`

Late Middle Lemizh `x-lml`

Early New Lemizh `x-lmn`

Modern Lemizh `x-lm`