lemÌc. Lemizh grammar and dictionary


Stories are the most important thing in the world. Without stories, we wouldn’t be human beings at all.

(Philip Pullman)

The Lemizh language originated in the region of Lemaria, which stretches from Moldavia and the southern Ukraine to the river Don. The dialect spoken to the north of the Danube Delta prevailed against other variants and is considered the standard language today. But let’s start from the beginning, with some sketches of the modern language’s ancestors.

Contrary to the grammar, I will be using some linguistic terms here that might not be widely known, as well as a fair number of IPA symbols. Please look them up on Wikipedia or elsewhere on the internet if you don’t know them already.

The code next to each title is the corresponding HTML language tag.

Proto-Indo-European x-ine

Proto-Indo-European (PIE) was spoken during the early fourth millennium BCE, probably in the Pontic-Caspian steppe to the northeast of the Black Sea, by a people who had tamed the horse and knew the wheel, but were illiterate. It is the common ancestor of a number of major historical and modern languages, which served as a basis for reconstructing the protolanguage in great detail. The descendants are categorised into ten branches (disregarding a few unclassified languages): Lemizh, Anatolian, Tocharian, Iranian, Hellenic, Celtic, Waldaiic, Sabellic, Armenian, and Albanian.

While it is widely regarded as a descendant of a poorly reconstructed language dubbed the Hemetera, a minority view PIE as one of the primordial languages that originated from the Blessing of Babel.


PIE had to make do with only three genuine vowels, but many of the continuant consonants could form syllables as well. Only the plosives, of which there were as many as fifteen, and the sibilant *s lacked this capability.

Syllabic*e *o *a*i *u*l̥ *r̥*m̥ *n̥*h̥₁ *h̥₂ *h̥₃
Nonsyllabic*i̯ *u̯*l *r*m *n*h₁ *h₂ *h₃*s*p *b *bʰ*t *d *dʰ*k̑ *g̑ *g̑ʰ*k *g *gʰ*kʷ *gʷ *gʷʰ

All syllabic sounds except the laryngeals had long counterparts, but it is not clear which of these were genuine long vowels (if any), and which were originally short vowels with a following laryngeal, e.g. *ih₁ > . A vowel plus a glide might or might not have been analysed as a diphthong (*e͜u/*eu̯?) by native speakers.

*bʰ *dʰ etc. were breathy voiced plosives as in Sanskrit ‘Buddha’. The laryngeals *h₁ *h₂ *h₃ were likely pronounced [h] [χ/ħ] [ɣʷ], respectively, in their nonsyllabic forms, and [ə] [ɐ] [ɵ] when syllabic. The second laryngeal caused adjacent *e to be coloured to *a, while the third laryngeal caused o-colouring of this vowel.

PIE had a pitch-accent system, with the prominent syllable having higher pitch than the surrounding ones. There is no way of predicting the position of the accent in a word, so we will mark it whenever possible: *nás‑os ‘nose’. Such a feature is called a mobile accent.

Morphology and syntax

Verbs, nouns and adjectives had a basically tripartite common structure: root, suffix and ending. Each of these parts had a core vowel which was an *e, sometimes an *a, in its basic form or full grade, and could change (ablaut) to *ē/*ā (lengthened grade), *o (o-grade), (lengthened o-grade), or vanish altogether, often with syllabification of a neighbouring consonant (zero grade), depending of the word’s grammatical form: *bʰer‑ • *bʰor‑ • *bʰēr‑ • *bʰōr‑ • *bʰr‑/bʰr̥‑ ‘carry, bear’. The root contained the lexeme. Suffix and ending of verbs encoded grammatical information, while nouns and adjectives often had derivational suffixes. Depending on the last sound of the suffix, we distinguish thematic forms – ending in an ablauting vowel – and athematic ones – ending in a consonant, including a syllabic glide.

Verbs inflected for tense/aspect (present, imperfect, aorist, perfect, and possibly pluperfect), mood (indicative, imperative, subjunctive, optative, and possibly desiderative), voice (active and mediopassive), person and number (singular, dual and plural); nouns and adjectives for case (eight of them) and number, and adjectives additionally for gender and comparison. The language also had pronouns and various types of particles, such as pre- and postpositions, conjunctions, and interjections.

The default word order might have been SOV, but this is disputed.

Further reading

Proto-Lemizh x-lmp

A carved squirrel

Proto-Lemizh is the ancestor of two major present-day languages, Lemizh and Volgan. It is very poorly attested in form of some papyri found near the northwestern shore of the Black Sea, to the north of the Dniester Liman, dated about 2700 BCE (_4C0 in the Lemizh calendar), notably one enigmatic fragment relating to a squirrel. (Tortoises came into play much later.) Most of what we know about the protolanguage, however, is inferred from PIE, Old Lemizh, loanwords in neighbouring languages, and place names.

Proto-Lemizh became a distinct language probably in the early or mid third millennium BCE, well before Proto-Anatolian.


Proto-Lemizh had a standard vowel inventory a e i o u, with long counterparts for all five. The occurrence of diphthongs is disputed: while w is usually reconstructed as [v], it might actually have been [u̯] after vowels as these combinations show vocalic reflexes in Old Lemizh. Regarding the consonants, the language partly kept the PIE phonemic distinction between palatals and velars. It also featured some unusual affricates.

LabialDentalAlveolarPostalveolarPalatalVelarGlottal (pharyngeal?)
Liquidsl, r
Plosivesp • bt • dk • g
Fricativesf • w /v/th /θ/ • dh /ð/s • zsh /ʃ/ • zh /ʒ/ç • j /ʝ/x • gh /ɣ/h • ɦ
Affricatespf • bwts • dzkç • gjkx • ggh

Accent was fixed (predictable): it fell on the penultimate syllable unless there was a long vowel or diphthong in the word, in which case it received the accent. Proto-Lemizh likely had stress accent, meaning increased volume of the prominent syllable.

Diachronically (i.e. regarding its development from PIE), the language is remarkable in having a consonantal triple reflex, i.e. *h₁ > h, *h₂ > x, *h₃ > f, unless preceded by a vowel. The presence of vowel-initial Proto-Lemizh words such as *oçtōw ‘eight’ has given headaches to linguists who object to PIE words beginning in vowels and prefer reconstructing *h₃ek̑tṓu̯ to *ok̑tṓu̯, subsequently trying to explain where the expected PLem *f has gone. We will wriggle out of this difficulty by ignoring the arguments against initial vowels in PIE.

Morphology and syntax

Not much is known about the language’s morphology, except that it was highly inflected like its ancestor. Verb stems generally continued the zero grade of PIE athematic present forms, most commonly root presents (*negʷ‑ ‘grow dark’, zero grade *n̥gʷ‑ > angw‑), nasal-infix presents (*bʰei̯d‑ ‘split’, *bʰi‹n›d‑ > bwind‑), and presents with the ending *‑sk̑‑ (*mer‑ ‘die’, *mr̥‑sk̑‑ > marsk‑), but also some reduplicated presents (*bʰer‑ ‘carry, bear’, *bʰi‑bʰr‑ > bwimbr‑ ‘give birth to’). However, reflexes of Narten presents (full-grade forms) also occurred frequently (*h₁ed‑ ‘eat’ > hedh‑). Resultative verbs sometimes continued PIE perfect forms (*bʰei̯h₂‑ ‘come to fear’, *bʰe‑bʰih₂‑ > bwembī‑ ‘fear’). Nouns and adjectives mostly generalised the strong (nominative) stem. Word endings seem to have undergone far-reaching innovation.

Word order varied, albeit maybe only in poetic usage.

Further reading

Old Lemizh x-lmo

Old Lemizh is a fairly well attested language. Its earliest known documents were probably written around 2100 BCE (_280) along the northern and western shores of the Back Sea. Due to some archaeologists’ sloppy work, dating is uncertain.

The Old Lemizh people were seafarers and loved mathematics and poetry. They were a proud and lofty people, but to no avail.


Old Lemizh added y ([ɯ] or [ə]), ö and ü (both long vowels) to the vowel inventory, and lost the long vowels save ē and ī, as well as the voiced glottal fricative ɦ. n was already pronounced [ŋ] unless followed by t, d, s, or z. Probably at a late stage of Old Lemizh, gh split into the velar [ɣ] and the uvular [ʁ], although a later date for this change cannot be ruled out. Following this, the palatals were merged with the velars.

As in Proto-Lemizh, accent fell on the penultimate, or on a long vowel or diphthong if present. Old Lemizh definitely had stress accent.

Morphology and syntax

The language had highly regular grammatical endings. Verbal endings encoded tense, person, number and voice; participles were inflected for tense, voice and comparison; and nouns, for case and number. Many nouns were derived from verbs by a simple exchange of the ending. Adjectives were lost early on, being replaced with participles (‘white’ > ‘being white’). Pronouns and a variety of particles completed the word inventory.

There were seven tenses: pre-past, past, post-past, present, pre-future, future, and post-future; four of which mainly expressed dependent clauses’ relationships to their main clause. In addition to the three Indo-European persons, the language innovated a ‘fourth person’, actually an impersonal form. The PIE dual was lost, so we are left with singular and plural. The five voices were really combinations of voice and aspect: active, direct passive (turning the accusative object into the subject), indirect passive (turning the dative object into the subject), and direct and indirect passive perfect. Old Lemizh featured the three familiar degrees of comparison, positive, comparative and superlative; but infinitives were formally participles with a special fourth comparison ending. Finally, the language knew eight cases, only partly corresponding to the present plot cases, and supplemented with a number of prepositions.

Despite its case endings, everyday language had a rigid SVO word order; and modifiers (participles and genitive attributes) followed the head. Poetry had a much freer word order. Finite subordinate clauses interestingly had their subject in the case of the clause: the subject of a local clause was in the locative case without having a local meaning in itself. This is thought to be a generalisation of the old accusative and infinitive (Lat accusativus cum infinitivo) construction.

Further reading

Ghean x-gh

Ghean [ˈɣɛən], often erroneously pronounced [ˈɡiːən] or [ˈɡe͜ɪən], is a language with no known genetic relationships. It was spoken by a people of unknown origin and doubtful morals, who subdued the Lemizh tribes in around 1000 BCE (1C0) and ruled for infamous three generations.

The term ‘Ghean’ was coined, quite pompously, in Early New Lemizh times: je (then pronounced [ʝε]), inner nominative of the temporal verb ja ‘having done’, i.e. ‘those who have done’. The language’s endonym (what the Gheans called it themselves) is unknown.


Ghean sample sentence: oətTⁿö̂i pə̄a aəxshshˡāo ‘All hail the king!’

The vowels were a, o, ə, e, ö, i; each of them with a corresponding long vowel and a diphthong (aə, oə, əə, ei, öi, ii). The ‘diphthongs’ əə and ii, as well as sequences of two identical short vowels such as aa, were likely pronounced like the ‘true’ long vowels but more open. Each vowel or diphthong, as well as the unpronounced zero vowel that occured word-initially and in the second component of some compounds, could be followed by a consonant or consonant cluster of up to three sounds. There were seven pronunciation types of such clusters, each of which only contained consonants from a limited set of eight. Ghean orthography reflected this by having only eight consonant letters. A consonant cluster could be marked with one of six modifiers which determined the pronunciation type. The modifier’s names do not always reflect the actual pronunciations associated with them.

This is the transliteration and pronunciation scheme:

ʳ (trilled)ʙʙ̥rr̠̥ʀʀ̥
ˇ (voiced)bβdʒɢʁ
ˡ (lateral)βd̠ˡʒɢˠʁ
ʱ (breathy voiced)mnd̠ʱɢʱɴ
 ̑ (implosive)ɓɗɗ̠ʛ

Here are some illustrations of Ghean moræ, i.e. vowel (including the zero vowel) plus optional consonant or consonant cluster:

Ghean was a register tonal language; the individual moræ were marked by their relative pitch level for grammatical purposes:

Vowel typelow toneneutral tonemid-high tonehigh tone

A short overview over the Ghean script is available as a PDF here.

Morphology and syntax

Verbs and nominals (a combined noun/adjective/participle part of speech) consisted of – at least – three moræ (or two when used as main predicates). Pronouns had one mora less, and particles had two less (meaning there were particles that consisted solely of consonants). The penultimate mora of a verb, nominal or pronoun carried grammatical information:

All three were also inflected for outer case, which was encoded in the ultimate mora; the main predicate lacked this part. Tone expressed grammatical level (as well as the imperative and vocative); rising by more than one level additionally employed lengthening of the outer case vowel. Ghean comparison handling is not well understood.

The word order was VSO, but also VOS, and head before modifier, as we would expect given the existence of level. The language featured brackets in the sense of Modern Lemizh grammar, but no coordinations.

Further reading

(Standard) Middle Lemizh x-lmm

The Gheans discouraged the use of the natives’ language, but obviously tolerated Lemizh words (or rather word stems) to stand in for unfamiliar Ghean ones. The grammar of simple sentences was easy enough to learn for the Lemizh, as they were used to case endings and head-first phrases, and likely still knew VSO sentences from poetry. After two or three generations, the natives must have spoken a mixed language or creole with a more or less Ghean grammar but an abundance of Lemizh words, especially outside the core vocabulary. This is a quite unusual development as most creoles draw their lexicon mainly from the dominant group, and tend to be grammatically more innovative. (The Tanzanian language Mbugu might have had a somewhat similar development with more or less analogous outcomes.)

After the mysterious disappearance of the Gheans, Lemizh patriots tried to revive their old language, which failed spectacularly for the grammar but reintroduced many Lemizh words of the core vocabulary. The Ghean hexadecimal counting system stuck. The modern Lemizh alphabet is an invention of this period.


Under the influence of Ghean, Middle Lemizh (re)introduced a number of diphthongs and long vowels, but shortened U. Unless the uvular fricative [ʁ] dates from before the Ghean conquest, it is also a consequence of the language contact. The labiodental frivatives became bilabial, and h was lost or merged into x.

Middle Lemizh continued the Ghean tonal system mainly in word endings, while turning stem syllables that were accented in Old Lemizh to ones with a low tone.

Morphology and syntax

Middle Lemizh had verbs, nominals, pronouns and particles, following the Ghean model. The penultimate mora carried the same grammatical information as in Ghean, and, as we would expect, outer case was expressed by the ultimate mora.

Syntax continued to be level-based, a feature that of course lasts until today. As in Ghean, brackets expressed adjective and participle attributes as well as relative clauses. Coordinations were implemented subsequently.

Further reading

Late Middle Lemizh x-lml

During the seventh to third centuries BCE, in a period still classified as pre-Late Middle Lemizh, diphthongs were simplified and long vowels shortened while retaining their low tone, now marked orthographically with a ‘`’: leemin‑ > lèmin‑ ‘make Lemizh’. R [ʁ] attained its present pronunciation of [ɹ].

By definition, we speak of Late Middle Lemizh from the time pronouns and tenses started to be used relatively to the respective predicate rather than the main predicate, the principle of relativity. This development dates from about 200 BCE (500).

Early New Lemizh x-lmn

The cover of the Tlöngö̀l in Penguin Classics

For a millennium and a half, Lemizh sources remained almost silent, and the few texts are orthographically and grammatically inconsistent. These are the Lost Years, from which the language re-emerged in a phonologically considerably altered shape. The major grammatical changes were still to come.

The Tlöngö̀l (lit. ‘the reason for enduring / for plucking up courage’, NLem tlOnaKoi τλῆναιPIE *telh₂‑ ‘lift up, take upon oneself’), a pathetic and literary utterly irrelevant epic novel (but Jorge Louis Borges, in one of his lesser known essays, defends it), has nevertheless triggered a new literary high and defined a language standard that is still palpable in Modern Lemizh. It was published in 1351 CE (AFE), which is the formal birth date of New Lemizh.


During the Lost Years, a number of phonological changes had taken place. These included (very roughly in the following order)

  1. syncope (elimination) of certain unstressed vowels (one rule being that vowels with Middle Lemizh low tone never syncopated),
  2. contraction of nasal + plosive to a nasal (e.g. mg > n),
  3. methatesis of nasal + fricative under certain circumstances, sometimes leaving the place of articulation behind (mj > wn),
  4. various changes involving liquids, including metathesis, but also contraction (Ld, Lt > L, Lz > R, Lc > r),
  5. two consecutive different plosives became plosive + fricative or fricative + plosive unless there was an adjacent fricative,
  6. two consecutive different fricatives contracted when part of a larger consonant cluster,
  7. of two consecutive plosives or nasals, the second was eliminated (lèmin‑ > *lemn > lem),
  8. double fricatives and liquids (ff cc RR etc.) were simplified,
  9. words starting in a fricative or plosive + liquid + nasal lost the liquid,
  10. remaining clusters of four consonants, and those of the type plosive + fricative + plosive, were broken up by an epenthetic vowel,
  11. and fricative + plosive and plosive + fricative clusters underwent anticipatory voicing assimilation.

The exact rules have been studied extensively; but detailling them would go beyond the scope of this overview.

At this stage, Lemizh had simplified the earlier tonal system, resulting in the modern two-way pitch-accent system for expressing level.


The inner factive case had already been in existence for some time to form verbal, or gerund-like, nouns. The Tlöngö̀l popularised dependent clauses headed by such nouns, replacing finite clauses, and also introduced verbal nouns as main predicates with adverbial clauses. This development led to the eventual extinction of the verb. Pronouns lost their status as a separate part of speech, leaving us with nouns (which today are known as verbs because of their verbal stems) and a small number of particles, principally ‘yes, no(t), and, inclusive or, exclusive or’.

Further reading

Modern Lemizh x-lm

There is no exact definition for the beginning of the Modern Lemizh period. The present language differs significantly from the one we know from the Tlöngö̀l, but the main changes are spread out over several centuries.


The most obvious development after the Tlöngö̀l is the forming of poststems, starting in the mid-18th century (C00). They have three major sources: at first, the inner case vowel replaced the last vowel of the stem, leaving the following consonants (if any) to form the poststem. Liquids and nasals in this position were either eliminated (elision), switched places with the inner case vowel (metathesis), or became fricatives (fortition), following a number of rather complicated rules. In some stative verbs, the perfect ending -s turned into a poststem; and in certain nominal verbs the singular and plural endings -r -l,* going back to Ghe ᴛʳə /r̠ə/ ‘one’ and ᴛˡi /d̠ˡɪ/ ‘several’, became the poststems -c -j, respectively (under the fortition rules: whence lemÌc., from the collective singular). As words with zero poststem came to be viewed as female, and those with non-zero poststem as male, ‘gender change’ can also occur by elimination or addition of poststems: this is the source of the final consonant of sxnèz. ‘Sun’, which is male in Indo-European mythology. Addition of a poststem also has the function of distinguishing words that would otherwise be homophones: for example j‑a > x. ‘move’ vs. the relative pronoun jà.. Sometimes an Early New Lemizh word has two modern descendants with different poststem formations. Often, one of them is an everyday word, while the other is a technical term, e.g. rOsy ‘frost’ > rÌs. ‘frost’ (regular poststem) / rOsÌc. ‘crystal’ (poststem from singular) or canxwy ‘dust’ > cnÌxw. ‘dust’ (regular poststem) / canxwÌ. ‘dark matter’ (no poststem). These mechanisms are still productive, at least for assimilating loanwords.

The only other notable sound shift was the dissimilation of consonant clusters beginning with a plosive. This shift is the reason why Modern Lemizh is entirely without affricates.

* mnemonic: ‘singular’ and ‘plural

Lexicon and morphology

What with the Middle Lemizh Renaissance, and the influx of predominantly Indo-Euopean loanwords over the course of three millennia (including those introduced or popularised by the Tlöngö̀l), we now have a language with a largely Indo-Euopean lexicon.

Conversely, starting with the Ghean occupation, augmented by subsequent grammatical simplifications, and maybe completed by the extinction of particles, the Lemizh language finally arrived at a thoroughly un-IE and highly unlikely regular grammar. The chances for this to have happened are two to the power of two hundred and seventy-six thousand seven hundred and nine to one against. The future, however, will doubtlessly introduce new irregularities.

Further reading