Orthography & alphabet

Orthography

Version: 1.0
Last Updated: May 26, 2026

This section is part of our living public grammar of Avar and is regularly updated to reflect the latest linguistic research, database enhancements, and orthographic analyses.

The writing system of modern Avar is Cyrillic-based and is highly phonetic. However, because Avar possesses 50 distinct phonemes but must represent them using standard Cyrillic characters, the orthography relies heavily on multi-letter combinations (digraphs and trigraphs) and the specialized auxiliary character known as the Palochka (Ӏ).


Historical Development of the Avar Script

Throughout its history, the Avar language has been written in three main scripts, reflecting the broader socio-political and cultural shifts in the Caucasus:

1. The Arabic-Based Script (Ajam)

With the introduction and consolidation of Islam in Dagestan from the 15th century onward, Avar began to be written in a modified Arabic script called Ajam (гӀажам). Over the centuries, local Dagestanian scholars adapted the script to represent the unique Caucasian consonants (such as ejectives and laterals) by adding custom diacritic dots and lines to Arabic letters. Ajam was widely used for religious texts, correspondence, historical chronicles, and poetry until the early Soviet period.

2. The Latin Alphabet (1928–1938)

In 1928, as part of the Soviet union-wide "Latinization" (Janalif) campaign for minority languages, a new Latin-based alphabet was engineered for Avar. The Latin script introduced specialized characters (such as z, ç, q, , ƣ) and diacritics to represent the massive Caucasian consonant inventory. While highly accurate phonetically, this script was used for only a decade.

3. The Cyrillic Alphabet (1938–Present)

In 1938, Soviet policy shifted toward "Cyrillization," and a new Cyrillic alphabet was designed for Avar. Rather than creating new, non-standard letters, the designers used standard Russian Cyrillic characters combined in pairs (digraphs) or modified by a vertical stroke called the Palochka (Ӏ). This remains the official orthography of the Avar literary language today.


The Modern Alphabet

The modern Avar alphabet is conventionally said to consist of 46 distinct graphemic units (33 Russian letters + 13 native digraphs). However, some internal project documentation and linguistic sources cite 45 letters, which typically occurs when the Palochka (Ӏ) is treated purely as a diacritic modifier rather than an independent letter. Additionally, the critical lateral affricate phoneme [лӀ] is entirely absent from the official alphabet, forcing writers to use workarounds.

Letter IPA Letter IPA Letter IPA Letter IPA
А а [a] И и [i] П п [p] Х х [χ]
Б б [b] Й й [j] Р р [r] ХӀ хӀ [ħ]
В в [w] / [v] К к [k] С с [s] Ц ц [t͡s]
Г г [g] КӀ кӀ [kʼ] Т т [t] ЦӀ цӀ [t͡sʼ]
Гъ гъ [ʁ] Къ къ [q͡χʼː] ТӀ тӀ [tʼ] Ч ч [t͡ʃ]
Гь гь [h] Кь кь [t͡ɬʼː] У у [u] ЧӀ чӀ [t͡ʃʼ]
ГӀ гӀ [ʕ] Л л [l] Ф ф [f] Ш ш [ʃ]
Д д [d] Лъ лъ [ɬ] / [ɬʼ] Хъ хъ [q͡χː] Щ щ [ʃː]
Е е [e] / [je] М м [m] Хь хь [x] Ъ ъ [ʔ]
Ё ё [jo] Н н [n] Э э [e] Ы ы [ɨ]
Ж ж [ʒ] О о [o] Ю ю [ju] Я я [ja]
З з [z] Ӏ (Palochka)

Note: The letters Ф ф, Э э, and Ы ы appear almost exclusively in Russian loanwords. The letter combination пӀ is not officially in the alphabet, but is used in literary text for the ejective bilabial stop in the exclamation гьопӀа [hopʼa].


Digraphs, Trigraphs, and the Palochka (Ӏ)

Because the basic Cyrillic alphabet does not have enough characters to cover the 50 phonemes of Avar, the orthography employs digraphs (two-letter combinations) and trigraphs (three-letter combinations).

Digraphs

Digraphs represent single phonemes, not consonant clusters. They are formed by combining a basic Cyrillic consonant with either a modifier letter (such as ь or ъ) or the Palochka (Ӏ).

  • Pharyngeals and Uvulars (гъ, гӀ, хъ, хӀ): Represent pharyngealized or back-of-the-mouth sounds.
  • Velars and Laryngeals (гь, хь): Represent glottal or velar fricatives.
  • Ejectives (кӀ, тӀ, цӀ, чӀ, пӀ): The Palochka (Ӏ) acts as an ejective marker, modifying a voiceless stop or affricate into its glottalized counterpart.
  • Lateral fricative (лъ): Represents the voiceless lateral sound [ɬ].
Trigraphs

Trigraphs are used to represent the tense (strong) ejective affricates and lateral sounds:

  • кӀкӀ [kʼː] — Tense ejective stop: кӀкӀал [kʼːal] "gorge"
  • цӀцӀ [t͡sʼː] — Tense ejective dental affricate: цӀцӀе [t͡sʼːe] "goat"
  • чӀчӀ [t͡ʃʼː] — Tense ejective postalveolar affricate: чӀчӀва [t͡ʃʼːwa] "string"
  • лълъ [ɬː] — Tense voiceless lateral fricative: лълъел [ɬːel] "water (gen.)"
The "Palochka" (Ӏ) and Normalization

The Cyrillic Letter Palochka (Ӏ, U+04CF) is a crucial character in Avar. Because it is absent from standard Russian keyboards, writers frequently substitute it with lookalike characters such as:

  • Latin capital I (U+0049) or lowercase l (U+006C)
  • Numeral 1
  • Cyrillic decimal І (U+0406)

These substitutions cause major search and processing failures in digital lexicography. Computational processing must normalize all variants to the canonical Cyrillic Palochka (Ӏ) before tokenization or transcription can take place.


Known Coverage Gaps (Missing Graphemes)

Avar orthography exhibits several systemic "gaps" where the official alphabet fails to provide dedicated letters for distinct phonemes, resulting in graphemic ambiguity:

1. Unwritten Geminates (Tense Consonants)

Out of the 10 tense (strong) consonant phonemes in Avar, only three are consistently written in standard spelling (чч, кк, кӀкӀ). The remaining six tense consonants ([цц, цӀцӀ, сс, хх, чӀчӀ, лълъ]) are written with double letters only under specific conditions:

  • To resolve ambiguity in minimal pairs:
    • иц [it͡s] "moth" vs. ицц [it͡sː] "spring"
    • мах [maχ] "birch" vs. махх [maχː] "iron"
    • си [si] "tower" vs. сси [sːi] "dignity"
  • When different word forms coincide.
  • Otherwise, they are written with single letters (e.g. writing цӀ for both weak [t͡sʼ] and strong [t͡sʼː]), leaving pronunciation to be derived from context.
2. The Lateral Affricate Problem ([лӀ])

Avar has a distinct voiceless lateral affricate phoneme [лӀ], which is completely absent from the official alphabet (the combination лӀ is not written). Instead, writers are forced to use the lateral fricative symbols лъ or лълъ to represent it. Consequently, the combination ль has a triple functional load (representing weak lateral fricative [лъ], lateral affricate [лӀ], and palatalized [lʲ] in Russian loanwords):

  • лъар [ɬar] "river" (pronounced as lateral fricative [ɬ])
  • лъутизе [ɬʼu.ti.ze] "to run away" (pronounced as lateral affricate [лӀ])

Orthographic Principles

Avar spelling is governed by three primary linguistic principles:

1. The Phonetic Principle (Primary)

The core of Avar spelling is phonetic: words are written exactly as they are pronounced. This is highly effective in Avar because:

  • Vowels do not undergo qualitative reduction (unstressed а remains pronounced as [a]).
  • Consonants undergo very limited devoicing or voicing at internal boundaries.
  • The overwhelming majority of letters represent a single, stable phonetic value.
2. The Morphological Principle (Limited)

The morphological principle (where spelling remains constant despite pronunciation changes) applies in a small number of inflected words. For example:

  • хъабарча "sheepskin coat" has a voiced [b], but in the genitive хъабчил [q͡χːap.t͡ʃil] "fur coat", the consonant devoices to [p] before voiceless ч, yet the spelling retains б.
  • гьеж "arm" has a voiced [ʒ], but in the locative гъежда [ʁeʃ.da] "in hand", it is pronounced as voiceless [ʃ] but written with ж.
3. The Conventional / Arbitrary Principle

Certain orthographic rules are established purely by convention to resolve dialectal variation or simplify writing:

The о/у Constraint in Disyllabic Words

In disyllabic words with stress on the first syllable, the final unstressed vowel is often highly ambiguous between [о] and [у] depending on the dialect. The literary norm establishes a strict rule: always write -у in two-syllable words, and only in words of three or more syllables.

  • наку [ˈna.ku] "knee"
  • макьу [ˈma.t͡ɬʼːu] "sleep, dream"
  • гъеду [ˈʁe.du] "crow"
  • могоро [mo.ˈgo.ro] "club, cudgel" (three syllables, write )
Single Consonants in Ergative Suffixes

By convention, ergative case suffixes are written with single consonants even though they are phonetically pronounced as tense (strong) consonants:

  • васас "boy (ergative)" — written with single , pronounced with tense [sː] as [wa.saˈsː].
  • ясалъ "girl (ergative)" — written with single -лъ, pronounced with tense [ɬː] as [ja.saˈɬː].
  • дица "I (ergative)" — written with single -ц-, pronounced with tense [t͡sː] as [di.ˈt͡sːa].

Orthography of Russian Loanwords

The spelling of Russian loanwords in Avar is characterized by a stark distinction between two historical layers, reflecting a shift from oral-based linguistic integration to formal, written-based schooling and administrative standardization.

1. The Older Layer: Oral and Phonetic Adaptation

Borrowings that entered Avar prior to the mid-20th century (specifically before the educational and orthographic standardizations of 1938) did so primarily through oral contact. Because these words were integrated by ear, they were fully adapted to Avar phonology, and this adaptation was reflected directly in their spelling:

  • панар [panar] — from Russian фонарь (lantern/flashlight). Avar lacked the voiceless labiodental fricative [f] and the soft sign, so the word was adapted using Avar native stops and spelling.
  • чамадан [t͡ʃa.ma.dan] — from Russian чемодан (suitcase).
  • карзинка [kar.zin.ka] — from Russian корзина (basket).

In this older layer, the spelling matches the actual, adapted Avar pronunciation, in accordance with the phonetic principle of Avar orthography.

2. The Newer Layer: Post-1938 Orthographic Preservation

From 1938 onward, with the introduction of standard Cyrillic and the rise of universal bilingual education, new Russian loanwords entered Avar primarily through the written channel (books, newspapers, administration). The official orthographic codification established a strict rule: all newer Russian borrowings must preserve their original Russian spelling exactly, using Russian letters and combinations that are not native to Avar:

  • Russian orthographic preservation: Words are written with soft signs (ь), silent characters, Russian vowel endings, and non-native letters (e.g. председатель, телефон, объект, календарь).
  • Avar phonetic adaptation in speech: Despite preserving Russian spelling in writing, speakers fully adapt these words to Avar phonological rules in speech. Non-native vowels are substituted, soft signs are ignored, consonants remain hard, and the stress is strictly shifted to the second syllable:
    • телефон is written exactly as in Russian, but pronounced [ti.ˈli.pun].
    • председатель is written exactly as in Russian, but pronounced [pir.ˈsi.da.tel].
    • календарь is written exactly as in Russian, but pronounced [ka.lin.ˈdar].

This creates a systemic orthographic conflict unique to the modern literary language: the written form is copied from Russian to maintain orthographic prestige, while the spoken form remains adapted to Avar phonology.