Mission & Goals

Why AvarLab exists, who it serves, and what we are building.

Mission

AvarLab's mission is to build open, sustainable digital infrastructure for Avar — connecting dictionary, corpus, morphological tools, and community contribution in a single platform. Languages without a digital presence risk marginalisation among younger generations. We believe computational tools are instruments of cultural survival, and that linguistic resources should be built with and for speaker communities, not just about them.

Who we build for

  • Native speakers and heritage communities — a modern dictionary that understands all forms of a word, not just the base form.
  • Language learners and teachers — grammar tables, real-text examples, audio, and idioms in one place.
  • Linguists and fieldworkers — a searchable morphological database and annotated corpus covering all parts of speech.
  • NLP researchers — structured, exportable datasets for training computational models.

What we are building

  • A living dictionary — covering not just lemmas but all inflected forms, with real corpus examples and audio.
  • An annotated corpus — growing collection of Avar texts with automatic grammatical annotation, open for research use.
  • Community tools — ways for native speakers to contribute words, validate entries, and upload pronunciations.
  • Historical depth — bringing older Avar texts written in Arabic and Latin scripts into the digital record.
  • Speech and accessibility tools — foundational work toward pronunciation resources, a custom Avar keyboard, and speech technologies.

Roadmap

AvarLab is developed as part of a doctoral research programme at Universitat Pompeu Fabra (2025–2028). Phases are indicative.

  • 2025–2026

    Complete the morphological generator, launch the platform publicly, release a custom Avar keyboard, and publish the first annotated corpus.

  • 2026–2027

    Expand corpus coverage with historical texts, open community validation, and begin collecting audio recordings from native speakers.

  • 2027–2028

    Train language models on AvarLab-generated data, prototype speech recognition, and release all datasets openly.

Out of scope

  • Prescribing "correct" spelling — we document attested usage, not enforce norms.
  • A commercial product — AvarLab is and will remain a research and community platform.

Team & publications · How it works · Data & research access