Computational Approaches to Arabic Script-based Languages

[    Home |   Contact   ]       



   Arabic Script Languages

   Workshop Proceedings

   Organizing Committee

   Tools

   Mailing List



The Arabic script is one of the most widespread writing systems in the world. It is used to write Arabic, extending from North Africa, the Near East through the Middle East, but it has also been used by other languages such as Persian, Pashto, or Urdu. These other groups have extended the Arabic script in order to represent sounds of their language which didn't exist in Arabic.

Introduction to CAASL2 workshop [Slides]
includes an introduction to Arabic script-based languages and discusses the importance of the writing system by looking at differences in phrasal boundary recognition in Tajiki Persian (extended Cyrillic script) vs. Farsi/Dari Persian (extended Arabic script). The slides also give some stats on the two CAASL workshop submissions.

The Languages using the Arabic script are:
  • Arabic
  • Azerbaijani
  • Baluchi
  • Eastern Cham
  • Comorian
  • Dogri
  • Hausa
  • Kashmiri
  • Kurdish
  • Lahnda
  • Pashto
  • Persian (Iranian and Dari)
  • Punjabi
  • Sindhi
  • Uighur
  • Urdu
  • Languages that previously used the Arabic script:
  • Coptic
  • Indonesian
  • Ingush
  • Kirghiz
  • Malay
  • Susu
  • Tajik
  • Turkish
  • Turkmen
  • Uzbek
  • Wolof


  • - Arabic script on Omniglot
    - Arabic alphabet on Wikipedia
    - Arabic Language Script, Indiana University, Near Eastern Languages and Cultures.
    - Unicode Scripts and Languages: contains a list of languages using the Arabic script.

    Please email us if you see an error on this page or if we have failed to include an Arabic script-based language.