Optical character recognition of handwritten Arabic using hidden Markov models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.

Original languageEnglish
Title of host publicationOptical Pattern Recognition XXII
DOIs
StatePublished - 2011
EventOptical Pattern Recognition XXII - Orlando, FL, United States
Duration: Apr 28 2011Apr 29 2011

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume8055
ISSN (Print)0277-786X

Conference

ConferenceOptical Pattern Recognition XXII
Country/TerritoryUnited States
CityOrlando, FL
Period04/28/1104/29/11

Keywords

  • Arabic OCR
  • Character recognition
  • OCR
  • Viterbi algorithm
  • hidden Markov models (HMMs)

Fingerprint

Dive into the research topics of 'Optical character recognition of handwritten Arabic using hidden Markov models'. Together they form a unique fingerprint.

Cite this