Arabic Handwriting Text Offline Recognition Using the HMM Toolkit (HTK)

Hicham El Moubtahij(1*), A Halli(2), Khalid Satori(3)

(1) USMBA-Fez University, Morocco
(2) USMBA-Fez University, Morocco
(3) USMBA-Fez University, Morocco
(*) Corresponding author



This Recognition of Arabic text handwritten awaits precise recognition solutions. There are a lot of difficulties facing a good handwritten Arabic recognition system such as unlimited variant in human handwriting, similarities of different character shapes, and their location in the word. This paper presents a handwriting Arabic text recognition system. It decomposes the text image into text line images and extracts a set of simple statistical features from a narrow window which is sliding a long that text line, then it injects the resulting feature vectors to the Hidden Markov Model Toolkit (HTK). HTK is a portable toolkit for speech recognition system. In recognized state, the concatenation of characters to form words is modelled by simple lexical models, each word is modelled by a stochastic finite-state automaton (SFSA), and the concatenation of words into sentences is modelled by an n-gram language model. The proposed system is applied to a data corpus constructed by Text lines examples from the “Arabic-Numbers”, which contains 1905 sentences and 47 words. This phrase is written by 5 different peoples.
Arabic Text Handwritten; Hidden Markov Model Toolkit (HTK); Stochastic Finite-State Automaton

