Handwritten Word Searching by Means of Speech Commands  Using Deep Learning Techniques

Javier Orlando Pinzón-Arenas; Robinson Jiménez-Moreno; César Giovany Pachón-Suescún

doi:10.15866/iremos.v12i4.17166

Handwritten Word Searching by Means of Speech Commands Using Deep Learning Techniques

Javier Orlando Pinzón-Arenas⁽¹⁾, Robinson Jiménez-Moreno^(2*), César Giovany Pachón-Suescún⁽³⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/iremos.v12i4.17166

Abstract

Word searching is a topic that has gained great interest due to the development of numerous techniques focused on this task. However, most of those implementations require a prior digitalization or preprocessing of the target text so, it cannot be done in real-time, or do not locate the word requested within the text or image, making the user search the word by their own. For that reason, in this paper, it has been proposed to develop an application focused on locating handwritten words in real-time, i.e. a webcam acquires the image continuously and the application searches for the word required by the user, indicating how many times the word is in the image and their respective location. The work has focused on detecting 10 words in the Spanish language. For this development, two recognition systems have been implemented. The first one is for speech recognition, in such a way that it is not required to enter the search by means of a keyboard, but that the selection is made using an audio input in charge of recognizing what word the user says; this is done by means of a convolutional neuronal network, whose accuracy has been 90% and is responsible for telling the application what the said word by the user. The second system is to detect and locate the words in the image acquired by the webcam, where a Faster R-CNN is used, validated with 98.9% accuracy in the words found. In order to verify the performance of the application, tests have been performed in real-time, showing the capacity it has, correctly identifying the word spoken by the user and locating with great precision each word found in the captured image.
Copyright © 2019 Praise Worthy Prize - All rights reserved.

Keywords

CNN; Faster R-CNN; Handwriting Recognition; Speech Recognition; Word Searching

Full Text:

PDF

References

A. Giotis, G. Sfikas, B. Gatos, and C. Nikou, A survey of document image word spotting techniques, Pattern Recognition, Vol 68: 310-332, 2017.
https://doi.org/10.1016/j.patcog.2017.02.023

J. Chandarana, and M. Kapadia, Optical character recognition, International Journal of Emerging Technology and Advanced Engineering, Vol 4(Issue 5): 219-223, 2014.

S. Sudholt, and G. Fink, Attribute CNNs for word spotting in handwritten documents, International Journal on Document Analysis and Recognition (IJDAR), Vol. 21(Issue 3): 199-218, 2018.
https://doi.org/10.1007/s10032-018-0295-0

P. Riba, J. Lladãs, and A. Fornés, Handwritten word spotting by inexact matching of grapheme graphs, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 781-785, Tunis, Tunisia, August 2015.
https://doi.org/10.1109/icdar.2015.7333868

J. Almazán, A. Gordo, A. Fornés, and E. Valveny, Segmentation-free word spotting with exemplar SVMs, Pattern Recognition, Vol. 47(Issue 12): 3967-3978, 2014.
https://doi.org/10.1016/j.patcog.2014.06.005

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, Vol. 521(Issue 7553): 436-444, 2015.
https://doi.org/10.1038/nature14539

S. Sudholt, and G. Fink, PHOCNet: A deep convolutional neural network for word spotting in handwritten documents, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp. 277-282, Shenzhen, China, October 2016.
https://doi.org/10.1109/icfhr.2016.0060

F. Wolf, Densely Connected Convolutional Networks for Word Spotting in Handwritten Documents, Master Thesis, Dept. Comp. Sci., Technical University of Dortmund, Dortmund, Germany, 2018.

P. Krishnan, K. Dutta, and C. V. Jawahar, Word spotting and recognition using deep embedding, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 1-6, Vienna, Austria, April 2018.
https://doi.org/10.1109/das.2018.70

K. Zagoris, I, Pratikakis, and B. Gatos, Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features, IEEE Transactions on Image Processing, Vol. 26(Issue 8): 4032-4041, 2017.
https://doi.org/10.1109/tip.2017.2700721

P. Roy, P. Bhunia, A. Bhattacharyya, and U. Pal, Word searching in scene image and video frame in multi-script scenario using dynamic shape coding, Multimedia Tools and Applications, Vol. 78(Issue 6): 7767-7801, 2019.
https://doi.org/10.1007/s11042-018-6484-5

D. Bazazian, D. Karatzas, and A. D. Bagdanov, Word Spotting in Scene Images based on Character Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1872-1874, Salt Lake City, UT, June 2018.
https://doi.org/10.1109/cvprw.2018.00244

S. Ren, K. He, R., Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, Vol. 28, pp. 91-99, Montreal, Canada, December 2015.

M. Zeiler, and R. Fergus, Visualizing and understanding convolutional networks, European conference on computer vision, Springer, Cham, pp. 818-833, September 2014.
https://doi.org/10.1007/978-3-319-10590-1_53

doi: https://doi.org/10.1007/978-3-319-10590-1_53

M. Khayyat, and N. Nobile, Handwriting Recognition Systems and Applications (World Scientific, 2019, pp. 57-82).

doi: https://doi.org/10.1142/9789811203527_0004

C. Szegedy, S. Ioffe, V. Vanhoucke, and A.A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI Press, pp. 4278-4284, San Francisco, CA, February 2017.

Pinzon Arenas, J., Jimenez Moreno, R., Hernandez Beleño, R., EMG Signal Acquisition and Processing Application with CNN Testing for MATLAB, (2018) International Review of Automatic Control (IREACO), 11 (1), pp. 44-51.
https://doi.org/10.15866/ireaco.v11i1.13379

Z. Li, N. Teng, M. Jin, and H. Lu, Building efficient CNN architecture for offline handwritten Chinese character recognition, International Journal on Document Analysis and Recognition (IJDAR), Vol. 21(Issue 4): 233-240, 2018.
https://doi.org/10.1007/s10032-018-0311-4

L. D’souza and M. Mascarenhas, Offline Handwritten Mathematical Expression Recognition using Convolutional Neural Network, 2018 International Conference on Information, Communication, Engineering and Technology (ICICET), IEEE, pp. 1-3, Pune, India, August 2018.
https://doi.org/10.1109/icicet.2018.8533789

M. Tanaka, R. Kamio, and M. Okutomi, Seamless image cloning by a closed form solution of a modified poisson problem, SIGGRAPH Asia 2012 Posters, AMC, pp. 15, Singapore, Singapore, November 2012.
https://doi.org/10.1145/2407156.2407173

Refbacks

There are currently no refbacks.

Username
Password
Remember me