Share Email Print

Proceedings Paper

On-line handwritten text categorization
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these kinds of documents such as topic spotting are required. This means that we should be able to perform text categorization of on-line documents. The textual data available in on-line documents can be extracted through online recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the word recognition rate on the categorization performances, by comparing the performances of a categorization system over the texts obtained through on-line handwriting recognition and the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on the noise levels.

Paper Details

Date Published: 19 January 2009
PDF: 11 pages
Proc. SPIE 7247, Document Recognition and Retrieval XVI, 724709 (19 January 2009); doi: 10.1117/12.804355
Show Author Affiliations
Sebastián Peña Saldarriaga, LINA, CNRS, Univ. de Nantes (France)
Christian Viard-Gaudin, IRCCyn, CNRS, Univ. de Nantes (France)
Emmanuel Morin, LINA, CNRS, Univ. de Nantes (France)

Published in SPIE Proceedings Vol. 7247:
Document Recognition and Retrieval XVI
Kathrin Berkner; Laurence Likforman-Sulem, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?