Share Email Print

Proceedings Paper

Character recognition in the presence of occluding clutter
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Many documents contain (free-hand) underlining, "COPY" stamps, crossed out text, doodling and other "clutter" that occlude the text. In many cases, it is not possible to separate the text from the clutter. Commercial OCR solutions typically fail for cluttered text. We present a new method for finding the clutter using path analysis of points on the skeleton of the clutter/text connected component. This method can separate the clutter from the text even for fairly complex clutter shapes. Even with good localization of occluding clutter, it is difficult to use feature-based recognition for occluded characters, simply because the clutter affects the features in various ways. We propose a new algorithm that uses adapted templates of the font in the document that can be used for all forms of occlusion of the character. The method finds the simulated localization of the corresponding clutter in the templates and compares the unaffected parts of the templates and the character. The method has proved highly successful even when much of the character is occluded. We present examples of clutter localization and character recognition with occluded characters.

Paper Details

Date Published: 19 January 2009
PDF: 13 pages
Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470I (19 January 2009); doi: 10.1117/12.805855
Show Author Affiliations
Knut T. Fosseide, Lumex AS (Norway)
Lars Aurdal, Lumex AS (Norway)

Published in SPIE Proceedings Vol. 7247:
Document Recognition and Retrieval XVI
Kathrin Berkner; Laurence Likforman-Sulem, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?