Proceedings Volume 7247

Document Recognition and Retrieval XVI

cover
Proceedings Volume 7247

Document Recognition and Retrieval XVI

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 19 January 2009
Contents: 10 Sessions, 35 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2009
Volume Number: 7247

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7247
  • Invited Presentation
  • Segmentation
  • Retrieval and Text Categorization
  • Recognition I
  • Invited Presentation
  • Writer or Script Identification
  • Recognition II
  • Segmentation and Restoration
  • Image Processing
  • Interactive Paper Session
Front Matter: Volume 7247
icon_mobile_dropdown
Front Matter: Volume 7247
This PDF file contains the front matter associated with SPIE-IS&T Proceedings Volume 7247, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Invited Presentation
icon_mobile_dropdown
Pseudo-color enhanced x-ray fluorescence imaging of the Archimedes Palimpsest
Uwe Bergmann, Keith T. Knox
A combination of x-ray fluorescence and image processing has been shown to recover text characters written in iron gall ink on parchment, even when obscured by gold paint. Several leaves of the Archimedes Palimpsest were imaged using rapid-scan, x-ray fluorescence imaging performed at the Stanford Synchrotron Radiation Lightsource of the SLAC National Accelerator Laboratory. A simple linear show-through model is shown to successfully separate different layers of text in the x-ray images, making the text easier to read by the scholars.
Segmentation
icon_mobile_dropdown
Text-image alignment for historical handwritten documents
S. Zinger, J. Nerbonne, L. Schomaker
We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.
Document boundary determination using structural and lexical analysis
Kazem Taghva, Marc-Allen Cartright
The document boundary determination problem is the process of identifying individual documents in a stack of papers. In this paper, we report on a classification system for automation of this process. The system employs features based on document structure and lexical content. We also report on experimental results to support the effectiveness of this system.
Segmentation of continuous document flow by a modified backward-forward algorithm
Th. Meilender, A. Belaïd
This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.
Retrieval and Text Categorization
icon_mobile_dropdown
Retrieval of historical documents by word spotting
Nikoleta Doulgeri, Ergina Kavallieratou
The implementation of word spotting is not an easy procedure and it gets even worse in the case of historical documents since it requires character recognition and indexing of the document images. A general technique for word spotting is presented, independent of OCR, using automatic representation of the text queries of the user by word images and comparing them with the word images extracted from the document images. The proposed system does not require training. The only required preprocessing task is the alphabet determination. Global shape features are used to describe the words. They are very general in order to capture the form of the word and appropriately normalized in order to face the usual problems of variance in resolution, width of words and fonts. A novel technique that makes use of the interpolation method is presented. In our experiments, we analyze the system dependence on its parameters and we prove that its performance is similar to the trainable systems.
Enriching a document collection by integrating information extraction and PDF annotation
Brett Powley, Robert Dale, Ilya Anisimoff
Modern digital libraries offer all the hyperlinking possibilities of the World Wide Web: when a reader finds a citation of interest, in many cases she can now click on a link to be taken to the cited work. This paper presents work aimed at providing the same ease of navigation for legacy PDF document collections that were created before the possibility of integrating hyperlinks into documents was ever considered. To achieve our goal, we need to carry out two tasks: first, we need to identify and link citations and references in the text with high reliability; and second, we need the ability to determine physical PDF page locations for these elements. We demonstrate the use of a high-accuracy citation extraction algorithm which significantly improves on earlier reported techniques, and a technique for integrating PDF processing with a conventional text-stream based information extraction pipeline. We demonstrate these techniques in the context of a particular document collection, this being the ACL Anthology; but the same approach can be applied to other document sets.
Locating and parsing bibliographical references in HTML medical articles
Jie Zou, Daniel Le, George R. Thoma
Bibliographical references that appear in journal articles can provide valuable hints for subsequent information extraction. We describe our statistical machine learning algorithms for locating and parsing such references from HTML medical journal articles. Reference locating identifies the reference sections and then decomposes them into individual references. We formulate reference locating as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles from 100 journals achieves near perfect precision and recall rates for locating references. Reference parsing is to identify components, e.g. author, article title, journal title etc., from each individual reference. We implement and compare two reference parsing algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, and then a search algorithm systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.
On-line handwritten text categorization
As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these kinds of documents such as topic spotting are required. This means that we should be able to perform text categorization of on-line documents. The textual data available in on-line documents can be extracted through online recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the word recognition rate on the categorization performances, by comparing the performances of a categorization system over the texts obtained through on-line handwriting recognition and the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on the noise levels.
Recognition I
icon_mobile_dropdown
Improvement of Arabic handwriting recognition systems; combination and/or reject?
Haikal El Abed, Volker Märgner
In this paper, we present a comparison between two different combination schemes for the improvement of the performance of Arabic handwriting recognition systems. Several recognition systems (here considered as black box systems) are used from the participating systems of the Arabic handwriting recognition competition at ICDAR 2007. The outputs of these systems provide the input of our combination schemes. The first combination schemes are based on fixed fusion using logical rules, while the second one are based on trainable rules. After the normalization step of the recognition confidences and the combination of the outputs, the improvement is evaluated in term of recognition rates of a multi-classifier system with or without reject. The participating systems use the sets a to e of the IfN/ENIT database for training, and we use the set f for tests. Applying the combination rules, the results show a high recognition rate of about 95% without reject, which corresponds to an improvement of recognition rates between 8% and 15% compared to results at the ICDAR 2007 competition.
A robust model for on-line handwritten Japanese text recognition
Bilan Zhu, Xiang-Dong Zhou, Cheng-Lin Liu, et al.
This paper describes a robust model for on-line handwritten Japanese text recognition. The method evaluates the likelihood of candidate segmentation paths by combining scores of character pattern size, inner gap, character recognition, single-character position, pair-character position, likelihood of candidate segmentation point and linguistic context. The path score is insensitive to the number of candidate patterns and the optimal path can be found by the Viterbi search. In experiments of handwritten Japanese sentence recognition, the proposed method yielded superior performance.
Online computation of similarity between handwritten characters
Oleg Golubitsky, Stephen M. Watt
We are interested in the problem of curve identification, motivated by problems in handwriting recognition. Various geometric approaches have been proposed, with one of the most popular being "elastic matching." We examine the problem using distances defined by inner products on functional spaces. In particular we examine the Legendre and Legendre-Sobolev inner products. We show that both of these can be computed in online constant time. We compare both with elastic matching and conclude that the Legendre-Sobolev distance measure provides a competitive alternative to elastic matching, being almost as accurate and much faster.
Invited Presentation
icon_mobile_dropdown
Advanced topics in character recongition and document analysis: research works in intelligent image and document research lab, Tsinghua University
Character Recognition and Document Retrieval still are very interesting research area although great progress in performance has been made over the last decades. Advanced research topics in character recognition and Document analysis are introduced in this paper, which include the further research in Tsinghai University on handwritten Chinese character recognition, multilingual character recognition and writer identification. In handwritten Chinese character recognition a special cascade MQDF classifier is discussed for unconstrained cursive handwritten Chinese Character recognition and an optimum handwritten strip recognition algorithm is introduced. In writer identification content dependent and content independent algorithms are discussed. In multilingual character recognition a THOCR multilingual, including Japanese, Korean, Tibetan, Mongolian, Uyghur, Arabic document recognition system is introduced in this paper.
Writer or Script Identification
icon_mobile_dropdown
Comparison of statistical models for writer verification
A novel statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The goal of this formulation is to learn the specific uniqueness of style in a particular author's writing, given the known document. Since there are often insufficient samples to extrapolate a generalized model of an writer's handwriting based solely on the document, we instead generalize over the differences between the author and a large population of known different writers. This is in contrast to an earlier model proposed whereby probability distributions were a priori without learning. We show the performance of the model along with a comparison in performance to the non-learning, older model, which shows significant improvement.
Online writer identification using alphabetic information clustering
Writer identification is a topic of much renewed interest today because of its importance in applications such as writer adaptation, routing of documents and forensic document analysis. Various algorithms have been proposed to handle such tasks. Of particular interests are the approaches that use allographic features [1-3] to perform a comparison of the documents in question. The allographic features are used to define prototypes that model the unique handwriting styles of the individual writers. This paper investigates a novel perspective that takes alphabetic information into consideration when the allographic features are clustered into prototypes at the character level. We hypothesize that alphabetic information provides additional clues which help in the clustering of allographic prototypes. An alphabet information coefficient (AIC) has been introduced in our study and the effect of this coefficient is presented. Our experiments showed an increase of writer identification accuracy from 66.0% to 87.0% when alphabetic information was used in conjunction with allographic features on a database of 200 reference writers.
Recognition II
icon_mobile_dropdown
Using synthetic data safely in classification
When is it safe to use synthetic training data in supervised classification? Trainable classifier technologies require large representative training sets consisting of samples labeled with their true class. Acquiring such training sets is difficult and costly. One way to alleviate this problem is to enlarge training sets by generating artificial, synthetic samples. Of course this immediately raises many questions, perhaps the first being "Why should we trust artificially generated data to be an accurate representative of the real distributions?" Other questions include "When will training on synthetic data work as well as - or better than training on real data ?". We distinguish between sample space (the set of real samples), parameter space (all samples that can be generated synthetically), and finally, feature space (the set of samples in terms of finite numerical values). In this paper, we discuss a series of experiments, in which we produced synthetic data in parameter space, that is, by convex interpolation among the generating parameters for samples and showed we could amplify real data to produce a classifier that is as accurate as a classifier trained on real data. Specifically, we have explored the feasibility of varying the generating parameters for Knuth's Metafont system to see if previously unseen fonts could also be recognized. We also varied parameters for an image quality model. We have found that training on interpolated data is for the most part safe, that is to say never produced more errors. Furthermore, the classifier trained on interpolated data often improved class accuracy.
Combination of dynamic Bayesian network classifiers for the recognition of degraded characters
We investigate in this paper the combination of DBN (Dynamic Bayesian Network) classifiers, either independent or coupled, for the recognition of degraded characters. The independent classifiers are a vertical HMM and a horizontal HMM whose observable outputs are the image columns and the image rows respectively. The coupled classifiers, presented in a previous study, associate the vertical and horizontal observation streams into single DBNs. The scores of the independent and coupled classifiers are then combined linearly at the decision level. We compare the different classifiers -independent, coupled or linearly combined- on two tasks: the recognition of artificially degraded handwritten digits and the recognition of real degraded old printed characters. Our results show that coupled DBNs perform better on degraded characters than the linear combination of independent HMM scores. Our results also show that the best classifier is obtained by linearly combining the scores of the best coupled DBN and the best independent HMM.
Character recognition in the presence of occluding clutter
Many documents contain (free-hand) underlining, "COPY" stamps, crossed out text, doodling and other "clutter" that occlude the text. In many cases, it is not possible to separate the text from the clutter. Commercial OCR solutions typically fail for cluttered text. We present a new method for finding the clutter using path analysis of points on the skeleton of the clutter/text connected component. This method can separate the clutter from the text even for fairly complex clutter shapes. Even with good localization of occluding clutter, it is difficult to use feature-based recognition for occluded characters, simply because the clutter affects the features in various ways. We propose a new algorithm that uses adapted templates of the font in the document that can be used for all forms of occlusion of the character. The method finds the simulated localization of the corresponding clutter in the templates and compares the unaffected parts of the templates and the character. The method has proved highly successful even when much of the character is occluded. We present examples of clutter localization and character recognition with occluded characters.
Multi-font printed Mongolian document recognition system
Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
Segmentation and Restoration
icon_mobile_dropdown
Resolution independent skew and orientation detection for document images
In large scale scanning applications, orientation detection of the digitized page is necessary for the following procedures to work correctly. Several existing methods for orientation detection use the fact that in Roman script text, ascenders are more likely to occur than descenders. In this paper, we propose a different approach for page orientation detection that uses this information. The main advantage of our method is that it is more accurate than compared widely used methods, while being scan resolution independent. Another interesting aspect of our method is that it can be combined with our previously published method for skew detection to have a single-step skew and orientation estimate of the page image. We demonstrate the effectiveness of our approach on the UW-I dataset and show that our method achieves an accuracy of above 99% on this dataset. We also show that our method is robust to different scanning resolutions and can reliably detect page orientations for documents rendered at 150, 200, 300, and 400 dpi.
Text line extraction in free style document
This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free style document, global property such as character size, line direction can hardly be concluded, which reveals a grave limitation in traditional layout analysis. 'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity function found on gradient information to locate text areas where gradient within a window have large magnitude and various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with cost function based on the probability model.
Simultaneous detection of vertical and horizontal text lines based on perceptual organization
Claudie Faure, Nicole Vincent
A page of a document is a set of small components which are grouped by a human reader into higher level components, such as lines and text blocs. Document image analysis is aimed at detecting these components in document images. We propose the encoding of local information by considering the properties that determine perceptual grouping. Each connected component is labelled according to the location of its nearest neighbour connected component. These labelled components constitute the input of a rule-based incremental process. Vertical and horizontal text lines are detected without prior assumption on their direction. Touching characters belonging to different lines are detected early and discarded from the grouping process to avoid line merging. The tolerance for grouping components increases in the course of the process until the final decision. After each step of the grouping process, conflict resolution rules are activated. This work was motivated by the automatic detection of Figure&Caption pairs in the documents of the historical collection of the BIUM digital library (Bibliotheque InterUniversitaire Medicale). The images that were used in this study belong to this collection.
Efficient shape-LUT classification for document image restoration
In previous work we showed that Look Up Table (LUT) classifiers can be trained to learn patterns of degradation and correction in historical document images. The effectiveness of the classifiers is directly proportional to the size of the pixel neighborhood it considers. However, the computational cost increases almost exponentially with the neighborhood size. In this paper, we propose a novel algorithm that encodes the neighborhood information efficiently using a shape descriptor. Using shape descriptor features, we are able to characterize the pixel neighborhood of document images with much fewer bits and so obtain an efficient system with significantly reduced computational cost. Experimental results demonstrate the effectiveness and efficiency of the proposed approach.
Image Processing
icon_mobile_dropdown
Camera-based document image mosaicing using LLAH
In this paper we propose a mosaicing method of camera-captured document images. Since document images captured using digital cameras suffer from perspective distortion, their alignment is a diffcult task for previous methods. In the proposed method, correspondences of feature points are calculated using an image retrieval method LLAH. Document images are aligned using a perspective transformation parameter estimated from the correspondences. Since LLAH is invariant to perspective distortion, feature points can be matched without compensation of perspective distortion. Experimental results show that document images captured by a digital camera can be stitched using the proposed method.
Mark detection from scanned ballots
Analyzing paper-based election ballots requires finding all marks added to the base ballot. The position, size, shape, rotation and shade of these marks are not known a priori. Scanned ballot images have additional differences from the base ballot due to scanner noise. Different image processing techniques are evaluated to see under what conditions they are able to detect what sorts of marks. Basing mark detection on the difference of raw images was found to be much more sensitive to the mark darkness. Converting the raw images to foreground and background and then removing the form produced better results.
Interactive Paper Session
icon_mobile_dropdown
Improving semi-text-independent method of writer verification using difference vector
The semi-text-independent method of writer verification based on the linear framework is a method that can use all characters of two handwritings to discriminate the writers in the condition of knowing the text contents. The handwritings are allowed to just have small numbers of even totally different characters. This fills the vacancy of the classical text-dependent methods and the text-independent methods of writer verification. Moreover, the information, what every character is, is used for the semi-text-independent method in this paper. Two types of standard templates, generated from many writer-unknown handwritten samples and printed samples of each character, are introduced to represent the content information of each character. The difference vectors of the character samples are gotten by subtracting the standard templates from the original feature vectors and used to replace the original vectors in the process of writer verification. By removing a large amount of content information and remaining the style information, the verification accuracy of the semi-text-independent method is improved. On a handwriting database involving 30 writers, when the query handwriting and the reference handwriting are composed of 30 distinct characters respectively, the average equal error rate (EER) of writer verification reaches 9.96%. And when the handwritings contain 50 characters, the average EER falls to 6.34%, which is 23.9% lower than the EER of not using the difference vectors.
Restoring warped document image through segmentation and full page interpolation
In camera-based optical character recognition (OCR) applications, warping is a primary problem. Warped document images should be restored before they are recognized by traditional OCR algorithm. This paper presents a novel restoration approach, which first makes an estimation of baseline and vertical direction estimation based on rough line and character segmentation, then selects several key points and determines their restoration mapping as a result of the estimation step, at last performs Thin-Plate Splines (TPS) interpolation on full page image using these key points mapping. The restored document image is expected to have straight baselines and erect character direction. This method can restore arbitrary local warping as well as keep the restoration result natural and smooth, consequently improves the performance of the OCR application. Experiments on several camera captured warped document images show effectiveness of this approach.
Identification of forgeries in handwritten petitions for ballot propositions
Sargur Srihari, Veshnu Ramakrishnan, Manavender Malgireddy, et al.
Many governments have some form of "direct democracy" legislation procedure whereby individual citizens can propose various measures creating or altering laws. Generally, such a process is started with the gathering of a large number of signatures. There is interest in whether or not there are fraudulent signatures present in such a petition, and if so what percentage of the signatures are indeed fraudulent. However, due to the large number of signatures (tens of thousands), it is not feasible to have a document examiner verify the signatures directly. Instead, there is interest in creating a subset of signatures where there is a high probability of fraud that can be verified. We present a method by which a pairwise comparison of signatures can be performed and subsequent sorting can generate such subsets.
Simultaneous segmentation and recognition of Arabic printed text using linguistic concepts of vocabulary
Mohamed Ben Halima, Adel M. Alimi
In this paper, we propose a new approach to Arabic printed text analysis and recognition. This approach is based on linguistic concepts of Arabic vocabulary. For the text, we allow to categorize the words in decomposable words (derived from a root) and indecomposable words (not derived from a root) and to put forth morpho-syntactic characterization hypotheses for each word. For the decomposable words, we attempt to recognize word basic morphemes: antefix, prefix, infix, suffix, postfix and root contrary to existing approaches which are usually based on recognition of word entity by holistic approach.
Comparison of Niblack inspired binarization methods for ancient documents
Khurram Khurshid, Imran Siddiqi, Claudie Faure, et al.
In this paper, we present a new sliding window based local thresholding technique 'NICK' and give a detailed comparison of some existing sliding-window based thresholding algorithms with our method. The proposed method aims at achieving better binarization results, specifically, for ancient document images. NICK has been inspired from the Niblack's binarization method and exhibits its robustness and effectiveness when evaluated on low quality ancient document images.
Figure content analysis for improved biomedical article retrieval
Daekeun You, Emilia Apostolova, Sameer Antani, et al.
Biomedical images are invaluable in medical education and establishing clinical diagnosis. Clinical decision support (CDS) can be improved by combining biomedical text with automatically annotated images extracted from relevant biomedical publications. In a previous study we reported 76.6% accuracy using supervised machine learning on the feasibility of automatically classifying images by combining figure captions and image content for usefulness in finding clinical evidence. Image content extraction is traditionally applied on entire images or on pre-determined image regions. Figure images articles vary greatly limiting benefit of whole image extraction beyond gross categorization for CDS due to the large variety. However, text annotations and pointers on them indicate regions of interest (ROI) that are then referenced in the caption or discussion in the article text. We have previously reported 72.02% accuracy in text and symbols localization but we failed to take advantage of the referenced image locality. In this work we combine article text analysis and figure image analysis for localizing pointer (arrows, symbols) to extract ROI pointed that can then be used to measure meaningful image content and associate it with the identified biomedical concepts for improved (text and image) content-based retrieval of biomedical articles. Biomedical concepts are identified using National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus. Our methods report an average precision and recall of 92.3% and 75.3%, respectively on identifying pointing symbols in images from a randomly selected image subset made available through the ImageCLEF 2008 campaign.
A semi-supervised learning method to classify grant support zone in web-based medical articles
Xiaoli Zhang, Jie Zou, Daniel X. Le, et al.
Traditional classifiers are trained from labeled data only. Labeled samples are often expensive to obtain, while unlabeled data are abundant. Semi-supervised learning can therefore be of great value by using both labeled and unlabeled data for training. We introduce a semi-supervised learning method named decision-directed approximation combined with Support Vector Machines to detect zones containing information on grant support (a type of bibliographic data) from online medical journal articles. We analyzed the performance of our model using different sizes of unlabeled samples, and demonstrated that our proposed rules are effective to boost classification accuracy. The experimental results show that the decision-directed approximation method with SVM improves the classification accuracy when a small amount of labeled data is used in conjunction with unlabeled data to train the SVM.
Layout-free dewarping of planar document images
Masakazu Iwamura, Ryo Niwa, Akira Horimatsu, et al.
For user convenience, processing of document images captured by a digital camera has been attracted much attention. However, most existing processing methods require an upright image such like captured by a scanner. Therefore, we have to cancel perspective distortion of a camera-captured image before processing. Although there are rectification methods of the distortion, most of them work under certain assumptions on the layout; the borders of a document are available, textlines are in parallel, a stereo camera or a video image is required and so on. In this paper, we propose a layout-free rectification method which requires none of the above assumptions. We confirm the effectiveness of the proposed method by experiments.
Watermarking ancient documents based on wavelet packets
Med Neji Maatouk, Ola Jedidi, Najoua Essoukri Ben Amara
The ancient documents present an important part of our individual and collective memory. In addition to their preservation, the digitization of these documents may offer users a great number of services like remote look-up and browsing rare documents. However, the documents, digitally formed, are likely to be modified or pirated. Therefore, we need to develop techniques of protecting images stemming from ancient documents. Watermarking figures to be one of the promising solutions. Nevertheless, the performance of watermarking procedure depends on being neither too robust nor too invisible. Thus, choosing the insertion field or mode as well as the carrier points of the signature is decisive. We propose in this work a method of watermarking images stemming from ancient documents based on wavelet packet decomposition. The insertion is carried out into the maximum amplitude ratio being in the best base of decomposition, which is determined beforehand according to a criterion on entropy. This work is part of a project of digitizing ancient documents in cooperation with the National Library of Tunis (BNT).
Script identification of handwritten word images
This paper describes a system for script identification of handwritten word images. The system is divided into two main phases, training and testing. The training phase performs a moment based feature extraction on the training word images and generates their corresponding feature vectors. The testing phase extracts moment features from a test word image and classifies it into one of the candidate script classes using information from the trained feature vectors. Experiments are reported on handwritten word images from three scripts: Latin, Devanagari and Arabic. Three different classifiers are evaluated over a dataset consisting of 12000 word images in training set and 7942word images in testing set. Results show significant strength in the approach with all the classifiers having a consistent accuracy of over 97%.