Publication: Krupiński, R., Lech, P., Tecław, M., Okarma, K.: Binarization of Degraded
Document Images with Generalized Gaussian Distribution. In: Rodrigues,
J.M.F.e. (ed.) Computational Science - ICCS 2019. Lecture Notes in Computer Science,
vol. 11540, pp. 177-190. Springer International Publishing (2019).
Abstract. One of the most crucial steps of
preprocessing of document images subjected to further text recognition is their
binarization, which influences significantly obtained OCR results. Since for degrades
images, particularly historical documents, classical global and local thresholding
methods may be inappropriate, a challenging task of their binarization is still
up-to-date. In the paper a novel approach to the use of Generalized Gaussian
Distribution for this purpose is presented. Assuming the presence of distortions, which
may be modelled using the Gaussian noise distribution, in historical document images, a
significant similarity of their histograms to those obtained for binary images corrupted
by Gaussian noise may be observed. Therefore, extracting the parameters of Generalized
Gaussian Distribution, distortions may be modelled and removed, enhancing the quality of
input data for further thresholding and text recognition. Due to relatively long
processing time, its shortening using the Monte Carlo method is proposed as well. The
presented algorithm has been verified using well-known DIBCO datasets leading to very
promising binarization results.
https://link.springer.com/chapter/10.1007/978-3-030-22750-0_14
|
|