Comparative analysis of Software accuracy for OCR in examples of documents digitalized by different resolutions during digitization process Cover Image

Komparativna analiza tačnosti Software-a za OCR u primjerima digitaliziranih dokumenata sa različitim rezolucijama prilikom digitalizacije
Comparative analysis of Software accuracy for OCR in examples of documents digitalized by different resolutions during digitization process

Author(s): Dragan Golubović
Subject(s): Museology & Heritage Studies
Published by: Arhiv Bosne i Hercegovine
Keywords: Software za OCR; digitalizacija; Abby Fine Reader 2.0

Summary/Abstract: Rad predstavlja eksperiment s primarnim ciljem dokazivanja učinkovitosti software-a za OCR kod dokumenata koji su digitalizirani u različitim rezolucijama (200 DPI1, 400 DPI i 600 DPI). Prema nekim pokazateljima, tačnost iščitavanja odnosno prepoznavanja karaktera unutar teksta trebala bi da bude oko 90%, što će biti provjereno odnosom broja OCR-ovanih alfabetskih ili numeričkih karaktera, te postotkom tačnosti u odnosu na postotak pogrešno isčitanih karaktera. Sekundarni cilj jeste da se putem eksperimenta dođe do najoptimalnijih komponenti za proces digitalizacije, uzimajući u obzir vrijeme procesa skeniranja i količine računarske memorije koja će biti potrebna kako bi se digitalizirani dokument trajno čuvao. U eksperimentu će za proces konvertiranja dokumenata iz analogne u digitalnu formu biti korišten skener EPSON GT 20.000, a software za OCR digitaliziranih dokumenta je Abby Fine Reader 2.0. On today’s IT market the wide range of OCR (Optical Character Recognition) software is available, but only a few are able to process cyrilic and latinic alfabet characters. The most common is Abby Fine Reader, along with Omnipage and Readiris. Usage estimation of OCR processed text is refl ected in the period of time needed to manually correct all remaining errors. A page from magazine Nada (1897) is used in this experiment. It had been digitized in three diferent resolutions (200 DPI, 400 DPI and 600 DPI). The primary objective of the experiment was to show the percentage of OCR software accuracy during character’s conversion process from picture to readable text, and secondary objective was to use this experiment to fi nd out the most optimal components for digitization process, considering the amount of scanning time and the quantity of memory needed for permanent preservation of digitized record.

  • Issue Year: 2009
  • Issue No: 1
  • Page Range: 139-145
  • Page Count: 7
  • Language: Bosnian
Toggle Accessibility Mode