MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

All About Optical Character Recognition

Published on 05 August 13
259
0
0
All About Optical Character Recognition - Image 1

If you wanted a digital copy of a magazine article, printed book or newspaper clipping, you could always put it through a scanner for a facsimile image of it. Sometimes, though, an image turns out to be unsuitable for the purpose at hand.

To begin, images aren't searchable. If you scan a business card to put on your computer, for instance, your computer or other device won't be able to search on your list of contacts to find it. Printed matter captured as images isn't editable, either. The question is - how do you get a digital copy of printed matter without spending time typing it out by hand?


The answer is optical character recognition - the technology that allows a computer to scan printed words as images, decode what words those images represent and produce machine recognizable text. This text, just the same as content typed into word processing software, is searchable.


How does OCR software turn images of words into machine-recognizable words?


More than one kind of OCR technology exists.


At its most basic, optical character recognition does exactly what the name suggests - it tries to recognize words one letter or character at a time.

Optical word recognition is more advanced - it tries to recognize whole words at a time.


OCR technology works best on printed words - not handwriting. Intelligent character and word recognition are attempts at reliably recognizing even cursive handwritten text. This advanced recognition technology only works reliably on certain kinds of handwriting at this time.


Traditionally, one of the greatest challenges facing the designers of OCR software has been to make sure that the software isn't confused by print artifacts - lines on the page, spots, scans where a page is not straight, text in color and so on. Modern OCR software is built with special algorithms to overcome these problems - de-skewing algorithms to help straighten up images, binary isolation algorithms to help turn a color image to black and white and de-speckling software to help clean up spots and other artifacts.


Once an image is cleaned up, the software uses various techniques to compare each character on the image to a stored set of character shapes to arrive at the closest possible recognition result.


Accurate OCR offers great benefits



To improve recognition accuracy, software vendors often create special versions of their software for specific applications. Software that's used by lawyers, for instance, comes with a database of legal industry expressions and terminology. Access to such a database can help OCR software check how likely the appearance of a particular phrase or sentence is. Special software exists for the medical industry, too. Several document processing businesses that specialize in OCR technology exist today to deliver professional scanning services to these industries.

When applied to large-scale projects such as Google's plan to digitize the world's libraries, OCR technology can bring knowledge to the masses at low cost. Businesses and individuals use OCR in imaginative and empowering ways, too. OCR software helps the visually impaired find their way around the world. The banking industry uses it to process checks quickly and Internet security businesses deliver CAPTCHA systems to protect websites and individuals online.


Janifar is a computer scientist and researcher. She enjoys passing on his insights through blogging. Visit the Scanning Services Vancouver link to learn more about scanning in that area.














All About Optical Character Recognition - Image 1

If you wanted a digital copy of a magazine article, printed book or newspaper clipping, you could always put it through a scanner for a facsimile image of it. Sometimes, though, an image turns out to be unsuitable for the purpose at hand.

To begin, images aren't searchable. If you scan a business card to put on your computer, for instance, your computer or other device won't be able to search on your list of contacts to find it. Printed matter captured as images isn't editable, either. The question is - how do you get a digital copy of printed matter without spending time typing it out by hand?

The answer is optical character recognition - the technology that allows a computer to scan printed words as images, decode what words those images represent and produce machine recognizable text. This text, just the same as content typed into word processing software, is searchable.

How does OCR software turn images of words into machine-recognizable words?




More than one kind of OCR technology exists.

At its most basic, optical character recognition does exactly what the name suggests - it tries to recognize words one letter or character at a time.

Optical word recognition is more advanced - it tries to recognize whole words at a time.

OCR technology works best on printed words - not handwriting. Intelligent character and word recognition are attempts at reliably recognizing even cursive handwritten text. This advanced recognition technology only works reliably on certain kinds of handwriting at this time.

Traditionally, one of the greatest challenges facing the designers of OCR software has been to make sure that the software isn't confused by print artifacts - lines on the page, spots, scans where a page is not straight, text in color and so on. Modern OCR software is built with special algorithms to overcome these problems - de-skewing algorithms to help straighten up images, binary isolation algorithms to help turn a color image to black and white and de-speckling software to help clean up spots and other artifacts.

Once an image is cleaned up, the software uses various techniques to compare each character on the image to a stored set of character shapes to arrive at the closest possible recognition result.

Accurate OCR offers great benefits




To improve recognition accuracy, software vendors often create special versions of their software for specific applications. Software that's used by lawyers, for instance, comes with a database of legal industry expressions and terminology. Access to such a database can help OCR software check how likely the appearance of a particular phrase or sentence is. Special software exists for the medical industry, too. Several document processing businesses that specialize in OCR technology exist today to deliver professional scanning services to these industries.

When applied to large-scale projects such as Google's plan to digitize the world's libraries, OCR technology can bring knowledge to the masses at low cost. Businesses and individuals use OCR in imaginative and empowering ways, too. OCR software helps the visually impaired find their way around the world. The banking industry uses it to process checks quickly and Internet security businesses deliver CAPTCHA systems to protect websites and individuals online.

Janifar is a computer scientist and researcher. She enjoys passing on his insights through blogging. Visit the Scanning Services Vancouver link to learn more about scanning in that area.

Related Posts:
Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top