Photo Detection in Historical Documents

27 Feb 2014

We have continued to improve our handwriting detection and recognition tools. In doing so, we stumbled upon another exciting new feature that we think will help change the way people learn about their family history. We are excited to share that we have developed the ability to very easily extract pictures, photographs and other images from our historical books. It’s not exactly like stumbling upon penicillin, but we were pleasantly surprised at how perfectly we are able to identify these images!

Notice the red outline in the examples below

29-thumb_709

31-thumb_709

33

The next step for us will be to not only extract the image, but to also read the associated caption to enable our community members to search for information about the image. In the vast majority of cases, the caption describing the image is relatively easy for our search engine to identify for the following reasons:

  • its proximity to the image
  • additional whitespace around the block of text
  • the caption may also have different type characteristics from the page content (font size, weight, casing, etc)

What is particularly exciting about this discovery is that when we put the finishing touches on this technology, we’ll be able to add Image-specific search capabilities to Mocavo. This development will open up a whole new realm of exciting discoveries for our community. Stay tuned!