Google Play Books Ate My Apostrophes

Update 10 April: It pays to report problems like the one described below to Google’s customer support. Seven weeks ago I discovered the problem. One week ago I reported it. Today the problem was suddenly gone, probably because Google updated the two ebooks involved and pushed new versions of the files to my phone.

I usually shop around for a good price when I buy e-books, and lately Google’s bookstore has received my custom. It’s not a very high-profile store – you see, this isn’t the well-known Google Books, where they offer scanned paper books in your browser. This is something called, clunkily, Google Play Books or Books On Google Play, where you can get copy-protected e-books for off-line reading.

A funny thing about this service is that many or all of Google’s e-book files contain original bitmaps scanned from paper books [or are they PDF images of the layout?]. You can toggle between the real e-book, which is the product of Optical Character Recognition probably followed by human proofreading, and the scanned pages. This won’t do you much good on a little phone screen, but anyway.

Now, the two most recent books I bought from Google Play Books have a strange glitch. When I complained about it to customer service, I received prompt friendly help. When none of their suggested fixes worked, I was offered a refund. So this is not a disgruntled customer blog entry. Still the problem is so strange that I want to blog about it just as a technical conundrum.

On my Android smartphone, the OCRed texts in my e-book copies of Adam Roberts’ Jack Glass (2012) and Neal Stephenson’s REAMDE (2011) have lost all their apostrophes. All their quotation marks. All their long dashes. And all their diacritic characters. When Stephenson writes “naïveté”, my e-book says “navet”, which is French for turnip. When the problem first showed up, in Roberts’ book, I actually thought he wrote non-standard English as a futuristic device.

When you run operating systems in non-English language modes, like Swedish or even Chinese, you get used to misidentified characters, with ÅÄÖÜ becoming all kinds of junk symbols. But this doesn’t look like a case of that. Google’s reader software is just quietly omitting some of the most common characters in English novels!

The problem isn’t new. I’ve found references to it on-line starting December 2010. Strangely, most of the complaints are about science fiction novels. Dear Reader, what’s your take on this?