cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Machine Learning and Destroying the Library of Alexandria

Level 12

Hey THWACKers! Welcome back for week 2 in machine learning (ML). In my last post, Does Data Have Ethics? Data Ethic Issues and Machine Learning, you may have noticed I mentioned "evil" four times, but also mentioned "good" four times. Well, you're in luck. After all that talk about evil and ethics, I want to share with you some good that's been happening in the world.

But who can talk about goodness, without mentioning the dark circumstances “the machines” don't want you to know about?

For those who aren't familiar, the Library of Alexandria was a place of wonder, a holder of so much knowledge, documentation, and so much more. But what happened? It was DESTROYED.

In preparation for this topic, and because I wanted to mention some very specific library destructions over the years, I found this great source on Wikipedia so you can see just how much of our history has been lost.

Some notable events were:

  • ALL the artifacts, libraries, and more destroyed by ISIS
  • The 200+ years’ worth of artifacts, documents, and antiquities destroyed in the National Museum of Brazil fire
  • The very recent fire at Notre Dame, where the fires are hardly even out while this topic smolders within me
  • The Comet Disaster that breaks off and destroys this sleepy Japanese town every 1,200 years (OK, so this one’s from an anime movie, but natural disasters are disasters all the same.)

Screen Shot 2019-05-16 at 8.28.10 PM.png

Image: Screen capture from the movie “Your Name” (Original title: Kimi no na wa) 50:16

https://www.imdb.com/title/tt5311514/

But how can machine learning help with this? Because I'm sure you all think “the machines” will cause the next level of catastrophe and destruction, right?

I’d like to introduce you to someone I'm honored to know and whose work has inspired growth, change, and not only can be used to preserve the past, but will enlighten the future.

This inspiration is Tkasasagi, who has been setting the ML world on fire with natural language processing and evolutionary changes to the translation of Ancient, Edo era, and cursive Hiragana.

To give you a sense of the significance of this, there's a quote from last June, "If all Japanese literature and history researchers in the whole country help transcribing all pre-modern books in Japan, it will only take us 2000 years per persons to finish it."

Let's put that into perspective—there are countless tomes of knowledge, learning, information, education, and so much more that documents the history and growth of Japanese culture and nation. An island nation in a region with some of the most active volcanoes and frequent earthquakes in the world. It's only a matter of time before more of this information suffers from life's natural disasters and gets lost to the winds of time. But what can be done about this? How can this be preserved? That's exactly the exciting piece that I'm so happy to share with you.

  Here in the first epoch of this transcription project, machine learning does an OK job… but is it a complete job? Not even in the least. But fast forward to a few weeks later, and the results are staggering and impressive (even if nowhere near complete). 

Screen Shot 2019-05-16 at 8.52.44 PM.pngScreen Shot 2019-05-16 at 8.55.09 PM.png

Images: https://twitter.com/tkasasagi/status/1036094001101692928

Now some of you may feel (justifiably so) that this is an impressive growth in such a short amount of time, and I would agree.  Not to mention the model is working with >99% accuracy at this point which is impressive in its own right.

Screen Shot 2019-05-16 at 9.19.25 PM.png

Image: https://twitter.com/tkasasagi/status/1115862769612599296

But the story doesn't end there—it continues literally day by day. (Feel free to follow Tkasasagi and learn about these adventures in real time.)

Every day, every little advancement in technologies like this through natural language processing (NLP), computer vision (CV), and convolutional neural networks (CNN) continue to grow the entire industry as a whole, where you and I, as consumers of this technology, will eventually find our everyday activities to be easier, and one day will just be seen as commonplace. For example, how many of you are using, or have used, the image language translate function of Google Translate to help display another language, or used WeChat's natural conversion of Chinese into English or vice-versa?

We are leap-years beyond where we were just a few years ago, and every day, it gets better, and efforts like these just continue to make things better, and better, and better.

How was that for using our machines for good and not the darkest of evils? I'm excited—aren't you?

18 Comments
Level 16

Seems like you could lose all your digital data even easier. I know I've dropped my phone in the lake before.

Level 12

Or you could do what I do with my IP68 waterproof phone... when something gets spilled on it at a restaurant, I take it into the bathroom... take it out of its case... Get some soap and water and clean it up, then dry it off and I'm good to go.

Level 16

I need a waterproof phone

Level 12

Pretty much every iphone past the iPhone 7 is IP68, and many Android phones are as well.

It's very weird that the Pixel 2 and Pixel 3 are IP68, but the Pixel 3A is NOT!

Level 12

No backups? I back mine up to my desktop computer at least weekly, which is in turn backed up to a separate hard drive more than daily.  When I have a day recording absolutely essential data I'll back it up that night.

The bigger risk is accessing that data.  The software you used to use to create the data store becomes obsolete, no longer can read the old data, or cannot be run at all, meaning it is stored but inaccessible.  How many of you still have reel-to-reel tape drives? Microfiche readers? Those 8-track-like tape readers (never knew the name even though I stored thousands upon thousands of them.)  And that's just the data generated in the past few decades.  And passwords!  If you found an old mobile phone of yours and managed to charge it, can you recall the old password?

Paper has disadvantages, yet also the advantages of being readable centuries later and not requiring batteries, and hacking can be readily detected.

Level 12

Without even needing to go back centuries think about the number of lost recordings in the BBC or other television networks.  Cases where they would record over the old tapes, or even when that's happened at home "Oh, I recorded over your wedding film with this sports game!"

It's been done in art too with paintings being painted over, and finding masterpieces hidden in the canvas.  

Though I figure if everything goes in a handbasket... we can always rely upon our Zip drives to get us back up and running!

Level 16

The phone I ruined was an Iphone 5, but it was in an Otterbox IP68 rated case. It leaked snorkeling in 6 feet of water ruining the phone. I was in Jamaica and didn't have any way to back up my data so ... it was lost.

MVP
MVP

Level 12

"... we can always rely upon our Zip drives to get us back up and running!"  I wasted some money on those things.

Level 12

An AI could destroy large amounts of data, but data backups in multiple locations can also save data.

I want to know more about my family history, but they were living in Chicago before and during the big fire so genealogical records were lost. If those records had been digitized (obviously impossible more than a century ago) and stored in multiple locations, ideally stored in a nonvolatile format that is offline so it can't be damaged by malware, then no fire or other regional catastrophe could destroy those records.

When I learned the hard way that I need to back up and protect important personal files, I started my own system. I keep those files on my notebook computer's SSD, of course, but also back them up to a Raspberry Pi at home that I set up as a file server. It uses less than $1 worth of electricity per year so it's not going to be as expensive as a traditional server. I then have that server use rsync over an SSH tunnel to a second Raspberry Pi that I set up elsewhere in North America. They are far enough apart that one high altitude electromagnetic pulse could not destroy both. But if I had a host, I would put a third R Pi in the Eastern hemisphere.

Sure, there are benefits to using a free or paid service to store these files. But I like the privacy and control I have. And in a worst-case situation where an R Pi dies, the data is stored on an external hard drive salvaged from an old notebook computer that is using an encrypted filesystem.

Level 14

Well said and illustrated by examples. One more for destroyed forever category. During the French Revolution many documents and artifacts were destroyed, leaving huge voids in historical research.

I do use the Google Translate feature for some personal work I am doing. (French and Latin to English). A useful tool.

Level 12

A number of you may appreciate this image and how this is not an uncommon problem and our own 'english' has been plagued by growth and inconsistency over time.

Screen Shot 2019-06-01 at 11.31.01 AM.png

I heard about machine speech learning in an audio book. They played samples of the "learning process" and it sounded like the machine was doing "swallowing / breathing" sounds.

it's very impressive what machine learning can do today, scary that sometimes we have no real explanation and 100% understanding what the machine is actually doing when it learns.

Level 14

Thanks for the article!  Interesting stuff.  Reading about such historically significant treasures being destroyed is so sad. 

Level 13

Very interesting post.  Thanks for sharing.  I always wondered how they were going to handle OCR for pictographic languages like Japanese.

MVP
MVP

Thanks for the article.

Level 20

We were just beginning to implement neural networks while I was in college... it's finally getting somewhere now.

Level 13

Thanks Interesting article.

About the Author
Founder at Remedy8 Security, Technology Evangelist, vExpert, EMC Elect, BDA, CISSP, MCT, Cloud, Ninja, Vegan, Father, Cat, Humorist, Author