Machine Learning and Destroying the Library of Alexandria

cxi over 4 years ago 3 minute read time

Hey THWACKers! Welcome back for week 2 in machine learning (ML). In my last post, Does Data Have Ethics? Data Ethic Issues and Machine Learning, you may have noticed I mentioned "evil" four times, but also mentioned "good" four times. Well, you're in luck. After all that talk about evil and ethics, I want to share with you some good that's been happening in the world.

But who can talk about goodness, without mentioning the dark circumstances “the machines” don't want you to know about?

For those who aren't familiar, the Library of Alexandria was a place of wonder, a holder of so much knowledge, documentation, and so much more. But what happened? It was DESTROYED.

In preparation for this topic, and because I wanted to mention some very specific library destructions over the years, I found this great source on Wikipedia so you can see just how much of our history has been lost.

Some notable events were:

ALL the artifacts, libraries, and more destroyed by ISIS
The 200+ years’ worth of artifacts, documents, and antiquities destroyed in the National Museum of Brazil fire
The very recent fire at Notre Dame, where the fires are hardly even out while this topic smolders within me
The Comet Disaster that breaks off and destroys this sleepy Japanese town every 1,200 years (OK, so this one’s from an anime movie, but natural disasters are disasters all the same.)

Screen Shot 2019-05-16 at 8.28.10 PM.png

Image: Screen capture from the movie “Your Name” (Original title: Kimi no na wa) 50:16

https://www.imdb.com/title/tt5311514/

But how can machine learning help with this? Because I'm sure you all think “the machines” will cause the next level of catastrophe and destruction, right?

I’d like to introduce you to someone I'm honored to know and whose work has inspired growth, change, and not only can be used to preserve the past, but will enlighten the future.

This inspiration is Tkasasagi, who has been setting the ML world on fire with natural language processing and evolutionary changes to the translation of Ancient, Edo era, and cursive Hiragana.

To give you a sense of the significance of this, there's a quote from last June, "If all Japanese literature and history researchers in the whole country help transcribing all pre-modern books in Japan, it will only take us 2000 years per persons to finish it."

Let's put that into perspective—there are countless tomes of knowledge, learning, information, education, and so much more that documents the history and growth of Japanese culture and nation. An island nation in a region with some of the most active volcanoes and frequent earthquakes in the world. It's only a matter of time before more of this information suffers from life's natural disasters and gets lost to the winds of time. But what can be done about this? How can this be preserved? That's exactly the exciting piece that I'm so happy to share with you.

Here in the first epoch of this transcription project, machine learning does an OK job… but is it a complete job? Not even in the least. But fast forward to a few weeks later, and the results are staggering and impressive (even if nowhere near complete).

Screen Shot 2019-05-16 at 8.52.44 PM.png Screen Shot 2019-05-16 at 8.55.09 PM.png

Images: https://twitter.com/tkasasagi/status/1036094001101692928

Now some of you may feel (justifiably so) that this is an impressive growth in such a short amount of time, and I would agree. Not to mention the model is working with >99% accuracy at this point which is impressive in its own right.

Screen Shot 2019-05-16 at 9.19.25 PM.png

Image: https://twitter.com/tkasasagi/status/1115862769612599296

But the story doesn't end there—it continues literally day by day. (Feel free to follow Tkasasagi and learn about these adventures in real time.)

Every day, every little advancement in technologies like this through natural language processing (NLP), computer vision (CV), and convolutional neural networks (CNN) continue to grow the entire industry as a whole, where you and I, as consumers of this technology, will eventually find our everyday activities to be easier, and one day will just be seen as commonplace. For example, how many of you are using, or have used, the image language translate function of Google Translate to help display another language, or used WeChat's natural conversion of Chinese into English or vice-versa?

We are leap-years beyond where we were just a few years ago, and every day, it gets better, and efforts like these just continue to make things better, and better, and better.

How was that for using our machines for good and not the darkest of evils? I'm excited—aren't you?

Machine Learning and Destroying the Library of Alexandria

Top Comments