My top down approach at machine learning...

I was interested on ML(Machine Learning) / DL(Deep Learning) for a while now but never got around to actually studying about it or attempting to create anything that actually used or needed DL / ML. But luckily Verico our client came up with an actual real world problem that needed to be solved with machine learning.As the initial steps an Incubator was set up at NSBM Green with a plan to have 10 students with an interest on research & ML to get started on it. After bouncing around Ideas few times among our team and a few other people at Embla we came up with an initial assignment to be given to the Interns which was to come up with a way to extract text / words off of any image. The interns were divided into 2 teams hoping that they would come up with 2 different approaches, but in order to actually guide them and follow up on what they were doing we had to read up on some ML books and do some research ourselves.

My first approach to it was to read up some basics on some e-books on Amazon Kindle, this took me a while and realized that I got bored really fast doing any sort of reading specially with a lot of theory / math. Next I went ahead and started following up on MIT Deep Learning this was an initiative by MIT to opensource their learning materials and lectures which I thought was really awesome! I looked up a few videos but problem was one lecture was the length of an average movie and I found it very difficult sit through them, but having said that the lectures were very informative and had a lot to offer and I hope to go through most of them in the future. So I figured that this passive learning approach wasn't my style and followed a different approach - the top down approach.

With a top-down approach, the goal is to learn by creating significant pieces of real solutions to problems or by experimenting without knowing how it all works or the theory behind how the internals work. Think of it like using a computer, the levels of abstraction and interfaces provided by modern OS(Operating systems) allow us to use a computer without knowing anything about hardware or how software works... I followed the same concept. I researched on already Implemented & trained general purpose machine learning models that were readily available. AWS Machine Learning was something that caught my attention so I limited my research area to that and focused on learning about the several AI services they offer the image below shows a useful but a bit outdated overview of their services related to AI/ML.

Outdated overview

Since my real interest was on image processing I further reduced my focus area to Amazon Rekognition. Amazon describes it as : Amazon Rekognition is based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily, and requires no machine learning expertise to use. Amazon Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3.

This seemed great but I wanted to see what it allowed us to do and what was possible with it and their key features included these :

Image recognition and classification

Image recognition and expressions

You can find more related use cases/features on their Rekognition home page & features page. However the following had my exact use case i was interested about, that is text recognition in an image like the one here.

Image recognition and text capture

This convinced me that this service had something exactly like I was looking for and could end up in being a great solution to my use case. This also allowed me to reduce my area of focus further within rekognition service. Next it was a matter of reading their documentation and since I was most comfortable with javascript I choose AWS Javascript SDK there's several other SDK's available for different languages you may use any that you are comfortable with.

Next it was a matter of implementing or in my case creating a Proof of Concept(PoC) to learn how to use it and check if it actually does what it says ! I choose to implement it as a restful JSON API using Serverless framework running on AWS on lambda functions with Nodejs runtime(FaaS architecture). Then implemented a react client to consume the said API. You can find out more about a serverless implementation on this article that I published on medium if you are interested on the implementation details of a similar architecture. As for the implementation of the POC Its here on https://github.com/DasithKuruppu/MLTextRecognition and a working demo of the same implementation its a bit scuffed I know but it's just a POC and I haven't put much effort into it. Hopefully it helps someone with similar interests to get started on ML :).Now that I know what ML can do I would now be interested in diving a bit deeper and try and implement an open source ML modal that would have similar features to that offered by AWS Rekognition & hope to document / blog the whole process if I have the time. Let me know your thoughts, feedback and comments below would really appreciate it !