Using Andrew Ng’s mantra about spaceship and fuel it needs, anyone on a mission to create a first-class AI-powered product needs vast amounts of data to feed to the machines. Such a necessity is dictated by multiple parameters in models that computers are to master. And to tell the truth, the problem of getting a bunch of training samples or marked and structured information is even harder to solve than just take some algorithm and use it. However, we at PerceptionBox, want to share some unexpected data sources with you and explain how you can use machine learning to improve your business.
Depending on the sophistication of the problem, the number of parameters and the amount of data needed is big, too. However, being creative and going about costly big data is a way to go. By now you might be curious how to find data sets required for training the machines if not using a standard notion of feeding machine big data that, if eg scientific or medical records, can be a cost barrier? Here’s what we have to say in this regard.
And although often you cannot use the machine learning without any data at all, however, there are several general ways how ML&AI can power up your business:
- Find the required data among multiple free datasets on the web that is necessary for usual machine learning system development.
There a bunch of places where you can get a dataset that could be very useful for your business either for some ML system training or for classic data analytics. Check Google Dataset Search, Kaggle, Government Dataor any other source.
- If you require some industry-specific information for your machine learning algorithms training, but you may find crumbs of this priceless information on multiple web pages and websites, then you just can scrap it.
There are really a lot of different cloud and on-premise tools like Octoparse, ParseHub, Dexi.io or others, that can help you do it fast, easy and automatically.
- Plenty of ready-to-use ML solutions are at your disposal. You can easily enhance your business with multiple high-end technologies like speech recognition, computer vision, recommendation, and matching engines, etc.
- If none of these ways didn’t help you, then deep learning with several special technics come into play.
Unpretentious Deep Learning
Depending on the project focus, industry and detailed requirements, slightly different technologies or algorithms are preferable, but it stays up to your machine learning developers. But its general recommendation to use the latest and the most advanced algorithms of Deep Learning. Lower there is a bit modified slide from the above mentioned Andrew Ng that shows the reason why Deep Learning is so popular lately.
After we looked at inputs to start a project with limited amounts of data at hand, let us further deep dive into alternative methods to work on the data you have:
To capture long-term dependencies, one might consider transfer learningto start from. Let us assume you have a task to solve that is highly specific a certain domain and thus the number of data sets necessary for the project are nowhere to find. So in order to build new training models, you’d require additional resources which is costly.
However, think of the models used in previous projects but share the specific domain component. Why not employing those in the project you deal with now? Such a method is widely known as transfer learning. Think of knowledge hand over-take over from a person resigning the company to a new associate.
“Transfer learning is an up-and-coming technique that allows us to transfer the knowledge learned in one dataset and apply it to another dataset.”
_ Bradley Arsenault
Similar to training models, a fresher will have a foundation inherited from a senior associate leaving the company to build upon the future knowledge and concatenate them.
If handling uncertainty caused by a small amount of training data, domain expertise and unsupervised learning (clustering) can be the keys. For example, having a hypothesis, you are still not sure about the final outcome of the research. So the target features are not provided in the training samples. The idea then is to build a classification that can be used to classify the raw data. The basics behind the unsupervised learning are finding the underlying structure of a dataset and grouping the raw material. These are the grounds of unsupervised learning that starts with unlabeled data.
Think programmatic ads here and imagine that a marketing professional needs to judge the historical records of purchasing patterns of a community (or even a nation). An ad engine would cluster those that splurge on shoes on the one hand and others that drop at the organic store every weekend to shop healthy whole grained foods on the other hand. In the end, marketing gurus get better insights and can decide on targeting the ad campaigns better.
Clustering is the essence here and it’s about putting events, people or things together depending on their relativity to each other with the clusters being used as insights to feed the machines.
Finally, you might also look into reinforced learning. The latter is about trial and error, thus the efforts leading to self-discovery in an interactive environment when feedback from prior actions used as a guide and mapping. Employing the technique of learning from experience, a set of sticks and carrots helps machines refine the approach and decide on the best possible step to take next.
Reinforced learning is the greatest step towards the machine learning future where ML systems are trained not by humans, but other ML systems.
To illustrate the concept, we can look at the chemical lab setting and the way scientists took up optimizing chemical reactions with deep reinforcement learning to the next level. The core idea of the study undertaken by Stanford University was to create a model that would iteratively record the results of a chemical reaction and chooses new experimental conditions in order to enhance the reaction outcome.
Right of the bat
What if you decide to build up your strength on already existing training material available from tech giants and data enthusiast? Let us see what’s there for you.
From cloud video intelligence to cloud translation API — that is the Google Cloud AI products your business can be enhanced with. Google is the company that has a lot of data to train its ML systems. That is why if you want to play hard you may take a glance on Google machine learning services to bring unmatched scale and speed to your business.
Microsoft Azure offers a variety of AI-powered cognitive services, that are able to bring into your business a lot of magic like abilities as complex information and data mapping, speech and text recognition, sentiment evaluation, etc. Pay as you go model let you try and pay only for what you use, while you scale your business on the go.
Serving the needs of developers of all categories, Amazon has developed a comprehensive set of tools for scientists of different levels and project expertise. Among the testimonials, you can find praise from such renowned business as Vodafone, Expedia, Ryanair and many more. But apart from paid products, Amazon has AWS Free Tier and specifically its SageMaker for professionals that want to use a fully managed platform to create, train, and run machine learning models.
Machine Learning Taste
There are a variety of usage examples of the combination of ready-to-use machine learning APIs or custom machine learning software paired with different third parties tools or your internal data.
To give you a brief understanding of how you can use such tools and improve your business with machine learning let imagine that you own a huge women apparel ecommerce website. You have a variety of different brands and models in stock right now. One of the biggest problems for every ecommerce company is the right stock management to minimize stock balance (reduction of storage costs and lost profits). Therefore, one of the main questions that confront each large online store is which brands and models and in what quantity to choose for the following seasons. The right choice may significantly lower stock balance and boost revenue!
Let’s imagine that you want to take the whole power of machine learning and data science development and to understand which of the current brands and models your customers like, and which models will be popular in the next seasons. It is a quite likely situation, isn’t it?
To achieve the best results for your business plan and stock planning you may gather/mine/extract the following data:
- Get the whole sales history of all your goods from your CRM or stock management system. We assume that every item in your store has all details (types, categories, models, colors, features, etc.).
- Download all comments of your customers related to specific items from your website.
- Use several social media monitoring tools (like Hootsuite and/or Google Alerts) or directly to Twitter API for opinion mining and gather all mentions of every apparel brand and model.
- Check RSS feeds of fashion magazines or some popular beauty bloggers and gather all articles related to some specific category or niche.
After you gather all this data and they would be prepared for further usage, you can apply some machine learning technics:
- Use a data clustering technics to find out which types of clothes, features, colors, etc. are more popular and are sold better.
- Check Google Cloud Translate to find out trends in different languages and in different countries (you can use this information later to make your offering targeting better if you operate worldwide).
- Use Amazon Comprehend to check all mentions about every product, model, type of, let’s say, a dress or its color and understand how positive or negative the text of the mention or review is.
- Identify multiple entities in product reviews or social media mentions and link distinct entities between each other by associating text to additional information on the web with Azure Text Analytics.
- Check the everything you’ve gathered from everything above mentioned to find out which products, types of clothes, colors, and etc. WILL BE more popular and satisfying among particular parts of your audience.
Everything above mentioned is possible without data at all, and yes, you need a really skilled machine learning developer to achieve that (find out the data channels, pre-learing data preparations like data cleansing or building aggregate indicators, custom model training, overall system assembling). But we hope that now you understand that you don’t need an astonishing number of data to benefit from the power of ML.
Whether you are new to the industry just making baby steps in machine learning or already felt a taste of ML magic that cut down costs, there are alternative sources and technics to power up your business and make the most of emerging technology to boost your revenue.