5 Ways to Blow Machine Learning Sales

If you think machine learning is a panacea for every business challenges and sell it as such, you’re doing it wrong. The best way to jeopardize your business is to go all in with machine learning by following the 5 tips below.

Read More

Sequence Labeling on a Structured Data

Sequence labeling is one of the classic ML tasks, that include well-studied problems of Part-of-Speech (POS) tagging, Named Entity Recognition (NER), Address parsing, and more. Here I want to discuss two related topics: tokenization, and satisfying constrains imposed by the structure of input document.

Read More

Building Datasets, Part 1

There was a time when working with big data was not technically possible because our compute resources couldn’t handle the amount of information involved. Beyond that, it took a while for the use case to develop around massive computing resources, so it wasn’t even considered a worthy pursuit. 15 years ago, I remember creating machine-learning algorithms using only a handful data points and then tweaking features representation for weeks. Back then, it was quite challenging to process the 20 newsgroup dataset and its 19 thousand news items.

Even as recently as five years ago, the situation hadn’t improved much. At that time, I worked on putting a learning system with a continuous retroaction loop into production. To fit the budget, we could only train the Random Forest with 5,000 examples – only a few days of data. Using such a small data set alone would not have produced the desired results, so we had to implement many tricks to keep ‘some’ past data alongside the continuous feed of new data to keep everything running smoothly.

Read More

Cost of Machine Learning

Exactly how much does deep learning cost? And are those prices fixed, or can they be optimized? Let me compare some cloud hardware and get down to dollars and cents to uncover some answers.

Read More