Monte Carlo Tree Search (MCTS) is an important algorithm behind many major successes of recent AI applications such as AlphaGo’s striking showdown in 2016.
In this blog, we will first start with uninformed search in which we simply traverse through the whole search space to find the optima. It includes depth-first search and breadth-first search.
Then we follow by describing how MCTS algorithm works. In the end, we apply it to a toy example problem of finding the most rewarding leaf node of a binary tree.
Uninformed search, as its name suggests, is a kind of generic search algorithms which…
Recommender systems eventually output a ranking list of items regardless of different modelling choices. So it is important to look at how to evaluate directly ranking quality instead of other proxy metrics like mean squared error, etc.
In recommender settings, the hit ratio is simply the fraction of users for which the correct answer is included in the recommendation list of length L.
As one can see, the larger L is, the higher hit ratio becomes, because there is a higher chance that the correct answer is included in the recommendation list. …
Cross entropy loss is commonly used in classification tasks both in traditional ML and deep learning.
Note: logit here is used to refer to the unnormalized output of a NN, as in Google ML glossary. However, admittedly, this term is overloaded, as discussed in this post.
In this figure, the raw unnormalized output from a Neural Network is converted into probability by a softmax
function.
RS is to match items to users. The starting point is a user-item matrix filled with values representing either explicit feedback (user provided ratings) or implicit feedback (count of clicks, number of visits, watch time, etc.). One can also pose this question as a matrix filling problem: given the known entries in user-item matrix, how to fill in the missing entries given all sort of constraints?
Given the recent hyped progress in deep learning based approaches in AI research especially in NLP and CV, it is time to go back to the fundamentals. Many of the fundamental methods, such as neighbourhood based methods are surprisingly effective in many areas, even today. We take a brief look at nearest neighbour method and its applications in density estimation and recommender systems. User-based and item-based collaborative filtering are discussed. In the end, the cold start and data sparsity issue are also touched upon.
Let’s start with the problem of density estimation. In plain words, density estimation is the construction…
Immutability is highly encouraged in Scala applications. As a first class into Scala, it often starts with the difference between val
and var
. In the naive example when the variable is pointing to a primitive data type like Integer, it all seems obvious, but what if the variable is pointing to a more complex data type, such as a class with a mixture of var
and val
members?
Let’s dive in with a small example walk-through.
val
is conveniently called “value”, implying its immutability, whereas var
is called “variable”, implying its mutability. …
What are we really talking about when we talk about AI?
The term “Artificial Intelligence” is coined shortly after World War II during the 1956 Dartmouth Conference. John McCarthy persuaded other attendees (amongst Marvin Minsky, Claude Shannon, etc.) to accept AI as the name for this promising field.
(Anecodotes: Claude Shannon again? He is just everywhere. And Marvin Minsky? Isn’t him the villain who has discouraged Neural Network reseach and caused the AI winter in 1970s?)
Compared to Physics and other fields in science and engineering, AI is one of the newest field, claimed by Russel & Norvig in their…
Let’s start by outlining some main benefits of using ML pipeline, already.
The last point might not be immediately obvious for some. Imagine if one day your model needs to train on large amount of data. You might have no better choice but to use some distributed framework like Spark. Using ML pipeline will make migrating to Spark a breeze to work with. One…
LSH is a technique used to find similar documents in a large corpus. Here we use “documents” in a broad sense to represent any data that can be represented as a set, such as a shopping basket containing a set of grocery items.
There are different LSH functions associated with different distance measures. LSH functions are used to group similar items into the same buckets. Before we can do that, we first need to define “similar” in terms of the distance measure under consideration. That’s why LSH functions are always associated with a distance measure.
Loosely speaking, for a given…
Bloom filter is a probabilistic data structure designed to tell you if a member is in a set in a highly memory and time efficient manner. In that sense, it is similar to a set, but it does it with an adjustable false positive rate.
A brilliant example is provided in this blog post.
Bloom filter is especially fit for large set, where it is hard to fit the whole set into main memory as it is, and when it is okay to allow certain rate of false positives.
One typical example is account name selection while creating your gmail…
Machine Learning & Software Engineer in Amsterdam, Holland