Unlabeled Data

What Does Unlabeled Data Mean?

Unlabeled data is a designation for pieces of data that have not been tagged with labels identifying characteristics, properties or classifications. Unlabeled data is typically used in various forms of machine learning.

Techopedia Explains Unlabeled Data

In types of machine learning called unsupervised machine learning, the machine learning program operates by evaluating sets of unlabeled data. Because the data does not have labels, the machine learning program has to identify each data piece on its properties and characteristics.

One of the best ways to explain this is by using the fruit bowl metaphor. Suppose the machine learning program is learning to identify three different kinds of fruit – bananas, grapes and apples. If the data in the initial training set is labeled, the machine learning program works from that perspective – matching successive images to one of those three categories.

If, however, none of the data pieces are labeled with the three fruit names – bananas, grapes and apples – the machine learning program will need to work by evaluating each image and looking at characteristics like color – yellow, red or purple – shapes – long and thin, round or clustered – and other characteristics.

From this example, it is easy to see how labeled data affords much easier opportunities to use machine learning algorithms for decision results. However, sophisticated unsupervised machine learning programs dealing with unlabeled data can produce astoundingly accurate and precise results as well.