Hands-On Predictive Analytics with Python
上QQ阅读APP看书,第一时间看更新

Methodology

For the problem we have defined, the target is the price of the diamond, and our features will be the nine remaining columns in our dataset: carat, cut, color, clarity, x, y, z, depth, and table.

Since we are talking about prices, the type of variable we want to predict is a continuous variable; it can take (in principle) any numeric value within a range. (Of course, we are talking about a practical definition of continuity, not a strictly mathematical definition.) Since we are predicting a continuous variable, we are trying to solve a regression problem; in predictive analytics, when the target is a numerical variable, we are within a category of problems known as regression tasks.

This is a whole category of problems and we will talk about them in Chapter 4, Predicting Numerical Values with Machine Learning. Perhaps you are already familiar with the term linear regression, which is very popular in statistics; however, these terms should not be confused, as the latter refers to a specific statistical technique and the former to a whole category of machine learning problems.

For now, it will be enough to say that the methodology will consist mainly of the following: building a regression model with the price of the diamond as a target.