Predict Value with Python Machine Learning

This blog article shows you how to predict a value with Machine Learning using Python. I use Diamond price as an example.

Diamonds have long fascinated us with their brilliance, rarity, and timeless allure. Whether you’re a jewelry enthusiast, a gemologist, or simply curious about the world of diamonds, understanding their pricing can be both intriguing and practical. In this short blog, we’ll explore how machine learning can help predict diamond prices based on their carat weight.

The Data

Imagine you have a dataset containing information about various diamonds. Each diamond is described by two key features:

Carat Weight: The weight of the diamond (measured in carats).
Price: The price of the diamond (in dollars).

Our goal is to build a model that can predict the price of a diamond based on its carat weight. Let’s dive into the steps!

Step 1: Data Collection

First, gather data. You might have a CSV file named “diamonds.csv” with columns for carat weight and price. Load this data into a Pandas DataFrame.

Step 2: Data Splitting

Split the data into training and testing sets. We’ll use 80% for training and 20% for testing.

Step 3: Train a Linear Regression Model

We’ll use a simple linear regression model. This model assumes a linear relationship between carat weight and price. Initialize the model and fit it to the training data.

Step 4: Make Predictions

Now comes the exciting part! Use the trained model to predict prices for new diamonds based on their carat values.

Step 5: Evaluate the Model

Calculate metrics like Mean Squared Error (MSE) and R-squared to assess the model’s performance.

Visualizing the Results

To visualize our predictions, let’s create a scatter plot. The x-axis represents carat weight, the y-axis represents price, and the red line shows our regression model’s prediction.

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

import matplotlib.pyplot as plt

# Load the diamond dataset from the CSV file

diamond_data = pd.read_csv(“diamondsprice.csv”)

# Extract features (carat) and target (price)

X = diamond_data[“carat”].values.reshape(-1, 1)

y = diamond_data[“price”].values

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a linear regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f”Mean Squared Error: {mse:.2f}”)

print(f”R-squared: {r2:.2f}”)

# Plot the regression line

plt.scatter(X_test, y_test, color=”blue”, label=”Actual”)

plt.plot(X_test, y_pred, color=”red”, label=”Predicted”)

plt.xlabel(“Carat”)

plt.ylabel(“Price”)

plt.title(“Diamond Price Prediction”)

plt.legend()

plt.show()

# Example prediction for a new diamond with carat=1.5

new_diamond_carat = 1.5

predicted_price = model.predict([[new_diamond_carat]])

print(f”Predicted price for a {new_diamond_carat:.2f} carat diamond: ${predicted_price[0]:,.2f}”)

Remember that in a real-world scenario, you’d need to preprocess data, handle missing values, and fine-tune your model. But this example should give you a starting point!

Source code download: https://github.com/chanmmn/python/tree/main/2024/MachineLearningPredict/?WT.mc_id=DP-MVP-36769

Reference: https://scikit-learn.org/stable/

About chanmingman

Since March 2011 Microsoft Live Spaces migrated to Wordpress (http://www.pcworld.com/article/206455/Microsoft_Live_Spaces_Moves_to_WordPress_An_FAQ.html) till now, I have is over 1 million viewers. This blog is about more than 50% telling you how to resolve error messages, especial for Microsoft products. The blog also has a lot of guidance teaching you how to get stated certain Microsoft technologies. The blog also uses as a help to keep my memory. The blog is never meant to give people consulting services or silver bullet solutions. It is a contribution to the community. Thanks for your support over the years. Ming Man is Microsoft MVP since year 2006. He is a software development manager for a multinational company. With 25 years of experience in the IT field, he has developed system using Clipper, COBOL, VB5, VB6, VB.NET, Java and C #. He has been using Visual Studio (.NET) since the Beta back in year 2000. He and the team have developed many projects using .NET platform such as SCM, and HR based applications. He is familiar with the N-Tier design of business application and is also an expert with database experience in MS SQL, Oracle and AS 400.

View all posts by chanmingman →