This blog article shows you how to predict a value with Machine Learning using Python. I use Diamond price as an example.
Diamonds have long fascinated us with their brilliance, rarity, and timeless allure. Whether you’re a jewelry enthusiast, a gemologist, or simply curious about the world of diamonds, understanding their pricing can be both intriguing and practical. In this short blog, we’ll explore how machine learning can help predict diamond prices based on their carat weight.
The Data
Imagine you have a dataset containing information about various diamonds. Each diamond is described by two key features:
- Carat Weight: The weight of the diamond (measured in carats).
- Price: The price of the diamond (in dollars).
Our goal is to build a model that can predict the price of a diamond based on its carat weight. Let’s dive into the steps!
Step 1: Data Collection
First, gather data. You might have a CSV file named “diamonds.csv” with columns for carat weight and price. Load this data into a Pandas DataFrame.
Step 2: Data Splitting
Split the data into training and testing sets. We’ll use 80% for training and 20% for testing.
Step 3: Train a Linear Regression Model
We’ll use a simple linear regression model. This model assumes a linear relationship between carat weight and price. Initialize the model and fit it to the training data.
Step 4: Make Predictions
Now comes the exciting part! Use the trained model to predict prices for new diamonds based on their carat values.
Step 5: Evaluate the Model
Calculate metrics like Mean Squared Error (MSE) and R-squared to assess the model’s performance.
Visualizing the Results
To visualize our predictions, let’s create a scatter plot. The x-axis represents carat weight, the y-axis represents price, and the red line shows our regression model’s prediction.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Load the diamond dataset from the CSV file
diamond_data = pd.read_csv(“diamondsprice.csv”)
# Extract features (carat) and target (price)
X = diamond_data[“carat”].values.reshape(-1, 1)
y = diamond_data[“price”].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f”Mean Squared Error: {mse:.2f}”)
print(f”R-squared: {r2:.2f}”)
# Plot the regression line
plt.scatter(X_test, y_test, color=”blue”, label=”Actual”)
plt.plot(X_test, y_pred, color=”red”, label=”Predicted”)
plt.xlabel(“Carat”)
plt.ylabel(“Price”)
plt.title(“Diamond Price Prediction”)
plt.legend()
plt.show()
# Example prediction for a new diamond with carat=1.5
new_diamond_carat = 1.5
predicted_price = model.predict([[new_diamond_carat]])
print(f”Predicted price for a {new_diamond_carat:.2f} carat diamond: ${predicted_price[0]:,.2f}”)

Remember that in a real-world scenario, you’d need to preprocess data, handle missing values, and fine-tune your model. But this example should give you a starting point!
Source code download: https://github.com/chanmmn/python/tree/main/2024/MachineLearningPredict/?WT.mc_id=DP-MVP-36769
Reference: https://scikit-learn.org/stable/
Pingback: Create a Windows Form using Python | Chanmingman's Blog
Pingback: Prediction using Random Forest Classifier with Python | Chanmingman's Blog