Car Price Prediction Machine Learning Model

by | Jan 18, 2023 | Coding Projects | 0 comments

What We Do

Software & SaaS Development

Delivered 100+ SaaS Solutions. An expert team capable of converting your ideas into reality.

Custom Mobile Apps Design & Development

Fast Development, Fast Deployment. We develop native apps compatible with both Android & iOS.

AI & Augmented Reality

Agentic Workflows, Process Automation and AI Integration. Our team will help you to deliver AI Apps within 4 weeks.

Introduction

A car price prediction machine learning model is a type of algorithm that uses historical data on car sales and features to predict the price of a car. The model is trained on a dataset of car information such as make, model, year, mileage, condition, and the corresponding sale price. Once trained, the model can be used to predict the sale price of a car based on its features. Common techniques for creating a car price prediction model include linear regression, decision trees, and random forests.

 

Objectives

The objective behind building this car price prediction machine learning model is

  • To predict the price of a car so that we can get our car according to our own utility and demand balanced according to our price range.
  • To help businesses in the automobile industry to set standards to meet the requirements of the users and can also grow their businesses accordingly.
  • To use historical data to train the model and make predictions on new, unseen data.

Requirements

To build a car price prediction model using Python, you will need the following:

  • A dataset of car information: This dataset should include features such as make, model, year, mileage, and condition, as well as the corresponding sale price.
  • Python programming language: You must install Python on your computer to build the model.
  • Required Libraries: You will need to install libraries such as numpy, pandas, scikit-learn, and matplotlib. These libraries are used in Python for data manipulation, visualization, and machine learning.
  • Jupyter Notebook/ IDE: You will need a development environment such as Jupyter Notebook or an IDE to write and run the code for the model.
  • Understanding of Machine Learning concepts and Python programming.

Source Code

import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

%matplotlib inline

df=pd.read_csv('car data.csv’)

df.shape

print(df['Seller_Type'].unique())

print(df['Fuel_Type'].unique())

print(df['Transmission'].unique())

print(df['Owner'].unique())

##check missing values

df.isnull().sum()

df.describe()

final_dataset=df[['Year','Selling_Price','Present_Price','Kms_Driven','Fuel_Type','Seller_Type','Transmission','Owner']]

final_dataset.head()

final_dataset['Current Year']=2022

final_dataset.head()

final_dataset['no_year']=final_dataset['Current Year']- final_dataset['Year']

final_dataset.head()

final_dataset.drop(['Year'],axis=1,inplace=True)

final_dataset.head()

final_dataset=final_dataset.drop(['Current Year'],axis=1)

final_dataset.head()

final_dataset.corr()

sns.pairplot(final_dataset)

import seaborn as sns

#get correlations of each features in dataset

corrmat = df.corr()

top_corr_features = corrmat.index

plt.figure(figsize=(20,20))

#plot heat map

g=sns.heatmap(df[top_corr_features].corr(),annot=True,cmap="RdYlGn")

final_dataset.head()

X=final_dataset.iloc[:,1:] # independent feature

y=final_dataset.iloc[:,0] # dependent feature (selling price)

X.head()

y.head()

# feature importance

from sklearn.ensemble import ExtraTreesRegressor

model = ExtraTreesRegressor()

model.fit(X,y)

print(model.feature_importances_) # according to the value this tells us the importance of features

#plot graph to better visualize feature importances

feat_importances = pd.Series(model.feature_importances_, index=X.columns)

feat_importances.nlargest(5).plot(kind='barh')

plt.show()

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

X_train.shape

from sklearn.ensemble import RandomForestRegressor

regressor=RandomForestRegressor()

# hyperparameters for decision trees

n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1200, num = 12)]

print(n_estimators)

# Number of trees in random forest

n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1200, num = 12)]

# Number of features to consider at every split

max_features = ['auto', 'sqrt']

# Max number of levels in the tree

max_depth = [int(x) for x in np.linspace(5, 30, num = 6)]

# max_depth.append(None)

# Min number of samples that are required to split a node

min_samples_split = [2, 5, 10, 15, 100]

# Min number of samples that are required at each leaf node

min_samples_leaf = [1, 2, 5, 10]

from sklearn.model_selection import RandomizedSearchCV

#Randomized Search CV

# Create the random grid

random_grid = {'n_estimators': n_estimators,

'max_features': max_features,

'max_depth': max_depth,

'min_samples_split': min_samples_split,

'min_samples_leaf': min_samples_leaf}

print(random_grid)

# Using the random grid to search best hyper-parameters

# First create the base model to tune

rf = RandomForestRegressor()

# Random search of parameters by using 3 fold cross validation

# search across 100 different combinations

rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid,scoring='neg_mean_squared_error', n_iter = 10, cv = 5, verbose=2, random_state=42,n_jobs=1)

rf_random.fit(X_train,y_train)

rf_random.best_params_

rf_random.best_score_

predictions=rf_random.predict(X_test)

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions))

print('MSE:', metrics.mean_squared_error(y_test, predictions))

print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

import pickle

# open a file, where you want to store the data

file = open('random_forest_regression_model.pkl', 'wb')

# dump information to that file

pickle.dump(rf_random, file)

Explanation of the Code

1. Initially, we imported the dataset and all the necessary libraries that were needed to build our model.

2. Then, we checked for the null values in our dataset, and if present, we removed them accordingly.

3. According to our features, we have cleaned our dataset and dropped some of the columns which are not useful in our model-building process.

4. Then, in the next section, we started our train test split phase and trained the model with Random Forest Classifier, and then with the Randomized Search CV, we selected the best number of attributes for our model building.

5. We have created some plots and visualizations to get insights from our dataset more concisely.

6. Then, accordingly, we predicted the values after the training phase was done.

Output

Car Price Prediction Machine Learning Model

Conclusion

Hence we have successfully built the car price prediction machine learning model. This model will predict the price of a car based on given features in our dataset, which will help individuals to select the best-suited car according to their own utility and demand. Hence this model can also help businesses grow and increase revenues.

 

Get Started

Let’s Build The Future Together

Products

API

Services

Design

Consultation

Development

Copyright © 2026 RUDE LABS. All Rights Reserved.