Linear Regression – Problem Statements


Linear Regression – Problem Statements

(1) Marketing Promotion TV vs Radio vs Social Media

  • Each row corresponds to an independent marketing promotion where the business uses TV, social media, radio, and influencer promotions to increase sales.
  • The features in the data are:
    • TV promotional budget (in “Low,” “Medium,” and “High” categories)
    • Social media promotional budget (in millions of dollars)
    • Radio promotional budget (in millions of dollars)
    • Sales (in millions of dollars)
    • Influencer size (in “Mega,” “Macro,” “Micro,” and “Nano” categories)

(2) USA Optimal Product Price Prediction Dataset

  • This dataset contains product prices from Amazon USA, with a focus on price prediction.
  • With a good amount of data on what price points sell the most, you can train machine learning models to predict the optimal price for a product based on its features and product name.
  • Your objective is to create a prediction model that will assist sellers in pricing their products within the optimal price range to generate the most sales.
  • The dataset includes various data points, such as the number of reviews, ratings, best seller status, and items sold last month.
  • You can select specific factors (e.g., over 100 reviews = optimal price for the product) and then divide the dataset into products priced optimally vs products priced unoptimally.
  • By utilizing techniques like vectorizing product names and features, you can train a model to provide the optimal price for a product, which sellers or businesses might find valuable.

(3) Experience Salary Dataset

  • This dataset contains information on the relationship between work experience (in months) and corresponding monthly salaries (in thousand dollars) of employees across various industries.
  • It is designed to help data enthusiasts and aspiring data scientists practice linear regression techniques by analyzing and modelling salary predictions based on experience.

(4) Melody Metrics: Decoding Song Popularity

  • “Melody Metrics: Decoding Song Popularity
  • Dive deep into the rhythm of data with “Melody Metrics”, a curated dataset designed to unravel the mystique behind song popularity.
  • Harness the power of machine learning to explore the harmonic interplay between various song attributes and their influence on a track’s success.
  • Whether you’re a data maestro or just starting your symphony in data science, this dataset offers a melodious challenge to fine-tune your skills.
  • train.csv: This file sets the stage, comprising 80% of the total data. Each row echoes a unique song, resonating with features like song length, tempo, genre, and more, culminating in the ‘Popularity’ score—a measure of the song’s success, waiting to be predicted.
  • test.csv: The crescendo of your model’s performance! This file houses the remaining 20% of the data. While it shares the same features as the training set, the ‘Popularity’ score remains a mystery, urging you to predict and harmonize with the hidden patterns.

(5) Used Car Price Prediction.

  • This dataset contains information about used cars.
  • This data can be used for a lot of purposes such as price prediction to exemplify the use of linear regression in Machine Learning.
  • The columns in the given dataset are as follows:
    • name
    • year
    • selling_price
    • km_driven
    • fuel
    • seller_type
    • transmission
    • Owner

(6) Salary Prediction – Simple Linear Regression.

  • Predict the salary of the candidate based on year of experience.
  • Dataset contains two columns.
    • YearsExperience
    • Salary

(7) Student Performance Prediction.

  • The Student Performance Dataset is a dataset designed to examine the factors influencing academic student performance.
  • The dataset consists of 10,000 student records, with each record containing information about various predictors and a performance index.
  • Input Variable :  
    • Hours Studied: The total number of hours spent studying by each student.
    • Previous Scores: The scores obtained by students in previous tests.
    • Extracurricular Activities: Whether the student participates in extracurricular activities (Yes or No).
    • Sleep Hours: The average number of hours of sleep the student had per day.
    • Sample Question Papers Practiced: The number of sample question papers the student practiced.
  • Target Variable : 
    • Performance Index: A measure of the overall performance of each student. The performance index represents the student’s academic performance and has been rounded to the nearest integer. The index ranges from 10 to 100, with higher values indicating better performance.

(8) Student Performance Prediction.

  • The problem that we are going to solve here is that given a set of features that describe a house in Boston, our machine learning model must predict the house price. 
  • In this dataset, each row describes a Boston town or suburb. There are 506 rows and 13 attributes (features) with a target column (price).

(9) Graduate Admission Prediction.

  • This dataset is created to predict Graduate Admissions from an Indian perspective.
  • The dataset contains several parameters considered important during the application for Masters Programs.
  • The parameters included are :
    • GRE Scores ( out of 340 )
    • TOEFL Scores ( out of 120 )
    • University Rating ( out of 5 )
    • Statement of Purpose and Letter of Recommendation Strength ( out of 5 )
    • Undergraduate GPA ( out of 10 )
    • Research Experience ( either 0 or 1 )
    • Chance of Admit ( ranging from 0 to 1 )

(10) Salary Prediction Based On Position And Levels.

  • In This Dataset, we have 3 columns and ten rows and it’s about a Company Where we will see the levels and how much salary is offered by the company for each level.
  • Different positions include Business Analyst, Junior Consultant, Senior Consultant, Manager, Country Manager, Region Manager, Partner, Senior Partner, C.level, CEO.

(11) Salary Prediction Based On Employee Details.

  • This dataset contains information about the salaries of employees at a company.
  • Each row represents a different employee, and the columns include information such as age, gender, education level, job title, years of experience, and salary.

(12) Insurance Premium Payment Prediction.

  • Here in This Dataset, we have only 2 columns the first one is Age and the second one is Premium You can use this dataset in machine learning for Simple linear Regression and for Prediction Practices.

(13) Walmart Store Sales Prediction

  • Understand the Dataset & cleanup (if required).
  • Build Regression models to predict the sales w.r.t single & multiple features.
  • Also, evaluate the models & compare their respective scores like R2, RMSE, etc.
  • Store – the store number
  • Date – the week of sales
  • Weekly_Sales – sales for the given store
  • Holiday_Flag – whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
  • Temperature – Temperature on the day of sale
  • Fuel_Price – Cost of fuel in the region
  • CPI – Prevailing consumer price index
  • Unemployment – Prevailing unemployment rate
  • Holiday Events\
    Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13\
    Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13\
    Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13\
    Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13

(13) Uber Fair Price Prediction.

  • Estimate the fare prices of the trips.
  • The datset contains the following fields:

    • key – a unique identifier for each trip
    • fare_amount – the cost of each trip in usd
    • pickup_datetime – date and time when the meter was engaged
    • passenger_count – the number of passengers in the vehicle (driver entered value)
    • pickup_longitude – the longitude where the meter was engaged
    • pickup_latitude – the latitude where the meter was engaged
    • dropoff_longitude – the longitude where the meter was disengaged
    • dropoff_latitude – the latitude where the meter was disengaged.
  • Objective:
    • Understand the Dataset & cleanup (if required).
    • Build Regression models to predict the fare price of uber ride.
    • Also evaluate the models & compare thier respective scores like R2, RMSE, etc.

(14) CO2 Emission By Vehicles

  • Determine or test the influence of different variables on the emission of CO2.
  • What are the most influencing features that affect CO2 emission the most?
  • Will there be any difference in the CO2 emissions when Fuel Consumption for City and Highway are considered separately and when their weighted variable interaction is considered?

(15) Predict Student Marks Based On Study Hours.

  • Number of hours students studied and Marks they got.
  • The data set contains two columns. that is the number of hours students studied and the marks they got.
  • we can apply simple linear regression to predict the marks of the student given their number of study hours.

Leave a Reply

Your email address will not be published. Required fields are marked *