Overview

This project involves automating the scraping of data from Cars.com using Python scripts and analyzing the data for insights. It focuses on Lexus GX 460 listings, with scripts for scraping, cleaning, and analysis.

Files

  • carsDotComScrape.py: Python script for scraping data related to the Lexus GX 460 model from Cars.com.
  • carsFileCleaning.py: Python script for cleaning the scraped data, including removing irrelevant information and formatting data.
  • GX460_scrape.py: Script combining scraping and cleaning operations specific to Lexus GX 460.
  • carsVis.ipynb: Jupyter Notebook for data visualization and analysis using matplotlib, seaborn, and scikit-learn.
  • output_DATE.csv: Raw scraped data from Cars.com listings.
  • output_DATE_cleaned.csv: Cleaned and formatted data ready for analysis.

Usage

  1. Scraping and Cleaning Data:
    • Edit GX460_scrape.py to modify the URL with desired filters for scraping Lexus GX 460 data.
    • Run GX460_scrape.py to extract and clean data, storing it in output_DATE.csv.
  2. Visualizing Data:
    • Open and run carsVis.ipynb in Jupyter Notebook to visualize cleaned data and perform analysis.

Example Plots