Analyzing Sales Data and Building a Sales Dashboard

data codes through eyeglasses
Photo by Kevin Ku on Pexels.com

The objective of this project was to analyze the sales data to identify trends, patterns, and seasonality, and build a forecasting model to predict future sales.

Here are steps that were taken to analyze the data as well as results from the analysis

Step 1: Read the file

Step 2: Check the dataset’s structure using df.head(), df.info(), and df.describe() to understand the data types, missing values, and overall distribution.

Step 3: Identify and handle any missing values

In this data set there are 11 records that have missing values.

Step 4: Handle Missing Values

You need to identify columns with missing values and decide how to handle them:

  • Drop columns with a high percentage of missing values if they are not critical.
  • Fill missing values using appropriate methods (e.g., mean, median, mode, or forward/backward fill).

# Drop columns with more than 50% missing values (if any)

threshold = len(df) * 0.5

df = df.loc[:, df.isnull().sum() <= threshold]

# Fill missing values for numerical columns with the median

numerical_cols = df.select_dtypes(include=[‘float64’, ‘int64’]).columns

df[numerical_cols] = df[numerical_cols].fillna(df[numerical_cols].median())

# Fill missing values for categorical columns with the mode

categorical_cols = df.select_dtypes(include=[‘object’]).columns

df[categorical_cols] = df[categorical_cols].fillna(df[categorical_cols].mode().iloc[0])

# Verify if there are any remaining missing values

print(“Remaining missing values:\n”, df.isnull().sum())

Based on the data type the order date and the shipping date are not in the right format. Convert these into the date format.

Step 4: Remove Duplicate Values

Check for duplicates and remove. In this data set there are no duplicates.

Step 5. Check the dataset and save the cleaned dataset

Step 6:

Exploratory Data Analysis

Univariate Analysis: Analyze sales, profit, and other numeric columns using descriptive statistics and visualizations.

Bivariate Analysis: Explore relationships between features, such as Sales vs. Profit, using scatter plots or heatmaps.

Right-Skewed Distribution:

  • The distribution is heavily right-skewed, meaning most of the sales transactions are concentrated at lower values, with a few transactions having significantly higher sales amounts. This is common in sales data, where most transactions are of smaller value, and only a few are high-value sales.

Majority of Sales:

  • The majority of the sales amounts fall within a lower range (close to 0 to 1000). This suggests that smaller sales are much more frequent compared to high-value sales.

Outliers:

  • There are a few sales transactions with extremely high values (over 10,000), which could be outliers. You might want to investigate these further to understand if they are genuine transactions or if they need special handling (e.g., excluding from certain analyses).

Sales by Category

Highest Sales in Technology:

  • The ‘Technology’ category has the highest total sales, with over 250,000 in sales. This indicates that technology products are the most popular or have the highest value in sales for this dataset.

Moderate Sales in Furniture:

  • The ‘Furniture’ category comes second, with total sales slightly above 200,000. This suggests that while Furniture is a significant category, it doesn’t generate as much sales revenue as Technology.

Lowest Sales in Office Supplies:

  • The ‘Office Supplies’ category has the lowest total sales, with sales figures around 150,000. This might mean that the items in this category have lower prices, or they are less in demand compared to Technology and Furniture.

Insights & Recommendations

  • Focus on Technology Products: Since the Technology category has the highest sales, it may be beneficial to focus on marketing and inventory for technology products, as they have the greatest revenue-generating potential.
  • Evaluate Office Supplies Strategy: The lower sales in the Office Supplies category could be an opportunity to investigate pricing, promotion, or product range to increase sales.
  • Balanced Portfolio: Given that all three categories contribute a substantial amount to overall sales, the company should maintain a balanced product mix but prioritize high-performing categories like Technology.

Sales by Region

Highest Sales in the East and West Regions:

  • The ‘East’ and ‘West’ regions have the highest total sales, with sales values above 175,000. These two regions appear to be the most significant contributors to overall sales.

Moderate Sales in the Central Region:

  • The ‘Central’ region has slightly lower sales compared to the East and West but still contributes a significant amount (around 150,000). This suggests it is an important region but not as dominant as the top two.

Lowest Sales in the South Region:

  • The ‘South’ region has the lowest sales, with values significantly below the other regions. This might indicate lower demand, fewer customers, or less effective sales strategies in this area.

Insights & Recommendations

  • Focus on High-Performing Regions (East and West): The East and West regions are performing exceptionally well. It might be beneficial to investigate the factors contributing to their success and continue investing in marketing, sales, or product offerings in these regions.
  • Improve Sales Strategies in the South: Since the South region shows the lowest sales, it would be worth exploring the reasons behind this. Consider conducting further analysis to identify any barriers, such as fewer customers, lower demand, or ineffective sales strategies, and implement targeted marketing campaigns or promotions to boost sales.
  • Analyze the Central Region for Growth Potential: The Central region is performing moderately well, suggesting there may be potential for growth. Evaluate the sales trends, customer demographics, and preferences to identify opportunities for increasing sales.

Sales over time

Analysis

  1. Seasonal Peaks:
  • There are noticeable spikes in sales around the beginning of each year, particularly in early 2017 and early 2018, where sales reached the highest peak.
  • This suggests there might be a seasonal trend, possibly due to holiday sales, end-of-year spending, or other annual events that cause a significant increase in sales during this period.

Declines in Sales:

  • There are several noticeable dips in sales, particularly in mid-2015, early 2016, late 2017, and late 2018.
  • These declines could indicate seasonal lulls, market downturns, or other external factors affecting sales during these times.

Volatility:

  • The sales trend shows a high degree of volatility with sharp increases and decreases over short periods. This might indicate sensitivity to external factors like marketing campaigns, product launches, economic changes, or seasonal events.

Consistent Low Points:

  • Sales consistently hit low points in the middle of each year, particularly around July to August. This could suggest a seasonal dip in customer demand during the summer months.

Insights and Recommendations

  • Leverage Peak Sales Periods: Given the noticeable spikes at the start of each year, it would be advantageous to plan marketing campaigns, promotions, or inventory management strategies to capitalize on these high sales periods.
  • Investigate Low Sales Periods: Identify the factors contributing to the regular dips in sales, particularly during the mid-year period, and consider strategies to boost sales during these months (e.g., targeted promotions, discounts, or product launches).
  • Understand Seasonality: The apparent seasonality in sales suggests that factors such as holidays, weather, or fiscal periods influence purchasing behavior. A more detailed investigation could help fine-tune strategies to address these seasonal changes.

Forecasting sales

Initial Increase Followed by Gradual Decline:

  • The forecast starts with higher predicted sales values (e.g., 26,504 for the first month and 33,303 for the second month) before gradually declining in the later months (e.g., down to 5,666 in the last forecasted month).
  • This suggests that the model has captured an initial trend where sales might be expected to rise initially but then decline over the forecast period.

Seasonality:

  • Since you have specified seasonal=’add’ with seasonal_periods=12, the model is accounting for an annual (12-month) seasonality pattern. This means the model expects some months to have consistently higher sales than others based on the historical data.
  • The pattern of rising and then declining sales indicates that there could be seasonal effects causing fluctuations in sales over the year.
  1. Realistic Positive Values:
  • Unlike the ARIMA model, which produced negative forecasts, the Holt-Winters model has generated positive sales forecasts throughout. This indicates that the model is more appropriate and realistic for your sales data.

Insights

  • Seasonal Peaks and Dips: The forecast suggests that there might be periods of high sales followed by a decline, indicating seasonality in your data. For instance, high sales in the earlier months followed by lower sales later might correspond to specific seasonal demand, such as holiday or year-end purchases.
  • Data Trend: The model captured the trend component effectively, suggesting that your data has an upward trend initially but decreases toward the end of the forecast period.

Recommendations for Action

  • Seasonal Strategies: Given the forecast’s seasonality, focus on preparing for the peak sales periods (e.g., stock up inventory, launch marketing campaigns) and find ways to boost sales during the low-demand months.
  • Monitor Actual Sales vs Forecast: Track actual sales against these forecasts to see if the trend follows this pattern. It will help refine future models.

Dashboard

I used streamlit to build a dashboard