Sales Data Analysis Project: 10 visualizations crafting the roadmap to business growth

and two things I’ve learned from this project

Kim W.
12 min readFeb 22, 2024

“Sales Performance Meeting” — — it may sound soooo dull and dreary to some of you, but I hope you won’t feel the same way here. Analyzing data allows me to see what’s going on in the company without being in the boardroom. In this Superstore Sales Analysis project, I’ll delve into a 4-year sales history dataset and share the insights I uncover. I hope you find it enjoyable to read! 😆

Getting started on this data analysis project…

There are quite a few ways to play with this dataset. For now, I’m going to explore the data from a business perspective, such as sales, marketing, and merchandising. Imagine the perspective of a curious, newly onboarded team member in the business analytics team trying to get the first look of the big picture.

Good starting points will be:

  • 4-Year Overview: What are the key trends in sales & profit over the past four years? I will look at the performance across regional, customer segment, and product category dimensions.
  • Next-Year Forecast: What is the forecast for demand in the next year?

This is just an initial exploration that scratches the surface. It will lay the foundation for deeper analysis to come later after this project; other refined data products such as a sales report or a dashboard will not be discussed here.

A first look at the dataset:

My “Superstore Sales Analysis” project uses the dataset originates from Tableau. According to the official description, it “contains information about products, sales, and profits that you can use to identify key areas of improvement within this fictitious company.”

Let’s have a sneak peek into the dataset:

A glimpse of the Superstore dataset from Tableau, consisting of 3 tables: Orders, Returns, and People.
  • The spreadsheet consists of 3 tabs: Orders, Returns and People. “Orders” will be the main focus here as the other 2 tabs are not as comprehensive.
  • The “Orders” data comprises 21 columns and nearly 10,000 rows. It’s a perfect balance in size for a small project: large enough to derive meaningful insights, yet not so big to complicate the workflows.
  • The order dates span from January 3, 2014, to December 30, 2017, spanning 4 full years.
  • “Returns” data lacks specific product details or return costs. Therefore, I’d use it only for assessing customer satisfaction, rather than product performance.
  • “People” data seems to denote the regional sales managers across various regions. It would be valuable for analyzing the performance of each sales representative.

Let’s dive in!! Oh wait, Data Prep first.

The original data is pretty clean already. Just a little prep work needed before we dive into the analysis:

  • The column labeled “Sales” appears ambiguous to me. Given the presence of “Discount” column, I will interpret “Sales” as referring to “revenue” or “net sales”, which is the income generated. I’ll rename “Sales” to “Net Sales”.
  • After examining the data more closely, I calculated the following metrics using Python based on the available information, and added them into the .csv data source that will be used for the visualization.
    - Original Item Price
    - Discount Amount
    - Profit Margin
    - Lead Time
    - Customer’s nth Order
    , which will be used to understand Customer Order Frequency for customer analysis
    - Days Since Customer’s Last Order, which will be used to understand Customer Order Recency, also an important factor in customer analytics
  • One issue was spotted. I found 64 Product IDs (less than 4% of the entire catalog) are associated to multiple different Product Names. Similarly, 16 Product Names are associated with multiple Product IDs. This situation may indicate underlying issues in the data management process or other operational challenges. However, it’s possible that a Product ID may correspond to multiple variations. Moving forward in this project, I will prioritize using Product Name as the primary key for analyzing sales performance. Nevertheless, Product ID may still be relevant for inventory planning purposes.
A potential many-to-many relationship data issue spotted between Product ID and Product Name.

Okay let’s start!

First, let’s look at the overall trends in Net Sales and Profits over the last four years:

[1.01] 4-year overview of net sales and profit

At first glance, it appears to be a small-medium business, with annual revenues below $1 million dollars. The profit margin has been consistently above 10%, which is generally considered as a healthy metrics.

The sales had a slight drop in 2015 but have been increasing steadily then. As for profits, they have been rising all along, even when the sales went down. However, despite the significant surge in net sales in 2016 and 2017, the profit growth did not mirror the trend as closely, which indicates that the costs of operation rose as sales grew.

Understanding the factors behind the sales decline in 2015 and what drove the growth will help the business further develop the strategy in the upcoming years. A detailed P&L analysis is also crucial to understand if there is any room for more efficient cost management and process optimization, so that profits will rise along with the sales.

Now let’s drill down to see quarterly trends.

[1.02] 4-year overview quarterly trends

As expected, Sales, Discounts, Item Sales Quantity, and the Number of Orders all show similar patterns across the quarters, with consistent growth throughout the year and peak in Q4. This aligns with typical consumer behavior during the holiday seasons.

Such seasonality in demand underscores the importance of flexible staffing in the warehouse to ship the orders and flexible inventory management to navigate both the busiest and slowest seasons effectively. In addition, careful cash flow management in Q1 is expected.

While an increase in discounts tends to boost sales, the correlation between these two factors is not particularly strong. This would need further analysis to evaluate the effectiveness of promotional campaigns. Detailed analysis may include which products are discounted out of the whole catalog, purchase volumes of these discounted items, return on investment, among other factors.

After having an idea of the overall sales performance, let’s break it down by category:

[1.03] 4-year overview of product categories

All three categories have relatively close amounts each year, without any category significantly outpacing the others in terms of net sales volume.

However, Technology category is the most successful segment, with the highest sales growth and efficiency, making its strategic importance to the company. Having a risk mitigation strategy could also solidify its position. Despite such success in sales, Technology category performance in Item Sales Quantity remains relatively flat, suggesting limited potential market growth. A broader market reach or product innovation may be needed for the upcoming years.

Furniture category has the lowest profitability with minimal growth. This calls for better cost control and more efficient supply chain. The business strategy for this category needs to be re-evaluated.

Office Supplies has the highest volume of sales but has lower profitability. This suggest a volume-pricing strategy could be beneficial. Bulk purchases and thus lower average shipping cost per item could also be leveraged in terms of cost reduction.

Overall, each category has its own unique opportunities and challenges. The company needs to understand each category’s role to the overall success in order to develop a holistic strategy.

Now shifting our focus to geographical regions:

[1.04a animated gif] 4-year net sales trend map

Unsurprisingly, large states like California, Texas, and New York dominates sales volume. Notably, the states in the East has the most widespread sales presence, while the Mountain area has less activity. Washington state emerges as a key growth state in the Pacific region along with California.

[1.04b animated gif] 4-year profit trend map

However, profitability has a different picture. Despite leading in sales, Texas shows negative profit for the past four years. In contract, California and New York excelled both in sales and profitability.

This difference suggests potential cost factors specific to each region, such as shipping, returns, taxes, or different customer preferences and product choices. Further researches into these areas could lead to more insights.

Let’s now take a quick look at customer segmentation analysis:

[1.05] 4-year overview of customer segments

The Consumer segment emerges as the dominant segment, having the highest number of customers, orders, and profitability. This suggests that this segment is the most focused demographic for the company, followed by Corporate and Home Office segment.

Yet, the Average Order Value (AOV) tells a different story. The AOV for the Consumer segment has been continuously decreased, dropping 25% from $510 to $380 over the past four years. As this contrasts with the rising profit trend, it suggests that this outcome may have been anticipated and strategically planned. Supply chain efficiency for products selected by this segment likely is critical for the success.

On the other hand, Home Office segment has an upward trend in AOV, becoming the highest AOV among all segments, slightly above $500 in 2017. This inspires further investigation into the product preferences of each segment to identify optimize pricing and profitability.

Another interesting point is the Corporate segment’s profit decline by 14% in 2017, despite a positive trend in number of orders and sales (not shown). An in-depth cost analysis is essential to uncover the cause.

Now let’s see the distribution between new and returning customers over the years:

[1.06] 4-year overview of new/returning customers

While there is a noticeable annual increase in the number of orders as we’ve seen previously, the growth of new customers remains minimal. By 2017, nearly all orders are from returning customers. This trend could indicate market saturation, a shift in market demands, or the need to improve the acquisition strategies.

However, customer loyalty shows as a strong advantage in the business, as evidenced by the strong trends towards growing retention, highlighting the strong value of fostering long-term relationships. As each customer segment has the similar distribution, a deeper analysis is needed to ensure sustained growth in the customer base.

(New Customer definition: a “new customer” refers to a customer placing their 1st order with the company in his/her lifetime. Subsequent orders from the same customer are all categorized as “returning customer” orders, regardless of how recent the subsequent orders are from the 1st order.)

What about the time it took for returning customers to make another purchase?

[1.07] Median days since last order across customer segments; excluding 2014 data due to lack of purchase history in 2013

Across all customer segments over the years, the interval between purchases has been decreasing, dropping to about 4 months in 2017 from 5 months in 2015, showing an increase in frequency. Such information can be used for the marketing team to identify potential churn risks and to offer re-engagement campaigns in time.

Notably, throughout this period, the Home Office segment showed the longest purchase intervals. This suggests an opportunity for the marketing team to tailor specific incentive programs particularly for this segment.

Now, I’d like to look into the sales-profit distribution across the customer base:

The scatter chart shows that a significant majority of the customers generated less than $5000 in revenue and under $2000 in profits over the past four years. This aligns with the 80/20 rule, and the company may look at whether the resources were allocated accordingly and whether the efforts were maximized.

[1.08] profit, net sales, and discount amount of each customer

There are some outliers that may need the company’s attention. For instance, the customer Sean Miller has placed orders worth about $25,000 but the business incurred a loss of $2000, and this account has applied the most discounts. This raises questions about whether any exceptions were made for this customer due to his exceptional sales volume. A more sustainable relationship is needed moving forward.

Similarly, the customer Tamara Chand stands out as the most profitable customer, highlighting the importance for the sales representative of this account to maintain and strengthen this valuable relationship. The discount amount this customer applied is minimal, and the sales representative may consider offering customized discounts to show the company’s appreciation for this relationship.

Similarly, let’s look at the distribution of the subcategories:

[1.09] profit, net sales, and count of products of each subcategories

Previously we noticed that the Furniture was the least profitable category. A deeper look shows that Bookcases and Tables subcategories are the major contributors to the low profit. A more closer analysis of the cost structure at the products for these two subcategories is important to address this financial drag.

Turning into Technology category, the Machines subcategory is underperforming compared to other subcategories due to high sales volume but low profit margin. In comparison, some Office Supplies subcategories with much lower sales generate higher profits. Potential issues with pricing, cost, or product selection within Machines subcategories may exist.

On the other hand, Copiers, despite having the fewest products, shows moderately high sales and profits. This suggests potential competitive advantages due to high unit price, margins, or an effective sales strategy. Assuming the company can sustain those strengths in the competition, the company may consider an expansion of this product line and solidify the company’s market success.

Lastly, let’s make a sales forecast for 2018:

[1.10] next year sales forecast

Tableau significantly simplifies the sales forecasting process. However, these statistical models offer a foundational baselines, and it is crucial to integrate broader business considerations to enhance the precision of the forecast.

Here are some additional areas I’d consider for a better forecast:

  • Growth goals across various departments: it’s important to align revenue forecast with the company’s overall growth objectives, whether it’s about categories, regions, or customer segments. The forecast may be adjusted in response to market expansions, product launches, or other initiatives.
  • Communication with marketing and sales teams: knowledge about upcoming campaigns, customer acquisition plans, or other sales drivers is essential for refining the forecast. This would ensure that the prediction account for the planned and targeted demands.
  • Coordination with merchandising and product teams: sales forecast would need to consider inventory availability that may otherwise hinder order fulfillment, while excessive stock creates carrying costs that may burden the profitability. Effective sales forecasting and inventory planning are partners in progress.
  • Beyond its immediate use for sales team, a forecast also serves as a strategic guide for financial planning. It helps in resource allocation, budgeting, and operational planning for the upcoming year. Also being as a goal-setting reference point, the sales forecast must incorporate diverse inputs alongside statistical modeling.

By incorporating these considerations, the sales forecast can be improved and helps drive strategic decision-making across the company.

And with that, let’s wrap up:

This preliminary analysis of the datasets is completed through 10 visualizations. Here are the key takeaways from this dataset:

Positives:

  • 20% YoY revenue growth in 2017, showing healthy, solid business performance.
  • The Technology category is the star of all categories. Kudos to the category manager!

Areas for Improvement:

  • The Furniture category needs profit optimization.
  • Address the profit loss in Texas and specific customer accounts.
  • Customer loyalty is strong, but consider exploring more aggressive new customer acquisition strategies to solidify the company’s market position and to mitigate risks.

Two things I’ve learned from my first data project:

  1. Like any project, planning is crucial and often the most time-consuming phase. However, like adventures, data projects thrive on the freedom to explore. I have found that the dynamic balance between rigorous planning and agile adaptability makes the journey just as rewarding as the outcome. This reminds me of a quote:

“Writing a novel is like driving a car at night. You can only see as far as your headlights, but you can make the whole trip that way.” — E.L. Doctorow

2. AI (and many other people) is able to do the job, sure, but I still need to grow, personally and professionally, as one little human being. Anxiety is my enemy. Self-doubt is my enemy. Don’t let these things blunt my mind or numb my heart.

I hope you find this project interesting to read! 🥳

--

--