How to use Python for demand forecasting in consumer goods

published on 04 April 2024

In this guide, we'll explore how to leverage Python for effective demand forecasting in the consumer goods sector. Here's what you need to know:

  • Demand Forecasting: Understanding future customer demand to balance stock levels.

  • Why Python?: It's versatile for data handling, machine learning, and making data-driven decisions.

  • Data Preparation: Gathering and cleaning historical sales data.

  • Exploratory Data Analysis: Identifying sales trends and seasonal patterns.

  • Feature Engineering: Enhancing data with additional insights.

  • Model Selection: Choosing between SARIMAX, Prophet, and XGBoost for accurate predictions.

  • Model Training & Evaluation: Fine-tuning and comparing models to select the best one.

  • Implementing Forecasts: Automating updates and integrating predictions into business operations.

Whether you're a data scientist or a business analyst, this guide will help you understand the steps and tools necessary for forecasting demand in consumer goods using Python.

What is Demand Forecasting?

Demand forecasting means guessing how much of something customers will want in the future. It looks at past sales and other clues to figure out what will be popular later on.

Getting these guesses right is super important for companies that sell stuff to us. If they think we'll buy less than we actually do, they might run out of products. But if they think we'll buy more than we really want, they'll have too much stuff lying around. Both situations are bad for business.

By guessing demand better, businesses can avoid having too much or too little stock. This helps them save money and make sure they have what customers want when they want it.

Why Use Python for Demand Forecasting?

Python is a computer language that's really good at handling data, making it perfect for figuring out demand forecasts. Here's why:

  • It has everything you need - Python comes with a bunch of free tools like Pandas and NumPy that make working with data a breeze.

  • It's great for learning from data - With Python, you can use machine learning to make really smart forecasts. Tools like Scikit-Learn and XGBoost help with this.

  • It can grow with your needs - Python works well whether you're looking at a little data or a lot, and it can easily fit into your existing systems.

  • It makes data look good - With tools like Matplotlib and Seaborn, you can make charts that help you understand your forecasts better.

In short, Python gives you the tools to make smart guesses about what customers will want, helping businesses plan better.

Preparing the Data

Data Collection

To get good at predicting what customers will want, you need lots of information about what they bought before. Here's what to gather:

  • Look for detailed sales info from the past 3-5 years. This helps see patterns and times when people buy more or less.

  • Grab data on prices and promotions like sales or ads. This shows how they affect what people buy.

  • Collect details on products like brand, size, or flavor. This helps group similar items to predict better.

  • If you're in retail, also track store traffic, weather, events, and holidays. These things can change how much people buy.

  • Use APIs and scripts to automatically pull in data from online stores, ERPs, CRMs to keep up with demand changes.

Data Cleaning

Once you have your data, it needs to be tidied up. Here's a simple way to do it in Python:



# Check for missing stuff
print(df.isnull().sum())

# Fill in missing spots
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='median')

# Look for weird numbers
for col in df.columns:
    print(df[col].describe())

# Limit extreme numbers to 95th percentile
for col in df.columns:
    df[col] = df[col].clip(upper=df[col].quantile(.95))

# Turn categories into numbers
df = pd.get_dummies(df)

Important steps:

  • Find and fix missing or strange data

  • Limit big outliers to avoid skewing results

  • Change categories into something the computer can work with

Cleaning up your data helps spot and fix any issues, making your predictions more reliable.

Exploratory Data Analysis

Let's start by looking at how sales have gone up or down over time. We can do this by drawing a simple line chart with Python. Here's how you can do it:

import pandas as pd
import matplotlib.pyplot as plt

# Load in sales data 
data = pd.read_csv('sales.csv', index_col='date')

# Create line plot
data.plot()
plt.xlabel('Date')
plt.ylabel('Units Sold')
plt.title('Daily Sales')
plt.show()

By doing this, we can spot:

  • Trends - Times when sales consistently go up or down.

  • Outliers - Sales that are much higher or lower than usual.

  • Level shifts - When sales suddenly change and stay that way.

It's key to notice these patterns because they help us make better guesses about future sales.

Analyzing Seasonal Patterns

Next, let's look at how sales change with the seasons or even different days of the week. Here's how you can see these patterns:



# Monthly seasonality 
data.groupby([data.index.month]).mean().plot.bar()

# Day of week variability
data.groupby([data.index.dayofweek]).mean().plot.bar()

This shows us how sales can change. For example, sales might jump 20-30% in December because of holiday shopping. Or, there could be a 10-15% increase in the middle of the week when stores get new stock.

When we build our forecast models, including these seasonal changes helps us predict more accurately. We might add special markers for different seasons or days, or directly include these seasonal changes in our models.

Feature Engineering

Feature engineering is like giving your machine learning model a set of glasses. It helps the model see the data more clearly by creating new features from the existing data. This is especially useful in demand forecasting for consumer goods, where understanding the timing and reasons behind purchases can make a big difference. Let's dive into two main types of features we can create: those based on time and those based on the calendar.

Time Series Features

Time series data is all about tracking how things change over time, like how many items are sold each day. We can make some special features to help our models understand these changes better:

  • Lag variables - These are just sales numbers from the past few days. They show the model what the recent trends are.
for i in range(1, 4):
    df[f'lag_{i}'] = df['sales'].shift(i)
  • Rolling averages - Think of this as a moving average. It smooths out the daily ups and downs to show a clearer trend.
df['roll_7'] = df['sales'].rolling(7).mean()
  • Growth rates - This shows how fast sales are going up or down in percentage. It's like a speedometer for sales.
df['growth'] = df['sales'].pct_change()*100  

Adding these time-focused features helps our model get a better grip on how sales are moving over time.

Calendar Features

Things happening on specific dates can also affect how much people buy. We can use calendar features to highlight these effects:

  • Seasons - Like winter or summer. This helps the model know about regular seasonal changes.
df['season'] = df['date'].apply(get_season)
  • Holidays - Special days can lead to spikes in sales. Think Christmas or the Fourth of July.
holidays = ['2022-12-25', '2023-07-04'] 
df['holiday'] = df['date'].isin(holidays).astype(int)
  • Promotions/events - Sales or marketing campaigns can temporarily boost demand.
promos = ['2023-02-14', '2023-05-19']
df['promo'] = df['date'].isin(promos).astype(int)  

By including these date-related features, our model can better predict when sales might go up or down because of special events or seasonal patterns.

Model Selection

When we're figuring out how to predict demand for consumer goods, we have a few good options for models. Here's a quick look at three popular choices:

Model Pros Cons
SARIMAX - Good at spotting patterns and trends
- Explains its predictions well
- Needs careful setting up
- Can get too complicated
Prophet - Super easy to use
- Great for adding special events like sales
- Not as adjustable
- Best for daily or weekly info
XGBoost - Finds complex patterns well
- Works with different kinds of info
- Can guess wrong if not set up right
- Hard to understand why it guessed something

SARIMAX is great if you want something that can be finely adjusted and explains itself well. It's especially good at noticing patterns in how things sell over time. But, you'll need to know a bit about how to set it up.

Prophet is the go-to for a hassle-free option. Just give it your sales history, and it figures out the rest, including when sales might spike because of a holiday or sale. The downside is you can't tweak it as much as SARIMAX.

XGBoost is like a detective that uncovers hidden clues in your data, such as how a holiday or a big sale might boost demand. But, it can be a bit of a guesser and needs careful handling to avoid making wrong predictions. Plus, its reasoning can be a bit of a puzzle.

Mixing different models, like Prophet and SARIMAX, often gives better predictions. Prophet can serve as a starting point, while SARIMAX can add precision with its detailed setup.

Model Evaluation

Choosing the best model means testing them to see which one predicts the best. Here's how to do it:

  1. Split your past sales data into two parts: one for training and one for testing

  2. Train your models on the first part

  3. Test them on the second part

  4. Look at how close their predictions are to what actually happened

Models that get closer to the real numbers are doing a better job. Also, it's a good idea to check how they do in different 'what-if' scenarios to make sure they're reliable.

Operationalizing Forecasts

To make these predictions useful for business, we need to:

  • Make the process of making predictions smooth and automatic

  • Keep an eye on important metrics and dashboards

  • Notice when predictions are way off from what really happens

  • Set up warnings for when things like stock levels need a quick look

  • Share this info with the teams that plan what to buy and sell

By following these steps, predictions can help manage store inventory, plan the supply chain, and make sure we have just the right amount of stock.

sbb-itb-ceaa4ed

Model Training & Evaluation

SARIMAX Model Configuration

To find the best setup for the SARIMAX model, we can use a method called grid search. This is like trying out a bunch of different settings to see which one gives us the best predictions. Here's a simple way to do it with some code:

import pandas as pd
from sklearn.model_selection import GridSearchCV
from pmdarima import auto_arima

# Define parameter grid
p_params = [0, 1, 2]
d_params = [0, 1]
q_params = [0, 1, 2]
param_grid = dict(order = [(x, y, z) for x in p_params for y in d_params for z in q_params])

# Grid Search
grid = GridSearchCV(auto_arima(), param_grid, scoring='neg_mean_squared_error')
grid.fit(train, cv=5)

# Best Model
print(grid.best_params_)
print(grid.best_score_)

We can look at different combinations of settings (like how we look at past data and trends) to find the best one. The aim is to get the lowest error, meaning our predictions are as close as possible to what actually happens.

Model Comparison

To figure out which forecasting model is the best, we compare them by seeing how well they predict sales we already know about. We use measures like how far off the predictions are from the real numbers. Here are some ways to see which model does better:

  • sMAPE: This tells us how off our predictions are, as a percentage.

  • MASE: This compares our model's errors to a simple guess.

  • MSE: This shows how much our predictions are off, on average, squared to make big errors stand out.

Here's an example of how to do this comparison:

from sklearn.metrics import mean_squared_error, mean_absolute_error

# Make predictions
y_pred_sarimax = sarimax.predict(test)
y_pred_prophet = prophet.predict(test)

# Evaluate sMAPE
def smape(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred) / ((np.abs(y_true) + np.abs(y_pred))/2)) * 100

print(f'SARIMAX sMAPE: {smape(test, y_pred_sarimax):.2f}')
print(f'Prophet sMAPE: {smape(test, y_pred_prophet):.2f}')

# Compare with statistical test
from scipy.stats import ttest_ind

p = ttest_ind(y_pred_sarimax, y_pred_prophet).pvalue
if p < 0.05:
    print('Significant difference in accuracy')
else:
    print('No significant difference')  

This helps us pick the model that's most likely to give us accurate predictions. Once we know which one is best, we can use all the data we have to get it ready for real-world use.

Implementing the Forecasts

Automating Forecast Updates

To keep our demand forecasts fresh and useful, we can set up a system that automatically updates our predictions with the latest sales data. Here's a basic example of how to do this with Python code:

import pandas as pd
from sklearn.externals import joblib
import datetime

# Load new sales data
new_data = pd.read_csv('sales_update.csv') 

# Add it to the old data
data = pd.concat([data, new_data])

# Update the model with new data
model = joblib.load('demand_model.pkl')
model.fit(data)

# Predict future demand
predictions = model.predict(future_dates)

# Save the new predictions
predictions.to_csv('updated_forecasts.csv', index=False)

# Keep the model updated
joblib.dump(model, 'demand_model.pkl') 

print(f'Updated forecasts on {datetime.date.today()}')

We can make this script run by itself at set times, like every day or week, to make sure our demand forecasts are always based on the most recent data. You can use:

  • Cron jobs on Linux/Unix

  • Windows Task Scheduler

  • Cloud services like AWS Lambda

This helps us make sure our predictions are always spot on.

Consuming Forecasts

To make the most of our demand forecasts, we need ways to share and use them. Here are some ideas:

Dashboards

  • Compare what we predicted to what actually happened

  • Show which products are expected to sell the most

  • Alert us if we're likely to sell less than we thought

Reports

  • Send emails with predictions to people who plan what to stock

  • Point out which products need more attention

  • Show how different areas are doing compared to each other

Alerts

  • Let us know if real sales are way off from what we thought

  • Warn when we might run out of stock

  • Keep an eye on whether we're hitting our sales targets for special times of the year

APIs

  • Let other computer systems use our forecasts

  • Connect with systems that help manage our stock

  • Automatically take action based on our forecasts

By doing this, everyone from the supply chain to marketing can make smarter decisions based on our forecasts. Keeping an eye on how accurate our predictions are also helps us catch any problems early.

Conclusion

Making good guesses about future sales of products is super important for companies that sell stuff to us. We've gone through the steps to use Python, a computer language, to make these guesses more accurate:

Data preparation

  • First, we need to gather information about what people have bought before, along with how much things cost, if there were any sales, what the products are like, and even when and where they were bought.

  • Then, we clean up this information by fixing any missing or weird parts and making sure everything is in a format the computer can understand.

Exploratory data analysis

  • We look closely at the sales data to find patterns, like when people buy more or less stuff.

  • We also check how things like holidays or special events affect what people buy.

Feature engineering

  • We create new data points from the existing information to help the computer understand trends over time and the impact of specific dates better.

Model building and evaluation

  • We try out different methods (SARIMAX, Prophet, XGBoost) to see which one does the best job at predicting future sales.

  • We fine-tune our best method by testing different settings and seeing how well it does with test data.

Operationalization

  • We set up a system that keeps our predictions up-to-date by automatically adding new sales data.

  • We share these predictions in ways that are easy to understand, like charts and alerts, and make sure they're useful for planning what to stock.

The goal is to keep making our predictions better by using more data, creating useful new data points, and adjusting our methods. These predictions help with planning what products to have on hand, managing store inventory, and making smart decisions about the supply chain. Python is great for this because it's good at handling data, making charts, and even using machine learning for smarter predictions.

How to do demand forecasting in Python?

To forecast demand using Python, you can use several handy tools:

  • Pandas for organizing and looking at your data

  • NumPy for doing math stuff

  • Statsmodels for figuring out trends and making predictions

  • Scikit-learn for trying out different prediction methods

  • Prophet, a tool made by Facebook for forecasting

Here's a simple way to do it:

  • Load your past sales data into Python

  • Make charts to see how sales have changed over time

  • Add new info to your data, like averages over time or info about seasons

  • Pick a forecasting method (like SARIMAX, Prophet, or XGBoost) and teach it with your data

  • Check how good your method is by seeing if it can guess past sales right

  • Use the best method to guess future sales

It's important to include things like trends, seasonal changes, and special events in your predictions. Also, regularly updating your predictions with new sales data helps keep them accurate.

Can Python be used for forecasting?

Yes, Python is great for forecasting things like how much you'll sell. It has a bunch of tools for making accurate guesses about the future.

Some tools you might use include:

  • Statsmodels for trend-based predictions

  • Scikit-learn for using past data to predict future sales

  • Keras for more complex predictions with deep learning

  • Prophet for easy forecasting that considers trends and seasons

Python also helps you get your data ready, find the best forecasting method, and share your predictions in a way that's easy to understand.

How do you forecast consumer demand?

To guess consumer demand well, follow these steps:

  • Gather all the sales data you have, along with info on prices, special deals, and anything else that might affect sales.

  • Make sure your data is clean and ready to use.

  • Use charts to spot trends and see how things like holidays affect sales.

  • Add new info to your data, like past sales or average sales, to help make better predictions.

  • Try out different ways to predict sales, like using machine learning.

  • Pick the method that does the best job and use it for your predictions.

  • Make sure to keep your predictions up-to-date by regularly adding new sales data and checking how accurate they are.

  • Use your predictions to help plan inventory, alert you when stock is low, and make smart business decisions.

The trick is to use both stats and machine learning to handle the tricky parts of predicting demand. Trying out different methods and combining their predictions often works best.

Which algorithm is best for demand forecasting?

Here are some top choices for predicting demand:

  • ARIMA: Good for when you have clear trends and seasonal changes.

  • Prophet: Easy to use and good for spotting trends and seasonal patterns.

  • LSTM: Great for finding complex patterns in your sales data.

  • Random Forest: Useful when you have lots of different things affecting demand.

  • Gradient Boosting (GBM): Good for understanding how different factors work together to affect sales.

Mixing different methods together usually gives you the best guess at future demand. Keeping an eye on how well your predictions are doing and updating them regularly is also key.

Related posts

Read more