By Josh Callaway

Dear Reader,

Many industries collect and maintain data. You might think of the shipping industry where large SQL databases are kept to log product inventory and track the number of items shipped monthly or, perhaps weekly. Warehouses of products, such as Walmart, Best Buy, and the multitude of large malls, also contain relational databases of their product transactions. Perhaps you have a favorite restaurant down the street that tracks weekly customer volume.

Dear Reader,

Many industries collect and maintain data. You might think of the shipping industry where large SQL databases are kept to log product inventory and track the number of items shipped monthly or, perhaps weekly. Warehouses of products, such as Walmart, Best Buy, and the multitude of large malls, also contain relational databases of their product transactions. Perhaps you have a favorite restaurant down the street that tracks weekly customer volume.

**How might these businesses use this data to improve their overall system?**

Well, a restaurant can monitor customer behavior in order to associate volume patterns with times of the year. Be it daily, weekly, monthly, quarterly, or yearly behavioral patterns, the restaurant can model this and project volume for next week, month, quarter, or year. In this way, management can utilize this insight for staffing purposes. In essence, they can save money by avoiding over-staffing and boost customer satisfaction by alleviating the under-staff dilemma.

**Data Frames & Time Series**

Now, these businesses and large companies have large databases of multiple related tables. But, how might the data look after cleaned and prepared for analysis? Well, typically, we see 2 types of formats: a data frame or a time series. Let’s take a look at a few examples of the 2. Take a look below at 4 different data sets (taken from the

*developed by*__fpp package__*using*__Rob J Hyndman__*): anti-diabetic drug sales in Australia, beer production in Australia, international visitors to Australia, and air passengers on Answett Airlines. Right away, as discerned from the first column containing “Date”, we see that all 4 of these data sets are arranged chronologically in a time sequence and, therefore, these are time series data. At a closer glance, notice the difference between all 4. Each contains different time increments in the “Date” column, and we refer to this as periodicity (i.e. periodicity can be daily, weekday, weekly, monthly, quarterly, yearly). Starting from the left, we see monthly drug sales, then quarterly beer production, yearly international visitors and, finally, weekly air passengers. Also notice that the first 3 contain only 1 measure, while the 4th time series contains 3 measures: First Class, Business Class, and Economy Class. Keep this in mind, as our forecasting solution can take one or multiple measures (we call instruments) as input.*__R software__So, here below is another time series set (also from

*fpp*) containing multiple instruments (this time 8; i.e. visitor nights for 8 regions in Australia) measured quarterly.*will easily take this data with all 8 instruments as input for a forecast analysis.*__PendulumRock Forecasting Solution__**…and the Data Frame**

Ok, so we have seen enough examples of the composition of a time series data set. What about the other type, a data frame? With this format, data is stored for many different metrics (i.e. price, number of rooms, average neighborhood income, and number of stories for housing data) with less emphasis placed on time. So, the data does not necessarily have to be in a chronological sequence. With this type of data, we would not be curious to detect a behavioral patter based on the time of year. Rather, we are looking for how the value of one metric (i.e. housing price) varies with the value of another metric (i.e. number of rooms). So, we might try to compare the price of houses with few rooms to the price of those with several rooms. Take a look at the credit data below (also taken from

*fpp*package).We see what might be a customer ID in the far left corner, but no column for date. Hence, this is not time-ordered data, but a frame of different observations (customers) with several measures/variables per observation: score, savings, income, full time employment, single status, time at current address, and time with current employer. Since we are not looking at patterns over time, we might build a model to predict credit score based on the values of the other variables. So, we use a machine learning technique such as regression.

**Analytics**

You can see the depiction below of different analytical methods used for different data formats (namely data frames and time series here).

Remember, here at PendulumRock, we specialize in that seen on the far right, as our forecasting solution is built for demand, sales, inventory, vacancies, and any other numerical measure. In other words, our solution is not concerned with the effect of one measure on another. Our solution projects future values of one measure based solely on its own historical behavioral pattern. This pattern usually includes a mixture of the following components: trend, seasonality, cyclicality, and autocorrelation (or the effect of previous values on the current value).

**The Problem**

Now, many companies in industry out there today are using forecasting already. So, you ask, what is the actual problem? And, why would anyone want to make the decision to go with the PendulumRock Forecasting Solution? Well, in this particular situation, most companies face 1 of 2 problems:

- They pay out thousands to millions in licensing costs for commercial software.
- They don’t have the analytical ability to leverage this type of information to improve their business processes (i.e. they don’t have the software or people available to use these methods)

**PendulumRock Forecasting Solution**

Our solution solves these issues in several ways:

Therefore, the company has no necessity to purchase expensive software, thus saving time and money. Ok, so you like the sound of what we have, but you are thinking you can build your own model. Well, therein lies the problem: there are

It is very hard to determine the components inherent in a time series without graphical depictions, and these depictions take time to produce. Added to this, building an analysis for every model type on the same data and then comparing accuracies is extremely time-consuming and unproductive. That is why we have automated the model-building process which expediently fits any given numerical time series to 17 of the most commonly used forecasting models used in analytics. The tool then assesses the forecasting accuracy of every model (based on mean average percent error, or

- Our service is offered at a fraction of the cost of purchasing commercial software
- We already have the system developed
- The solution is offered by pay per forecast (per instrument)

Therefore, the company has no necessity to purchase expensive software, thus saving time and money. Ok, so you like the sound of what we have, but you are thinking you can build your own model. Well, therein lies the problem: there are

*that fit data patterns with various behaviors. Depending on the behavioral nature of the data in question, one model will forecast points into the future more accurately than other models. Getting an accurate model heavily depends on selecting one that correctly illuminates components inherent in the data. Taking these components into account, forecasts may be based on:*__MANY models__- Average of all values
- Simply the previous value
- With option to stray up or down from previous value

- Weighted average of all previous values & levels (no trend or seasonality; only level)
- With more emphasis placed on recent time periods (days, weeks, months, etc.)

- Trend, Cycle, & Seasonality
- Can be determined by smoothing techniques (moving averages, locally weighted regression)
- Trends may be constant or may follow a growth/decay rate
- Seasonality may follow constant fluctuations, changing fluctuations, or increase/decrease in magnitude with each new turn of season

- Differencing a series that is non-stationary
- Subtracting most recent observation from every observation

- Weighted moving average of error components from recent observations
- Autoregressive behavior – the dependency of future values on most recent observations
- Some combination of all of the above

It is very hard to determine the components inherent in a time series without graphical depictions, and these depictions take time to produce. Added to this, building an analysis for every model type on the same data and then comparing accuracies is extremely time-consuming and unproductive. That is why we have automated the model-building process which expediently fits any given numerical time series to 17 of the most commonly used forecasting models used in analytics. The tool then assesses the forecasting accuracy of every model (based on mean average percent error, or

*), and chooses the model with the least error. Naturally, this will be the model that best exemplifies the*__MAPE__*inherent in the data (i.e. trend, seasonality, cyclicality, autocorrelation). Ok, so you understand, are tired of the discussion, and are ready to see our solution in action! Let’s walk through an example.*__components__**Airline Passenger Data**

A very commonly cited example in time series analysis is the classic Box & Jenkins airline data. This includes the monthly totals of international passengers (in thousands) from 1949 to 1960. See the time series data set (taken from the

*datasets*package in R software) below.In the data, you can see the years listed in ascending order from top-to-bottom in the left-most column (1949-1960) and the months of the year listed in descending order from left-to-right in the row header. Each cross-section of any given year and month represents the number of airline passengers for that given year and month. For example, there was a total of 347 passengers in October 1957. This information is nice, but let’s see a visual depiction to better detect any time patterns in these monthly passengers numbers from 1949-1960.

At a glance, there is a clear observable pattern in the data. How can we use this pattern to predict airline passenger bookings for, say, 10, 15, or 20 months into the future? Well, first, we need to discern exactly how many and which patterns actually exist. By looking, again, at the chart above, there appears to be 3 distinct patterns:

Okay, so let’s break these patterns down individually so that we may better quantify them independent of each other. Taking a look at the figure below, we see 4 panels of plots from top to bottom:

- The number of air passengers trends upward over time (i.e. there is a level trending up).
- In January and February of every year, the number of passengers is low, then there are 1-2 short peaks in the spring, followed by a large peak in July & August. From there, the number of passenger drops back down into Autumn-season lows (October & November). Hence, there is a seasonal pattern in this data, and we might expect this as more people will be travelling during the spring break & summer vacation months and working throughout the winter months.
- The magnitude of the seasonal pattern grows at an increasing rate every year.

Okay, so let’s break these patterns down individually so that we may better quantify them independent of each other. Taking a look at the figure below, we see 4 panels of plots from top to bottom:

- Original graph of number of passengers on the y-axis plotted against time on the x-axis
- Seasonal component on the y-axis plotted against time on the x-axis
- Trend component on the y-axis plotted against time on the x-axis
- Noise left over (after extracting seasonal & trend components) plotted against time

**PendulumRock Demo & Analysis on Airline Passenger Data**

*Ok, so it is evident from the exploration above that patterns do exist in this data. Now, if you remember from our solution description earlier, our system tries 17 of the most widely-implemented time series models and then chooses the one with the best accuracy. Before our solution takes the data as input, it is in the format to the right as a*

***.csv*file (

____).

*passengers.csv*The solution delivers 3 items for the client:

**1) Microsoft Excel Data Set (.csv format) of Forecasts**

allfcsts.csv |

Notice the monthly forecasts for 10 months (the horizon) into the future from January 1961 to October 1961. Our solution can forecast data that is daily, weekday, weekly, monthly, quarterly, or yearly, and it can take any horizon (i.e. 5 days, 10 weeks, 7 months, 5 years, etc. into the future).

**2) ****Microsoft Word Table of Forecasts**

allfcsts.doc |

**3) ****Microsoft Word Analytical Report (discussed below)**

passengers.doc |

The model chosen was the

The forecasts for 10 months into the future are revealed by the blue line. The shaded areas around the blue line represent 80% & 95% confidence intervals. So, you can see that the forecasts fit the pattern rather well. They are trending upward additively and show a seasonality with higher magnitude fluctuations from the previous seasons.

*(HW Multip ES) method that includes 2*__Holt-Winters’ Multiplicative Exponential Smoothing__*:*__components____Additive trend__- Multiplicative seasonality

The forecasts for 10 months into the future are revealed by the blue line. The shaded areas around the blue line represent 80% & 95% confidence intervals. So, you can see that the forecasts fit the pattern rather well. They are trending upward additively and show a seasonality with higher magnitude fluctuations from the previous seasons.

Remember that our solution chooses the model with the greatest

*. Well, that accuracy is measured by the Mean Absolute Percentage Error, better known as the MAPE. In a good model, we are looking for a MAPE that is very low. So, also in our analytical report is figure and table you see below. Notice 15 models listed on the y-axis and the MAPE range on the x-axis. You can see that the HW Multip ES model is listed at the bottom with the lowest MAPE. Hence, our solution fit all 15 models you see on the left and chose HW Multip ES because it had the greatest accuracy (lowest MAPE).*__accuracy__We take a final look at the time series plotted with the forecasts again, except this time with the forecasts at all past values based on the selected model. It is quite evident that the model chosen fits the pattern of the data well. We can be confident to use this to predict future values.

**The Wrap-Up**

This was just one instrument (monthly totals of airline passengers). However, PendulumRock Forecasting Solution can input a data set with multiple instruments and output the same 3

But, we will discuss forecasting multiple instruments another day.

For now, just know that we are offering a

So,

Josh Callaway, editor

*:*__deliverables__**1)****Microsoft Excel Data (.csv format) Set of Forecasts (with all instruments)****2)****Microsoft Word Table of Forecasts (with all instruments)****3)****Microsoft Word***(analyses for all instruments)*__Analytical Report__But, we will discuss forecasting multiple instruments another day.

For now, just know that we are offering a

*for any***free trial**__. Just get in touch with us, and we will discuss what company data you would like to forecast. We respect the data integrity of all clients and will not share any data unless permitted (can be settled in a contract). If the solution helps your company save money and/or catch profit, then we will discuss payment options depending on the number of instruments you want forecasted (pay per instrument). We will also need you to fill out a__*start-up or larger business**to determine the nature of your data (file format, how frequently data is recorded, etc.), the type of forecasts you want (daily, weekly, monthly, etc.), and the horizon you want (5 weeks, 4 days, 10 quarters, etc. into future).*__questionnaire__So,

*and let’s see how we can improve your business!*__contact us__Josh Callaway, editor

*PendulumRock Forecasting Solution*