Ideas on a Forecasting Problem
Our take
In the ever-evolving landscape of retail and e-commerce, understanding and predicting demand during peak periods, particularly during short festive windows, presents a unique challenge. The recent inquiry regarding forecasting synthetic demand, which combines actual sales and lost sales due to stockouts, highlights the complexities faced by businesses striving to optimize inventory management in high-stakes situations. The data constraints—such as the limited duration of the business-as-usual (BAU) and festive periods—underscore how traditional forecasting methods, often reliant on historical averages, may fall short in capturing the volatile nature of consumer behavior during these critical times. This scenario resonates with broader discussions in the industry, such as those explored in articles like Are there any small, quick things I can do everyday to keep my skills sharp? and [Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]](https://www.example.com/post/need-reliable-source-for-30-years-of-s-p-500-historical-data-cmpc86puc01ips0glns9d91y2), which delve into the importance of adaptability and continual learning in data-driven environments.
The central challenge identified in the forecasting project is the difficulty in imputing lost demand when items go out of stock (OOS) due to the distinct demand profiles that vary significantly by the hour and day during the festive period. The proposed strategy of utilizing search volume as a proxy for raw demand is both innovative and practical, especially in an age where consumer intent can be gleaned from digital engagement metrics. By leveraging search sessions to derive a contextual search-to-sale conversion rate (CVR), the project aims to provide a more nuanced understanding of consumer demand. This approach emphasizes the importance of data richness and the integration of various datasets, such as OOS records and search data, to inform more accurate predictions.
As we consider the implications of this approach, it's essential to recognize the importance of modeling techniques that can handle both categorical and temporal features effectively. The suggestion to explore methods like random forest proximity matrices or even LightGBM indicates a forward-thinking mindset that embraces machine learning's capabilities in tackling complex forecasting problems. The need to quantify relationships among heavily categorical and temporal combinations speaks to a broader trend in the industry toward more sophisticated analytics that can provide actionable insights. This complexity is not just a technical hurdle; it reflects the dynamic nature of consumer behavior and the necessity for businesses to remain agile in their response strategies.
Ultimately, the exploration of innovative forecasting methodologies during peak periods marks a significant step forward for retailers and e-commerce platforms. As businesses seek to minimize lost sales and enhance customer satisfaction during critical times, the focus must remain on integrating diverse data sources and employing advanced analytics. This not only empowers businesses to make informed decisions but also fosters a deeper understanding of customer needs and market trends. Looking ahead, one question worth pondering is how these advanced forecasting techniques will evolve as artificial intelligence and machine learning continue to mature. Will we see a shift towards even more real-time data processing capabilities that allow businesses to adapt instantaneously to consumer demand fluctuations? The exploration into this space will undoubtedly shape the future of retail and e-commerce forecasting.
Hi everyone,
I'm working on a retail/e-commerce forecasting project where we need to predict synthetic demand (actual sales + lost sales due to stockouts) during peak festival times.
We are trying to calculate the lost demand when an item goes Out of Stock (OOS), but the extreme volatility of the short festive window is making standard historical imputation impossible.
The Data We Have:
Periods: Last Year BAU, Last Year Festive, Current Year BAU.
Constraint: The BAU and Festive periods we are looking at are only 7 days long each.
Sales Data: Store + SKU level across all these periods.
OOS Records: Flagged at the Hour + Day + Store + SKU level.
Search Data: Search sessions at the day + hour + store level in which the specific SKU (or its parent L3 category) was present/impressed.
Features available: store, sku, day, hour, store\_cluster, category, subcategory, l3\_category, city.
The Core Problem:
Because the festive period is only 7 days, every single day and hour has a completely different demand profile. For example, the conversion rate for an item on "Festival Day minus 1 at 8 PM" is drastically different from "Festival Day at 8 PM" or even 2 PM on the same day. Because of this intra-day and day-to-day volatility, we can't just take a simple historical average of the previous day or week to impute demand when an item is OOS.
Our Current Idea:
Since we still capture search sessions when an item is OOS, we want to use search volume as our proxy for raw demand. To convert those searches into "lost units," we need to predict a highly contextual Search-to-Sale Conversion Rate (CVR).
When a Store-SKU is OOS at a specific day/hour, we want to find its "Nearest Neighbors" based on the categorical and temporal features mentioned above, and do a distance-weighted average of their In-Stock search-to-sale CVRs. We then multiply this imputed CVR by the actual search sessions observed during that OOS hour.
My Questions for the Experts:
What is the best metric to quantify the relationship/distance between these heavily categorical and temporal combinations? (e.g., Target encoding + Euclidean distance? Random Forest proximity matrix?)
How would you handle the cyclical/temporal features (day, hour) alongside the search session volume so the model understands the specific urgency of a festive timeline without suffering from massive data sparsity?
Is there a completely different architecture (like LightGBM directly predicting lost sales using search volume as a feature) you would recommend over this KNN/distance-based CVR imputation?
Would love to hear how you've tackled similar short-term, high-volatility lost sales problems.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience