Bayesian Changepoint Detection on Price Histograms

Author

Pavel Logačev

Published

Invalid Date

Scope

[This is work in progress.]

All model implementations in this repo are designed to detect change points in pricing time series data. All models are based on a set of assumptions outlined below: In brief, we will assume that the relative frequency of specific prices for a product on a given day depends on the underlying price regime, which is associated with a specific distribution of prices, and changes rather infrequently. The models in this repo are all designed to detect structural changes in pricing regimes based on daily price histograms. Changepoints occur when the pricing of a product undergoes a meaningful change — such as list price changes, promotions, or changes in the availability of discounts.

This document outlines the general model assumptions and logic. While none of the actual implementations in this repo actually implement all of them, all models work with some of the assumptions outlined here.

Theoretical Generative Model

We model the daily distribution of purchase prices for a product as dependent on the current latent price regime. Within each regime, there is assumed to be one list price, and one or multiple discounts, at which a product can be purchased. Discounts may have different amounts of availability: for instance, only 2% of customers may be eligible for a 10% discount, while only 1% of customers may be eligible for a 20% discount. Price regimes change occasionally. This can be due to changes of the list price, promotions, changes in the magnitude or availability of discounts, or multiple factors.

We assume that each day belongs to a specific pricing regime. At end of the day $t$, the regime either changes (with probability $\rho_t$) or remains the same (with probability $1 - \rho_t$). Each regime is characterized by a distribution over prices. We assume that it follows a multinomial distribution, such that the distribution of prices in pricing regime $k$ follows $\text{Multinomial}(\tau_k)$.

We encode a segmentation of a price time series of length $T$ using a changepoint indicator vector $z$ of length $T-1$, where $z_t=1$ indicates a changepoint between time $t$ and $t+1$, and $z_t=0$ indicates the absence of a changepoint. As a result, each indicator vector defines a set of $K$ segments, where $t_1^{(k)}$ and $t_2^{(k)}$ are the start and end points of segment $k$: \[ [t_1^{(1)},\, t_2^{(1)}], [t_1^{(2)},\, t_2^{(2)}],\quad \dots,\quad [t_1^{(K)},\, t_2^{(K)}] \] When the pricing regime changes, it needs to change significantly. Thus, we specify a prior $\theta$ over the difference between two adjacent price regime distributions, where the actual difference may be specified in any number ways, including a difference between means, the Wasserstein distance, etc.

The complete posterior distribution for the changepoint detection model combines the likelihood of the observed data with the prior distributions over the changepoint configuration and the regime differences. The posterior is given by:

\[ p(z, \theta, \tau \mid y_{1:T}) \propto p(y_{1:T} \mid z, \theta, \tau) \cdot p(z, \theta, \tau)\text{, where } \]

\[ p(z, \theta, \tau) = p(z) \cdot p(\theta) \cdot p(\tau)\text{, and where } \]

$p(y_{1:T} \mid z, \theta, \tau)$ is the likelihood of the observed data given the changepoint configuration $z$, the changepoint prior $\theta$, and the regime parameters $\tau$.
$p(z)$ is the prior probability of the changepoint configuration, modeled as a Bernoulli process with changepoint probability $\rho$.
$p(\theta)$ is the prior distribution over the magnitude of the price regime change, specified based on domain knowledge.
$p(\tau)$ is the prior distribution over the regime parameters $\tau$, which incorporates the belief about the differences between successive regimes.

It follows from the structure of the model that likelihood of $y_{1:T}$ is as below.

\[ p(y_{1:T} \mid z, \theta, \tau) = \prod_{k=1}^{K} \prod_{t=t_1^{(k)}}^{t_2^{(k)}} \text{Multinomial}(y_t \mid \tau_k) \]

The prior probability $p(z)$ of a changepoint configuration $z$ can be modeled using a Bernoulli process with a changepoint probability $\rho$. Assuming each $z_t$ is an independent Bernoulli random variable, the prior can be expressed as below, where $\rho$ is the probability of a changepoint occurring between any two consecutive time points.

\[ z \sim Bernoulli(\rho) \]

We further incorporate a prior for the differences between successive $\tau_k$ to capture the belief that changes in pricing regimes are typically abrupt rather than gradual, and that two adjacent pricing regimes need to be substantially different in order to justify positing a regime switch. In other words, it penalizes small differences between any two adjacent pricing regimes. One example specification is as below, where $\delta$ is any kind of distance function calculating absolute differences between two distributions.

\[ \delta(\tau_k, \tau_{k+1}) \sim Normal( \theta_{\mu}, \theta_{\sigma} ) \]

--- title: "Bayesian Changepoint Detection on Price Histograms" author: "Pavel Logačev" date: "r Sys.Date()" format: html: toc: true code-fold: true code-tools: true df-print: paged engine: knitr editor: visual --- ## Scope *[This is work in progress.]* All model implementations in this repo are designed to detect change points in pricing time series data. All models are based on a set of assumptions outlined below: In brief, we will assume that the relative frequency of specific prices for a product on a given day depends on the underlying price regime, which is associated with a specific distribution of prices, and changes rather infrequently. The models in this repo are all designed to detect structural changes in pricing regimes based on daily price histograms. Changepoints occur when the pricing of a product undergoes a meaningful change — such as list price changes, promotions, or changes in the availability of discounts. This document outlines the general model assumptions and logic. While none of the actual implementations in this repo actually implement all of them, all models work with some of the assumptions outlined here. ## Theoretical Generative Model We model the daily distribution of purchase prices for a product as dependent on the current latent price regime. Within each regime, there is assumed to be one list price, and one or multiple discounts, at which a product can be purchased. Discounts may have different amounts of availability: for instance, only 2% of customers may be eligible for a 10% discount, while only 1% of customers may be eligible for a 20% discount. Price regimes change occasionally. This can be due to changes of the list price, promotions, changes in the magnitude or availability of discounts, or multiple factors. We assume that each day belongs to a specific pricing regime. At end of the day $t$, the regime either changes (with probability $\rho_t$) or remains the same (with probability $1 - \rho_t$). Each regime is characterized by a distribution over prices. We assume that it follows a multinomial distribution, such that the distribution of prices in pricing regime $k$ follows $\text{Multinomial}(\tau_k)$. We encode a segmentation of a price time series of length $T$ using a changepoint indicator vector $z$ of length $T-1$, where $z_t=1$ indicates a changepoint between time $t$ and $t+1$, and $z_t=0$ indicates the absence of a changepoint. As a result, each indicator vector defines a set of $K$ segments, where $t_1^{(k)}$ and $t_2^{(k)}$ are the start and end points of segment $k$: $$ [t_1^{(1)},\, t_2^{(1)}], [t_1^{(2)},\, t_2^{(2)}],\quad \dots,\quad [t_1^{(K)},\, t_2^{(K)}] $$ When the pricing regime changes, it needs to change *significantly*. Thus, we specify a prior $\theta$ over the difference between two adjacent price regime distributions, where the actual difference may be specified in any number ways, including a difference between means, the Wasserstein distance, etc. The complete posterior distribution for the changepoint detection model combines the likelihood of the observed data with the prior distributions over the changepoint configuration and the regime differences. The posterior is given by: $$ p(z, \theta, \tau \mid y_{1:T}) \propto p(y_{1:T} \mid z, \theta, \tau) \cdot p(z, \theta, \tau)\text{, where } $$ $$ p(z, \theta, \tau) = p(z) \cdot p(\theta) \cdot p(\tau)\text{, and where } $$ - $p(y_{1:T} \mid z, \theta, \tau)$ is the likelihood of the observed data given the changepoint configuration $z$, the changepoint prior $\theta$, and the regime parameters $\tau$. - $p(z)$ is the prior probability of the changepoint configuration, modeled as a Bernoulli process with changepoint probability $\rho$. - $p(\theta)$ is the prior distribution over the magnitude of the price regime change, specified based on domain knowledge. - $p(\tau)$ is the prior distribution over the regime parameters $\tau$, which incorporates the belief about the differences between successive regimes. It follows from the structure of the model that likelihood of $y_{1:T}$ is as below. $$ p(y_{1:T} \mid z, \theta, \tau) = \prod_{k=1}^{K} \prod_{t=t_1^{(k)}}^{t_2^{(k)}} \text{Multinomial}(y_t \mid \tau_k) $$ The prior probability $p(z)$ of a changepoint configuration $z$ can be modeled using a Bernoulli process with a changepoint probability $\rho$. Assuming each $z_t$ is an independent Bernoulli random variable, the prior can be expressed as below, where $\rho$ is the probability of a changepoint occurring between any two consecutive time points.  $$ z \sim Bernoulli(\rho) $$ We further incorporate a prior for the differences between successive $\tau_k$ to capture the belief that changes in pricing regimes are typically abrupt rather than gradual, and that two adjacent pricing regimes need to be substantially different in order to justify positing a regime switch. In other words, it penalizes small differences between any two adjacent pricing regimes. One example specification is as below, where $\delta$ is any kind of distance function calculating *absolute* differences between two distributions. $$ \delta(\tau_k, \tau_{k+1}) \sim Normal( \theta_{\mu}, \theta_{\sigma} ) $$