Advanced Customer LTV prediction in DataVault
Why we build a prediction use case
Prediction use cases enable Growth FullStack customers to get more out of their ETLs and BI solution stacks by helping them with predictive analytics and use cases, and answering questions like “What’s next?” “How soon?” and “How profitable?” etc.
Our Boosted Tree Regression Models are a part of the prediction use cases. They help us predict Day 7 to Day 90 lifetime value (LTV) with high accuracy and low execution time, so that we may use them to make intelligent decisions for our business.
Model 2.1: Boosted Tree Regression
The Boosted Tree Regression is an advanced version of our basic LTV prediction model (or Model 1.1). While Model 1.1 offers a simple framework that uses Day 0 to Day 3 LTV to make predictions about Day 7 to Day 90 LTV, Model 2.1 offers a more advanced decision tree framework to make the same predictions.
How the model works?
The Boosted Tree Regression Model works by using a subset of features (listed below) within a model to create different combinations of decision trees. The combinations can look like the following:
- Combination 1: spend, site id, install, country
- Combination 2: country, days_since_intall, d0_ltv, d1_ltv
- …and so on
After building multiple combinations on subsets of data and features, the combinations are then boosted by aggregating values, and averaging the predictions to come up with a final value. This model can be used to predict multiple LTV values, as well as a single LTV value as needed. Additionally, this model can also utilise other features such as, country, platform, site_id etc to improve the accuracy of the prediction.
The graphic below explains the prediction process.
What are the subset of features required for the prediction?
What am I able to predict with this model?
What are the benefits of using this model as opposed to model 1.1?
- Model 2.1 is more statistically advanced and has better accuracy since it uses multiple small models and takes the average of their errors as opposed to using a single model that uses all features together
- With this model, we don’t need to wait for the campaign to have 3 days of data (as we would need to in Model 1.1), and can predict the next day already
- This model requires less computing power and execution time.
- This model uses all available features and is also available at a site_id level
- No feature is mandatory, the model will adapt to missing feature
- This model can help you understand whether your campaign was successful in one country and not the other. Because we’re using multiple features/signals they are much more accurate and can be used for longer.
- Options of incrementally upgrading this model’s accuracy can also be explored
How can you get started with this model?
If you would like to get started with this model, reach out to us and we will get back to you within 12 hours.