Creating Custom Conversion Prediction Models with GA4
Learn how to develop a tailored propensity model using GA4 data to predict user behavior and conversion likelihood for any key event in your analytics setup.
Introduction
While GA4’s built-in predictive capabilities focus primarily on purchase and churn predictions, many organisations need to forecast different types of conversions. This guide demonstrates how to construct a custom propensity model in BigQuery using your GA4 data, allowing you to predict any conversion event that matters to your business.
What You’ll Need
- GA4 property configured and collecting data
- BigQuery project set up and linked to GA4
- Defined key conversion event(s) in your analytics
- Familiarity with your GA4 event structure
Step-by-Step Implementation
1. Create Label Table
First, we’ll identify users who have completed our target conversion action:
SELECT user_pseudo_id, MAX(CASE WHEN event_name = 'target_conversion_event' THEN 1 ELSE 0 END) AS conversion_flagFROM `your-project.analytics_XXXXX.events_*`WHERE _TABLE_SUFFIX BETWEEN 'DATE-START' AND 'DATE-END'GROUP BY user_pseudo_idThis query creates a binary flag for each user, marking whether they’ve completed the conversion event (1) or not (0).
2. Create Demographics Table
WITH first_values AS ( SELECT user_pseudo_id, geo.city as city, device.operating_system as operating_system, device.browser as browser, ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp DESC) AS row_num FROM `your-project.analytics_XXXXX.events_*` WHERE event_name = "user_engagement" AND _TABLE_SUFFIX BETWEEN 'DATE-START' AND 'DATE-END')SELECT * EXCEPT (row_num)FROM first_valuesWHERE row_num = 13. Create Behavioral Features Table
SELECT user_pseudo_id, SUM(IF(event_name = 'event_type_1', 1, 0)) AS cnt_event_1, SUM(IF(event_name = 'event_type_2', 1, 0)) AS cnt_event_2, SUM(IF(event_name = 'event_type_3', 1, 0)) AS cnt_event_3FROM `your-project.analytics_XXXXX.events_*`WHERE _TABLE_SUFFIX BETWEEN 'DATE-START' AND 'DATE-END'GROUP BY user_pseudo_id4. Combine Tables into Training View
CREATE OR REPLACE VIEW `your-project.your_dataset.training_view` AS (WITH conversion_data AS ( -- Label table query from step 1 ), demographics AS ( -- Demographics query from step 2 ), behavioral AS ( -- Behavioral query from step 3 )SELECT dem.* EXCEPT (row_num), IFNULL(beh.cnt_event_1, 0) AS cnt_event_1, IFNULL(beh.cnt_event_2, 0) AS cnt_event_2, IFNULL(beh.cnt_event_3, 0) AS cnt_event_3, c.conversion_flagFROM conversion_data cLEFT OUTER JOIN demographics dem ON c.user_pseudo_id = dem.user_pseudo_idLEFT OUTER JOIN behavioral beh ON c.user_pseudo_id = beh.user_pseudo_idWHERE row_num = 1)5. Train the Model
Logistic regression is well-suited for propensity modeling — it provides interpretable results, handles numerical and categorical variables, outputs probabilities between 0 and 1, and is less prone to overfitting than more complex models.
CREATE OR REPLACE MODEL `your-project.your_dataset.propensity_model`OPTIONS( MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['conversion_flag'], L1_REG=0.01, L2_REG=0.01, MAX_ITERATIONS=50, LEARN_RATE=0.1) ASSELECT *FROM `your-project.your_dataset.training_view`6. Evaluate Model Performance
SELECT *FROM ML.EVALUATE(MODEL `your-project.your_dataset.propensity_model`)Key metrics to understand:
- Accuracy — proportion of correct predictions (0–1, higher is better)
- Precision — of users predicted to convert, how many actually did
- ROC AUC — 0.7–0.8 is acceptable, 0.8–0.9 is excellent, >0.9 is outstanding
- Log Loss — <0.1 is excellent, 0.1–0.3 is good, >0.3 needs improvement
7. Generate Predictions
SELECT user_pseudo_id, conversion_flag, predicted_conversion_flag, predicted_conversion_flag_probs[OFFSET(0)].prob as conversion_probabilityFROM ML.PREDICT(MODEL `your-project.your_dataset.propensity_model`, (SELECT * FROM `your-project.your_dataset.training_view`))Important Considerations
- Date Ranges: Use
_TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'for appropriate time periods - Event Selection: Customise behavioral features based on your specific business events
- Data Quality: Ensure GA4 is tracking all relevant events consistently
- Testing: Split data into training and testing sets for robust evaluation
Next Steps
Once your model is deployed, you can:
- Set up regular model retraining to maintain accuracy
- Create segments based on propensity scores
- Export high-propensity audiences to GA4 via Data Import
- Export predictions to your marketing platforms
- Monitor model performance over time