How I Built a Marketing Attribution Model That Actually Works â€” ChromiumData

By Ahmed A. Dec 10, 2025 Analytics Marketing 11 min read

Last year, a client came to me with a familiar problem. They were spending $40,000 a month across Google Ads, Facebook, email campaigns, and organic content. When I asked how they decided where to allocate that budget, the answer was: "We look at what Google Analytics says drove the most conversions."

That sounds reasonable on the surface. But what Google Analytics was telling them â€” by default â€” was based on last-click attribution. And last-click attribution was lying to them.

Their Google Ads spend looked like a hero because it captured the final click before purchase. Meanwhile, the blog content and email nurture sequences that actually warmed those leads up over weeks were getting zero credit. They were about to cut their content budget in half. That would have been a costly mistake.

So I built them a marketing attribution model from scratch â€” one that actually reflected how their customers moved through the funnel. Here is exactly how I did it, step by step.

The Problem with Last-Click Attribution

Last-click attribution is the default in most analytics platforms. It gives 100% of the credit for a conversion to the very last touchpoint before the customer converted. It is simple, easy to understand, and dangerously misleading.

Think about how people actually buy things. A typical B2B customer journey might look like this:

Sees a LinkedIn ad and clicks through to a blog post (awareness)
Reads two more blog posts over the next week (consideration)
Signs up for the email newsletter (engagement)
Receives a case study via email and reads it (trust-building)
Googles the brand name, clicks a Google Ad, and buys (conversion)

Under last-click attribution, Google Ads gets 100% of the credit. The LinkedIn ad that started the entire journey? Zero credit. The blog content? Zero. The email sequence that built enough trust to convert? Zero.

Last-click attribution is like giving the goalie all the credit for winning a soccer match. Sure, they were there at the end â€” but they did not score the goals.

This is not just an academic problem. It leads to real budget misallocation. Teams cut "underperforming" channels that are actually doing the heavy lifting at the top and middle of the funnel. Revenue drops, and nobody understands why.

The Common Attribution Models

Before building a custom model, it helps to understand what is out there. Here are the five most common attribution models:

First-Touch: 100% credit to the first interaction. Great for understanding what drives awareness, but ignores everything that happens afterward.
Last-Touch: 100% credit to the final interaction before conversion. The default in most tools. Overvalues bottom-of-funnel channels.
Linear: Equal credit to every touchpoint in the journey. Fair, but treats a random blog visit the same as the demo request that sealed the deal.
Time-Decay: More credit to touchpoints closer to the conversion. Better than linear, but still undervalues the initial discovery.
Position-Based (U-Shaped): 40% credit to the first touch, 40% to the last touch, and the remaining 20% split evenly among everything in between. This is the model I recommend for most businesses.

Each model tells a different story about your marketing. The right choice depends on your sales cycle length, number of channels, and what decisions you are trying to make.

Why Most Businesses Are Doing Attribution Wrong

Beyond just using the wrong model, I see three recurring mistakes when businesses attempt attribution:

1. Incomplete data collection. If you are not tracking every touchpoint â€” including organic visits, email opens, and offline events â€” your model is working with partial information. Garbage in, garbage out.

2. No unified customer identity. A visitor on your website, a lead in your CRM, and a subscriber on your email list might all be the same person. Without stitching those identities together (usually via email or a user ID), you cannot build a complete journey.

3. Treating attribution as a one-time project. Customer behavior changes. Channels evolve. An attribution model you built six months ago might not reflect your current funnel. You need a system, not a spreadsheet.

Attribution is not about finding the "one true answer." It is about getting a more honest picture of how your channels work together, so you can make smarter budget decisions.

The Data You Need

Before writing any code, you need to get your data foundation right. Here is what a proper attribution dataset looks like:

Touchpoint logs: Every interaction a user has with your brand â€” page views, ad clicks, email opens, form submissions, chat conversations.
Conversion events: The moment a lead becomes a customer (or hits whatever goal you are measuring) â€” purchases, sign-ups, demo requests.
Timestamps: Exact date and time of each touchpoint and conversion. You need this to sequence the journey and apply time-based models.
Channel and source info: Where did the touchpoint come from? This is where UTM parameters become essential.
User identity: A consistent identifier (email, user ID, or cookie ID) that ties all touchpoints to the same person.
Revenue data: The monetary value of each conversion, pulled from your CRM or payment system.

Data Collection Setup

For this project, I used three data sources:

UTM parameters on every link we controlled. Every ad, every email link, every social post got tagged with utm_source, utm_medium, utm_campaign, and utm_content. I created a UTM naming convention document so the marketing team stayed consistent.

Google Analytics 4 event data exported to BigQuery via the native GA4-BigQuery integration. This gave us raw event-level data with timestamps, session info, and traffic source details.

CRM data from HubSpot â€” specifically the contact timeline and deal records. This told us which leads converted and for how much revenue. We exported this to BigQuery on a nightly schedule using a Python ETL script.

The key step was identity resolution. We matched GA4 user pseudo IDs to HubSpot contact records using the email address captured at form submission. This let us stitch pre-conversion anonymous browsing sessions to known contacts.

-- Identity mapping: link GA4 users to CRM contacts
CREATE TABLE attribution.identity_map AS
SELECT DISTINCT
    ga.user_pseudo_id,
    crm.contact_id,
    crm.email,
    ga.event_timestamp AS first_identified
FROM `project.analytics.events_*` ga
JOIN `project.crm.contacts` crm
    ON LOWER(ga.event_params.value.string_value) = LOWER(crm.email)
WHERE ga.event_name = 'form_submit'
    AND ga.event_params.key = 'email';

Building the Position-Based (U-Shaped) Attribution Model

With clean, unified data in BigQuery, I built the attribution model in three stages: touchpoint sequencing, credit assignment, and revenue allocation.

Step 1: Build the Touchpoint Journey

First, I constructed a complete journey for every converted user â€” every touchpoint from first visit to conversion, ordered by timestamp.

-- Build ordered touchpoint journeys for converted users
CREATE TABLE attribution.touchpoint_journeys AS
WITH conversions AS (
    SELECT
        im.contact_id,
        im.user_pseudo_id,
        d.deal_id,
        d.deal_amount,
        d.close_date AS conversion_date
    FROM attribution.identity_map im
    JOIN `project.crm.deals` d
        ON im.contact_id = d.contact_id
    WHERE d.deal_stage = 'closed_won'
),
touchpoints AS (
    SELECT
        c.contact_id,
        c.deal_id,
        c.deal_amount,
        c.conversion_date,
        ga.event_timestamp,
        ga.traffic_source.source AS source,
        ga.traffic_source.medium AS medium,
        ga.traffic_source.name AS campaign,
        ROW_NUMBER() OVER (
            PARTITION BY c.deal_id
            ORDER BY ga.event_timestamp ASC
        ) AS touch_position,
        COUNT(*) OVER (
            PARTITION BY c.deal_id
        ) AS total_touches
    FROM conversions c
    JOIN `project.analytics.events_*` ga
        ON c.user_pseudo_id = ga.user_pseudo_id
    WHERE ga.event_timestamp < c.conversion_date
        AND ga.event_name IN (
            'page_view', 'session_start',
            'form_submit', 'email_click',
            'ad_click'
        )
)
SELECT * FROM touchpoints;

Step 2: Apply Position-Based Credit Weights

This is where the attribution logic lives. In a position-based model, the first and last touchpoints each receive 40% of the credit. Everything in between splits the remaining 20% equally.

-- Assign position-based (U-shaped) attribution weights
CREATE TABLE attribution.weighted_touchpoints AS
SELECT
    contact_id,
    deal_id,
    deal_amount,
    source,
    medium,
    campaign,
    touch_position,
    total_touches,
    CASE
        -- Single-touch journey: 100% credit
        WHEN total_touches = 1 THEN 1.0
        -- Two-touch journey: 50/50 split
        WHEN total_touches = 2 THEN 0.5
        -- Multi-touch: U-shaped weights
        WHEN touch_position = 1 THEN 0.4
        WHEN touch_position = total_touches THEN 0.4
        ELSE 0.2 / (total_touches - 2)
    END AS attribution_weight,
    CASE
        WHEN total_touches = 1 THEN deal_amount * 1.0
        WHEN total_touches = 2 THEN deal_amount * 0.5
        WHEN touch_position = 1 THEN deal_amount * 0.4
        WHEN touch_position = total_touches THEN deal_amount * 0.4
        ELSE deal_amount * (0.2 / (total_touches - 2))
    END AS attributed_revenue
FROM attribution.touchpoint_journeys;

Notice the edge cases: single-touch journeys get 100% credit to that one touchpoint, and two-touch journeys split 50/50. These edge cases matter more than you think â€” in most datasets, 15-25% of conversions come from single-touch journeys.

Step 3: Aggregate Attributed Revenue by Channel

Now we roll everything up to see how much revenue each channel actually drove.

-- Final attribution report: revenue per channel
SELECT
    CONCAT(source, ' / ', medium) AS channel,
    COUNT(DISTINCT deal_id) AS conversions_touched,
    ROUND(SUM(attribution_weight), 1) AS weighted_conversions,
    ROUND(SUM(attributed_revenue), 2) AS attributed_revenue,
    ROUND(SUM(attributed_revenue) /
        NULLIF(COUNT(DISTINCT deal_id), 0), 2) AS revenue_per_conversion
FROM attribution.weighted_touchpoints
GROUP BY channel
ORDER BY attributed_revenue DESC;

This single query gives you the full picture: which channels are driving the most attributed revenue, how many conversion journeys they participate in, and what the average value per conversion looks like.

Validating with Python

SQL is great for building the model inside your data warehouse, but I also built a Python version for validation and experimentation. This made it easy to test different model weights and compare results side by side.

import pandas as pd

def apply_position_based_attribution(journeys_df):
    """
    Apply U-shaped (position-based) attribution to a
    DataFrame of touchpoint journeys.

    Expects columns: deal_id, deal_amount, source,
    medium, touch_position, total_touches
    """
    def calculate_weight(row):
        total = row['total_touches']
        pos = row['touch_position']

        if total == 1:
            return 1.0
        elif total == 2:
            return 0.5
        elif pos == 1:
            return 0.4
        elif pos == total:
            return 0.4
        else:
            return 0.2 / (total - 2)

    df = journeys_df.copy()
    df['weight'] = df.apply(calculate_weight, axis=1)
    df['attributed_revenue'] = df['deal_amount'] * df['weight']

    # Aggregate by channel
    channel_attribution = (
        df.groupby(['source', 'medium'])
        .agg(
            conversions_touched=('deal_id', 'nunique'),
            weighted_conversions=('weight', 'sum'),
            attributed_revenue=('attributed_revenue', 'sum')
        )
        .sort_values('attributed_revenue', ascending=False)
        .reset_index()
    )

    channel_attribution['channel'] = (
        channel_attribution['source']
        + ' / '
        + channel_attribution['medium']
    )

    return channel_attribution

# Compare multiple models side by side
def compare_models(journeys_df):
    """Run all five attribution models and compare."""
    results = {}
    df = journeys_df.copy()

    # Last-touch
    last_touch = df[df['touch_position'] == df['total_touches']].copy()
    last_touch['attributed_revenue'] = last_touch['deal_amount']
    results['last_touch'] = (
        last_touch.groupby('source')['attributed_revenue'].sum()
    )

    # First-touch
    first_touch = df[df['touch_position'] == 1].copy()
    first_touch['attributed_revenue'] = first_touch['deal_amount']
    results['first_touch'] = (
        first_touch.groupby('source')['attributed_revenue'].sum()
    )

    # Linear
    df_linear = df.copy()
    df_linear['attributed_revenue'] = (
        df_linear['deal_amount'] / df_linear['total_touches']
    )
    results['linear'] = (
        df_linear.groupby('source')['attributed_revenue'].sum()
    )

    # Position-based
    position_based = apply_position_based_attribution(df)
    results['position_based'] = (
        position_based.set_index('source')['attributed_revenue']
    )

    comparison = pd.DataFrame(results).fillna(0)
    comparison.columns = [
        'Last Touch', 'First Touch',
        'Linear', 'Position-Based'
    ]
    return comparison

Running this comparison function was the "aha moment" for my client. They could see, in one table, how dramatically different the story looked depending on the model.

Building the Attribution Dashboard

Numbers in a SQL query are useful for analysts. But to get buy-in from the marketing director and the CFO, you need a visual dashboard. I built two versions: one in Looker Studio (for the team that lives in Google Workspace) and one in Power BI (for the executive stakeholders who preferred it).

Dashboard Components

The dashboard included four main views:

Channel Attribution Summary: A horizontal bar chart showing attributed revenue per channel under the position-based model, with a toggle to switch between models for comparison.
Model Comparison Table: A side-by-side table showing how each channel's attributed revenue changes across all five models. This was the most important view for decision-making â€” it showed where the models agreed (high confidence) and where they diverged (needs investigation).
Journey Length Distribution: A histogram showing how many touchpoints the average conversion journey has. This validates whether a multi-touch model is even necessary. If most journeys are single-touch, the model choice matters less.
Attribution Over Time: A line chart showing attributed revenue by channel per month. This surfaces seasonal trends and lets the team see how channel performance evolves over time.

The most powerful feature of the dashboard was the model comparison toggle. When the marketing director could see that organic content was worth $12,000/month under position-based attribution versus $800/month under last-click, the budget conversation changed immediately.

Real Results: How the Model Changed Budget Allocation

Here is what happened after we deployed the attribution model and let it run for three months.

Under the old last-click model, the budget split looked like this:

Google Ads: 55% ($22,000/mo)
Facebook Ads: 25% ($10,000/mo)
Email Marketing: 10% ($4,000/mo)
Content/SEO: 10% ($4,000/mo)

The position-based model told a very different story. Here is the attributed revenue breakdown it revealed:

Google Ads: 30% of attributed revenue (was getting 55% of budget)
Facebook Ads: 18% of attributed revenue (was getting 25% of budget)
Email Marketing: 22% of attributed revenue (was getting 10% of budget)
Content/SEO: 25% of attributed revenue (was getting 10% of budget)
LinkedIn Organic: 5% of attributed revenue (was getting 0% of dedicated budget)

Content and email were massively underfunded relative to their contribution. Google Ads was overfunded â€” it was capturing demand that other channels had created, and getting all the credit for it.

The client reallocated budget over the next quarter:

Google Ads: reduced to 35% ($14,000/mo)
Facebook Ads: reduced to 18% ($7,200/mo)
Email Marketing: increased to 20% ($8,000/mo)
Content/SEO: increased to 22% ($8,800/mo)
LinkedIn: new allocation of 5% ($2,000/mo)

The result after one quarter: total conversions increased by 18% while overall spend stayed flat. The blended cost per acquisition dropped from $320 to $264. The model paid for itself many times over.

The biggest wins came from doubling the email marketing budget. Those nurture sequences were converting warm leads that Google Ads would have had to re-acquire at 3x the cost. The attribution model made that invisible value visible.

Key Takeaways and Best Practices

After building attribution models for multiple clients, here are the lessons I keep coming back to:

1. Start with position-based attribution. It is the best default for most businesses. It properly values both awareness (first touch) and conversion (last touch) while still giving credit to the middle of the funnel. You can always move to a data-driven model later once you have enough volume.

2. Get your UTM hygiene right first. No attribution model can save you if your tracking is inconsistent. Create a UTM naming convention, enforce it across the team, and audit your parameters monthly. One person tagging a source as "facebook" and another as "Facebook" or "fb" will fragment your data.

3. Do not ignore the middle of the funnel. The touchpoints between first click and final conversion are where trust is built. Email nurture sequences, blog content, case studies, and webinars rarely get credit under last-click models, but they are often the reason someone converts.

4. Run models in parallel before making decisions. Do not rip out your existing reporting overnight. Run the new attribution model alongside your old one for at least one quarter. Compare the stories they tell. The areas where they differ most are where the biggest insights live.

5. Update the model regularly. Customer behavior shifts. New channels emerge. Review your attribution model quarterly. Check if the average journey length is changing, if new touchpoint types need to be added, and if the weights still make sense.

6. Use attribution for direction, not precision. No attribution model is perfectly "right." The goal is not to calculate the exact dollar value of every touchpoint. The goal is to get a directionally accurate picture that is significantly better than last-click. Even an imperfect multi-touch model is a massive upgrade.

Want an Attribution Model for Your Business?

If you are still making budget decisions based on last-click attribution, you are almost certainly misallocating spend. The good news is that building a proper attribution model does not have to take months. With clean data and the right approach, I can typically have a working model and dashboard deployed in two to three weeks.

I have built attribution systems for e-commerce brands, B2B SaaS companies, and lead-gen businesses â€” each with their own quirks and data challenges. Whether you need a BigQuery-based model like the one described here, or something simpler built in Google Sheets and Looker Studio, I can help.

Get in touch for a free consultation, or hire me on Upwork to get started right away.

How I Built a Marketing Attribution Model That Actually Works