This is a design and working prototype, not a deployed client engagement. Every dbt model, macro, and validation test below is real. The metrics that survive are pipeline-level numbers from the build itself, not projected business outcomes.

What this system is

A modern data stack that unifies revenue data across marketing CRM, sales CRM, billing, and product analytics into a single warehouse, transforms it through a versioned modeling layer, and operationalizes the resulting account-health and channel-LTV signals via Reverse ETL back into the CRM where the GTM team actually works.

It’s infrastructure — the foundation both classes of AI agents depend on. Internal agents (research, summarization, routing) and go-to-market agents (sending, publishing, personalization) all need a deduplicated, schema-stable account record to operate against. This is that record.

What motivated the build

A Series C SaaS company had marketing in one CRM, sales in another, billing in a payments platform, and product usage in a third analytics tool. None of the systems talked to each other reliably. The result: nobody could answer the question that mattered — which acquisition channel produces customers that actually stick around and use the product? — because ad-spend data and product-usage data lived in disconnected silos.

Worse, the brittle point-to-point integrations (Zapier-style spaghetti) constantly drifted out of sync. Marketing’s “active customer” count diverged from sales’s; sales’s diverged from finance’s; finance’s diverged from the product team’s. The board got different revenue numbers depending on which dashboard they opened.

The design goal: a deduplicated, versioned, observable single source of truth that the AI agents on top can trust.

Technical architecture

graph TD
    %% Phase 1: Ingestion
    subgraph Phase 1: EL Tools (Extract/Load)
        A[Salesforce] -->|Fivetran| E[(Google BigQuery DWH)]
        B[HubSpot] -->|Fivetran| E
        C[Stripe] -->|Fivetran| E
        D[Mixpanel] -->|Fivetran| E
    end

    %% Phase 2: Transformation
    subgraph Phase 2: dbt Transformation Layer
        E --> F[Staging Tables:<br/>Standardize IDs]
        F --> G[Core Join:<br/>Unified Account Model]
        G --> H[Fact Tables:<br/>Churn Risk & MRR]
    end

    %% Phase 3: Activation
    subgraph Phase 3: Reverse ETL & BI
        H --> I[Looker Studio:<br/>Exec Dashboard]
        H --> J[Hightouch:<br/>Reverse ETL]
        J -->|Churn Alert Flag| A
    end

    style I fill:#3b82f6,color:#fff
    style J fill:#22c55e,color:#fff

dbt model DAG

graph LR
    subgraph Staging
        S1[stg_salesforce_accounts]
        S2[stg_hubspot_contacts]
        S3[stg_stripe_subscriptions]
        S4[stg_mixpanel_events]
    end

    subgraph Core
        S1 --> C1[core_unified_accounts]
        S2 --> C1
        S3 --> C1
        S4 --> C1
    end

    subgraph Facts
        C1 --> F1[fact_account_unified]
        C1 --> F2[fact_channel_ltv]
        F1 --> F3[fact_churn_alerts]
    end

    subgraph Reverse ETL
        F3 -->|Hightouch| RT1[Salesforce: Health Score Field]
        F2 -->|Looker| RT2[Exec Dashboard]
    end

Architecture-spec table (real numbers from the build)

SpecValueNotes
Ingest cadenceEvery 15 minutesFivetran-managed
Number of upstream sources4Salesforce, HubSpot, Stripe, Mixpanel
dbt model count~25Staging + core + facts + utility macros
Universal-domain-resolution coverage~98%After prefix-strip + email-domain extraction
Schema-test failure thresholdPipeline halts on error severityMRR not_null, accepted_range
Reverse-ETL sync cadenceEvery 15 minutesHightouch-managed
Health-score categories3HIGH_RISK_CHURN, MODERATE_RISK, HEALTHY
Data-freshness alert thresholdTable not refreshed >2hSlack alert; CSM workflow paused

Universal domain-resolution macro

-- macros/resolve_domain.sql
{% macro resolve_domain(raw_domain_column) %}
  REGEXP_REPLACE(
    REGEXP_REPLACE(
      LOWER({{ raw_domain_column }}),
      r'^(www\.|mail\.|app\.|uk\.|us\.|de\.)', ''
    ),
    r'^[^@]*@', ''
  )
{% endmacro %}

This macro is called from every staging model that joins on domain — Salesforce, HubSpot, Stripe (via billing_email), Mixpanel (via user_domain). Without it, [email protected] doesn’t match company.com and the join silently misses 15-25% of records.

Core unified account model

-- models/core/fact_account_unified.sql
WITH sales_crm AS (
  SELECT account_id, {{ resolve_domain('domain') }} AS domain, arr, csm_owner
  FROM `analytics.stg_salesforce_accounts`
),
finance AS (
  SELECT {{ resolve_domain('billing_email') }} AS domain, mrr,
         DATE_DIFF(CURRENT_DATE(), last_payment_date, DAY) as days_since_payment
  FROM `analytics.stg_stripe_subscriptions`
),
product_usage AS (
  SELECT {{ resolve_domain('user_domain') }} AS domain,
         COUNT(login_event) as logins_last_30_days,
         MAX(last_active_date) as last_active
  FROM `analytics.stg_mixpanel_events`
  WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY 1
)

SELECT
  s.account_id,
  s.domain,
  f.mrr,
  p.logins_last_30_days,
  CASE
    WHEN p.logins_last_30_days < 2 AND f.mrr > 0 THEN 'HIGH_RISK_CHURN'
    WHEN p.logins_last_30_days BETWEEN 2 AND 5 AND f.mrr > 0 THEN 'MODERATE_RISK'
    ELSE 'HEALTHY'
  END AS account_health_score
FROM sales_crm s
LEFT JOIN finance f ON s.domain = f.domain
LEFT JOIN product_usage p ON s.domain = p.domain;

Schema-test assertions (data quality gate)

# models/core/schema.yml
version: 2
models:
  - name: fact_account_unified
    columns:
      - name: account_id
        tests:
          - unique
          - not_null
      - name: mrr
        tests:
          - not_null:
              severity: error  # Pipeline STOPS if Stripe sync fails
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 500000
      - name: account_health_score
        tests:
          - accepted_values:
              values: ["HIGH_RISK_CHURN", "MODERATE_RISK", "HEALTHY"]

These tests fire on every dbt run. An error-severity failure halts the pipeline before the downstream dashboards see broken data; a warn-severity failure fires a Slack alert but lets the build continue.

Reverse-ETL sync configuration

{
  "sync_name": "churn_risk_to_salesforce",
  "source": {
    "type": "bigquery",
    "table": "analytics.fact_account_unified",
    "filter": "account_health_score IN ('HIGH_RISK_CHURN', 'MODERATE_RISK')"
  },
  "destination": {
    "type": "salesforce",
    "object": "Account",
    "match_key": "account_id",
    "field_mappings": {
      "account_health_score": "Health_Score__c",
      "logins_last_30_days": "Product_Logins_30d__c",
      "mrr": "Current_MRR__c"
    }
  },
  "schedule": "every_15_minutes",
  "on_change": {
    "trigger_flow": "CSM_Intervention_Task",
    "condition": "Health_Score__c = 'HIGH_RISK_CHURN'"
  }
}

The Reverse ETL closes the loop: the warehouse calculates account health; the CRM gets a field updated; the CSM gets a task created. The team works where they work; the warehouse stays the source of truth.

Failure modes (named)

  1. Messy domain matching. Different systems represent the same account differently (uk.company.com, [email protected], company.com). Without the universal resolve_domain macro, 15-25% of joins silently miss. The macro strips prefixes and email syntax to a canonical root domain across every source.
  2. Pipeline fragility on upstream sync failures. If the Stripe ingest fails for 6 hours, the dashboard would silently report MRR drops. not_null + accepted_range schema tests on mrr halt the pipeline so the dashboard reports “data unavailable” instead of “MRR dropped 80%.”
  3. Timezone drift. Mixpanel timestamps in UTC; Salesforce in PST; Stripe in Unix epoch. Joining raw timestamps falsely flags healthy accounts as inactive. All timestamps standardize to UTC in the staging layer before any downstream join logic runs.
  4. Silent sync failures. Fivetran connectors fail quietly more often than vendors admit. BigQuery metadata alerts in a dedicated Slack channel fire when any table hasn’t received fresh data in >2 hours, before stale data reaches the dashboards.
  5. Cardinality explosions in joins. A bad join condition produces 10x the expected row count. dbt’s uniqueness test on account_id catches this before the explosion propagates downstream.

Where this fits in the broader system

This is the infrastructure that both classes of AI agents depend on. The lifecycle email agents, the outbound research agents, the intent-data audience agents — they all need a clean, deduplicated account record with up-to-date product-usage and CRM-stage state. This system is that record.

The AI agents encyclopedia goes deep on the supervision models. The AI automation encyclopedia covers the orchestration layer that sits on top of this data layer.

Stack

Fivetran (ingest) · Google BigQuery (warehouse) · dbt (transformation + tests) · Hightouch (Reverse ETL) · Salesforce (CRM destination) · HubSpot (marketing CRM source) · Stripe (billing source) · Mixpanel (product analytics source) · Looker Studio (exec dashboard).

Want to talk about something like this?

If you’re standing up a single source of truth — or fixing one that drifted — hit me up.

Author

Fenil Parekh is a GTM engineer based in San Francisco Bay Area. He builds internal and go-to-market AI agents — programmatic inbound at scale, signal-driven outbound, intent-targeted paid, lifecycle email — for AI-native B2B SaaS. M.S. Computer Science, ITU San Jose. Currently Lead GTM Engineer (consulting) at Marketing Boutique. Built and broken in the open.

Interlinking

JSON-LD

TechArticle + BreadcrumbList

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "@id": "https://fenil.ai/library/revops-single-source-truth#article",
  "url": "https://fenil.ai/library/revops-single-source-truth",
  "headline": "RevOps Single Source of Truth — Case Study",
  "description": "How I designed a CRM-as-system-of-record refactor — schema reconciliation, dedup logic, and the validation passes that survive production scale.",
  "image": "https://fenil.ai/assets/og/revops-single-source-truth.png",
  "author": { "@id": "https://fenil.ai/#person" },
  "publisher": { "@id": "https://fenil.ai/#person" },
  "datePublished": "2026-05-10",
  "dateModified": "2026-05-10",
  "articleSection": "Case study",
  "keywords": "RevOps single source of truth, CRM as system of record, dbt revenue model, schema reconciliation",
  "inLanguage": "en"
}