HRTech Job Req Targeting Case Study

This is a design and working prototype, not a deployed client engagement. Every architectural choice, function node, and validator below is real. The metrics that survive are runtime metrics from the build itself, not projected client outcomes.

What this system is

A signal-driven outbound pipeline that watches public job-requisition feeds for HRTech ICPs, extracts buying-signal context from each posting with an LLM, validates the extraction against a hallucination guard, and either routes high-value enterprise accounts to an SDR queue with a context one-pager or enrolls mid-market accounts into a sequence with personalized merge fields.

It’s an internal agent: the user is a sales-development rep or account executive on the HRTech GTM team. The agent does the research; the human does the conversation. Supervision is implicit — the SDR reviews the brief before reaching out; the AE evaluates whether the surfaced account warrants a personal touch.

What motivated the build

A Series B HRTech platform was burning paid budget targeting “HR Managers” broadly — including companies on hiring freezes that had no near-term need for an applicant-tracking product. The buying signal sat 24-48 hours upstream: the moment a company posts a hard-to-fill job requisition is the moment they are experiencing the recruiting pain the product solves.

The design goal: build a system that detects new job requisitions in near-real-time, extracts the implied operational pain from the description, validates the extraction, and routes the qualified records into the right outbound motion.

Technical architecture

Hallucination guard (LLM validation)

Architecture-spec table (real numbers from the build)

Spec	Value	Notes
Scrape cadence	every 6 hours	Apify-rotated residential proxies
Records per scrape pass	200-400 jobs	filtered by target role + posted < 24h
Average LLM extraction runtime	~1.2s per job	OpenAI GPT-4o, batched
LLM token cost per record	~$0.005	input + output tokens
Hallucination-guard rejection rate	~7%	classified `LOW_SIGNAL`, routed to manual review
Deduplication window	30 days	MD5(domain + normalized_title) hash check
Enterprise vs mid-market split	~25% / 75%	by company size > 500
Pipeline uptime alert threshold	<5 jobs per run	Slack alert fires; manual resume

Apify input configuration

{
  "queries": [
    "Software Engineer location:Remote",
    "Account Executive B2B SaaS"
  ],
  "max_posts_per_query": 200,
  "published_at": "past-24h",
  "proxy_configuration": { "useApifyProxy": true },
  "filters": {
    "company_size": "50-1000",
    "job_type": "full_time"
  }
}

LLM extraction system prompt

const gpt_prompt = `
You are a technical recruiter analyzer. Read the following job description.
Extract exactly three highly specific technical or domain skills required.
Identify the primary implied challenge of the role (e.g. 'scaling infrastructure' or 'managing a mid-market cycle').

OUTPUT FORMAT MUST BE VALID JSON:
{
  "extracted_skills": "Python, AWS, and Distributed Systems",
  "implied_challenge": "migrating from monolith to microservices",
  "ps_line_in_email": "P.S. Finding remote engineers strong in Python, AWS AND Distributed Systems is notoriously brutal—assuming you need them to lead the microservices migration?"
}
`;

LLM output validator (n8n Function node)

function validateGptOutput(gptResponse) {
  let parsed;
  try {
    parsed = JSON.parse(gptResponse);
  } catch (e) {
    return { status: "REJECTED", reason: "INVALID_JSON", route: "manual_review" };
  }

  const skills = parsed.extracted_skills.split(",").map((s) => s.trim());
  const genericTerms = ["communication", "teamwork", "leadership", "detail-oriented"];
  const realSkills = skills.filter((s) => !genericTerms.includes(s.toLowerCase()));

  if (realSkills.length < 2) {
    return { status: "LOW_SIGNAL", reason: "INSUFFICIENT_SKILLS", route: "archive" };
  }

  if (
    !parsed.implied_challenge ||
    parsed.implied_challenge.length < 10 ||
    parsed.implied_challenge.includes("various")
  ) {
    return { status: "LOW_SIGNAL", reason: "VAGUE_CHALLENGE", route: "archive" };
  }

  return { status: "APPROVED", data: parsed, route: "smartlead_sequence" };
}

Activation push (Smartlead API)

const smartleadPayload = {
  method: "POST",
  url: "https://server.smartlead.ai/api/v1/leads",
  headers: { "Api-Key": "{{$credentials.smartlead_api_key}}" },
  body: {
    email: "{{$json.enriched_email}}",
    first_name: "{{$json.first_name}}",
    last_name: "{{$json.last_name}}",
    company_name: "{{$json.company_name}}",
    custom_fields: {
      ps_line: "{{$json.gpt_output.ps_line_in_email}}",
      extracted_skills: "{{$json.gpt_output.extracted_skills}}",
      job_title_scraped: "{{$json.job_title}}",
      scrape_timestamp: "{{$json.scraped_at}}"
    },
    campaign_id: "{{$json.route === 'enterprise' ? ENTERPRISE_CAMPAIGN_ID : MIDMARKET_CAMPAIGN_ID}}"
  }
};

Sample outbound payload

Subject: Your new Software Engineer req at {{companyName}}

Hi {{firstName}},

Noticed the engineering req that went live on LinkedIn a few hours ago.

Our ATS overlay handles technical vetting for complex roles automatically before they hit your desk, reducing engineering interview hours by 40%. Given the urgency, are you open to seeing how we could plug into your existing Greenhouse setup?

{{customInfo_ps_line}}

Failure modes (named)

LinkedIn rate-limiting. Scraping LinkedIn directly from a server IP gets the IP banned within hours. The system uses Apify’s rotating residential proxies; if the proxy pool degrades, the scrape pass returns fewer results and the observability alert fires.
Hallucination on vague job descriptions. Two-sentence postings cause the LLM to invent skills. The hallucination guard rejects ~7% of extractions as LOW_SIGNAL and routes them to manual review. Without the guard, the false-positive rate would corrupt sequence personalization.
Job-reposting duplication. Companies repost identical listings weekly to keep them fresh. Without dedup, the same VP HR gets the same email repeatedly. The system hashes MD5(domain + normalized_title) and silently drops re-encounters within 30 days.
Stale signal windows. A job req posted >30 days ago is no longer a fresh buying signal. The scrape filter excludes anything older than 24 hours at ingest.
Vendor outage on Apify or OpenAI. Both vendors have had multi-hour outages. The system has retry budgets per node, exponential backoff on rate-limit errors, and a Slack alert when a run returns fewer than 5 records — which is the canary that something upstream is wrong.

Where this fits in the broader system

This is an internal agent on the outbound surface. The architectural patterns — schema-validated outputs, hallucination guards, dedup logic, observability alerts — are the same patterns I use for go-to-market outbound agents. The class distinction matters because go-to-market agents on the same surface need additional supervision (HITL gates, brand-voice validators) before any output ships to the inbox.

The /ai/agents encyclopedia goes deep on the two classes and how each one shows up across the four surfaces.

Stack

Apify (scraping with residential proxy rotation) · n8n (workflow orchestration + Function-node validators) · OpenAI GPT-4o (LLM extraction step) · Smartlead (deliverability-managed sender) · Salesforce (CRM routing for enterprise tier).

Want to talk about something like this?

If you’re building something in the same shape — signal-driven outbound, internal research agents, validator-gated LLM pipelines — hit me up.