LeadTrail: B2B Lead Enrichment Platform

Client: Confidential5 months

Tech Stack

DjangoPostgreSQLRedisCeleryCompanies House APIZenSERPHunter.ioSnov.ioHeroku

Challenge

B2B sales teams spend 2-3 hours per company manually researching prospects—gathering official records from Companies House, discovering websites, extracting contacts, identifying LinkedIn decision-makers, and sourcing verified emails. This manual process doesn't scale for large prospect lists (100+ companies), produces inconsistent data quality, and scatters information across multiple platforms. The client needed an automated solution to process entire prospect lists simultaneously, verify data from authoritative sources, and deliver outreach-ready lead profiles at scale.

Solution

Built an intelligent B2B lead enrichment platform automating the entire research workflow. Starting with just a UK company registration number, LeadTrail processes leads through an 8-stage automated pipeline—transforming 2-3 hours of manual work into 45 seconds.

8-Stage Enrichment Pipeline

Company Discovery & Verification

  • Companies House API integration for official business records, directors, incorporation dates
  • VAT registration discovery and tax compliance validation
  • Business legitimacy verification through government databases

Website & Contact Discovery

  • ZenSERP-powered intelligent web search with domain ranking algorithm
  • Automated website crawling extracting up to 15 emails and phone numbers per company
  • LinkedIn, Facebook, Instagram business profile discovery
  • Human approval workflow for quality control before extraction

Decision-Maker Identification

  • LinkedIn company page discovery and employee profile search
  • Cross-referencing Companies House directors with LinkedIn profiles
  • C-suite and VP-level targeting with manual selection for cost control
  • Job title identification for personalized outreach

Email Enrichment

  • Dual-source strategy: Hunter.io (domain-based) + Snov.io (LinkedIn profiles)
  • Multi-source email verification with confidence scoring
  • Deliverability checking and professional email validation

Technical Architecture

Backend & Queue Processing

  • Django REST API: 8-stage pipeline orchestration, custom management commands for batch processing, atomic transactions preventing corruption
  • Celery: 7 independent background workers, scheduled execution (2-5 min intervals), exponential backoff retry logic, real-time progress tracking
  • Redis: API response caching (70% cost reduction), rate limiting for quota management, session storage for long-running tasks

Data & Integration

  • PostgreSQL: Optimized schema modeling companies/contacts/enrichment stages, strategic indexing on company numbers and domains, historical progress tracking
  • API Abstraction Layer: Unified integration for 6+ sources (Companies House, ZenSERP, Hunter.io, Snov.io, VAT services), circuit breaker patterns, quota tracking preventing failures

Deployment & Monitoring

  • Heroku: Production hosting with auto-scaling, Heroku Scheduler for background jobs, database backups
  • Sentry: Real-time error tracking and performance monitoring

Results

Impact Metrics

  • 97% time savings: Reduced lead research from 2-3 hours to 45 seconds per company
  • 100+ companies processed per batch concurrently
  • 85% success rate for website discovery across diverse industries
  • 70% enrichment rate with valid email contacts
  • 90% accuracy on company information validation
  • 80% email deliverability on discovered contacts
  • 95% accuracy on company legitimacy verification
  • 70% API cost reduction through intelligent caching
  • 99.5% uptime over 6-month production period
  • 10,000+ companies processed without data corruption
  • Zero duplicate entries via deduplication algorithms
  • 8-12 data points per comprehensive lead profile

Key Learnings

API Integration Architecture: Managing 6+ third-party APIs with different rate limits, error responses, and data formats required unified abstraction layer. Consistent error handling and retry logic made adding new sources straightforward—learned to abstract early, not after integrating multiple APIs.

Caching for Cost Control: Redis caching essential for both performance and cost—without it, repeated API calls would be prohibitively expensive. Smart cache invalidation (24-hour company data, 7-day website data) balanced freshness with cost savings. Achieved 70% API cost reduction through intelligent caching strategy.

Reliability Over Speed: Early versions prioritized speed but suffered cascading failures. Implementing exponential backoff, circuit breakers, and graceful degradation improved success rates dramatically. Better to take 60 seconds with 90% success than 30 seconds with 50% success—reliability wins in production.

Strategic Human Review Gates: Initially attempted full automation but discovered website matching and employee selection benefit from strategic human oversight. Two review gates (website approval, employee selection) improved quality significantly with minimal manual effort. Users appreciated control over targeting decisions and cost management.

Asynchronous Processing is Essential: Synchronous implementations created terrible UX with multi-minute page loads. Moving to Celery background tasks with real-time progress transformed experience—users upload lists, close browser, return to complete profiles. Async processing non-negotiable for long-running workflows.

Concurrent Worker Safety: 7 concurrent workers required careful database transaction handling. Django's atomic transactions and lock management prevented race conditions and data corruption under high load. Design for concurrency from day one—retrofitting is painful.

Screenshots

Interested in similar work?

Looking to build something like this? Let's discuss how I can help bring your project to life.

Get in touch