Venture Capital

ML-Powered Investment Intelligence Platform for a Growth-Stage VC Firm

A respected Silicon Valley venture capital firm managing over $3 billion, focused on growth-stage investments in AI and SaaS companies. A pioneer of the data-driven VC movement.

Duration
Multi-year
ongoing
Team
Data engineers
ML engineers
Services
Data Platform
Data Engineering
Machine Learning
Tech Stack
Snowflake
Airflow on AWS MWAA
SQLMesh
Polytomic
Python
Salesforce

The Challenge

The Challenge

The firm had invested heavily in data providers and built a basic integration layer, but their initial scoring model was too simplistic. It produced a 0–1 rating of investment opportunities, but the formula was so basic that it frequently generated misleading scores, limiting adoption among investors. Leadership knew that their unique blend of external and internal data could create genuine competitive advantage — but they needed a proper data platform and ML capability to realize it.

The Solution

The project was structured in three blocks: integrating all data into a central hub, cleaning and processing diverse datasets, and building machine learning models on top.

cloud

Cloud-native data platform

a mix of custom connectors and SaaS integrations sync data into Snowflake using an ELT approach, orchestrated by Airflow on AWS MWAA. Data transformations run in SQLMesh using Snowflake SQL with Python UDFs, including temporal change tracking for historical reprocessing.
model_training

ML model deployment and integration

two models developed by the firm's data science team were deployed into production on the data platform we built. The first, a Company Lookalike model, uses NLP to match target companies against the firm's portfolio based on natural language descriptions. The second, a Deal Scoring model, replaced the original simplistic formula — combining four key data sources across 300+ features with a dedicated API providing LLM-based explanations of each score. We deployed both models into production, connected them to the underlying data infrastructure, and integrated them directly into Salesforce CRM for seamless investor adoption.

Results

cloud

Modern cloud-based data platform

integrating diverse external and internal sources with automated orchestration and monitoring.
model_training

Two ML models deployed and integrated

Company Lookalike for pattern-based sourcing and Deal Scoring for investment prioritization, both production-ready on the data platform and embedded in CRM. Deal Scoring is now core to the firm's investment decision-making process.
query_stats

Self-serve analysis environment

enabling data scientists to run ad-hoc queries and prepare for further model development.
Why This Matters

Every firm has access to the same market data. The edge comes from what you do with it. This platform turns commodity data into proprietary investment signal — a Deal Scoring model with 300+ features that replaces gut feel with quantifiable conviction. Because both models are embedded directly in the CRM, adoption is frictionless. Intelligence surfaces where investors already work, not in a separate tool they have to remember to check.

Continue exploring

More Case Studies

View All Case Studiesarrow_forward