How to Mine Open Source Data for Enterprise Sales
How Sales and RevOps teams can analyse their OSS repositories to power enterprise GTM motion
Introduction
Open source is not a business model, it is a distribution advantage. Communities of millions of developers trying out your product, giving feedback and building along with you. But then converting this motion to a revenue needs significant efforts, which often starts with understanding which developers in which organizations are using or contributing or simply playing around with your open source product with business related needs or evaluations in mind.
Getting this intel can uncover a lot of value for a Commercial Open Source Business, especially for their enterprise sales teams that can use this mined data to understand which accounts to target and how to target them since a lot of this can be gained from analyzing the usage data. It comes as no surprise that companies such as Hashicorp and Imply, along with many others, have mined their open source usage data to build intelligent pipelines for their enterprise sales teams (source).
Imply’s enterprise sales team closely monitors the company’s open source Druid base to inform them on which organizations might be ready for an enterprise proof of concept. PQLs that have previously used Imply’s open source offering (either in testing, development, or production) are coded as “pre-buyers,” while leads with no prior exposure to Druid are coded as “pre-believers.”
Source: BVP Atlas
The trickiest part of mining data for enterprise sales is that you first need to understand which of the accounts playing with your open source products really fit your customer profile, and are really worth going after based on their purchasing powers and fitment to your growth strategy. Secondly, you want to track their activity across time to understand the usage patterns to know what is the right time to get into the account and make a hard push. If you go in too early, the buyers might not have completely discovered the problem and will not be ready to talk to business teams; and if you go in too late, they might have already customized the open source product in a way that fits them best. The third part is that once you identify an account that fits the enterprise sales criteria, you want to know every developer activity happening in the account because even small signals (such as a developer asking a query in an open forum or your support community) can mean a lot commercially.
Understanding Developer Intent Sources
The key in any enterprise sales activity is to filter out revenue signals from a lot of data. Probably, one important thing here is to know the sources where you can find developer intent and how you can decode it to know which organization/team is evaluating your product. Once you have that, you can triangulate the data to create an account-based view of developer activity.
This has two parts, a) understanding what all intent sources are important for you and b) then who are the developers who are showing signs of activity in such intent sources and which organizations they belong to.
Intent Sources
There is a whole host of places where you find developer intent. Since we are discussing mining OSS data, we are limiting to selected sources which might be more important for an OSS-focused company:
Github engagement data: This will include any usage or engagement data you can get from Github such as Stargazers, Watchers, Forks, Issue Creators, PRs, and more.
Package Manager Downloads: Here too, you can measure intent across multiple dimensions such as frequency of download - that is how often developers from an organization are downloading the package (someone who did a download 6 months back and never came back for a version update has lower intent than someone who repeatedly downloads the code); number of unique developers from an organization downloading the package and how actively is the open source code being used within the organization (for this, you will need some light telemetry or call home function)
Docs, Guides and Technical Content: Docs, Usages guide are a very strong signal of developer intent. In fact, studying docs usage can be such a strong signal that we will dedicate a separate article to this. But for the scope of this article, you can measure strength of intent depending upon the type of the page (for example if the developer is reading API references, the intent is higher than reading Welcome page) and time spent on pages.
Product Trials: Intent can be measured based on product usage metrics from a single developer (whatever you track - example # of API calls, data usage, etc.) or number of developers from the same organization trying out the product.
Since not all intent signals are equal, you might like to create a weightage chart that you can use for creating scores that will help you triangulate intent and discover which accounts are important to you
Image 1. Determining intent signals and weightage. Please see that the intent signals and weightage given above are representative and will vary from org to org
Decoding Developer and Account Identity from Intent Sources
This, in fact, is the tricky bit and is the one that takes maximum manual effort to decode data. Nevertheless, here are a few things that work:
Github:
Developer’s Public profile data: Many developers disclose their identity in public with a link to their public profiles, their organization, and email IDs. There are in fact advanced methods of finding an email ID of a developer from Github.
Public Data Sources: Github engagement activities at least give you a Github ID, Name, and Display pic. This tactic involves some smarter manipulation of this data along with search skills to reach a destination where you can understand the developer’s organization - such as their LinkedIn profile or company ID.
Data Enrichment tools: There are multiple data enrichment tools (e.g. Clearbit) in the market where you can enrich user data. You generally need a starting handle for this (like an email ID, Twitter handle, LinkedIn handle, or Github ID) to get other information, but once you have it, you will get the other handles.
Your CRM: If you ask developers to enter their Github ID and company ID as part of the sign-up process, then there is a good chance a developer actively engaging with your open source product would have signed up for your trial, webinar, community, or any other forms of developer engagement.
A bit of everything: But with strong computing, search models, and data access. That is what we do at Reo, check us out here.
Package Manager Installs
Creating Gateways: If you want to track which organization has the developers who are downloading your package manager, one way is to create a custom command that provides the code to the developer via a gateway that tracks the IP of the downloader. Then you can use reverse IP lookup tools such as Leadfeeder or MaxMind to learn which organization the developer works for.
Call Home Function / Telemetry: Many OSS projects today are embedding Opt-in telemetry to understand developer usage and gather product feedback. Common data gathered by OSS projects include the IP addresses, whether the instance is active, version of the software being used, timestamps, and more. If you already have a hook on the developer (from the methods above), linking that to the telemetry data can help business teams learn which accounts are using the instance actively.
Docs
This is relatively simple if you are already tracking first-party data on your page views. You will be able to know which user has gone through which pages on your documentation. Tools like Hubspot and many other marketing CRMs make it easier as they track a user’s first-party cookies across their website activities and link it to the identity and organization if that user has submitted such information on the form. The challenge here is to filter out which developers are actively spending time on “high purchase intent” pages or sections of the docs and technical content as compared to “low purchase intent” pages such as blogs, articles, and exploratory website visits.
Product Trials
As above, this information can be derived from multiple product analytics tools such as Amplitude or Mixpanel that link a user’s first-party data with their product usage patterns.
Deriving Account Intelligence from OSS Data
Till now, you should get a good idea about which developers from relevant accounts are showing the medium-to-strong intent signals of OSS usage. The path ahead from here is about knowing whether an account is at the ‘nurturing stage’ where you might need to be educational and evangelism centric to guide the developers learn about the product or help them evaluate it better; or whether you have reached the ‘right time’ to get the GTM motion activated to reach out to decision-makers, get more aggressive, and use your developer champions to make the sale.
Once you have the data, you can use it to build various business strategies to reach such decisions:
Building Account Scores Based on Developer Activities: Account scoring can help you understand which accounts are cold, warm, or hot basis the developer activity. While a lot has been written on lead or account scoring (two interesting posts I recommend from Hubspot and Techtarget), you basically are associating a numerical value to intent activities and weightage to intent signals to understand which account is showing a surge (or slowdown in activities).
Mapping Account Scores with Sales Funnel Data: The GTM motion in developer tool companies is often a hybrid of bottom’s up and top-down GTM (we wrote a detailed article on this some time back on building developer-focused sales funnels). One way we recommend analyzing such data is to plot it as a matrix with top-down funnel (Prospect → Lead → Opportunity → Customer) and bottom’s up funnel (Low Account Score → High Account Score) in adjacent dimensions and plot various accounts in the quadrants. This will help you understand the true view and action items for each account.
Creating Account-Based Views with Activity Timelines: This strategy goes deep into a single account view and maps the developer activities across a timeline. There are two views you can create to analyze the data: a) ‘Velocity View,’ which gives you the graph view of the velocity of the account by mapping the account score across time. This view helps you understand in real-time when an account is hot and when things are cooling down. b) The other view, the ‘Activity View,’ maps the developer activity across time. Here you can create further qualification criteria (e.g., 1 Github activity + 1 package manager install + 5 key pages on the docs read = Qualified) to identify the accounts where you want to focus.
Mining open source data for enterprise sales can be a highly effective strategy for commercial open source businesses. By understanding which organizations are using their products, and how they are using them, these businesses can better target their sales efforts and generate more revenue. However, the process of mining data for enterprise sales is not without its challenges. It requires careful consideration of the right intent sources, decoding developer and account identities from those sources, and deriving account intelligence from open source data. By following these steps, businesses can build more effective sales strategies and drive growth.
We are building Reo.Dev to help open source companies leverage their community momentum into revenue and we would love to jam with folks who are solving the same problem for their organizations.
This is brilliant stuff!