Having worked with many funds now, we continue to be surprised by how infrequently fund managers use web scraped data to generate alpha. As more and more companies move their activities online, the web is becoming a goldmine of information to those who know how to harness it. We are often able to answer key questions on companies simply by scouring the web for data.
What’s even more interesting is there is much more information on the web than what a normal user can see on the front-end of a website. Oftentimes, there are remarkable insights in the network requests and code of a website that funds are not aware of. We’ve been able to generate incredible returns and save funds from major drawdowns through this “back-end” data.
Let’s dive into a couple of examples across industries that can drive alpha for your fund:
Perhaps the best industry to find web data is retail. The most common way we see fund managers leverage web scraped data is using one of the well known providers to track pricing. There are two main problems with this. The first is you’re likely not generating alpha since all the other funds see the data at the same time. The second is there are many other metrics you’re missing out on that could provide signal into the company.
To solve the first problem, you need to set up or partner with someone that has the infrastructure to scrape the website and create alerts faster than when the major providers release their data. This should not be difficult given the major providers usually release retail data monthly. Even by just setting up weekly scrapes, your analysts and PMs will know when a company is pushing promotions and eating into their gross margins weeks before other funds. As an example, we helped one of our clients identify LULU was reducing promotion depth and intensity weeks before other funds caught on, helping them generate incredible returns.
The second problem of missing metrics is more of a result of other providers only scraping “front-end” data. There are so many retail websites where we’ve been able to find exact inventory counts of each product publicly available in the network requests of the site. This provides us insight into both how much volume they’re selling and if they’re having inventory issues. For instance, we helped a fund avoid a catastrophe with a stock because we were able to identify the inventory issues from the back-end data before earnings. Some other metrics we’ve found to be helpful include 1) the number of reviews across products as a directional view for sales, 2) star rating of products as a measure of NPS, and 3) special tags for products on the back-end. For example, Home Depot labels certain products as super SKUs on the back-end, which we can use to track how well industrial companies’ products are selling through the HD channel.
This is where we’ve seen the largest data gap for funds. None of the traditional datasets like credit card, foot traffic, app downloads etc. can tell you how Snowflake is performing. Yet many of these companies have a strong online presence that can reveal valuable insights. Below are the three main ways we’ve been able to track enterprise software companies.
Forum questions - All enterprise software companies will either have a public community forum or Stack Overflow page where developers can ask questions. We track the full history of questions, responses, upvotes, views etc. to gauge inflection points of developer adoption and alert funds before the alpha erodes. This use case is especially helpful for VC funds to reach out to founders before their inboxes get flooded.
Package downloads - Developers usually need to download the appropriate software package to use the software in the coding language of their choice. For example, Python has the snowflake-connector-python package to connect to Snowflake from Python. We track the number of downloads for these packages to again gauge developer adoption and growth.
Github activity - Many enterprise software companies like MongoDB will have their repositories publicly available on Github. Developers can suggest changes (aka create pull requests), create issues, ask questions, fork the repository, give stars to show support etc. All of this is publicly available data that reveals customer growth and NPS and can be a major source of alpha if you have the right alerts set up.
Any website that contains information on unique inventory can provide incredible insights. Examples of this include car inventory (Carvana), home inventory (Pulte Homes), and room inventory (WeWork). Each of these have uniquely identifiable units that can be tracked when they come on the website and tagged as “sold” when they disappear from the website. As long as you track the website frequently enough, you can predict revenue extremely accurately.
Car inventory - Each car can be uniquely identified by its VIN. Furthermore, you can track average selling price, the number of days an average car is held in inventory, makes and models that are selling faster, and pricing power across marketplaces. For example, we were able to prove a key thesis for a fund around new vs used vehicle sales on Camping World’s platform using this methodology.
Home inventory - Many homebuilders will post sitemaps of available and sold homes across all their communities. Simply tracking these sitemap images and comparing them from one day to the next will give you a real-time view into how many houses they’re selling per day, for what price, and in what locations.
Room inventory - you can track which rooms come online as available and later disappear to track room sales. For example, with WeWork, they list all of their available office rooms for all their locations on the website. Each of these rooms has a unique ID on the back-end along with price and discounts that agents can offer. This is a must have dataset if you’re tracking Wework.
These are just a few of the dozens of use cases we’ve come across to generate alpha using web data. We’re always discovering more alongside our clients. If any of these sound interesting or if you’d simply like to bounce around ideas, we’re always open to having a conversation. Feel free to reach out to rohit [at] durablealpha.com or use the contact us page.
We believe that our team can drive ROI for your fund so strongly that we are willing to run an 8 week fully refundable trial with your team - try us and see the alpha!