If you haven’t used BuiltWith before, it’s a SaaS that will show you all of the different technologies being used on a given website. For example, I can search BuiltWith for techcrunch.com and can see that they’re hosted at WordPress.com, are using all sorts of cool ad technologies and services, and can even see that they’re using the Jetpack plugin and Facebook commenting. All sorts of really interesting data all in one place and it couldn’t be easier to access. Just drop in the site URL and you’re good to go.
According to this Quora thread, BuiltWith now crawls over 300 million unique domains using their own crawler. They don’t rely on data from anywhere else. Wowsers.
The trends section is another really interesting tool. It shows how much certain apps and services are being used across their index, and whether that usage is rising or falling. You can get really interesting data like eCommerce software usage and the like.
Something You May Not Know about BuiltWith
By now you’re probably thinking “why am I still reading this post? I could have looked at their service for 30 seconds and learned all of this on my own.”
Don’t click away quite yet!
Last month when we got hit by a negative SEO attack, we started looking really closely at our data that was readily available online, and what we could do to protect our privacy and limit the number of threats against our domains.
While I had that on my mind, I was looking at our own site on BuiltWith.
— Ryan Sullivan (@ryandonsullivan) January 31, 2015
I couldn’t believe how accurate the details were. Their crawlers are really good at detecting all sorts of different software, and I really don’t think they missed identifying one bit of code on wpsitecare.com
That was a little unsettling since I was already paranoid about the previous attacks.
Could any of this data be used against us?
What can we do to make some of these details more private?
Wait, why is BuiltWith giving all of this data away for free? What’s in it for them?
This may not be news to you, but it turns out that BuiltWith is a Lead Generation and Market Analysis company. They’re selling your data to their customers.
From their website:
Start creating your lead list by choosing the technology you’re interested in. BuiltWith will come back with a list of all the sites on the internet using your chosen technology.
Get actual names, titles and emails of people at companies. With 7+ years of historical data we’ve built a comprehensive contact and email list providing you with the ability to see qualified emails for businesses where we’ve found titles and have a source of known email addresses for a company.
Tagline: Want to spam blast everyone running WooCommerce to market your cool new extension? BuiltWith’s got you covered.
Now, is BuiltWith a horrible company with horrible intentions? Not likely. They’re selling data that they collect which actually isn’t all that different than what Google does. The main difference is that that Google is selling to advertisers and veiling a lot of the sensitive information. BuiltWith gives you all of the information you want, without discretion. It’s likely the source for a lot of fun emails like this one.
How to Remove Yourself from BuiltWith
Whether or not being indexed by BuiltWith is a security concern is hard to say. If someone wanted to find out the software you’re running in order to harm your site, and actually had the skills to pull it off, they probably wouldn’t be using a tool like BuiltWith to begin with. That’s not anywhere near L337 enough.
That said, if you’re not a fan of spam, and don’t want to show off your goods without making folks work for it even a little bit, you can actually hide yourself from the BuiltWith database.
I read through the site FAQ and looked all over the web to try and find information about blocking BuiltWith or being removed from their results, and I found close to nothing. Their data is the lifeblood of their business, so I wouldn’t be surprised if they go to great lengths to make sure being hidden from their service isn’t something that can be done easily.
Since Google turned up close to nothing, I decided to try and block the BuiltWith user agent. Since I knew that they collected all their own data, I knew they had to have a proprietary web crawler of some kind. The more I searched google to find out their user agent string so I could block it, the less I found. Again, obscurity FTW!
We did some digging and testing and it looks like the current agent is
Mozilla/5.0 (compatible; BuiltWith/0.1; +http://builtwith.com/bot.html)
The main issue with blocking the user agent string is that they can easily change that string at any time, and if you don’t stay on top of it, your results will keep being updated. The same goes for blocking their site IP addresses. Those are dynamic and unless you’re checking in every single day and updating the blocks, you’re going to be out in the open again.
That didn’t sound like something I wanted to spend very much time on, so I did the next thing that came to mind. I emailed BuiltWith and asked them to remove my domain from their results. Mindblowing right?
They sent me a link (noindexed of course) and you can request that your domain be removed here.
Obviously whether you choose to opt out or not is totally up to you, but for now, I’ve made up my mind…