At my day job, I write a lot of demos for videos, webinars and basically to help customers/developers. And the number one source of data used for these demos? Northwind.
Why?
- Great set of various data to bind to different things like grids, reports, and charts.
- It is used by many Microsoft samples.
- Available for SQL Server and Microsoft Access.
- Easily distributable.
So what’s problem? It’s a bit boring to always use the same set of data. In fact, Scott Hanselman even tried to stir the community with a call to action to come up with sources other than Northwind. That was back in 2008 and unfortunately, not many other sources were offered.
So, I’ve scoured the internet and below are several resources that I’ve found. Warning: You may need to “clean-up” this data. And you may also need to import it to your database of choice.
Fresh Data For Free
Good news, it’s easy these days to find interesting sources of data. And for free. If you’re willing to dig around and clean up some of the data, it’s right there for the taking.
Here’s a few:
FreeBase.com Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members.
Freebase data is available for free/libre for commercial and non-commercial use under a Creative Commons Attribution License, and an open API, RDF endpoint, and database dump are provided for programmers. –Wikipedia
Download the latest dumps directly from here:
http://download.freebase.com/datadumps/ – Browse the latest dumps
Government Data
The US government has made available lots of government data. Not all of it is interesting or even ‘clean’. However, there is lots of data:
- http://www.itdashboard.gov/data_feeds
- http://gbk.eads.usaidallnet.gov/data/detailed.html
- http://explore.data.gov/Foreign-Commerce-and-Aid/U-S-Overseas-Loans-and-Grants-Greenbook-/5gah-bvex
- http://www.usa.gov/
- http://www.usaspending.gov/data
- http://gsociology.icaap.org/data.htm
Other sources
For reference – Stackoverflow –
- Free Large datasets to experiment with Hadoop
- Datasets for Running Statistical Analysis on
- Freely available example datasets of hierarchical information, and realistic names
Amazon Web Services offers some public data sets as well. Though you will need an Amazon EC2 account.
Tim Berners-Lee on the next web
Here’s an interesting talk about data and the next web from the father of the internet, Tim Berners-Lee:
Do you have other sources of data? Drop me a note below. Thanks!
Mehul, thanks for the great resources. I also like fakenamegenerator.com, especially for generating identity data. You can generate up to 50,000 records in csv and all popular database formats, and it’s free!
Mike,
That looks like a great resource. Bookmarked, thanks!