100 million rows in Excel? PowerPivot.. a first look from the SharePoint Conference 2009
"Project Gemini" has been batted around for a while now but it was unveiled at the conference that it is now known as SQL PowerPivot for Excel 2010 and SQL PowerPivot for SharePoint 2010.
What does it do?
In short, PowerPivot allows you to pull data into an Excel workbook from almost any data source. This can be SQL databases, Analysis Services Cubes, or any ODBC data source.
This is all handled via the import wizard, which contains a nice interface to setup which tables and filters you want to apply (the wizard then generates the necessary query).
You then have access to a whole raft of Excel Formulas (and a bunch of new aggregation and time intelligence formulas) that you can use to add new columns to the data. You can even bring in your own Excel worksheets as tables of data that can be linked up to the other data sources (say to provide foreign key tables where the lookups are stored and managed in Excel!)
Ok … So what’s so special about this?
Well, the main thing that is impressive is that they demonstrated an example system running with over 100,000,000 rows of data! Now remember that this is running from Microsoft Excel!
You could then add your own extension columns (using simple Excel style formulas) and the whole data set refreshes in seconds.
So the performance is good huh?
The performance is quite simply jaw-dropping.
One of the demo sessions the presenter imported over 3.5 million rows of data from a SQL Analysis Services cube and it imported in just under 2 minutes.
He then created a pivot table of the total sales data, split into rows by country.
He then added "slices" so that you can flick between sales figures for different years or product categories.
With all of these calculations the pivot table was refreshing it’s data in under 2 seconds!
Not even SQL Reporting Services can execute that fast, and this is in EXCEL so the user has full control over the pivots and can filter / query / change the results as much as they like.
How does it actually work then?
The main thing that PowerPivot does is that the database columns are separated out and compressed individually. Foreign key values can then be separately indexed and this makes the compression levels fantastic.
Take an example of a foreign currency field for Europe. Regardless of how many rows of data you have that column is only ever going to contain a small number of different values (£, €, etc). You could have one thousand rows or one billion rows and it would still have the same variation in the values. This makes it extremely compressible so you can get extremely large data sets down to a very small footprint.
When you then query the data set it loads those columns into memory for execution, so you end up with a column based querying model running directly from memory (which is the reason it is so incredibly extremely fast).
Now before you start wondering if this will only work on beefy 64-bit workstations with RAM in double figures I have been assured by the presenter that this works fine on a 2GB netbook! Although he was running the demo on a quad core laptop (presumably with about 8GB of RAM).
What about SharePoint 2010 then?
Well, SharePoint 2010 has support for Excel Services, and with SQL PowerPivot for SharePoint 2010 you can publish Excel Workbooks containing PowerPivot data sets directly to SharePoint!
This allows you the flexibility to share and present your workbooks with colleagues and other users of the SharePoint platform directly from the browser!
Even better than this, if you save an Excel Workbook containing PowerPivot data to a document library, then you can import that into another PowerPivot workbook!
This means that your PowerPivot workbook has actually become a data source in it’s own right, paving the way for true BI applications being built with this technology!