100 million rows in Excel? PowerPivot.. a first look from the SharePoint Conference 2009

 

"Project Gemini" has been batted around for a while now but it was unveiled at the conference that it is now known as SQL PowerPivot for Excel 2010 and SQL PowerPivot for SharePoint 2010.

 

What does it do?

In short, PowerPivot allows you to pull data into an Excel workbook from almost any data source. This can be SQL databases, Analysis Services Cubes, or any ODBC data source.

 

This is all handled via the import wizard, which contains a nice interface to setup which tables and filters you want to apply (the wizard then generates the necessary query).

 

You then have access to a whole raft of Excel Formulas (and a bunch of new aggregation and time intelligence formulas) that you can use to add new columns to the data. You can even bring in your own Excel worksheets as tables of data that can be linked up to the other data sources (say to provide foreign key tables where the lookups are stored and managed in Excel!)

 

Ok … So what’s so special about this?

Well, the main thing that is impressive is that they demonstrated an example system running with over 100,000,000 rows of data! Now remember that this is running from Microsoft Excel!

 

You could then add your own extension columns (using simple Excel style formulas) and the whole  data set refreshes in seconds.

 

So the performance is good huh?

The performance is quite simply jaw-dropping.

 

One of the demo sessions the presenter imported over 3.5 million rows of data from a SQL Analysis Services cube and it imported in just under 2 minutes.

 

He then created a pivot table of the total sales data, split into rows by country.

He then added "slices" so that you can flick between sales figures for different years or product categories.

 

With all of these calculations the pivot table was refreshing it’s data in under 2 seconds!

 

Not even SQL Reporting Services can execute that fast, and this is in EXCEL so the user has full control over the pivots and can filter / query / change the results as much as they like.

 

How does it actually work then?

The main thing that PowerPivot does is that the database columns are separated out and compressed individually. Foreign key values can then be separately indexed and this makes the compression levels fantastic.

 

Take an example of a foreign currency field for Europe. Regardless of how many rows of data you have that column is only ever going to contain a small number of different values (£, €, etc). You could have one thousand rows or one billion rows and it would still have the same variation in the values. This makes it extremely compressible so you can get extremely large data sets down to a very small footprint.

 

When you then query the data set it loads those columns into memory for execution, so you end up with a column based querying model running directly from memory (which is the reason it is so incredibly extremely fast).

 

Now before you start wondering if this will only work on beefy 64-bit workstations with RAM in double figures I have been assured by the presenter that this works fine on a 2GB netbook! Although he was running the demo on a quad core laptop (presumably with about 8GB of RAM).

 

What about SharePoint 2010 then?

Well, SharePoint 2010 has support for Excel Services, and with SQL PowerPivot for SharePoint 2010 you can publish Excel Workbooks containing PowerPivot data sets directly to SharePoint!

 

This allows you the flexibility to share and present your workbooks with colleagues and other users of the SharePoint platform directly from the browser!

 

Even better than this, if you save an Excel Workbook containing PowerPivot data to a document library, then you can import that into another PowerPivot workbook!

This means that your PowerPivot workbook has actually become a data source in it’s own right, paving the way for true BI applications being built with this technology!

  • Rob

    No, 32-bit Excel is fine – there is a 32-bit version of the PowerPivot client bits. Just have to make sure you install the 32-bit addin if you're using 32-bit Excel. 64-bit does allow you to work with more data, however, given the extra RAM capacity.

    More powerpivot info can be found at my site – http://powerpivotpro.com

    (I'm a member of the PowerPivot engineering team)

    -rob

  • Martin Hatch

    Really? So you need 64-bit Excel 2010 then?

    Obviously the server would be 64-bit, but you wouldn't be running that on a netbook! ;)

  • Arvindra Sehmi

    64-Bit on client and server is required. So, your netbook would have to be 64-Bit.
    - Arvindra Sehmi