Tag Archives: Infrastructure

100 million rows in Excel? PowerPivot.. a first look from the SharePoint Conference 2009

 

"Project Gemini" has been batted around for a while now but it was unveiled at the conference that it is now known as SQL PowerPivot for Excel 2010 and SQL PowerPivot for SharePoint 2010.

 

What does it do?

In short, PowerPivot allows you to pull data into an Excel workbook from almost any data source. This can be SQL databases, Analysis Services Cubes, or any ODBC data source.

 

This is all handled via the import wizard, which contains a nice interface to setup which tables and filters you want to apply (the wizard then generates the necessary query).

 

You then have access to a whole raft of Excel Formulas (and a bunch of new aggregation and time intelligence formulas) that you can use to add new columns to the data. You can even bring in your own Excel worksheets as tables of data that can be linked up to the other data sources (say to provide foreign key tables where the lookups are stored and managed in Excel!)

 

Ok … So what’s so special about this?

Well, the main thing that is impressive is that they demonstrated an example system running with over 100,000,000 rows of data! Now remember that this is running from Microsoft Excel!

 

You could then add your own extension columns (using simple Excel style formulas) and the whole  data set refreshes in seconds.

 

So the performance is good huh?

The performance is quite simply jaw-dropping.

 

One of the demo sessions the presenter imported over 3.5 million rows of data from a SQL Analysis Services cube and it imported in just under 2 minutes.

 

He then created a pivot table of the total sales data, split into rows by country.

He then added "slices" so that you can flick between sales figures for different years or product categories.

 

With all of these calculations the pivot table was refreshing it’s data in under 2 seconds!

 

Not even SQL Reporting Services can execute that fast, and this is in EXCEL so the user has full control over the pivots and can filter / query / change the results as much as they like.

 

How does it actually work then?

The main thing that PowerPivot does is that the database columns are separated out and compressed individually. Foreign key values can then be separately indexed and this makes the compression levels fantastic.

 

Take an example of a foreign currency field for Europe. Regardless of how many rows of data you have that column is only ever going to contain a small number of different values (£, €, etc). You could have one thousand rows or one billion rows and it would still have the same variation in the values. This makes it extremely compressible so you can get extremely large data sets down to a very small footprint.

 

When you then query the data set it loads those columns into memory for execution, so you end up with a column based querying model running directly from memory (which is the reason it is so incredibly extremely fast).

 

Now before you start wondering if this will only work on beefy 64-bit workstations with RAM in double figures I have been assured by the presenter that this works fine on a 2GB netbook! Although he was running the demo on a quad core laptop (presumably with about 8GB of RAM).

 

What about SharePoint 2010 then?

Well, SharePoint 2010 has support for Excel Services, and with SQL PowerPivot for SharePoint 2010 you can publish Excel Workbooks containing PowerPivot data sets directly to SharePoint!

 

This allows you the flexibility to share and present your workbooks with colleagues and other users of the SharePoint platform directly from the browser!

 

Even better than this, if you save an Excel Workbook containing PowerPivot data to a document library, then you can import that into another PowerPivot workbook!

This means that your PowerPivot workbook has actually become a data source in it’s own right, paving the way for true BI applications being built with this technology!

Securing SharePoint 2010 Web Servers

 

This was one of the best topics I’ve seen so far at the conference. The amount of concrete information was impressive (and to be honest a bit too much to post here) but there was some great information on how to harden your Web Servers.

 

SharePoint 2010 Security Features

There are a whole load of new features and changes to the SharePoint 2010 product for security.

 

  • ASPX Pages are gone for contributors. You can no longer upload ASPX pages into document libraries unless you have "Designer" permissions! The main reason this becomes possible is because the new Wiki Pages are so much more extensible than they were.
  • Anonymous Users Lockdown feature  now works for Web Services and WSS (SharePoint Foundation 2010)!
  • PowerShell Access – you can now delegate remote scripting rights through PowerShell, so you no longer need the Setup account to perform PowerShell commands. This can be delegated to farm administrators!
  • XSS (Cross Site Scripting) protection is now in place through the headers (although you can turn it off). This can be even be locked down to individual web part properties (through development)!
  • Application Page settings can now be controlled more granularly, so that you can set the master pages used and even swap out individual pages (such as the Error Page). This makes lock downs and branding of these far easier, without breaking the supported state of your environment, and without extensive development!

 

There was then whole load of recommendations for hardening your environments. It’s a bit of a list so apologies for that, but a lot of information to get through:

 

Hardening your Web Application

  • Place your web application directories on a non-system volume. If you have any issues with logging or file access then the I/O operations (or even disk space requirements) could damage the Operating  System!
  • Change the IIS header. By default this will include the SharePoint version number (which means any attacker knows which service packs and critical patches you have installed!). Removing this reduces your public footprint

 

Hardening your Web Servers

Windows Server 2008 takes care of most of the previous recommendations for hardening automatically, but there are still some things that you should do:

 

  • Restrict remote administration of the Registry (no-brainer, but a lot of people forget to do this)
  • Rename Administrator account
  • Delete / Disable unused accounts (again, make sure your dev and test accounts don’t hang around on the web front ends)
  • Use the IUSR instead of IUSR_<serverName>

The IUSR account is a "built in" account so therefore it doesn’t have a password and no-one can login using that account. This makes it much more secure than the Server specific IUSR account that gets created!

 

Hardening SQL 

There’s a whole load about this on the internet. The only one to mention here is change the port number! There are a lot of viruses and malware that will specifically target this port.

 

Hardening your Network

Again, none of this is SharePoint specific, but goes a long way to making sure that your network in general is secure (which is of course best practice for SharePoint systems).

 

Routers:

  • Block unused protocols and ports (see ports required, below)
  • Screen Traffic (e.g. ICMP)
  • Intrusion Detection should be in place

 

Firewall

  • Use packet filtering policies
  • Log your permitted / denied traffic, and make sure those logs are checked (using alerts)
  • Make sure perimeter networks are firewall secured, effectively providing end to end firewall security.

 

Switches

  • Disable any unused services in the switch
  • Do not overly trust VLANS. Just because your traffic is isolated to a VLAN doesn’t mean you shouldn’t still block off the relevant ports and protocols.

 

Ports Required for Web Servers

Note – When SharePoint is installed the communication ports are automatically opened on the Windows Firewall!

 

External:

  • Http 80 / TCP
  • HTTPS 443 / TCP
  • SMTP 25 / TCP

 

Internal*:

  • HTTP 32843/TCP
  • HTTPS 32844 / TCP
  • TCP 32845 / TCP
  • SMB 445 /TCP|UDP

 

* note that "internal" means Web Application –> Service consumtion over WCF. It does not include SQL or inter "server" communications.

Topology Changes for SharePoint 2010 Logical Architecture

The SharePoint 2010 topology has been massively updated, allowing for greater flexibility and scalability than ever before.

 

The "Shared Service Provider" is dead, it doesn’t exist in SharePoint 2010 and instead is replaced with new "Shared Service Applications". This allows core services to have their own security settings, run in their own applications and on their own databases.

 

There is even support for "cross farm" Service Applications (such as Search, User Profiles and the Managed Metadata Service) to allow distributed farm architecture like never before. Now in SharePoint 2010 you can scale up into multiple farm environments, allowing you to take advantage of more geo-distribution flexibility, and greater performance and availability from having dedicated farm hardware for important applications.

 

For the larger enterprise environments you have the benefit that different farms provide the opportunity to service different SLA requirements, and the Many – Many relationship for Web Applications to Shared Service Applications means that core enterprise level services can be shared globally, but smaller core specific services can be hosted multiple times, closer to the client environments, to service  those farms that need them.

 

If you need greater security boundaries and better utilisation of resources you can spin up department specific farms for business critical organisational boundaries (such as HR and Finance) each with their own independent services or shared services (such as an HR specific BCS, or Finance and HR sharing their own  set of Managed Metadata for payroll and accounting data, a service that is not provided to the more generalised collaboration and publishing environments).

 

All of this comes together with other administrative changes (such as the SQL failover awareness and Managed Accounts) to make SharePoint 2010 a truly industry leading platform for web applications and technology. I cannot think of any other product on the market that offers this level of flexibility across so many different technology streams.

Cross Site Scripting (XSS) protection for SharePoint 2010 Web Parts

 

Some of the new features in SharePoint 2010 offer some great new opportunities for malicious scripts to be manipulated in your system. The new SharePoint 2010 Client Object Model is a great case in point.

 

Let’s take the example where a contributor adds some Client Object Model scripts through exposed web Part properties to change list data that they don’t have access to. As soon as someone with admin privileges visits the page that Client OM kicks off and you’ve got yourself malicious script executing!

 

Well, step in the new XSS protection. The WebPartPages class now includes a new attribute that you can add to your Web Part Properties called "RequiresDesignerPermissionAttribute". There is also a new SafeControl attribute called "SafeAgainstScript".

 

These allow you to protect your assemblies and properties against contributors. The main problem is that none of your MOSS 2007 web part properties will be accessible to contributors without these added!

 

This obviously creates quite an overhead in terms of code use, but it really is required to make sure that your web parts are running in an appropriately secure state.

Improving SharePoint 2010 Administration

 

This is a big area for SharePoint 2010. Far too often in the MOSS 2007 interface was administration settings a little bit neglected (and lets face it, the public facing sections of our systems always get more attention) but in SharePoint 2010 there are a number of massive improvements.

 

Logging & Alerts

One of the key areas for reporting is centralising the reporting and alerting interface. In this the main logging and event data sources (ULS logs, Windows Events, Performance counters for SQL , .Net Framework and  hardware resources) are going to get pulled together into a single SQL Database. The best bit is that this database will have a published open schema!

 

This allows the database to be queried and reported upon, and is expected to include full SCOM integration!

 

Managed Accounts

This is a HUGE feature for managing service accounts in SharePoint 2010. You can create specified accounts that can be used by farm administrators when setting up SharePoint services (such as Search and Timers) as well as creating new applications (such as Web Applications and the new Managed Service Applications which replace the SSP).

 

The upshot of this is that you don’t have to hand out domain accounts just so that someone can provision a new web application.

 

But that’s not it. Managed accounts can reset the password (presumably into some rediculously long strong password) and manage that password through SharePoint 2010. If you have an AD security policy for password expiry, then Managed Accounts can automatically reset the password for you, so you never have to worry about password expiry hosing some key services in your SharePoint 2010 farm!

 

Health and Monitoring

Central Admin is set to gain a whole raft of performance monitoring reports.

 

On such example is the "slowest pages" report, literally showing you the average times for the slowest pages to render.

 

You can then use the new features of the "Developer Dashboard" to show you key performance information about that page

  • Which webserver was used
  • What SQL queries were run
  • What web parts loaded (and how long they took)
  • The call-stack of code method calls
  • SPRequest allocations

 

This allows detailed analysis and rapid debugging of problems that otherwise were seen by many as a black art!

 

Failover

There is now an option for a Failover Database Server when creating Web Applications and Managed Service Applications. This effectively allows SharePoint 2010 will automatically be aware of database mirroring. If the primary SQL server dies, SharePoint will automatically re-connect to the failover database.

 

Of course, you still have to setup mirroring manually, but this is a huge boon to creating high availability systems on SharePoint 2010.

 

Restoring Data

You can now recover a list from an unattached content database. From SharePoint 2010 Central Administration you can connect directly to any content database (regardless of whether it is attached or not), and you can browse the content structure from Central Admin.

 

You can navigate the structure and export any Site or List and it will download the package straight to your computer. This can also pull in versioning and security settings!

 

There is tonnes more content to cover, and loads more sessions .. But hopefully this gives you an indication of some of the improvements that are being made in SharePoint 2010!

Why I love the performance in SharePoint 2010

 

There really is no doubt about it, SharePoint 2010 is QUICK… We are talking several factors faster than SharePoint 2007. Now this is mainly from looking in the product demos and session examples, but if they are a true reflection of a typical SharePoint implementation then we’re really in for a treat.

 

I watched a demonstration in the opening "Overview of SharePoint 2010 for IT Professionals" an example of a  document library with 1,000,000+ documents which was filtering, sorting and refreshing in under 3 seconds.

 

Having filtered the view down to just 4 records it was then demonstrated that images could be opened from that library in under 1 second! And this is native, out of the box … and with over 1 million documents in a single library … running on pre-beta code!

 

This is nothing short of astonishing, and with some of the other features such as automatic column indexing, filtering setup and hierarchical navigation we are looking at an amazing set of features to make even the largest of SharePoint 2010 deployments absolutely fly!

 

Every time I look at SharePoint 2010 I get more excited about the feature sets, and it seems that Microsoft have followed the same vein with SharePoint 2010 that they did with Windows 7, in that performance and usability are two of the biggest "killer apps".

Performance Optimisation in SharePoint

Performance has always been a bit of a black art in SharePoint, with so many core files being dropped onto landing pages it’s not uncommon to have a 500kb payload taking over 10 seconds to load.. of course for most corporate websites this is simply not good enough.
 
When building the RNIB website we spent a lot of time getting the page payload lower, and the caching model tweaked so that the pages were loading more responsively. We were mostly successful but there were plenty of challenges.
 
First off was the ScripResource files (used by Ajax). They were only really used for editing (the rich text editor we used from Telerik). Putting those references into a SharePoint "EditPanel" seemed to sort that one out (so the files were only ever being loaded if the page was in Edit Mode  .. which was never the case for anonymous users.
 
The caching profiles were also a big boon, enabling BLOB caching for the script files and static images, and also tuning the SharePoint site collection cache profiles so that they worked best for anonymous users.
 
We also used a bunch of other 3rd party tools, including the awesome Fiddler2 and YSlow for FireFox.
 
There has also been a very good article recently from the Microsoft SharePoint Team Blog describing how they went about optimising the sharepoint.microsoft.com website. Its a good read and brings up some lesser known optimisation techniques.
 
If you have any tips on performance optimisation I’d love to hear it.

The call to SearchServiceInstance.Provision failed … resolved!

This was a very annoying one. I had previously configured my development virtual machine to disable Search, Indexing, Document Conversion .. all those pesky services that sap RAM and CPU which aren’t always needed for development work.

The problem came when I wanted to turn them back on again. I was creating a new SSP and got the rather spurious error "no indexers".
This was a little odd, but I quickly realised that it meant the Index service was disabled. (annoying, because I only wanted an SSP for user profile development, but c’est la vie).

So I went to start the Windows SharePoint Services Search and *wham*… "Error"
I had a bit of a poke around the log files and found a reference to:

The call to SearchServiceInstance.Provision ("<name of server>") failed.

Well .. it took me a while to work this out but I did finally crack it. You see, I was doing this on a local virtual machine, so I rarely if ever use the full domain name. I found that I could only start the Search Service using the fully qualified "Domain\UserName" designation.
If you use just the "Username" then it didn’t work and you got the odd error above!

Very strange, another one for the archives I guess.

How to setup Internet Connection Sharing and a VPN connection at the same time!

This all came about because I am a guinea pig for the new content and code development environment.
 
The good news is that I have a spiffy laptop with Dual Core, 4GB DDR2 and (this is the good bit) a 320GB 7200rpm hard drive (not bad for a laptop!).
 
This all allows me to kick it into gear with Server 2008 x64 and Hyper-V.
 
Now for the bad news … Hyper-V doesn’t support "Shared Networking (NAT)" even though the technically inferior Virtual PC 2007 DOES!
 
It’s not too much of a heartache though, you simply have to be creative and setup yourself Internet Connection Sharing. This effectively allows your virtual machine to stay completely hidden from the big wide world, safely behind your an internally facing virtual network and your laptop’s firewall.
 
This is great for development environments, as we can have stand-alone virtual machines which can be copied, rolled-back and swapped around in minutes without any of that messy sys-prep nonsense.
 
However .. things went awry when I switched on our VPN … I was at home and didn’t have the creature comforts of a dedicated piece of copper between myself and the server.
 
I turned on my VPN and it did exactly what it was supposed to .. it created a secure tunnel and promptly shut-down all my other network adapters … including my virtual network…
 
so … a few hours of troublshooting and chomping of the table edge and I nailed it.
 
You need to:
  1. Connect to the VPN
  2. Share the VPN connection

This effectively gives your virtual machine it’s own private access to the corporate network .. great for those TFS check-ins and server builds 🙂

*sigh* another one for midnight oil.

The many reasons for “Data Is Invalid” in MOSS 2007

Well .. this is something I have been grappling with for about 2 weeks … one of our test farms started spewing out this error and for the life of me I couldn’t work out why.

We were getting it in the following scenarios:

  1. Creating a new Web Application
  2. Installing SharePoint, and running the Products & Technologies Wizard
  3. Adding a new web front end server to an existing Farm

(Note – 2 – the error was when provisioning the Central Admin web application, so similar to point 1)

Now, "Data Invalid" surely points to SQL Server, and to a certain extent that was correct, but not as you might have thought!

It seems that we were getting this error while logging in using an account which does not have both local admin rights and appropriate SQL server permissions!

You would have thought that having Farm Administrator permissions would be enough to create a new web application or add a new web front end, but it appears that the SQL Server permissions break down and it doesn’t work!

« Older Entries Recent Entries »