Directing Traffic
Scalable shared data base enables the merging of a scale up computer science problem (the loading and reporting of data) with a scale out solution. The scale out solution is made possible because the data is synchronized across nodes as a read only copy. This same scale out approach has been used for years in large web sites.
Each web server in a farm contains the same set of read only data. A technology called load balancing allows the web site to display a single name, say EMC.COM, and direct user connections to one of many web servers hosting the web site. Advanced load balancers can direct new users based on the number of existing connections maintained by each web server. Web servers with low numbers of users receive incoming users until each server in the site is balanced. To ensure a great user experience, the load balancer will return existing users new requests to the same server they have already been using.
Load balancing technology used for web sites is based on network connections. A web browser opens a connection to the load balancing device or service. The load balancer forwards this request to a web server based on appropriate parameters. Simple load balancers rely on a round robin approach:
- Request 1 is directed to server A
- Request 2 is directed to server B
- Request 3 is directed to server A
- Request 4 is directed to server B
These network load balancers can be either a software service, such as a DNS server that sends requests to different addresses, or a hardware appliance. Both software services and hardware appliances can offer advanced features.
- Least Busy – This technology tracks parameters on the server and directs incoming connections based on usage. The most common parameter is the number of network connections already open on a given server.
- Drain stop – Drain stop allows an administrator to gracefully remove servers from the site. When the drain stop is initiated, new connections are not sent to the draining server. Any existing connections are allowed to be closed by the user. As soon as all connections are closed the server can then be maintained without affecting the overall web site.
Wikipedia has an interesting article detailing network load balancers that covers the topic in great detail: http://en.wikipedia.org/wiki/Load_balancing_%28computing%29
The reader should note that I have described load balancing and Web sites, but the article is about integrating SQL server and SQL Server Analysis Services with load balancing. Most hardware and software load balancers are designed to run large web sites. These systems work extremely well! Unfortunately they are not designed with long running queries in mind.
As a security measure, many load balancers limit the time that a given request can process on a given host. An average SQL or Analysis Services query can exceed these timeout values. For this reason I have often used load balancer service from Microsoft called Network Load Balancer (NLB). NLB is easily adaptable to SQL server and Analysis Services. This is not to say that another load balancer won't work, however, configuration specifics for any load balancer other than NLB are beyond the scope of this post. Check with the manufacture of your load balancer for recommended SQL configuration settings.
In the white paper; http://www.emc.com/collateral/hardware/white-papers/h8129-performance-scaling-sql-wp.pdf I described an Analysis services system that uses NLB to load balance three servers that share a single read only reporting volume.
Once these first servers are load balanced then we can really start to scale the solution. Because we are using the storage to replicate changes in the data we can create more than one shared volume. If we attach half of the servers to one shared read only volume (update pair 1) and the second set to a second read only volume (update pair 2) we can update half of the cluster at a time. This will allow the creation of a solution with absolute minimum downtime for data updates.
The solution implementation would look like:
- Perform a data refresh on the factory node.
-
Remove update pair 1 from service
- All clients will now connect only to update pair 2
- Detach the reporting volume from update pair 1 and sync the changes.
- Attach reporting volume 1 to update pair 1 and put update pair 1 into service.
-
Now take update pair 2 out of service
- All clients now connect only to update pair 1 – hitting the latest version of data.
- Detach the reporting volume from update pair 2 and sync the changes.
- Attach reporting volume 2 to update pair 2 and put update pair 2 into service.
- Now all the data has been refreshed. As clients connect – they are directed to constantly refreshed data with minimal or no downtime.
In addition to network based load balancers, a developer can create what I will call an Application Metadata Redirector (AMR). Application based load balancers can take into account the application state of a given server. Why is this important? If a single user creates a large workload on a given server, this single user is only using a single network connection. If a load balancer only accounts for the number of network connections, then a load balancer can inadvertently direct traffic to a server that is already extremely busy!
An AMR can take into account the load that is being placed on the processors and memory of a server, re-directing clients accordingly. Remember that one of the great features of Analysis Services is the cache. Make sure returning clients get re-directed back to servers they have already executed queries against (unless you have executed a data refresh – then it does not matter as the cache will be reset).
Unfortunately I am not aware of a commercially available product that will perform redirection for either SQL server or Analysis Services. If anyone knows of such a product please post in the comments.
If you decide to take on the task of creating a customer Application metadata redirector then the following blog post will be a great starting point (thanks Denny!): http://bloggingabout.net/blogs/mglaser/archive/2008/08/15/configuring-http-access-to-sql-server-2008-analysis-services-on-microsoft-windows-server-2008.aspx
It is far easier to build a redirector using HTTP as the access method, as opposed to direct connections from EXCEL.
For more information on setting up NLB:
Windows Server 2008 NLB Deployment guide: http://technet.microsoft.com/en-us/library/cc754833(WS.10).aspx
Recent Comments