Scalable Shared Data Base Part 2:
Setting up the Scalable Shared Data Base
Start setting up Scalable Shared Data Base (SSDB) by answering a question; Is my database compatible with Scalable Shared Data Base? The computer science problem that a Scalable shared data base solves is to distribute data reporting systems. If you have an Online Transaction Processing system (OLTP) that is the backend for a web sales site that just can't keep up with the number of orders you are trying to process, scaling out your DB across reporting nodes is not going to help you. If, on the other hand, you are having problems processing orders and generating business reporting; then gentle reader you have hit the mother lode.
Note: Scalable shared data base supports both SQL Server (2005, 2008, and 2008R2) and Analysis Services. The major difference between the platforms is that SQL Server has native support for Microsoft VSS and VDI. VSS (Volume Shadow Copy Service) and VDI (Virtual Backup Device Interface) are Microsoft technologies that coordinate the process of making consistent point in time data copies (more on this in Part 3). Analysis Services relies on NTFS implementations of VSS. This makes the data copy process more challenging on Analysis Services than SQL server.
From msdn; requirements for Scalable Shared Data Base:
Scalable shared databases have the following limitations:
- The database must be on a read-only volume.
- The data files are accessible over a SAN.
- The databases are supported by Windows Storage running only on Windows Server 2003 SP1 or later.
- We recommend that you limit your scalable shared database configurations to eight server instances per shared database.
- Scalable shared databases do not support database snapshots.
Ensuring a Correct Environment for a Scalable Shared Database: http://msdn.microsoft.com/en-us/library/ms345523.aspx
My good friend over at Microsoft SQL CAT, Denny Lee, collaborated on a white paper that is required reading for anyone contemplating setting up a Scalable Shared Data Base:
Planning the Shared Reporting install will require the following:
- Network Load balancer: This can be hardware or software (NLB). More on this in Post 4
One Factory Server: This server will handle write operations. Scalable Shared Data base offers an excellent scale out solution for reporting. The Factory server is the exception. It remains a scale up solution. Make sure the Factory has enough memory and processing power to complete data processing with time to spare. It is a wise idea to only occupy less than half of allotted processing time with actual processing, lest errors put the system in a state where it can never catch up. Have a plan that enables re-processing of data, lest errors be detected. Not being able to catch up after an error is a very bad thing!
- One other word about servers. Processing data is all about moving bits. The speed and bandwidth of memory and system busses can be much more important that the number of processors or the processor speeds. If your Factory can't move data as fast as the processors can consume it, then you have a bunch of expensive processors spinning while waiting for data. Do your homework!
- At least two identical reporting nodes: You could use one node, but that would defeat the purpose of having a scale out solution.
- We are going to cover data consistency in part 3. Windows caches data about file systems in memory. In order to tell windows that underlying data has changed on a read only volume (think about this – a read only volume does not change) the volume needs to be taken offline and brought back online again.
- A reporting Data Base, or Data Cube, that lends itself to point in time processing. The process of taking a volume offline, updating data, and bringing the volume online again does not happen instantly. If the system needs to be constantly processing data and reporting in real time, or near-real time, Scalable Shared Data Base may not be optimal.
- Analysis Services makes a really great candidate for Read only Scale out. The Analysis Services Factory can spend its time loading data and processing aggregations. The reporting servers will quickly answer aggregated questions, while being able to process queries that are not covered by the aggregation.
- SAN storage: Here is my shameless plug. I work for EMC Symmetrix Engineering. As you will see later in post 5, the VMAX array really shines at random I/O (shared db's are all about random I/O).
- Data Replication: My second shameless plug. Time Finder Clones for VMAX are ideal for replicating lots of data in minimal time with little or no server impact. See Post 3.
Hardware P.O. firmly in hand, we will now vault on to the minutia of moving data; On to Part 3!