Thanks to everyone that attended the presentation.
Protect Microsoft SQL Server
The Microsoft SQL Server and SQL Server Customer Advisory Teams are hard at work on SQL Server codename Denali.
Stay tuned to http://www.microsoft.com/sqlserver/en/us/product-info/future-editions.aspx and http://sqlcat.com/Default.aspx for more information.
Scale Microsoft SQL Server
This blog has more information about Scalable Shared Database in a five part series:
http://sqlvelocity.typepad.com/blog/2010/09/scalable-shared-data-base-part-1.html
SQL Server Tech Book
The definitive guide to running Microsoft SQL Server on EMC Symmetrix arrays is located at:
http://www.emc.com/collateral/software/solution-overview/h2203-ms-sql-svr-symm-ldv.pdf
Slides
In case you missed the presentation, or simply can't get enough; here is the presentation along with my speakers notes. Some of the slides were animated – the notes reflect the animation steps (turns out power point does not create animated slides).
Slide 1
Slide 2
Slide 3
Slide 4
RPO – how much data are you going to lose?
Example: SQL servers backed up by tape every 24 hours; the Recovery Point Objective was 24 hours.
Recovery Time Objective; as soon as a spare server could be beg borrowed or stolen.
If the building fell over then the tapes were moved offsite once a week. RPO = 1 week.
RTO was the time it took to acquire a new building, order gear, etc…
Slide 5
SRDF provides trusted failover; it is used by banks and brokerages worldwide.
SRDF allows EMC Symmetrix arrays to replicate data between storage arrays.
SRDF has many operational modes:
- SRDF/Synchronous - used locally
- SRDF/A – Some chance of Data Loss – can replicate worldwide
- SRDF/Star – Advanced Multi Site
SRDF/S – Recovery Point Objective of zero or no data loss. SRDF/S is used within a metropolitan area network or single data center.
Companies often run SRDF/S and SRDF/A – this hybrid solution mitigates risk.
Banks will host data and fail over across the Hudson River. They will have a bunker site at a "stable" region of the country or world.
Let's explore how SRDF/S works:
- We have our source server and our backup server.
- The source server will write data to cache.
- The data will be mirrored by SRDF/S to the target Symmetrix
- The arrays back the cache by battery, in the event of failure the cache data will be preserved.
- The remote array will acknowledge the write and respond to the source array with a success code
- The source array will respond to the server with a success code.
- Both arrays will then commit data to disk.
- As you can see – this configuration is extremely sensitive to distance. The further apart these arrays are located, the longer write operations will take.
Slide 6
SRDF/A, or Asynchronous replication – allows replication across the world.
SRDF/A Does not impair the source server's write latency.
This lack of performance impact comes at a price; It cannot offer a Recovery Point Objective of 0.
The amount of unrecoverable data is affected by:
- The rate of data change
- Distance between sites
- Network quality and bandwidth
When SRDF/A absorbs a failure, any data that has been un-replicated is lost.
In the case of Microsoft SQL server, the remote database will simply enter recovery mode to recover the data base and enter normal operation.
In the event of a failure SRDF/synchronous and SRDF Asynchronous allow businesses to limit or eliminate data loss.
Slide 7
SRDF/Cluster Enabler is the technology that ties Windows, SQL, and EMC SRDF together to reduce Recovery Time.
In partner engineering I get to work closely with the Microsoft SQL Customer Advisory Team – the world's foremost SQL experts! The CAT team maintains a Customer lab where customer's prototype custom SQL Server based solutions. These solutions often push the boundaries of Computer Science.
The environment depicted mirrors a system deployed by EMC Partner engineering for the Microsoft SQL CAT team.
VMAX arrays in the Customer Advisory Team lab Linked through Gig E networks and a latency generator to an array in the SQL Performance lab.
The VMAX arrays replicated Data and Log volumes.
To setup up this environment we:
- Configured the storage
- Set up SRDF
- Installed Microsoft Failover clustering
- Installed Microsoft SQL Server
- Finally we installed SRDF/CE.
- SRDF/ Cluster Enabler is a Cluster Resource Dll that becomes a dependency of the Physical Disk resources.
- Failover Clustering Checks with SRDF/CE when failing over. SRDF/CE in turn checks the state of replication on the arrays.
- SRDF/CE will manage the SRDF replication operations to ensure a smooth failover and failback
Today SQL Server, Including SQL Server 2008R2, does not support Multi Subnet failover. Windows Server 2008 came out with the failover clustering feature of multi subnet failover. All servers must be on the same subnet.
Announced last November, at the SQL PASS conference in Seattle, SQL Server Codename Denali will change all of that. It adds support for SQL running on multiple subnets. This will greatly simplify the deployment of GEO-clusters.
Slide 8
This slide represents a local server failure event.
Slide 9
This slide investigates a remote array failover:
- Site failure.
- SRDF/CE detects that the storage that Failover Clustering wants to access is actually the remote side of the SRDF link. SRDF/CE will stop the replication and set the storage to read/write enabled.
- Failover Clustering mounts the physical disk resources.
- If the original source site is online SRDF/CE will reverse the replication.
Caveat – In the event of a site or communication failure SRDF/CE will not automatically fail over. User interaction is required to bring up the cluster in the event of a site failure. SRDF/CE has a automatic failover mode; it is a feature that must be enabled by the user.
Slide 10
SRDF/ Cluster Enabler makes Windows failover clustering extremely Smart
Slide 11
Most businesses I work with are highly compartmentalized
- Teams interact with tickets.
- The storage team handles storage
- Network team handles networking
- Server team handles servers
- Windows team handles Server OS and Failover Clustering
- Database Team handles DB's
-
Application team owns the overall system, but depends on all other teams for support.
- In the Event of a failover all teams must engage. Automation is usually broken out by support team.
- If the customer deploys SRDF then the RPO is a few minutes.
-
RTO can be extremely long
Slide 12
Slide 13
Traditionally SQL has been a scale up solution.
Only Data Warehousing systems have been built to scale out.
Scale out means a distributed system – think of Bing and Google.
OLTP workloads, where data calculations must be performed holistically, are a scale up computer science problem.
The following Scalable Shared Database example covers SQL Server Analysis Services, Microsoft SQL Server can also be used in a Scalable Shared database solution.
Slide 14
Microsoft has created a technology Scalable shared Database that ties scale up processing operations with scale out reporting.
Scalable shared database allows SQL Servers to present a read only point in time data copy to multiple reporting nodes.
Scalable shared database is supported by the listed platforms.
Slide 15
EMC TimeFinder enables Scalable shared database to be used in massive data processing operations.
Array-based software
TimeFinder has been used for years to provide easy to use and restore backups.
We are extending its use as a highly efficient data movement solution.
The storage array does all the work; no extra load is placed on the host or the network.
Slide 16
Windows and SQL Server TimeFinder Integration
SQL Server first writes data to the SQL log file. This data is then written into the database. Remember that the database resides in the server memory and on disk. SQL server will first write data to server memory, then it "Flushes" dirty, or unwritten, data pages to disk.
TimeFinder replicates storage data, we call this block data.
Block data is simply data that does not have a file structure applied. The host maintains a mapping of file names to block locations. This is known as file data.
The host has no knowledge of block replication. If we simply create a point in time copy – without notifying the host – we run the risk of having inconsistent data.
To provide a consistent Point in time copy:
Microsoft SQL Server utilizes Microsoft
- Volume Shadow Copy Service (VSS)
- SQL Virtual Backup Device Interface (VDI)
VSS and VDI use a Storage Provider.
- The storage provider acts as a service broker, coordinating the freeze and thaw operations with storage device copy and split operations.
EMC offers two Storage Providers for the Symmetrix array:
- Microsoft SQL Server Analysis Services does not integrate with VSS or VDI.
- Analysis Services writes XML files directly to the NTFS Windows file system.
- To ensure data integrity, I recommend detaching Analysis Services cubes prior to creating clone devices, or snapshots.
Slide 17
Microsoft adCenter is the advertising system that powers the Microsoft Bing search engine.
When you complete a Search on Bing Advertising will render at the top and side of the search results.
These keyword ads are powered by adCenter.
adCenter provides really rich Business Intelligence and analytics to their customers; the advertisers.
All of the adCenter systems have really strict SLAs wrapped around system performance. All data from ads must be rendered by the BI system within two hours. In addition all web reports must continue to rapidly refresh, all while the system is processing customer driven custom queries.
Internal business reporting put excessive load on the customer facing systems. The answer was SQL Server Analysis Services.
The Analysis services solution became so successful that it was soon overloaded. adCenter first upgraded the Analysis Services system to SQL Server 2005, and again to SQL Server 2008. Both upgrades resulted in measured performance gains. Even with these software and server upgrades the system was again tasked beyond its limits.
In cooperation with the SQL Customer Advisory Team and EMC Partner Engineering the adCenter analytics and Business Reporting team extended Analysis services using Scalable shared database and Symmetrix time finder.
This solution is an overview of what is currently in use by Microsoft adCenter to provide Business reporting information.
- The data is fed by SQL server Data Warehouses
- Data is loaded into the SQL Server analysis services Factory server. The factory presently runs a 24 hour data load and processing cycle.
- The data is stored on a Symmetrix factory volume.
- The Symmetrix VMAX array, utilizing TimeFinder Clone, makes a full point in time copy every 24 hours.
- Reporting servers all point to the same read only reporting volume.
- The reporting servers are fronted by the Microsoft Windows Network Load Balancing Service.
- Clients, SharePoint dashboards made using Silverlight, Microsoft Excel, and Microsoft power pivot transverse the NLB system to retrieve data from the reporting servers.
Slide 18
Data is collected from:
- Billing
- Customer DB's
- Business Intelligence data
- Traffic analysis services
- Web Servers
Data is loaded into a staging data warehouse.
The data is processed by ETL (Extract, Transform, and Load)
Extract Data from other computer systems
Transform it into a format usable by the data warehousing system (basically this means counting and adding – think of a giant spreadsheet)
Load the data into the data warehouse (or in this case Analysis Services).
– In this case the ETL is SSIS (SQL Server Integration Services). SSIS replaced Microsoft DTS. It is used to extract, manipulate, and load data within the Microsoft SQL Server Environment.
The data is loaded from the staging data warehouse into the OLAP (look up what this means too) cube server.
Slide 19
Data is processed by the Factory server.
This really demonstrates the power of SQL Server Analysis services:
The hyper cubes store a smaller subset of the entire data warehouse.
Developers can customize the entire BI system using the Microsoft SQL Server Business Intelligence BI Development Studio or BIDS (looks just like visual studio – but dedicated to BI). This system provides easy to use wizards for building integration, reporting, and analytic solutions.
The cube can pre-aggregate the data.
If a BI Developer guesses wrong about the processed data the cubes will process data on the fly, and store the answers in memory.
Think of an Analysis Services cube as a business report.
This entire suite, SQL Server Data Warehouse, Analysis Services, SSIS, Excel, Power Pivot, SharePoint, and Silverlight / .Net as the SQL Server 2008 managed self-service business intelligence system.
The adCenter Self Service BI system takes the processed data and loads it into the Reporting servers.
The Silverlight and Excel Clients are able to perform data queries. These queries are usually constructed using data connected pivot tables.
This system is extremely powerful!
Slide 20
Setup – The scalable shared database environment is configured and data is initially replicated.
Data Consistency is guaranteed by mounting and un-mounting the cube.
Slide 21
The adCenter environment has been completely automated:
- C# application
- Starts and Stops Analysis Services processing.
- Mounts and Unmounts Cubes.
- Makes calls to the Symmetrix API.
- Mounts and Unmounts LUNS
- Controls Cloning
- The system uses part of the Symmetrix API called Symmetrix NT Control (symntctl.exe)
Automation can also be completed using EMC Replication Manager.
*Note: RM does not support mounting clones or snaps to more than one host. If automating with Replication manager first make a clone volume, and then create a Snapshot for each reporting server.
Slide 22
Daily Data Refresh
The adCenter system repeats this cycle every 24 hours.
- Data Load – The Data is loaded into the factory cube via SSIS from the Staging Data Warehouse
- Process Data – The Factory Server Processes data.
- Drain Stop Clients – NLB is used to drain stop clients. Any new connections are turned away. Existing connections are allowed to complete existing queries.
- Detach Reporting Cubes – Reporting cubes and any open files are closed. This step is extremely important. If any files are left open when the volume is un-mounted the clone volume will not successfully remount without restarting the server. This is due to the NTFS lazy reader (the lazy writer does not come into play as the volume is set read only in NTFS).
- Detach Shared Reporting Clone – The clone is detached using Symntctl.exe – this step is necessary to ensure that NTFS is updated with any changes. If the system is not un-mounted and re-mounted, the file system may not reflect changes.
- Detach Factory Cube – This step is taken because Analysis Services does not integrate with VSS or VDI. To ensure all data is updated and consistent the cube is detached, thus forcing all dirty memory pages to be committed to disk. If SQL Server is used then this step can be performed using the EMC VSS and or VDI providers.
- Flush NTFS – NTFS is flushed using SYMNTCTL.EXE. This step is only necessary for Analysis Services. SQL Server can be made consistent using VSS and or VDI.
- Refresh Clone – The array is told to update the clone. The array tracks all changes using a bitmap. The array chooses to either update only changed blocks or perform a full refresh. The refresh method is chosen based on efficiency. For the duration of the refresh the Clone target is marked not ready (this means it cannot be accessed by any hosts).
- Split Cone – After all changes have been replicated to the reporting clone it is split from the Factory LUN. The array sets the clone back to read write enabled. It can now be accessed by the hosts.
- Mount Factory Volume – the Factory volume is again accessed by the host (this is really a logical step as access was controlled from an application perspective)
- Attach Factory Cube – the factory cube is attached and the factory is now ready to load data.
- Mount Shared Read only clone – this is a multi-step process. The clone is mounted by a single server. This server uses symntctl.exe to set the NTFS file system to read only. The other nodes can then mount the volume and note its read only file system state. The array can set the LUN to read only; Windows has no way to see that the volume is read only, attempting to attach an Analysis Services Cube or SQL Server database will fail. SQL and Analysis Services specifically check the NTFS volume for the read only flag.
- Attach Reporting Cubes – attach the reporting cubes.
- Connect Clients – Use NLB to allow the clients to again access the reporting servers.
This system is really powerful. We can set up a series of parallel clones or snapshots that will allow only some of the reporting servers to be updated at a time. This allows a system that has virtually zero downtime. One caveat; Excel and most application are connection oriented. This means that when they connect to a cube or data warehouse they will fail the connection if that specific server is removed. This can be solved using web services for SQL.
Slide 23
Performance Impacts
Network
Network Load Balancing is an installable option for Windows Server. It works by spoofing a network MAC address. All nodes always have this MAC address active so every node receives all network traffic. This has the effect of limiting the achievable network bandwidth. If the system interconnect uses a gigabit Ethernet network, only a gigabit of bandwidth is available to the entire cluster; every note receives all traffic. Normally a network switch would only forward traffic specific to a single node. If extra bandwidth is needed, the entire network should be upgraded to a higher speed.
Hardware Load Balancers – Many manufactures provide hardware load balancing devices. Please note that these devices often do not support database systems. The hardware load balancer is designed to handle web traffic, TCP port 80 and 443. Web traffic is characterized by short connection times. A SQL decision support query may take several minutes to complete. As a protection against denial of service attacks, most hardware load balancers have an extremely short connection time out value set. Please contact the device manufacturer for information on supporting SQL server or Analysis services.
Processing
The Factory server is a single server. This server, by necessity, must be scaled up. If more processing power is needed, more powerful processors and more memory must be added. The reporting servers can scale out. The only limiting factors are the Ethernet bandwidth and storage configuration.
SQL server uses an extremely elegant I/O engine. SQL Server will dynamically scale up I/O size when sequential I/O is detected. The SQL I/O will vary from sequential to random depending on the data and query. Analysis services uses many files residing on NTFS. AS creates many XML files, Fact, Dimension, and Map files (Map files are used to store a mapping of frequently used data. Think of this as simple de-duplication). The Analysis Services workload is extremely random. The application will need to look up data in the map, fact, and possibly dimension files.
Rules of thumb for the storage:
Isolate the Factory server with dedicated performance. This will ensure that the factory has dedicated I/O that is not affected by reporting activities.
The reporting servers are putting load on a single volume. The volume must have enough performance to support all reporting servers accessing the volume simultaneously. If one server needs 10,000 random IOPS; 3 servers must have a single LUN that supports 30,000 random IOPS.
Slide 24
I executed a simple I/O test using the IOMETER I/O test tool. I used my laptop hard disk with a moderately sized data file.
The first test executed two 64K random reads at the same time. This test produced about 230 random I/O's at a 9 millisecond response time.
The second test executed two 64K sequential reads at the same time. Notice that the performance was about half – down to 120 IOPS at double the response time.
Slide 25
This multi sequential access pattern is called large block random I/O.
This rendering shows a typical hard disk. The blue and green circles represent two tracks. The orange and silver object is a read write head.
The drive has three performance modes:
- Sequential – this is the drives best performance. A typical 15K hard disk can perform about 1400 IOPS.
- Random – the same drive will only support about 220 IOPS when it is randomly executed. This is defined by drive manufactures as average seek time.
- Large block random – also caused excessive head seeking. This performance metric is tracked by drive manufactures as maximum seek time. From the previous slide this can be up to two times slower than associated random I/O.
Slide 26
Smart and Powerful – the adCenter solution deployed Enterprise Flash Drives.
Note the graph.
Migrating to just 9 EFDs not only increases performance, it drastically reduces TCO – especially power usage and physical footprint in the data center.
Slide 27
To summarize:
- Implementing a solution based on SRDF and SRDF/CE can reduce data loss to at or near zero.
- The recovery process can be automated and reduced from days or hours to minutes.
- These solutions utilize the Microsoft solutions already in use at most businesses.
- Adding SQL Server codename "Denali" can further simplify the deployment of geo clusters.
- Deploying a solution based on Symmetrix TimeFinder and SQL Server Scalable Shared Database can drasticly increase reporting system performance for data warehouse and business intelligence systems.
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Recent Comments