VSM Cover Story
Test-Drive SQL Server Data Services
Microsoft readies its new "cloud database" by hosting beta versions of REST and SOAP protocols for performing CRUD operations on clustered, customized SQL Server 2008 instances. These instances are organized as massively scalable, super-reliable, highly available Web services.
TECHNOLOGY TOOLBOX: VB.NET, SQL Server, Other: XML (See Additional Resources)
Provisioning highly scalable databases as Web services with REST and SOAP data access and update protocols is one of 2008's hottest Web infrastructure topics. Web-Oriented Architecture (WOA) envisions that on-premises services will morph to Web-facing resources accessible to enterprise workers anywhere and anytime.
Numerous bloggers and industry pundits, such as Microsoft's Jon Udell, are issuing the call to "overcome data friction" by moving from screen-scraping pages to Web-based data services as resources for data-intensive mashups. Forrester Research Inc. estimates that global "Enterprise 2.0" spending on WOA will reach $4.6 billion in five years (see Additional Resources). Accommodating rapid user growth and fluctuating traffic patterns ordinarily requires drastically overbuilding Web infrastructure to reduce latency when traffic spikes occur. Renting scalable resources in existing data centers with a monthly payment based on gigabytes of data stored and up or downloaded is far preferable to making large, up-front capital investments, especially for startups.
To meet the anticipated demand for off-premises data services, three major players in the Internet arena have released beta versions of highly scalable hosted database resources: Microsoft's SQL Server Data Services (SSDS), Amazon Web Services' SimpleDB, and Google App Engine's Datastore (GAE). SSDS appears to be the most enterprise-ready of the three services at this time. I'll give you a walkthrough of SSDS's important features and current limitations, describe how to provision your own SSDS service, and explain the .NET 3.5 code required to perform create, retrieve, update, and delete (CRUD) operations in the cloud with REST and SOAP protocols. If you have an SSDS beta account, you can run the SSDSNwindEntitiesCS.sln C# sample project to upload and manipulate data from the Northwind sample database, as well as compare the performance of REST and SOAP protocols with differing payloads. I'll also attempt to forecast the future of SSDS and its closely related on-premises counterpart, ADO.NET Data Services (usually called by its code name "Astoria"), which was the subject of my "Retrieve and Update Data in the Cloud with Astoria" article in VSM's May 2008 issue.
The three hosted databases are schema-less and implement the entity-attribute-value (EAV) data model instead of the traditional relational model. Replacing the tabular organization and rigid metadata of relational databases with EAV tuples simplifies creating new entity sets, which correspond approximately to tables, and increase flexibility by enabling ad hoc changes to entity structure. The EAV data model has been most commonly used for entity sets that might require a large number of properties to describe all their entities, but whose individual entities require only a few property values. Clinical patient examination data is the classic example of information that's best suited to storage in the EAV model; physicians might enter just a few observations per patient visit from literally thousands of potential candidates. EAV databases are best suited to read-mostly applications and aren't often used for online transaction processing (OLTP).
SQL Server 2008 handles the EAV model's sparse matrix of entity values with sparse columns and filtered indexes. These two new features enable increasing the maximum number of columns in a table from SQL Server 2005's 1,024 to 30,000 and indexes from 250 to 1,000 per table with SQL Server 2008. NULL values don't occupy space in columns that have the new SPARSE keyword applied. Filtered indexes on sparse columns don't include rows with NULL values, which reduces index size and improves query performance. However, each non-NULL value requires an additional four bytes. For example, the bit datatype's size grows from 1 to 33 bits. The maximum number of bytes in a row (without overflow to another page) remains 8,060 bytes.
The Layered Look
SSDS introduces a three-layered architecture consisting of authorities, containers, and entities to which the SQL Server team applies the ACE acronym (see Figure 1). Authorities are SSDS's basic billing unit and REST resource/SOAP endpoint, which Microsoft provisions on the SQL Server Data Services Platform in a single data center. The platform consists of racks of low-cost commodity servers with Serial ATA (SATA) disk drives organized into a "distributed data fabric" with automated "lights-out" management. SSDS calls an individual data center a geo-location and offers data replication between geo-locations for disaster recovery.
An authority corresponds approximately to a shared, clustered SQL Server 2008 instance that stores multiple data copies for reliability. Paying customers can have multiple authorities. Each REST authority has its own DNS name, https://authority.data.beta.mssds.com/v1. This DNS name is called the Service URI, and it resolves to rest-data.beta.mssds.com; authority is your choice of a unique subdomain name (AuthorityId). Secure Sockets Layer (SSL, HTTPS) is mandatory for preserving user ID, password, and data security for the REST protocol. The beta element will be present in the name while in beta, during which time the service will be free, and v1 indicates version 1 of the service. The SSDS team expects to update the beta service version with new features every eight weeks. To apply for the beta program, click on the Register for Free Beta button on the main SSDS page and complete the Microsoft Connect registration form. Monitor your e-mail or return to Connect periodically to learn when you've been admitted, at which time you'll be provisioned with a unique user ID, password, and authority automatically.
Containers are the basic unit of data consistency, as well as query scope. SSDS team members often analogize containers to databases. There's no technical limit on the number of containers per authority but their maximum size is 2GB. Therefore, you should consider more granular storage, such as related entity sets, to make container contents more visible to casual users and allow for data growth. This empty GET query string from the browser's address textbox lists an authority's containers in a Plain Old XML (POX) document (see Figure 2):
https://authority.data.beta.mssds.com/v1/?q=
Unfortunately, there's no corresponding query to list the entity types, which SSDS calls Kinds, in a V1 container. (The EntitySet namespace contains sitka, which is SSDS's beta code name.)
Query strings are case-sensitive. This URL returns an Employees container's XML template:
https://authority.data.beta.mssds.com/v1/Employees
Failure to capitalize the "E" in "Employees" returns an HTTP 404 Not Found error.
ACE use POX templates for creating or updating collection members with the REST protocol. Here's the template for an Employees container returned by the preceding query:
<s:Container
xmlns:s=
http://schemas.microsoft.com/sitka/2008/03/
/
/
xmlns:xsi=
http://www.w3.org/2001/XMLSchema-instance
xmlns:x="http://www.w3.org/2001/XMLSchema">
<s:Id>Employees</s:Id>
<s:Version>1</s:Version>
</s:Container>
To create a new container, POST the preceding template without the <s:Version> element, the value of which is read-only.
Entities are property bags that have three predefined properties (called metadata): Id (string, mandatory), Version (decimal, read-only), and Kind (string, optional). All other properties are optional and can be of XML Schema's string, decimal, dateTime, boolean, or base64Binary datatypes. Because of the free-form structure of entities and their properties, SSDS calls these two elements Flexible Entities and Flexible Properties.
Adding /?q= to the preceding container query returns an entity set of all Employee-Kind entities in the Employees container. Replacing ?q= with a valid Id value returns the template for an entity instance (see Figure 3). The template uses the Kind value to name the entity's root element, which is <Employee> in this example.
POSTing the first entity template of a new Kind creates a table; adds the conventional Id and Kind columns and indexes; autogenerates the flexible property columns and filtered indexes; and then adds its values to the new table. Autogenerating containers and entities from a pair of XML templates simplifies populating entire authorities from existing data sources programmatically. Otherwise, the entity creation process would require abandoning REST compliance to execute out-of-band T-SQL DDL queries. For REST create and update operations, the SSDSNwindEntitiesCS sample app uses LINQ to XML's functional construction feature to generate POX POST and PUT templates from table metadata returned by the SqlDataReader (see Listing 1). Updates must send all property values -- not just changed values -- in the PUT template. Deletions use DELETE templates with the entity's Id element only. SSDS doesn't perform transactions for updates, but its Version property emulates a timestamp for optimistic concurrency management.
REST GET operations use a LINQ expression syntax subset consisting of the from, in, where, and select keywords, as well as the <, <=, ==, >=, and > comparison operators or their Visual Basic equivalents. The full query string consists of the ServiceURI, ContainerURI, ?q=' and the URL-encoded LINQ query string followed by a closing apostrophe. Metadata properties use conventional "dot" syntax in where clauses, such as where e.Id == "Ord_1000055" or where e.Type == "Order"; flexible properties require bracketed quoted identifiers, such as where ["ShipCountry"] == "Brazil". SSDS doesn't support entity associations, projections, joins, grouping, or ordering (other than ascending by Id value); aggregate functions (MIN(), MAX(), and so on) are also missing. Combining the result of queries against multiple containers requires LINQ to Objects union operations on the client.
This snippet from the sample project's btnGetOrders_Click( ) event handler URL-encodes the entire query string with the HttpUtility.UrlEncode( ) method from the System.Web namespace and sends it as a Create( ) request to SSDS, which returns Order entities for Brazilian orders:
// Create a query string to retrieve Brazilian orders
string containerUri = String.
Format("https://{0}.data.beta.mssds.com/v1/{1}",
authorityId, containerId);
string brazilQuery = @"from e in entities where e " +
"[""ShipCountry""] == ""Brazil"" &&
e.Kind == ""Order""select e";
string queryUri = containerUri + "?q='" +
HttpUtility.UrlEncode(brazilQuery) + "'";
// Retrieve Brazilian orders
WebRequest request =
HttpWebRequest.Create(queryUri);
request.Method = "GET";
request.ContentLength = 0;
request.ContentType = "application/xml";
request.Credentials = new
NetworkCredential(userName, password);
string data = null;
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
// ReadResponse() function code here for clarity
string responseBody = "";
using (Stream rspStm =
response.GetResponseStream())
{
using (StreamReader reader =
new StreamReader(rspStm))
{
responseBody = reader.ReadToEnd();
}
}
data = responseBody;
if (response.StatusCode != HttpStatusCode.OK)
{
MessageBox.Show(string.
Format("Unexpected status code " +
"returned (HttpWebResponse): {0}",
response.StatusCode), title);
return;
}
}
SSDS limits response streams to 2MB length or 500 complete entity instances, whichever occurs first. The size limit also sets the maximum size of blobs encoded as base64Binary to approximately 1.46MB due to encoding overhead. Simple looping code handles queries that return more than 500 entities.
SOAP Simplifies CRUD
SSDS's SOAP protocol greatly reduces the amount of code required for basic CRUD operations. To use the SOAP binding, you must add a Windows Communication Foundation (WCF) Service Reference for SSDS to your project. Right-click on Solution Explorer's References node, and choose Add Service Reference to open the eponymous dialog. Next, type the generic Service URI without the AuthorityId prefix, but with a /soap?wsdl suffix (http://data.beta.mssds.com/soap/v1?wsdl). Click on "Go" to add a SitkaSoapService node to the Services list, and type SitkaClient as the service Namespace (see Figure 4). Next, click on "Advanced" to open the Service Reference Settings dialog, change the Collection Type from System.Array to System.GenericList, and click on OK twice to close both dialogs.
Flexible Entities and Flexible Properties enable using a single Entity class for all entities, regardless of the contents of their property bags. This feature greatly simplifies the process of writing clients. The Entity class defines Id, Kind, and Version metadata properties. It also adds a generic Dictionary<string, object> Properties collection of KeyValuePairs for flexible properties. A Scope struct with a DataContract that has AuthorityID, ContainerID, and EntityID as DataMembers specifies the container level at which operations occur. You'll find the implementation of these two classes in the Service Reference's Reference.cs file. That same file also includes implementations of Error and SitkaSoapServiceClient proxy classes, as well as ISitkaSoapClient and ISitkaSoapServiceClient interfaces.
This snippet from the CreateEntity( ) method's SOAP section adds an entity set of any type to the container specified by the authorityID and containerID values:
// Add new entities with SOAP protocol
SitkaSoapServiceClient proxy =
new SitkaSoapServiceClient())
// Supply security credentials to proxy
proxy.ClientCredentials.UserName.UserName =
userName;
proxy.ClientCredentials.UserName.Password =
password;
// Set scope to the container for the entity
Scope containerScope = new Scope();
containerScope.AuthorityId = authorityId;
containerScope.ContainerId = containerId;
SqlConnection connNwind =
new SqlConnection(strNwind))
SqlCommand cmdNwind =
new SqlCommand("SELECT * FROM" +
[" + tableName + "]", connNwind))
int entities = 1;
connNwind.Open();
SqlDataReader sdrNwind =
cmdNwind.ExecuteReader();
Entity ent = null;
while (sdrNwind.Read())
{
// Create a new Entity with metadata properties
ent = new Entity();
ent.Id = idPrefix + "_" +
(entities + 1000000).ToString();
ent.Kind = entityType;
// Add and populate a property bag dictionary
ent.Properties = new Dictionary<string, object>();
for (int i = 0; i < sdrNwind.FieldCount; i++)
{
string key = sdrNwind.GetName(i);
object value = sdrNwind.GetValue(i);
string type = value.GetType().ToString();
object objValue = null;
if sdrNwind.GetFieldType(i).ToString() !=
"System.Byte[]"))
{
// Numeric type fixup to decimal
if (type.Contains("Int") ||
type.Contains("Single") ||
type.Contains("Double") ||
type.Contains("Byte"))
{
decimal decValue = Decimal.
Parse(value.ToString());
objValue = (object)decValue;
type = objValue.GetType().
ToString();
}
else
objValue = value;
// Add the flex property
ent.Properties.Add(key, objValue);
}
}
// Add the flex entity to the container
proxy.Create(containerScope, ent);
entities += 1;
}
sdrNwind.Close;
connNwind.Close;
The code adds Id and Kind metadata properties, followed by a flexible property for each source table column except, for this example, columns of the image datatype. The service translates the object's underlying CLR type to an SSDS datatype so numeric datatypes other than decimal must be fixed in V1. (Note that three levels of using( ) statements and try ... catch error handling have been removed from the preceding code for brevity and conformance to publishing limitations on column width.)
The SitkaSoapServiceClient proxy has Query( ), Update( ), and Delete( ) methods that correspond to REST's HTTP GET, PUT, and DELETE methods. Thiscode queries and returns the same collection of Brazilian Order entities that the earlier REST code example returned as a POX document:
proxy.ClientCredentials.UserName.UserName =
userName;
proxy.ClientCredentials.UserName.Password =
password;
// Set scope to the entity's container
Scope containerScope = new Scope();
containerScope.AuthorityId = authorityId;
containerScope.ContainerId = containerId;
string brazilQuery = @"from e in entities where " +
"e [""ShipCountry""] == ""Brazil"" && " +
"e.Kind == ""Order"" select e";
List<Entity> brazilOrders = new List<Entity>();
// Add the entities to the list
brazilOrders =
proxy.Query(containerScope, brazilQuery);
Scope and Entity objects appear in VS 2008's Data Sources pane. However, databinding Scope objects doesn't work and binding a generic List<Entity> to a DataGridView control doesn't deliver a usable UI for the flexible property bag items. It doesn't take a major effort to populate a list of custom read-only objects from an Entity collection, but two-way, read/write databinding would require writing a substantial amount of code. It behooves the SSDS team to simplify client development with a Wizard to codegen conventional business objects or, better yet, an ADO.NET SsdsBindingSource component and ASP.NET SsdsDataSource control to support two-way databinding.
SSDS defaults the SOAP protocol to plain-text HTTP. To enable Secure Sockets Layer (SSL, HTTPS) for your SOAP client endpoint, change these two lines in the security and endpoint sections of your project's App.config or Web.config file:
//original:
<security mode="TransportCredentialOnly">
<endpoint address="http://data.beta.mssds.com/soap/v1"
binding="basicHttpBinding" … >
//change to:
<security mode="Transport">
<endpoint address="https://data.beta.mssds.com/soap/v1"
binding="basicHttpBinding" … >
WCF defaults to 16,384 byte message and array sizes. To enable messages with SSDS's 2MB size limit, make the following changes to attribute values in the binding section:
maxBufferSize="2097152"
maxBufferPoolSize="0"
maxReceivedMessageSize="2097152"
Compare REST and SOAP Performance
The sample application collects and saves CRUD execution times in an Excel-compatible .CSV-formatted file to help you compare the performance of SSDS's REST and SOAP protocols. Enabling base64Binary encoding measures the performance hit caused by including moderately large image data when persisting new entities. Performance depends on the upload and download speeds of the user's Internet connection and the operation performed. I conducted my tests during SSDS's invitation-only early beta period, so absolute performance numbers shouldn't be considered representative of results to be expected when SSDS goes into production status in mid-2009. However, differences between REST and SOAP performance might be an innate characteristic of the service.
I created the tests with Order entities, which have an average 1,180-byte message payload, and I conducted these tests over a commercial AT&T DSL connection with a measured upload speed of 430kbps. The results indicate that SOAP Create( ) inserts are almost twice as fast as their REST POST counterparts (6.25 vs. 3.40 entities/sec.). Similarly, SOAP Update( ) operations are about two times faster than REST PUT executions (4.67 vs. 2.40 updates/sec.). Deletions are about 1.5 times faster than updates. On the other hand, REST GET retrievals with a DSL download speed of 2,580kbps outperform SOAP Query( ) operations by 25 percent (97.0 vs. 72.6 entities/sec.).
George Moore, Microsoft's general manager of Live Platform Services announced in a March 12, 2008, blog post that the unified REST wire format will be Atom (RFC 4287) and the unified protocol will be AtomPub (RFC 5023) for on-premises SQL Server and hosted SSDS, as well as other Live storage services (see Additional Resources). This policy means that SSDS will gain Atom/AtomPub in addition to POX as supported wire formats. (SSDS probably will also add JavaScript Object Notation [JSON] to support the AJAX community.) Atom has the capability to add references for establishing many:1 and 1:many associations between entities, which SSDS v1 doesn't support.
The SQL Server Data Programmability team developed SSDS and Astoria concurrently but adopted the Flexible Entities architecture instead of the Entity Framework's Entity Data Model (EDM) for SSDS. (My "Model Domain Objects with the Entity Framework" article in the March 2008 issue explains the EDM) SSDS and Astoria REST query strings also have significant format differences. However, the SSDS and Astoria data model are destined to converge by the time SSDS releases to production status. Pablo Castro, Astoria's data architect, said in an April 14, 2008, blog post, "We're working on aligning aspects of SSDS and Astoria and this alignment will come over a series of updates to both Astoria and SSDS. For example, we're ... already working on extensions to EDM to incorporate the open content model of SSDS. We'll be working to extend Astoria as needed to ensure it provides a great development experience over the SSDS service."
The Microsoft Synchronization Framework gained more than 15 minutes of fame when Ray Ozzie's Windows Live Mesh went into public preview mode on April 22, 2008, with file and folder synchronization as a primary initial feature. The Sync Framework will become a primary data transport for later SSDS versions, which could then integrate with Mesh-connected applications. Nigel Ellis demonstrated synchronizing contact data from an Access database to SSDS to a folder of vCard text files in his "Introducing SQL Server Data Services" presentation at MIX08. Advertising-supported mail, search, Mesh, and other consumer-oriented Windows Live Web services, as well as newer, hosted Online "software and services" offerings for businesses provide the incentive for Microsoft to invest billions of dollars in a worldwide data center network.
SSDS v1 lacks the role-based security features required to restrict individual users and groups from accessing or updating specific authorities, containers, or both. Developers must adapt to missing relational features and handle joins, grouping, sorting, and emulating aggregate functions with LINQ to Objects or LINQ to XML on the client. However, the SSDS team's grand plan, according to Ryan Dunn, an SSDS technical evangelist, is to "start simple and expose richer and richer functionality as we prove out the scale and developer's need." Some features that might be added in early upgrades are full-text search, larger BLOBs, and richer datatypes. Assuming that the team sticks to its eight-week development cycle commitment, the first round of enhancements should be completed by July 2008, when public access to the beta is scheduled to begin. That's when you can obtain an SSDS account and connect the SSDSNwindEntitiesCS sample project to Microsoft's "database in the cloud."