Vineet's profileVineet GuptaBlogListsGuestbookMore ![]() | Help |
|
October 29 Azure - Microsoft's Cloud PlatformMicrosoft's application platform has now got a shade of the sky - Azure was formally announced at the PDC yesterday by Ray Ozzie in his opening keynote address. I have said this several times in the past - computing going forward is going to be all about parallelization. On the client, parallelization of code is necessitated by the advent of many-core CPUs. On the server, parallelization is about scaling out - partitioning your code, app, data to be able to run on several servers in parallel to provide high-scale and redundancy. Of course it is rather difficult to design, build and operate the app infrastructure required to run an application that can potentially run across hundreds of servers. And this is where the Azure platform comes in - it is a platform to run .Net applications on Microsoft's cloud fabric, hosted in its datacenters across the globe, offering virtually limitless scalability and availability. This is done by providing a set of abstractions that let you focus on the core business logic of the application instead of worrying about the underlying physical infrastructure. You can get an overview of these abstractions by reading David Chappell's excellent whitepaper that introduces the three key set of abstractions announced at the PDC. In this post, I am going to focus on the lowest layer - Windows Azure.
Cloud Operating System Windows Azure is the OS for the Cloud. Marketing speak? Not really. What does an OS do? It abstracts away the hardware. Before the OS came along, people used to code specifically for each machine architecture. You had to know the instruction set of a machine in order to be able to program for it. This changed with the notion of an OS. You did not have to worry about the specific machine architecture - you wrote to the OS and it took care of loading your code in memory, loading stack in the register, moving the instruction pointer, loading the data from disk to RAM, RAM to cache, cache to register, etc. Today, when we think about creating really large scale applications, we need to know our deployment environment fairly well. How many web-servers are there? How should I partition my data - where is the node to which I need to send the writes, from which nodes can I do the reads. And then servicing this entire thing is difficult as well - how do I upgrade my code across the farm? How do I apply patches? And then there is the problem of recovery - a node goes down, now how do I route traffic? What happened to in-flight data on that node? What about the database node that just went down? Recovery gets very hard in a truly large scale app that is deployed across several physical boxes. It is not that it cannot be done - every single successful dotcom / web2.0 site has gone thru this route. But it is a hard route to follow, it takes a ton of engineering talent, a lot of upfront money and you still need to be lucky to get it right. Enter the Cloud OS - an abstraction for a scaled out infrastructure that you can start using without bothering about underlying details. The idea is this - you design your app to scale (yes, you still have to do the design - no abstraction can help scale an app that has sticky session state or unpartitionable data design), and stop bothering about everything else underneath it. You basically state your intent by defining your deployment architecture declaratively by telling the OS the number of front-end servers you need, the number of back-end processing machines you need, what is the response time you expect on a certain operation, etc. and the system figures out the rest of it - it is an OS for the Cloud!
The Programming Model At a high level, the programming model is really straight-forward. You write your code just the way you typically do (hopefully designed for scale-out) and alongside, you provide meta-data about what your app is and how it should be deployed. Then you submit your code and the metadata to the Cloud OS. In the OS, a component called the Fabric Controller examines your metadata and finds the available resources that match the physical needs you describe and pushes your code to these resources - typically virtual machines. On each of these resources another component called an Agent is running which is in constant communication with the Controller. When the Controller pushes your code to these boxes, the agent configures it on this box as you defined - and keeps a check on its activities to make sure that the code is able to deliver the kind of characteristics you desire in terms of response time, footprint, etc. Similarly, if your metadata describes other resources to be used - routers, switches, load balancers, etc. - the Controller also gets these resources provisioned and configured according to your definition. Once your app is up and running, the agent on a box maintains a heartbeat with your code - if it sees anything moving out the range it may take action like moving your app to a different box, or provisioning another box, etc. Of course what can also happen is that this box itself may die, taking the agent with it. This is when the Controller comes into action. The controller also maintains a heartbeat with the various Agents, and in case an Agent goes down, it can try and re-start the machine, etc.
Service Architecture The architecture of a Service targeting the Azure CTP release is given at http://msdn.microsoft.com/library/dd179341.aspx. Couple of key points: 1) The service can consist of up to one app in a web role and one app in a worker role. However, there can be any number of instances of these apps. 2) The metadata is split into two parts:
3) Storage: There are two options for storing data: First one is to use SQL Server Services - this is the database in the sky option. More on this at http://msdn.microsoft.com/library/dd200927.aspx. Second option is to use Windows Azure native storage: This is a simpler storage option. There are three services here:
Creating a Azure hosted Service The Azure SDK provides an emulator called the Development Fabric that simplifies development tasks a lot. Here are the key steps: 1) Write your code 2) For storing data, a Development Storage is provided. You need to initialize this storage thru a tool called DevelopmentStorage 3) Create the Service Definition and Service Configuration files 4) Change the storage URIs as per http://msdn.microsoft.com/library/dd179425.aspx in the configuration file 5) Package the code along with the Service-Def file using a tool called CSPack. 6) Publish the package using a tool called CSRun.
Of course, before you can carry out step 6) you need to get your Azure account. To do that, visit https://connect.microsoft.com/site/sitehome.aspx?SiteID=681.
But Remember ... ... that though the OS makes it simpler to run a scalable application by abstracting away all the infra underneath and by providing multiple services, the key is to design the scalable app. I will write more on writing scalable applications and also on using Azure in subsequent posts. Comments (2)
TrackbacksThe trackback URL for this entry is: http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1499.trak Weblogs that reference this entry
|
|
|