個人檔案Vineet Gupta部落格清單訪客留言更多 工具 說明

Vineet Gupta

Make it Simple, as Simple as Possible, but no Simpler

Gupta Vineet

View Vineet Gupta's profile on LinkedIn
Thanks for visiting!
請稍候...
很抱歉,您輸入的回應過長。請縮短您的回應。
您尚未輸入內容,請再試一次。
很抱歉,目前無法新增您的回應,請稍後再試。
若要新增回應,您的父母必須先給您權限。要求權限
您的家長已關閉回應功能。
很抱歉,目前無法刪除您的回應,請稍後再試。
您已超過每日回應上限次數,請於 24 小時後再試一次。
由於系統顯示您可能傳送垃圾郵件給其他使用者,因此您帳號中的回應功能已遭停用。 如果您認為自己帳號遭錯誤停用,請連絡 Windows Live 支援
請完成下列安全檢查,以完成回應。
您輸入的安全檢查字元必須與圖片或音訊中的字元相符。
RaparlaAni​l撰寫:
Hi Vineet,
 
I am working in Bangalore. I saw u r sharepoint web cast. It is very useful for my carrier. I have small dought in sharepoint server. How can we create host name? or we can create site port 80?. Can i get download u r Sharepoint webcasts.
5 月 13 日
P BPhani Kumar撰寫:
Hi Vineeth

This is Phani Kumar here from Hyd working as a TL, I came to attend the Tech Mela and Attended u r session it was very much useful to my carrier. Thanks a Lot for that, I send a friend request if u dont mind plz accept me .

Regards
Phani Kumar PB


9 月 24 日
JayarajNS撰寫:
Hi Vinneth
This is Jayaraj here from Chennai working as a Developer, I came to attend the Tech Mela and Attended u r session it was very much useful to my carrier. Thanks a Lot for that, I send a friend request if u dont mind plz accept me .

Regards
Jayaraj
9 月 19 日
11 June

Notes on XMPP Part I - Fundamentals

We recently augmented the team working on our desktop product. At the core of the product is XMPP – the protocol that drives several instant messaging servers and clients, sites like Chesspark and now Google Wave. Since XMPP is not known by many people, let alone be understood well enough, every time we on-board someone new, they have to go thru a steep learning curve. This post is an attempt to make it easier to understand the protocol .

 

What is XMPP

XMPP generally refers to a collection of specifications that define protocols for real time interactions over the public internet.  The core set of specifications has been standardized by IETF. While the first application (and the origin) of the protocols was to address instant messaging, the extensible nature of XMPP has led it to be used for a wide range of applications which need real time communication.

 

Architecture

XMPP has a decentralized client-server architecture (like the WWW) – there are several hundreds of XMPP deployments, each running anywhere from one to hundreds of servers to which millions of clients connect. Key aspects:

 

Addressing:

Each interacting entity on XMPP needs to have a unique address. This address is called a Jabber Id (JID). A JID consists of three parts:

user-identifier: Unicode string representing the interacting entity. For example: user1

domain-identifier: Unicode string representing the domain of the interacting entity. For example: directi.com

resource-identifier: Unicode string representing a resource used by the entity. For example: pwdesktop

A full JID looks as follows: <user-identifier>@<domain-identifier>/<resource-identifier>. For example: user1@directi.com/pwdesktop. An identifier without the resource is  called a bare JID: user1@directi.com. Another way of addressing is to use XMPP URIs: xmpp:user1@directi.com

 

Communication:

The original purpose of XMPP was to do instant messaging. This requires the server to be able to push messages out to the client on a real time basis. Over HTTP, this requires the client to poll the server, or a technique like Comet where the server caches HTTP requests and then sends responses on those requests. XMPP however uses a long lived TCP connection. This gives the server an always on channel to push info to the client. The client also need not wait for the server to respond to its messages, but can instead send an indefinite amount of messages to the server without blocking, which the server can then respond to, enabling an asynchronous kind of communication. This is further helped by the fact that XMPP does not require every packet being sent over the wire to be acknowledged. An entity assumes a packet to be delivered unless it receives an error.

 

Protocol:

XMPP uses streaming XML over a long lived TCP connection (though HTTP is also possible thru BOSH – see later) for communication. There is one stream from client to server and another stream from server to client. To start communicating, a client would send an opening XML tag:

<?xml version="1.0"?>

This is followed by a stream element – this marks the beginning or root of the document:

<stream:stream>

This in turn is followed by various messages sent across as XML elements. These elements are called stanzas. These stanzas can continue to get exchanged between the server and the client endlessly till a closing stream tag is sent which marks the end of the communication. There are three kind of stanzas:

1) Message: The <message/> stanza denotes the basic push method for sending stuff from one entity to the other. These need not be acknowledged and provide a quick fire and forget mechanism to send info from one point to the other. A Message has a “to” and a “from” attribute denoting the receiver and the sender, and can include one or more payloads. In IM conversations, the payload is often HTML markup for richly formatted text (defined by XHTML-IM)

2) Presence: The <presence/> stanza denotes a broadcast sent out by an entity to advertise its availability to other entities who have subscribed to receive these updates from the advertising entity. Presence messages can also include a payload. Common uses are to include an availability state like “away,” “busy,” etc. and personal status messages like “Working on blog post 1 of 6.”

3) Information Query: The <iq/> stanza provides a Request-Response mechanism like HTTP verbs GET, PUT and POST. The payload defines the request of the sender which needs to be processed by the receiver. This is the only stanza type where the sender expects a reply – a result or an error. This makes IQ more reliable than a Message, allowing the two entities to carry out a structured interaction. Common examples of IQ usage in IM applications are to fetch the the Roster, and add / remove entries in the Roster.

A closing stream tag indicates end of conversation.

 

Decentralized client-server architecture

XMPP servers talk to each other directly. So if user1@domain1.com needs to interact with user2@domain2.com, the servers at domain1.com would interact with domain2.com servers directly. This is different from email where communication between servers on different domains happens thru hops (which can lead to address spoofing and other issues) or HTTP where servers do not interact with each other at all.

 

 

Creating a XMPP Session

user1@domain1.com wants to talk to user2@domain2.com. To accomplish this:

 

1) Client creates a TCP session with the server:

The client used by user1 needs to find out the box that hosts the XMPP service for domain2. This is accomplished by doing a DNS service lookup by checking the DNS SRV record which maps the service to the machine name and the port of the service (5222 by default) becomes known, a AAAA lookup gives the IP of the machine. With the IP and the port we can now open a TCP connection.

 

2) Client and Server start streams in opposite directions:

a) The client sends across the opening XML text declaration tag (optional) followed by an initial stream header:

<?xml version="1.0"?>
<stream:stream to="domain2.com"
    version="1.0"
    xmlns="jabber:client"
    xmlns:stream="http://etherx.jabber.org/streams">

b) The server sends back a response stream header with a unique stream id:

<?xml version="1.0"?>
<stream:stream from="domain2.com"
    id="0123456789" version="1.0"
    xmlns="jabber:client"
    xmlns:stream="http://etherx.jabber.org/streams">

 

3) Client and Server negotiate Stream Features:

Right after sending the response stream header, the server send across a <stream:features> message on the features it supports. These features are typically about:

a) Whether server supports TLS or not (recommended)

b) Authentication mechanism supported (see my earlier blog post on authentication mechanisms) – typically SASL plaintext and digest-md5 are supported. Ideally one should not use plain text without TLS since in that case the password is sent in clear on the wire.

c) Stream compression (optional)

At this point, the client uses <iq/> stanzas to negotiate which features it wants to use, and if TLS is used, enter into a TLS negotiation, or if SASL is used, authenticate via the appropriate SASL mechanism.

 

4) Post Authentication Stream Negotiation

After authentication, the server resets the session by sending a new stream header with a new stream id. This is done for security purposes. This new stream does not publish any authentication features (since that is already done), but now publishes new features. These typically include:

a) Compression support

b) Resource binding

c) Formally starting an XMPP session

At this point, the client again uses <iq/> stanzas to negotiate which features it wants to use, and once that part is over, the actual task of application specific stanza exchange can start between the client and the server.

 

BOSH

I earlier mentioned that XMPP can work over HTTP as well. This seems counter-intuitive: XMPP requires push, and HTPP is pull based (client sends a request and server responds). However, it turns out that one can do push over HTTP as well – the technique for using XMPP over HTTP is called Bidirectional-streams Over Synchronous HTTP (BOSH):

 

1) There is a server in front of the XMPP server which handles HTTP clients. This is called a BOSH connection manager (CM).

 

2) Client sends DNS query for TXT records, and discovers that there is an entry for BOSH connection which points to the BOSH server mentioned above.

 

3) Session Creation Request: Client now sends a HTTP POST with an empty <body/> tag with some attributes set. The important ones are:

a) hold – the number of HTTP requests the BOSH server can queue. This is typically set to 1

b) wait – the timeout in seconds before which the server must respond to a pending request

c) rid – a large random number that acts as the initial request id

 

4) Session Creation Response: BOSH server opens a regular XMPP stream with the XMPP server over a TCP connection, receives the server’s XMPP response, wraps it up in a <body/> tag and returns this to the client over HTTP. The body tag contains the following attributes:

a) hold – same as earlier

b) requests – max number of HTTP requests that the client can open with the BOSH server at any time. This is typically set to hold + 1. Since hold is typically 1, requests is typically 2.

c) sid – a large random number that acts as the session id. This is diff from the stream id sent by the XMPP server. The client must now include the sid in every subsequent request.

 

5) Hereafter the client and the server negotiate stream features and authenticate pretty much in the same way as with a TCP connection, with the BOSH server sitting in between and wrapping / unwrapping the <stream> and <iq/> stanzas in <body/> tags. The XMPP application is now ready to exchange its specific stanzas.

 

6) The question at this stage is, how does the server do a push. Recall that hold=1 is the max number of HTTP requests the BOSH server can queue, and requests=2 is the max number of HTTP requests that the client can open with the BOSH server at any time. Assume that the last request from the client was sent to the BOSH server just around 60 seconds back (let’s say a <presence/> packet) . The server had nothing to respond because the XMPP server had no stanzas. Now since the 60 seconds timeout is about to be over, this is what happens:

a) BOSH server returns a HTTP 200 ok response to the last request with an empty <body/> tag. If the client too has nothing to say, it also sends across a HTTP 200 ok with an empty <body/> tag, to which the server can again respond at the 60 second timeout. This can go on ad-infiniteum as a keep-alive mechanism.

b) Now assume that the client has something to say. One request is already in the play and max two are allowed. So the client can now send the new stanza in a new HTTP request. The BOSH server immediately responds to the earlier request (which was kept on hold) with a HTTP 200 ok with empty <body/> tag. It now again has one request outstanding which it can use to either send back the keep-alive, or send back a response from the XMPP server.

c) Assume that the client and the server have been playing the keep alive game. Now the XMPP server sends a stanza (say an authorization request). The BOSH server needs to push this to the client. This is easy since it has a cached HTTP request. Hence push is accomplished over HTTP, without doing constant polling.

 

The advantage of using BOSH is that it can work even in flaky networks where a TCP connection would break, forcing the client to once again establish an XMPP session. Also, this makes it possible to use XMPP in web clients where one cannot open a TCP connection, for example, Facebook’s chat feature uses XMPP over BOSH.

 

Jingle

XMPP uses a client-server model for all communication and is optimized for small snippets of info. So if the amount of data to be exchanged is very large, for example in applications like file transfer, audio-video calls and screen-sharing, an XEP called Jingle. Jingle is large and complex enough to deserve its own series of posts. I will summarize basic facts here:

1) The basic idea behind Jingle (and other multimedia protocols like SIP) is to use two channels:

a) Signaling Channel to set up, manage and tear down application defined sessions

b) Media Channel to transfer the payload either peer to peer or relayed thru a mediator over a application defined transport

 

2) In a Jingle negotiation (<jingle/> element inside a <iq/> element),  the initiator makes an offer to start a session by declaring one or more app type (say voice video, etc.) and a transport method (ICE, UDP, etc.). The responder and the initiator then negotiate a set of parameters (for example codecs to be used), and if the negotiation works data is exchanged. Some parameters can be modified even while the data is being exchanged.

 

3) Jingle supports two transport types:

a) Datagram transports like UDP – can tolerate packet loss – meant for apps like media streaming

b) Streaming transports like TCP – no packet loss tolerated – for example file transfer

 

4) The real power of Jingle comes from using Jingle over ICE. ICE provides a mechanism for two entities to communicate and negotiate all possible ways of connecting between each other – direct or mediated. ICE in turn can use a STUN server to find out the IP address and port of an endpoint from outside the firewall, and a TURN server to relay data in case a direct peer to peer connection is not available.

24 May

Lessons from Twitter @Replies tweaks

When the Twitter reply feature tweak story started breaking, my first reaction was – this could be us in future. Watching Twitter struggle with the change and the backlash it generated was a big public lesson in product design, tech implementation and communications, and I thought I should document it here, lest I ever forget it.

For the uninitiated, Twitter recently removed a setting where you could see responses by people who you follow to other people who you don’t follow. This was primarily done for tech reasons, but the blog announcement did not explain that very well, leading people to think of it as an arbitrary move, resulting in some strong feedback. Twitter however responded quickly with the real story: it was done for technical reasons. Here’s what I learnt:

 

1) Product Design:

According to Alex Payne – Twitter’s API lead, only 3% of the users had ever turned on that feature, however, these 3% users are apparently power users, which in Twitter’s world also means that they are also strongly vocal with a large influence base, and they were using the feature to discover new friends. Hence the strong backlash. The lessons are:

a) If you change a feature, figure out the audience profile it impacts most, and the consequence of that impact. Numbers-wise 3% is low, but what if they are some of your most important and vocal users?

b) When you do change a feature, figure out if the impact can be lessened by coming up with an alternative. It seems that the Twitter team is now working on this – they could have started the work before going ahead with the decision to remove the setting.

 

2) Tech Implementation:

The feature was removed for tech scalability reasons. Let us see what the situation is:

  • User U1 has n1 followers represented by the set S1. One of the followers in S1 is U2 who has in turn n2 followers represented by the set S2.
  • User U1 makes a twitter post – all n1 followers in S1 see it
  • User U2 makes a reply to this post. Now twitter needs to deliver it to people from both S1 and S2:
    • First deliver it to the intersection of S1 and S2 since that is the set which is following both U1 and U2.
    • Now find out (S1 + S2) - (S1 x S2) – this is the remaining set which is not following both U1 and U2
    • In this set find out the 3% users who have opted for receiving replies even when they are not following the user.

How hard is it to do this? Well you are looking at at least two select queries, where the second one involves a complex join. This is where the stress on the system comes from. Can this not be addressed by caching? Here’s what would need to be cached: the follower list for each user, and the 3% of the total users who have got this feature turned on. It would be ineffective if the follower numbers n1, n2, n3, … for users U1, U2, U3, … were very large and not easily cacheable. For a typical social network like Facebook it would not be very difficult, but Twitter is different: it is asymmetric – you can follow someone without requiring that person to follow you. This results in massive values of n1, n2, n3, … for popular users. How big? According to http://wefollow.com/top, each of the top 10 twitter users has more than a million followers, with Oprah Winfrey at 1.02M and Ashton Kutcher at 1.85M. Clearly not an easy number to cache. What is scarier is that Twitter is growing at a very fast pace – faster than anyone else according to http://blog.nielsen.com/nielsenwire/online_mobile/twitters-tweet-smell-of-success/, and the number of celebrities and businesses using Twitter is growing at a fast pace. According to http://twitterholic.com/, 5 of the top 10 users have joined in the last one year.

Lesson: when designing features, design with scalability in mind. If it is not scalable, avoid putting it in – you may need to kill it.

 

3) Communications:

This has been covered ad nauseam all over the web, so I won’t repeat it here. The moral of the story is that if you do have to kill a feature, the key thing is to be honest about the reasons and be clear in the message leaving no ambiguities. First Facebook learnt the lesson and now Twitter has. Hopefully, we would be able to avoid these mistakes.

1 May

Activity Streams 101

One of the cornerstones of the .PW platform is the Wall - the real time aggregate of activities being done by an entity and its network. So while I am personally rather inactive on the various social networks (way too distracting), the recent announcement by FB on opening up their feed via activity streams led to some analysis and thought – here’s a summary.

 

The Premise

  • A user carries out several activities across multiple systems – posting items, joining forums, connecting to people, etc.
  • Each of these systems have their own way of capturing, storing and publishing this information.
  • Each system wants to puts various degrees of control on the usage of this information. Twitter makes it freely available, FB puts a wall-garden around it, others fall in between.
  • The user wants to be able to control who can see what information, and exercise his copyright on that information in terms of how it is consumed, used, persisted and further shared.

 

Challenges

  1. How does one capture the semantic richness of the various types of activities in a simple, precise way.
  2. How does one publish the stream in near real time without going into expensive polling. Why is this imp? Because on Jul 21, 2008 FriendFeed crawled Flickr ~2.7M times for a grand total of 6700 updates. HTTP was not made for Push, and Pull is a resource hog.
  3. How does a user continue to exercise his copyright on his content even as his feed becomes available in a machine readable way and published.

 

Solutions

  • Challenge #1 is being addressed thru the emerging activity streams standard. More on this in a minute.
  • For #2, activity streams uses Atom. So this can theoretically be layered over XMPP. (Any implementations, anyone?)
  • For #3, there are no straightforward solutions. Twitter makes everything public – as a user you can opt out. FB comes from the other side of the fence – the user opts in for sharing. Copyright enforcement in both cases is contractually enforced.

 

Activity Streams Standard

Take a look at the formats of the following public feeds:

1) http://api.flickr.com/services/feeds/photos_public.gne

<entry>
<
title>death valley 07a</title>
<
link rel="alternate" type="text/html" href="http://www.flickr.com/photos/willburn25/3488468568/"/>
<
id>tag:flickr.com,2005:/photo/3488468568</id>
<
published>2009-04-30T08:40:47Z</published>
<
updated>2009-04-30T08:40:47Z</updated>
<
dc:date.Taken>2009-04-30T02:40:47-08:00</dc:date.Taken>
<
content type="html">
&lt;p&gt;&lt;a href=&quot;http://www.flickr.com/people/willburn25/&quot;&gt;Shackleton, Jules&lt;/a&gt; posted a photo:&lt;/p&gt;
&lt;
p&gt;&lt;a href=&quot;http://www.flickr.com/photos/willburn25/3488468568/&quot; title=&quot;death valley 07a&quot;&gt;&lt;img src=&quot;http://farm4.static.flickr.com/3391/3488468568_b3eea508d4_m.jpg&quot; width=&quot;240&quot; height=&quot;92&quot; alt=&quot;death valley 07a&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
</content>
<
author>
<
name>Shackleton, Jules</name>
<
uri>http://www.flickr.com/people/willburn25/</uri>
</
author>
<
link rel="enclosure" type="image/jpeg" href="http://farm4.static.flickr.com/3391/3488468568_b3eea508d4_m.jpg" />

</
entry>

 

2) http://picasaweb.google.com/data/feed/api/all 

<entry>
  <
id>http://picasaweb.google.com/data/entry/api/user/isabellechedin/albumid/5284084403719516961/photoid/5330400059699013378</id>
  <
published>2009-04-30T08:34:36.000Z</published>
  <
updated>2009-04-30T08:34:36.000Z</updated>
  <
categoryscheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/photos/2007#photo'/>
  <
titletype='text'>DSC03419.JPG</title>
  <
summarytype='text'/>
  <
contenttype='image/jpeg' src='http://lh6.ggpht.com/_2pkj6pXKPwY/SflinNUUpwI/AAAAAAAABxs/gXHsFzjrylU/DSC03419.JPG'/>
  <
linkrel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://picasaweb.google.com/data/feed/api/user/isabellechedin/albumid/5284084403719516961/photoid/5330400059699013378'/>
  <
linkrel='alternate' type='text/html' href='http://picasaweb.google.com/isabellechedin/ExpatriationHongKong#5330400059699013378'/>
  <
linkrel='http://schemas.google.com/photos/2007#canonical' type='text/html' href='http://picasaweb.google.com/lh/photo/FV74oIogEEchKANsA9dfLg'/>
  <
linkrel='self' type='application/atom+xml' href='http://picasaweb.google.com/data/entry/api/user/isabellechedin/albumid/5284084403719516961/photoid/5330400059699013378'/>
  <
linkrel='http://schemas.google.com/photos/2007#report' type='text/html' href='http://picasaweb.google.com/lh/reportAbuse?uname=isabellechedin&amp;aid=5284084403719516961&amp;iid=5330400059699013378'/>
  <
author>
    <
name>isa</name>
    <
uri>http://picasaweb.google.com/isabellechedin</uri>
    <
email>isabellechedin</email>
    <
gphoto:user>isabellechedin</gphoto:user>
    <
gphoto:nickname>isa</gphoto:nickname>
    <
gphoto:thumbnail>http://lh5.ggpht.com/_2pkj6pXKPwY/AAAArGA7Css/AAAAAAAAAAA/lspX8vyux5o/s48-c/isabellechedin.jpg</gphoto:thumbnail>
  </
author>
  <
gphoto:id>5330400059699013378</gphoto:id>
  <
gphoto:albumid>5284084403719516961</gphoto:albumid>
  <
gphoto:access>public</gphoto:access>
  <
gphoto:width>1600</gphoto:width>
  <
gphoto:height>1200</gphoto:height>
  <
gphoto:timestamp>1240920712000</gphoto:timestamp>
  <
exif:tags>
    <
exif:fstop>3.5</exif:fstop>
    <
exif:make>SONY</exif:make>
    <
exif:model>DSC-T100</exif:model>
    <
exif:exposure>0.04</exif:exposure>
    <
exif:flash>false</exif:flash>
    <
exif:focallength>5.8</exif:focallength>
    <
exif:iso>400</exif:iso>
    <
exif:time>1240920712000</exif:time>
  </
exif:tags>
  <
media:group>
    <
media:contenturl='http://lh6.ggpht.com/_2pkj6pXKPwY/SflinNUUpwI/AAAAAAAABxs/gXHsFzjrylU/DSC03419.JPG' height='1200' width='1600' type='image/jpeg' medium='image'/>
    <
media:credit>isa</media:credit>
    <
media:descriptiontype='plain'/>
    <
media:thumbnailurl='http://lh6.ggpht.com/_2pkj6pXKPwY/SflinNUUpwI/AAAAAAAABxs/gXHsFzjrylU/s72/DSC03419.JPG' height='54' width='72'/>
    <
media:thumbnailurl='http://lh6.ggpht.com/_2pkj6pXKPwY/SflinNUUpwI/AAAAAAAABxs/gXHsFzjrylU/s144/DSC03419.JPG' height='108' width='144'/>
    <
media:thumbnailurl='http://lh6.ggpht.com/_2pkj6pXKPwY/SflinNUUpwI/AAAAAAAABxs/gXHsFzjrylU/s288/DSC03419.JPG' height='216' width='288'/>
    <
media:titletype='plain'>DSC03419.JPG</media:title>
  </
media:group>
  <
gphoto:albumtitle>expatriation à Hong-Kong</gphoto:albumtitle>
  <
gphoto:albumctitle>ExpatriationHongKong</gphoto:albumctitle>
  <
gphoto:albumdesc/>
  <
gphoto:location/>
  <
gphoto:snippet/>
  <
gphoto:snippettype>PHOTO_DESCRIPTION</gphoto:snippettype>
  <
gphoto:truncated>0</gphoto:truncated>
</
entry>

While both of them are about public photos, the formats are vastly different since they come from two different providers. So a potential system that wants to aggregate photo updates across multiple providers needs to understand each vendor’s format separately. Efforts like Yahoo’s Media RSS extensions seek to address this.  (The picasa feed above uses the Yahoo extensions but the Flickr feed does not, which is strange considering it is a Yahoo property). Another example is Apple’s iTunes RSS extensions. However, these are RSS extensions, and what we really need in this space is an Atom extension. (why? see http://blog.unto.net/work/on-rss-and-atom/). An example is http://martin.atkins.me.uk/specs/atommedia.

However, this is only about photos which is just one of the activities. When we start considering the possible list of activities, the problem scope becomes extremely large. This is where the Activity Streams effort comes in. Consider the following activity: “Vineet posted new photographs to Agastya on Facebook 6 hours ago.” This can be broken down as:

  • Vineet = actor
  • posted = verb
  • new photographs = item / article
  • Agastya = social object (album here)
  • Facebook = site
  • 6 hours ago = timestamp

Other possible fields here can be:

  • User Agent / tool (example Thwirl)
  • Location
  • Mood
  • Title
  • Summary
  • Detail
  • Link
  • Verb collection
  • Actor collection
  • Object collection
  • Comments (points to another object)
  • The producer of the content may define the default viewing mechanism which can be used by a consumer

To capture the above, the following draft specs are currently in place.

  1. Atom Activity Extensions defines Actor, Verbs, Objects, Title, Summary, Detail, Link, etc.
  2. Atom Activity Base Schema defines the semantics of various Verbs and Objects, Location and Mood
  3. Atom Media Extensions defines how typical media – videos, photographs, etc. should be described. So in the above example, if we wanted to enclose the actual photographs of Agastya, we would need to describe the actual link, the preview thumbnail, image type, height, width, etc.

Note that this is work in progress and if you read the drafts, the gaps are fairly obvious. The big ones (I think) are:

  • As activities get republished, a downstream consumer may end up dealing with duplicates. This can be addressed by a combination of a URI + per item identifier
  • Some activities may be in response to other activities (classic case being video responses on YouTube) – the relationship would need to be captured
  • Instead of a single actor-verb-object, we may have collections of actors, verbs and objects. Examples being multiple actors on a single object (wiki editing) or single actor, multiple objects (uploading multiple pictures) or combinations.

 

Activity Streams Implementation

1. MySpace: http://wiki.developer.myspace.com/index.php?title=Standards_for_Activity_Streams is compliant to the current draft specs. Also, their documentation is simple to read, understand and use.

2. Facebook: http://wiki.developers.facebook.com/index.php/Using_Activity_Streams. For one, the stream only consists of user generated content and not app generated content. Second, they have a model where you need to prompt the user for permission to access the stream. This has two repercussions:

  • This requires the user to be logged on to Facebook
  • You can only show the user his own data. So if I import the FB feed on Friendfeed, the subscribers to my Friendfeed feed would not get to see the FB feed. This could have been possible had apps like FF could have persisted FB data, but FB TOS prohibit caching data for more than 24 hours.

Note that the Activity Stream standard is just one way of accessing the FB stream. They also provide a XML/HTTP and a FQL API. See Using the Open Stream API for details. The restrictions however stay in place.


So in summary, while the Activity Stream standard effort is quite exciting, the current FB implementation is not. All one can use it for is building clients which show the user her own data. The real value would be in letting apps re-publish the data without violating the privacy needs of the user. As of now, the wall garden is very much in place.

 

Further Reading


21 April

Authentication Mechanisms

After publishing my previous post, I had thought that I would not be coming back to Crypto for a while. However, today evening Sebastiaan posted on SCRAM on one of the Directi mailing lists, and I got compelled to write down this one.

Authentication in Cryptography has two aspects: data authentication and entity authentication. Data authentication is addressed using HMACs and Digital Signatures (discussed in my previous post). What we are talking about here is entity authentication: how does one entity get to know that the other entity is actually what is claims to be. This in turn has two aspects:

  1. Alice and Bob are communicating in “real time” – the typical way of authentication here is by Alice issuing a challenge to which only Bob can respond.
  2. Alice and Bob are sending messages to each other which are delivered after an appreciable delay since a message may be stored and forwarded thru multiple devices. Here since Alice can’t wait for Bob to respond to the challenge, the interest is only in validity of origin of data.

Here we are talking about the first kind of authentication. The goals of an entity authentication / identification protocol are as follows:

  1. Bob should be able to successfully authenticate himself to Alice.
  2. A third party should not be able to impersonate Bob to authenticate itself as Bob to Alice
  3. Alice should not be able to use the data used in authenticating Bob to impersonate Bob to a third party.

 

Passwords

A password is a shared secret between two entities. By sending across the shared secret, one entity can prove to the other that it is indeed the possessor of the secret and hence the valid entity. However, sending the shared secret across as a mechanism for authentication is fairly weak. Here are some inherent weaknesses in the scheme:

  1. Passwords are easy to guess (or brute force) since they are unlikely to be very long or use a random sequence or be chosen from a large alphabet set.
  2. Passwords do not change frequently so brute forcing them becomes easier since there is a huge time window available.
  3. A password is a shared secret between the two entities. Any entity that is able to intercept the password being sent by Bob over the wire can now use the password to authenticate itself to Alice.
  4. In systems that store password in clear text, this is even worse, since privileged insiders or administrators now have access to the shared secret.

 

Some of these issues can be addressed to some extent thru the following:

  1. Enforce a password policy that requires a long length, and inclusion of numbers or special symbols. This makes a brute force / dictionary / exhaustive search attack harder, but not impossible.
  2. Make the user change the password frequently – while good in principle, in practice this does not help in a huge way since most password changing policies provide a lot of time (30-45 days) for a dedicated hacker.
  3. Encrypt the password, preferably using a one-way function so that even if the algorithm, key and the Ciphertext are all available to the privileged user, he cannot get to know the shared secret.
  4. Salting the one way function by using a randomly generated number as a key also makes a dictionary attack harder.

However, the biggest drawback of the fixed password approach is that it is open to a replay attack. So it cannot be used over an open network, but must require a trusted connection.

 

Challenge Response Identification

The idea of a Challenge  Response scheme is simple: Both Alice and Bob have a shared secret. To authenticate Bob, Alice makes him a challenge. To respond to the challenge, the shared secret is required which only Bob has. Thus by sending the response, Bob proves that he is the possessor of the shared secret without ever sending across the shared secret. There are multiple ways of doing challenge response:

 

Symmetric key Challenge Response using Encryption

Here the assumption is that Alice and Bob exchange the shared secret a priori. Let’s  suppose the shared key is K. Then:

  1. Alice generates a random number n1, and sends it to Bob.
  2. Bob applies the encryption algorithm to generate n2 = EA(n1, K) and sends it to Alice
  3. Alice apples the decryption algorithm to compute n3 = DA(n2, K)
  4. If n1 = n3, Bob indeed owns the shared secret and is who he claims to be.

The EA is typically a function like modular exponentiation which is easy to compute, but hard to reverse. Also, data integrity must be ensured in all transmission by using a HMAC.

 

Symmetric key Challenge Response using One-Way functions

In case the usage of an Encryption / Decryption routine is infeasible (because of computing cost maybe), it is possible to achieve the same using one-way functions as well:

  1. Alice generates a random number n1, and sends it to Bob.
  2. Bob applies the one-way function H to generate n2 = H(n1, K) and sends it to Alice
  3. Alice also applies the one-way function H to generate n3 = H(n1, K)
  4. If n1 = n3, Bob indeed owns the shared secret and is who he claims to be.

 

Asymmetric Challenge Response

  1. Alice generates a random number n1 and sends it to Bob
  2. Bob encrypts the random number n1 using his private key to generate n2 and sends it over to Alice
  3. Alice decrypts n2 using Bob’s public key and obtains n3
  4. If n1 = n3, Bob indeed owns the public key under his name

 

Zero Knowledge Protocols

Since in all the cases above, Bob responds with some response based on the secret key, there is a theoretical risk of information leakage. Zero knowledge protocols seek to address this by enabling a claimant to demonstrate knowledge of the secret without revealing any information whatsoever.  The basic idea is that the claimant must answer two questions: one involving demonstrating the knowledge of the secret, and the second is a simple question to prevent cheating. The series of steps is repeated several times (typically 40) to reduce the probability of cheating, and only if all answers are correct is the entity authenticated. The math involved in doing this is detailed but straight forward, and it is also easy to understand how the protocol works without revealing any information. You can read about it any standard cryptography textbook.

 

SASL Mechanisms

With that backgrounder, let’s come back to the issue of SCRAM-MD5 where we started. Basically SASL is a framework for entity and data authentication supporting multiple mechanisms including anonymous, plain-text, one-time-password, CRAM, Digest and now SCRAM. Let’s quickly examine these (please do not take the descriptions below as accurate – they are greatly simplified):

  1. Anonymous: As the name suggests, the purpose of the mechanism here is for an entity to be able to use the service without proving its identity. The claimant basically sends a single string indicating that it wants to access the service anonymously. The server if it has that capability (which typically is checked by the client before sending the anonymous auth request) allows the entity to use the service anonymously.
  2. Plain-text: The plain-text mechanism is expected to be used over a secure pipe provided by a lower layer which takes care of confidentiality and integrity. Basically, the client sends over the username and password in plain text over TLS. The problem here is that in the absence of a secure lower layer, the entity can be easily compromised. Also, there is no mutual authentication.
  3. OTP: One time passwords are a simple concept:
    • The user provides the password.
    • The client receives a non-secret seed from the server and appends the password with the seed
    • The result is passed thru a hash function (typically MD5, SHA1 and MD4 also allowed) and then reduced to 64-bits. This result is called S
    • This secret S is now run thru the hash function N-times, then N-1 times, N-2 times, and so on to produce multiple one time passwords.
    • Every time authentication needs to be done, the OTP along with the sequence number and the hashing function used are send over to the server
    • The adversary cannot invert the hash function and hence cannot recover the original password
    • The server applies the same hash function the same number of times and if it gets the same data, authenticates the user.
    • Good: makes replay attack difficult
    • Bad: Server stores password, no mutual auth. Open to dictionary attack. Hard to maintain sequence in the event of communication failure. Open to a pre-play attack.
  4. CRAM:
    • Server sends a challenge string
    • Client responds with username followed by a message digest created by hashing the challenge string using the password as a key
    • Server performs the same computation on its side, and if it obtains the same message digest, authenticates the user
    • Good: Not open to replay. No disclosure of password
    • Bad: No mutual auth. Open to a dictionary attack if a CRAM exchange is captured. Server needs to store passwords (hopefully encrypted).
  5. Digest:
    • Server stores one way hash of username and password H1 = Hash(username, password)
    • On request of a resource, server calculates hash of resource and the method required to access it H2 = Hash(resourceURI, method)
    • Server sends a nonce to client
    • Client calculates H1 = Hash(username, password), H2 = (resource URI, method)
    • Client generates Response = Hash(H1, nonce, H2) and sends it back to the server
    • Server does the same calculation at its end and if both are identical, the user is authenticated.
    • Good: Server does not store password, but a one-way hash
    • Bad: Usage of MD5 as a hashing function. Many security settings are optional. Open to a man in the middle attack.
  6. SCRAM: It is tedious to describe the protocol in brief since it involves multiple repeated computations. Best to read the draft. The key improvements over Digest and CRAM, etc. are:
    • Server stores salted one way hash of password
    • Mutual authentication
    • Channel bindings
    • Can be used as a GSS API mechanism as well. GSS API is like SASL – a generic security framework.
    • Server cannot impersonate the client to other servers
    • Permits the use of a server-authorized proxy without requiring the proxy to have super-user rights with the backend server

I am mentioning the last two points in good faith. Have not studied the protocol deeply enough to arrive at how these two conclusions are valid.

20 April

Crypto 101

Most developers whom I have come across, lack a solid grasp of the fundamentals of cryptography.  When a developer who does not understand crypto needs to use crypto, several things can go wrong:

•    Not understanding the implications of using some crypto technology in the code
•    Not realizing where to use crypto
•    Not implementing crypto correctly and hoping that the implementation is correct
•    Not implementing crypto correctly, but feeling secure because “we have used cryptography”
•    Not using crypto at all

The unfortunate part to this whole situation is that Cryptography is not hard to understand and most of the perceived complexity is in areas where an application developer would typically never venture: designing an algorithm or a protocol, cryptanalysis, etc.  In most cases, one can use one of the modern crypto libraries and go by with a minimal level of understanding of the principles involved. The aim of this post is to provide that minimum background required to start using crypto effectively in day to day work.

 

1. Introduction
While Cryptography can serve in multiple ways, its most common usage – which also illustrates most of its principles – is secure communication. To understand the notion of secure communication, let’s bring in our cast:

This story is about two nice, genuine people - Alice and Bob - who are in separate locations and want to have a conversation. The third character in our story is Eve who does not like Alice and Bob and is their adversary. She wants to intercept this conversation, find out what Alice and Bob are talking about, and if possible, disrupt the conversation by modifying the messages between them or by impersonating Alice or Bob.

Now Alice and Bob are aware of Eve and to prevent her from accomplishing her purposes, they set up a pipe which goes from Alice to Bob. Whatever message Alice drops in the pipe at her end would be received by Bob at the other end and vice versa. The pipe is opaque and indestructible so Eve cannot look at the messages nor modify them.

Such a pipe or channel then has the following properties:

•    Confidentiality: Contents of messages passing between sender and receiver cannot be seen by the adversary since the pipe is opaque.
•    Integrity: Contents of messages cannot be modified since the pipe cannot be penetrated
•    Authentication: The pipe, being indestructible, does not allow anyone else except Alice and Bob to send messages. So Eve cannot masquerade as Bob.
•    Non-Repudiation: Since only Alice and Bob can send messages on the pipe, when Bob receives a message on the pipe, Alice cannot deny that she had not sent that message.

The purpose of cryptography is to establish this kind of a secure channel over a public network like the Internet.

 

2. Confidentiality using a Shared Secret
Traditionally the problem of making communication confidential over an insecure transport has been solved by Private-Key Encryption [also called Symmetric Encryption]. The idea is simple: Before exchanging messages, Alice and Bob meet in person and decide on the following:

  • An Encryption Algorithm (EA)
  • A Decryption Algorithm (DA)
  • A Shared Secret – a key (K)

[Note: Alice and Bob do not care if Eve gets to know EA or DA, as long as K remains a secret. This is a very important point in modern-day cryptography: we assume that the Adversary has the knowledge of how the algorithm works. This is called Kerckhoffs’ Principle. Mechanisms that are based on keeping the working of an algorithm secret are obscure, not secure. Such mechanisms are of historical interest only and are inadequate for today’s world.

Some people compare this with government agencies (like NSA in the US) not publishing their mechanisms. While this is true, such non-disclosure does not imply that the algorithms used by agencies like NSA are designed on the basis of algorithm obscurity. Security agencies in governments (or large IT companies creating security technology) hire top-notch cryptographers who are well aware of the Kirchhoff's Principle and design with the assumption in mind. Obscurity then is either an extra defense (however weak that maybe), or a way of protecting intellectual property rights on the mechanism. Bottom-line: modern-cryptography assumes that the attacker has knowledge of the working of the algorithm and only the key is secret.]

Now to send messages securely, Alice takes the message (called Plaintext in this context) and generates an encrypted message out of it (called Ciphertext) by using the Encryption Algorithm EA and the Secret Key K. Thus:

Ciphertext = EA(Plaintext, K)

This Ciphertext is now sent to Bob over the public network. On receiving the Ciphertext, Bob uses the Decryption Algorithm (DA) and the Secret Key (K) to generate the Plaintext back. Thus:

Plaintext = DA(Ciphertext, K)

Eve does not have the Secret Key (K) and is therefore not able to obtain the Plaintext. We have achieved confidentiality by sharing a secret.

 

2.1 Substitution Ciphers
The simplest way of understanding how private-key encryption works in practice is to consider an elementary algorithm - The Substitution Cipher. A Substitution Cipher works as follows: There is a message M that needs to be sent. To encrypt the message, characters in blocks of n characters are taken, and substituted by a different set of characters picked up from the Alphabet, differing from the original set by a function of K.

The idea is not new. It’s most famous usage was by none other than Julius Caesar who replaced each character in the message by a character three characters ahead (shifting by 3) in the Latin Alphabet. This is called the Caesar-Cipher. Given that most of his enemies were not even literate, let alone thinking of breaking ciphers, it would have been fairly secure. Even as late as 1915, the Russian army was using the Caesar cipher since their troops could not learn to use more complicated ones. A trivial modern day implementation is called ROT13 (Rotation 13 – rotation by 13 positions). Since the English alphabet is 26 characters, ROT13 is its own inverse, i.e., the decryption and encryption algorithms are the same rule because of the choice of the key (13). ROT13 is used by people asking questions / puzzles, etc, on the Usenet. The standard Usenet client has ROT13 built-in.

What can Eve – the adversary – do about breaking a substitution cipher being used by Alice and Bob? Cryptanalysis - analysis of a Cryptographic Scheme with the goal of breaking it – of a substitution Cipher is rather straightforward, and by analyzing a few instances of Ciphertext produced by a substitution cipher, a cryptanalyst can easily break the cipher.

One of the simplest techniques that Eve can use for this purpose is Frequency Analysis. This is based on the fact that in any language some letters are used more than others. (In English the approximate sequence is ETAOIN SHRDLU for the top 12 characters). So by analyzing the frequency of the alphabets appearing in the Ciphertext, the Plaintext can be figured out. The earliest use of this technique was by the Arabs in the 9th century who realized that certain characters appear more frequently in the Koran than others. Several of the ciphers used in World War II were also broken using frequency analysis.

 

2.2 Transposition Ciphers
We know that substitution ciphers are not strong enough. So what are our other choices? There is another technique called the Transposition Cipher, where the characters are permutated, i.e., positions of characters are interchanged. Transposition Ciphers have also been known for a very long time and one can find a transposition cipher every day in a newspaper. We know it by the name “Anagram.” An anagram is a very simple form of a transposition cipher, and has more recreational value than cryptographic. Nonetheless, anagrams have been used by linguists and intellectuals throughout history for amusement or for hiding messages, and provided a powerful plot device in Dan Brown’s The Da Vinci Code.

Anagrams aside, the earliest example of a Transposition Cipher being used for a security purpose is a Scytale – Greek for a baton – a device used by the ancient Greeks to communicate messages during a military campaign. The working was really simple: a strip of paper was wound around a cylinder of a known diameter and a message written on the paper. The paper would then be peeled off and sent thru a messenger. The recipient would have a similar cylinder on which it would wind the paper again to read the message. Unless the recipient’s cylinder was of the same diameter, the letters would not come together properly and the message would not make sense. However, manual analysis of the letters is straightforward – letters after a specific distance need to be put together to make sense out of the message, and the strip of paper is a big giveaway.

 

2.3 Product Ciphers
Transposition Ciphers or Substitution Ciphers are both open to fairly simple Cryptanalysis, but a combination of a Transposition operation with a Substitution operation leads to a much stronger cipher. For example, a Product Cipher, alternates transposition and substitution, and carries out this core function over several iterations. The ciphers used in World War II (ENIGMA by the Germans, PURPLE by the Japanese) were Product Ciphers. Modern-day Product Ciphers include DES as well as AES (a.k.a. Rijndael). DES uses a 64-bit key and alternates 16 substitutions with 15 transpositions. AES uses a 128-bit key and alternates 10 substitutions with 10 transpositions. AES is stronger.

 

2.4 Modes of Operation
Depending on how a Symmetric encryption algorithm deals with data, we can define two distinct categories for symmetric algorithms: Block Ciphers and Stream Ciphers. In a Block Cipher, data is taken in blocks (typically 64-bit or 128-bit block) and the algorithm encrypts / decrypts the block as a whole. A Stream Cipher operates on a stream of data and encrypts / decrypts the data one-bit at a time.

 

2.4.1 Stream Ciphers
A Stream Cipher uses the bits in the secret key to perform some operation with each bit in a stream to generate Ciphertext. For example, a simple operation would be to XOR each bit in a key with each bit in the Plaintext stream. However, it is highly unlikely that the key would be as long as the Plaintext (see “One Time Pad” in section 3.1 for a discussion), so what typically happens is that the Secret Key is used to generate a pseudo-random bit-stream called keystream, and the keystream is then used to perform some operation with the Plaintext stream.

The keystream is generated using some internal state in the algorithm. For higher security, this internal state needs to be updated continuously. Based on how this state change takes place, Stream Ciphers can be classified as:

1.    Synchronous Stream Ciphers: The internal state of the cipher changes independent of Plaintext or Ciphertext.

2.    Self-Synchronizing Stream Ciphers: The internal state changes based on the previous Ciphertext digits.

Since the keystream itself is not really random, the bit sequence in the stream would repeat itself after a period. The longer the period, the more secure the algorithm.

 

2.4.2 Block Ciphers
A block cipher operates on a pre-defined block-size. Typically the block-size is 64-bits or 128-bits. So for a block cipher, the biggest question is how to deal with messages longer than the block-size. There are various modes of operation which a Block Cipher can employ. The main ones are:

1.    Electronic Codebook (ECB): This is the simplest mode. The message is split into parts and each block is encrypted / decrypted separately. This is insecure and not recommended since it is subject to two simple attacks:
•    Stereotyped Beginnings and Endings: Messages have typical headers and footers and that knowledge can compromise the key.
•    Replay Attack: The adversary does not know the Plaintext or the key, but she simply intercepts the Ciphertext and forwards it to the receiver. An example where this can hurt is authentication.

2.    Cipher Block Chaining (CBC):  In this mode, each block is XORed with the previous block before encryption. It is very commonly used. A variation on CBC is called Propagating Cipher Block Chaining (PCBC). This mode is used in Kerberos.

3.    Cipher Feedback (CFB), Output Feedback (OFB), Counter (CTR): All of these modes convert a block-cipher into a stream-cipher by taking the output from the previous block. CFB converts a CBC into a self-synchronizing stream cipher. OFB and CTR makes a block cipher a synchronous stream cipher.

All modes except ECB use output from the previous block to operate on the next block. This leads us to the question: How is the first block processed? To solve this problem, a dummy block, called an Initialization Vector (IV) is used. The IV need not be secret, but the same IV should never be used with the same key.

 

3. The Chimera of Perfect Secrecy
I mentioned that Product Ciphers are stronger as compared to Transposition and Substitution Ciphers. But on what basis does one make a statement like that? Is there a way of figuring out how secure an encryption is, or how difficult it is to break it? Till recently, most cryptography was ad-hoc without any solid theory behind it. This changed in 1948 when Claude Shannon - the Father of Information Theory and Modern-Day Cryptography - published A Mathematical Theory of Communication (I tried reading it and lost my way by the 3rd page, so this really is the interpretation of other people – unlikely to be wrong since a lot of very learned people say the same thing). What he showed was that a secure encryption system could exist only if the size of the Secret Key was as large as the total size of the messages that would ever be exchanged using that secret key.

3.1 One Time Pad
An example that illustrates Shannon’s principle is the One Time Pad (OTP). The idea is this: suppose Alice has Plaintext of length N characters. She now generates a key randomly, also of length N and uses it to generate the Ciphertext. The key is used only once, and for the next message, a new key is generated, again equal to the length of the message. This is mathematically proven to be unbreakable. (Since the key can be used only once, it is called “One Time”, and “Pad” because the key used to be on a slip of paper torn from a Pad on which random text was printed).

The problem with OTPs is key distribution – it is logistically tough and prone to security breaches. So while OTPs in themselves are perfectly secure, their implementations are vulnerable because of the insecurity involved in key distribution. OTPs are typically never used, except in highly critical applications where the parties involved are willing to undertake the key exchange challenge.

The reason I brought up OTPs is to point out that perfect security – that can defeat unlimited computing power - is not always practical. Modern-day Cryptography does not chase this chimera. Its goal is not to provide security in the face of infinite computing capacity. Instead, it assumes that the adversary has (reasonably) limited resources. What makes modern-day Crypto work is the gap between the computational ability it takes the legitimate user to encrypt vs. the computational infeasibility of decryption for the adversary who does not have the secret.

3.2 Computational Complexity Theory
Whether or not a computation is feasible can be found out if we know the cost involved in making the computation. The discipline of computer science which deals with cost of computation is Computational Complexity Theory. The theory measures the cost of making a computation in terms of the space (quantity of information storage) and time it takes to make the computation. Here it is important to note that when we think of computation, we are not thinking about a specific computation model like the Von Neumann Architecture on which modern PCs are based, or a Turing Machine, but any arbitrary model.

3.2.1 Algorithm Efficiency
Before we get deeper into the theory, a quick word about algorithm efficiency: An algorithm’s efficiency is defined as the amount of the time it takes the algorithm to compute the problem, as a function of the problem size. (The problem size measures the size of the input to the program. For different problems, problem sizes are differently defined, and getting it right takes some familiarity with complexity theory.) So for example, the time taken could be:

T(n) = n3 + n2 + 1

Where n = problem size.

Now as n gets arbitrarily large, the value of (n2 + 1) becomes insignificant as compared to n3. So T(n) = n3 for very large n. Such an algorithm then has complexity of the “order of n cube”, and the algorithm is called a O(n3) algorithm.

An algorithm is considered to run in polynomial time if T(n) = O(nk) where k is a constant. Example include

•    Constant time: O(1) [k = 0].
•    Linear time: O(n) [k = 1]
•    Quadratic time: O(n2) [k = 2]
•    Cubic time: O(n3) [k = 3]

An algorithm running in polynomial time would be more efficient than an algorithm running in exponential time: T(n) = O(kn)

 

3.2.2 Complexity Classes
In general, complexity theory looks at the universal set of problems in terms of what are called Complexity Classes. A complexity class is a set of problems that are of related complexity, as in, the algorithm efficiency for solving that set of problems can be expressed by T(n) = O(f(n)).  As can be expected, there are a lot of complexity classes, but the ones of biggest interest to us are called P and NP classes.

 

3.2.3 Class P Complexity
A problem that can be solved in polynomial time is said to have Class P Complexity. A number of problems fall under Class P complexity:

  • Finding the greatest common divisor
  • Determining if a number is Prime
  • Linear Programming
  • Shortest Path
  • Sorting
  • Modular Exponentiation MOD(x, y, z) = xy  MOD z

From the perspective of Complexity theory any algorithm that runs in polynomial time is considered as being efficient (or feasible), and any algorithm that does not run in polynomial time as being intractable or infeasible. So any computation in the class P-Complexity is considered efficient. However, this is only in theory, it may not be so in real life. For example if the value of k is 10000, T(n) = O(n10000) is certainly a very inefficient algorithm.

 

3.2.4 Class NP Complexity
Put informally, problems of class NP complexity can be defined as ones having a verifier function that can be solved in polynomial time. To understand this better, consider the problem of finding out whether a number is composite or not. Doing this for large numbers is really hard, with T(n) = O(kLog n) for the most efficient algorithms. However, once we have a candidate factor of n (say x), finding if x is actually a factor or not is trivial:

bool IsFactor(int n, int x)
{
    if (n % x != 0 || x == 1 || x == n) return false; else return true;
}

Since the verifier function (IsFactor) is a class P algorithm, the problem of whether n is composite or not is a NP problem. All NP problems have a verifier function which accepts an input and a verification value, which can be computed in polynomial time. A number of problems are Class NP Complex, the most common being the Factoring problem and the traveling salesman problem.

The hardest problems in Class NP complexity are said to be NP Complete. An example is the problem of multi-processor scheduling: Find the minimum time required to schedule n jobs (of varying lengths) on m processors such that no two jobs overlap.

 

3.2.5 Is P = NP?
Consider the case where a P class problem also has a P class verifier. Would this problem be a class NP problem or a class P problem? The simple answer is P is a subset of NP. So every P problem is a NP problem, but the reverse may not be true. However, some researchers believe that any problem whose verifier can be solved in polynomial time, can itself also be solved in polynomial time. In short, P = NP.

Whether P is equal to NP is one of the most important unsolved problems in modern mathematics and if it can be proven that all class NP problems are actually class P problems, the implications would be stunning, with a huge impact on our society (for starters, most of modern-day Cryptography would be rendered useless)

 

3.3 Information Leakage
While the computation of the Plaintext from the Ciphertext may be infeasible (NP Complete), what if it is feasible (P Complex) to compute part of the Plaintext? For example, what if the length of the Plaintext can be known? Does that compromise the security of the message? Assume a situation where Eve expects Bob to ask Alice whether she would marry him. The answer which Alice would return is a simple yes / no, represented by a single bit. Now if the Eve is looking at the Ciphertext going along the wire, and discovers a single unit moving across, she may deduce that Alice has replied to Bob’s question.  What now is the probability of Eve finding out the correct answer? 50%. Much higher than what a secure communication may require.

So from a Cryptography standpoint, when we look at computational feasibility, we need to look at it not just from the perspective of the entire data becoming available, but also how feasible it would be to recover part of the information.

 

4. Cryptographic Primitives
Modern-day cryptography - based on computational complexity – is built using certain Cryptographic Primitives. What is a primitive? A primitive is an atomic operation (from the perspective of the layer above it), a building block, which does a certain defined task. Cryptography requires primitives that have certain computational hardness. All computational complexity in Cryptography comes from these underlying primitives. Understanding the primitives is the key to understanding Cryptography. While you would not have to typically deal with how Protocols are constructed, a basic understanding is important to get the fundamentals right.

 

4.1 One Way Functions
The simplest of the primitives is a One-Way Function. A one-way function is any function that is easy to compute but hard to invert. A mathematical function that is hard to invert is multiplication: Given integers P and Q, N = P x Q can be easily calculated, but given a random N, how would one factor it into P and Q? Sure, we can produce all the factors. But getting P and Q specifically is probabilistic. The higher the value of N, the lower is the probability of getting P and Q back quickly and consistently. This is called the factoring problem, and it is a good example of a function that is efficient to compute, but infeasible to invert.

Today we have no rigorous mathematical proof that what we consider one-way functions, are indeed one-way. What that means, is that today we are not sure whether there exist inversion methods that have lower computational complexity than the ones we are aware of. Just because a lower-complexity inversion has not yet been discovered, does not mean that they cannot exist. So cryptography is built on assumptions. So far the assumptions have been valid.

 

4.2 Hash Functions
One of the most important one-way functions is a Hash Function. A Hash function takes an arbitrary number of bits as input and produces a smaller number of bits based on the input. The algorithm used is deterministic (given the same input, it would always pass thru the same steps and produce the same output), so for the same input bits, it would always generate the same output bits. In context of hash functions, the input is often called a message and the output is called a message digest or a digital fingerprint. So:

Digital Fingerprint / Message Digest = HashFunction(Message)

A hash function is obviously one-way. That’s because a hash function is lossy – the number of output bits is less than the number of input bits so there is not enough information in the output to calculate the input. However, this alone would not make hash functions useful. What makes hash functions useful is that they are collision resistant and create an Avalanche effect.

A collision is a situation where the hash function generates the same output for two different inputs. A good hash function is collision resistant. Once again, if you consider the fact that the number of output bits is less than the number of input bits, it is obvious that there is no way of avoiding a collision. A simple way of approaching this is to think about the input as a 64-bit number and the output as a 16-bit number. Now there are 18,000 trillion 64-bit numbers (264 = 18,446,744,073,709,551,616) 64-bit numbers and 65,536 16-bit numbers. As we keep reducing each number in the 18,000 trillion-range to 65,536 numbers, there are bound to be collisions.

So a hash function cannot be collision resistant. However, we can reduce the probability of collision if we choose a large enough output range and a good algorithm. For cryptographic purposes, we define collision resistance as: It should be computationally hard to find two inputs m1 and m2 so that their output h = HashingFunction(m1) = HashingFunction (m2). The other important property for a cryptographically secure hashing algorithm is that it should be Pre-Image Resistant: Given an output h, it should be computationally hard to find an input m so that h = HashingFunction(m).

The other desirable property of a good hash function is what is called the Avalanche Effect. The idea is that if the input changes even by a small amount, the output changes dramatically. The Strict Avalanche Criterion states that if one bit is flipped, each of the output bits should change with a probability of 0.5, in other words, half of the output bits should flip.

 

4.3 Random Numbers
The other important family of Cryptographic primitives is Random numbers and their generators. Recall Kerckhoffs’ Principle – the only secret is the key, not the algorithm. Now, we may keep the key a secret, but what if someone guesses the key? So we want to make the key harder to guess. A “guess” can even be an automated attack where the program tries all possible permutations. (This is called a brute-force attack) To reduce the possibility of such an attack from being successful, there are two things that are needed:

1. The key should be long. The longer the key, the stronger the encryption
2. The key should be unpredictable. The more random a key, lesser are the chances of it being predictable.

Point one is easy to understand, but point two is a little ambiguous. What is “more random?” Without getting into a philosophical discussion on randomness, I would just like to say that a random number is one which is generated in such a way that its probability of being generated is as high / low as any other number, and there is no way of determining when the number would be generated.

Generation of random numbers depends on the initial input on which a mathematical calculation would be performed. But unless that initial condition has randomness in itself, the output would also not be in random. This initial randomness is called an Entropy Source. Based on the entropy source, Random Number Generators (RNGs) can be classified as:

1. RNGs based on Deterministic Random-bit Generators (DRBGs): DRBG means that the source of entropy is deterministic. Modern computers are all deterministic (as compared to quantum computers which are non-deterministic). If the entropy source is deterministic, the numbers generated cannot really be random. This means that a software-based generator does not generate truly random numbers. Hence we call the numbers generated by such a generator Pseudo-Random Numbers, and the generator itself is called a Pseudo-Random Number Generator (PRNG). All software based RNGs are PRNGs.

2. RNGs based on Non-Deterministic Random-bit Generators (NDRBGs): NDRBG means that the source of entropy is non-deterministic. Such RNGs are typically hardware based and use a physical phenomenon as a source of entropy. Examples of physical phenomena used by NDRBGs include photoelectric effect, thermal noise, or some other quantum phenomena. Quantum phenomena in theory are unpredictable and therefore a reliable source of entropy. We call the numbers generated by such a generator Truly-Random Numbers, and the generator itself is called a True-Random Number Generator (TRNG). TRNGs are rather expensive and are needed only for very high security scenarios.

How does one find out if a PRNG is good enough to be used in Cryptography? A random number generator would typically generate numbers in a given range. This leads to two important properties:

• The distribution of values of the numbers generated in the range. The more even the distribution, the better.
• Ease of prediction of numbers based on a given set of numbers from the generator. The more difficult the prediction, the better is the PRNG.

In order to have these properties, a PRNG needs two things:

1. A random bit stream from a source of entropy. There is some entropy even in a deterministic computer. Examples include Clock Tick count, and values of System Handles (like Thread handle, Process handle, Network Card Id, etc.)
2. A mathematical calculation that removes the bias from a bit-stream and makes it more randomized. Examples include hashing algorithms like MD4, SHA, etc.

For the second point, there is a standard by a body called FIPS (Federal Information Processing Standard) which can be found in FIPS 186-2.

A function catering to the above two requirements is called a Cryptographically Secure Pseudo Random Generator (CRNG). Typical random number generators like the C++ rand() are not CRNG and they should not be used in Cryptography.

 

5. Confidentiality without using a Shared Secret
Coming back to the problem of message exchange between Alice and Bob, the situation so far is as follows:

  • Prior to exchanging messages, Alice and Bob agree upon a large random key. The longer and more random the key, the more secure the encryption. The key is private to Alice and Bob – it is a shared secret between them and not disclosed to a third party.
  • The sender then uses a sophisticated encryption scheme (like a Product Cipher) to encrypt the message. The strength of the algorithm being determined by how efficient the algorithm is for the sender / receiver to execute, but computationally hard for Eve (the Adversary) to invert. The bigger this difference, the better the algorithm.
  • To decrypt the message, the recipient uses the secret key to reverse the computation carried out to encrypt the message. The shared Secret makes the reversal of the computation efficient for the recipient and gives him the computational advantage over the Adversary.

The only problem in this situation is the initial condition – the sharing of the secret key. Since the security of the scheme derives from the key (recall Kerckhoffs’ Principle), its sharing in a secure way is a pre-requisite. The question is how does one share the secret in a secure way without meeting in person? There has to be some way of transferring the key in a secure way too. But wouldn’t secure transmission of the key also involve some sort of encryption? And in that case, wouldn’t we need a key once more? The problem seems to be recursive. But we do know that the problem has been solved, and the key to solving the problem (pun unintended), lies in the symmetry of the situation.

 

5.1 Diffie-Hellman Key Exchange
The first time a solution was published for solving the problem of key sharing was in 1976 when Whitfield Diffie and Martin Hellman proposed The Diffie-Hellman Key Exchange:

  • Alice and Bob agree to use a prime number p (say 11) and a ‘generator’ g (say 5). Now these numbers are well established and everyone (including Eve, the Adversary) is aware of it.
  • Hereafter, the following actions take place on Alice’s and Bob’s sides (MOD is the modulus / remainder operation):
Alice Bob
Alice chooses an integer 'a' at random. Say 2. Bob chooses an integer ‘b’ at random. Say 3.
Alice computes (ga MOD p) and sends it over to Bob. (52 MOD 11 = 3) is sent over. Bob computes (gb MOD p) and sends it over to Alice. (53 MOD 11 = 4) is sent over.
Alice now computes (gb MOD p) a MOD p. So Alice computes 42 MOD 11 = 5. Bob now computes (ga MOD p) b MOD p. So Bob computes 33 MOD 11 = 5.
Alice obtains 5 Bob obtains 5
Shared Secret = 5 Shared Secret = 5

Hence both Alice and Bob are able to arrive at a shared secret. (Of course in a real implementation, a, b and p will be very large numbers)

What makes D-H work is the fact that gba = gab. This leads to:

(ga MOD p) b MOD p = (gb MOD p) a MOD p

But the big question is what prevents the Adversary Eve from also carrying out the same computations given that she also knows p and g? We need to give Alice and Bob some advantage over Eve for this shared secret to be limited to just Alice and Bob. The advantage Alice and Bob have is that they have direct knowledge of a (or b) while Eve is looking at ga (or gb)

Eve needs to calculate gab given g, ga, and gb where g belongs to a group defined as (Integers MODULO prime). This is called the Diffie-Hellman Problem (DHP). A fast means of solving this problem would break Diffie-Hellman Key Exchange (and a number of related schemes). The fastest way of solving it today is to find ‘a’ given ga. This is called the Discrete Logarithm Problem (DLP). There is no known efficient algorithm to solve DLP as of today. Why is that so? We have no rigorous mathematical proof; just that no efficient algorithms have so far been discovered. But to understand the nature of the problem, let’s try and break this down further.

Let’s start by thinking about what is known to Eve. When Alice sends across her first computation to Bob, Eve knows:

1.    p
2.    z
3.    N = (ga MOD p).

Now given this information, Eve needs to calculate ‘a’. Note that MOD is not a symmetric operation. Meaning that given (x = y MOD z), and given x and z, you cannot carry out a mathematical operation that yields y.

The values of y can be = (z * n) + x, where n is a natural number. Hence there is an infinite set of values that can be taken by y.

Applied to our case, when Eve looks at (25 MOD 11), she is looking at 3. And she knows ‘p’ is 11. So ga can be = (11 * n) + 3 where n is a Natural number. Given that g is known to be 5, our equation becomes:

5a can be = (11 * n) + 3.

Now Eve has the unenviable job of calculating a. How easy is this computation? A trivial approach would require Eve to run thru all powers of 5 and compare each with (11 * n) + 3 for several iterations of n. Are there multiple solutions? That depends on values for g, p. Let’s assume that there is a single solution only. But even if there is a single solution, if we choose p, a, and b to be very large integers (g can be small. It is usually 2 or 5), the calculation is a very tough one for Eve, and takes a lot more computational power than it would take Alice or Bob to calculate the Shared Secret. This asymmetry is what makes Diffie-Hellman work.

Note: These days, it is recommended that p be at least 512-bits long. For more security, use 1024-bits.

 

5.2 RSA
Another approach to solving the problem of symmetric encryption was published in 1977 by Ron Rivest, Adi Shamir and Len Adleman at MIT, and they called it RSA after their initials. The approach taken in RSA is that the secret is not required to be shared at all. In RSA, the encryption and decryption operations are done using different keys. The mechanism is:

  1. Alice generates a pair of keys. Let’s call them K1 and K2.
  2. The algorithm is such that the message is encrypted using K1 and decrypted using K2. Message encrypted using K1 cannot be decrypted using K1; you need K2 to decrypt the message. This is called asymmetric encryption.
  3. Alice makes K1 public: Bob, Eve and the entire world now has K1. We call K1 Alice’s Public Key. Bob (or anyone else) can now encrypt a message using Alice’s public key and send it to her.
  4. K2 stays with Alice as a secret. This is Alice’s Private Key, used to decrypt the message. Only Alice, and no one else, can read the messages that are encrypted using her public key.

Bob also does a similar thing. He also creates a private-public key pair and publishes the public key. Alice can now send messages to Bob by encrypting the messages with his public key. Only Bob would be able to decrypt the messages (using his private key). We have solved the problem of the secret key by removing the symmetry of the model.

The interesting part is the algorithm itself which requires one key for encryption and another key for decryption. Again, the math involved is straightforward, just that the calculations involve some huge numbers which means that the computing power required is rather high.

As usual, a cryptographic scheme works if the legitimate users can compute efficiently while the adversary’s computations are complex. So the questions we should ask for RSA are:
1. How complex is to compute the private key using the public key.
2. How efficient is to encrypt / decrypt messages using asymmetric algorithms.

We will not get into the details here, but take it for granted for now that:

1. Computation of the private key from the public key is computationally hard. No simple algorithms exist as of today and the inversion is infeasible. Of course this is an assumption based on knowledge that we have today. So far the assumption has been valid, and that is why RSA works!

2. It is not very efficient to encrypt / decrypt messages using asymmetric algorithms. For one, RSA calculations involve really long numbers. So while a typical symmetric algorithm would be secure with a 128-bit key, the minimum key-size required to achieve security in RSA would be 1024-bits. Obviously the computations on such large numbers would be really slow. Hence it is rather inefficient to encrypt / decrypt large amounts of data using RSA. So the way RSA based cryptography really works is that the only data that is encrypted using asymmetric encryption is the key for a symmetric encryption. Thus:

  • Bob generates a key for symmetric encryption and now needs to share this key with Alice
  • Bob has Alice’s public key which she has published for the whole world to send secrets to her
  • Bob encrypts the shared key using Alice’s public key and sends it over to her
  • Alice decrypts the symmetric key sent by Bob using her private key
  • Now Alice and Bob both have the secret key and they can use symmetric encryption
  • The actual data is encrypted / decrypted using symmetric encryption

 

6. Integrity
So far we have assumed that Eve is a passive spectator, but what if she is more malicious, more powerful? Consider the scenario where when Bob asks Alice if she would marry him or not, Eve is able to send a response (No). If Bob does not know that the message had been tampered with and that the message he received was not by Alice, but by Eve, we have a classic Hindi film situation. To avoid this, Bob should be able to establish that the message was not modified on the way, that it originated from Alice, that the message is authentic.

This is called Message Integrity. In fact, integrity is often a more critical requirement than confidentiality. We are ok if an adversary discovers our contents of our message, what we certainly do not want is the adversary modifying the contents.

6.1 Encryption does not give Integrity
Prima Facie, it seems that encryption automatically gives integrity. The argument goes as follows:

  • Alice prepares Ciphertext out of Plaintext and sends it over to Bob
  • In the regular scheme of things, Bob would receive the Ciphertext and get Plaintext out of it.
  • In the attack scenario, Eve intercepts the message and modifies it. Now Bob gets the modified Ciphertext.
  • When Bob decrypts the modified Ciphertext, he gets garbage. This means that the message is no longer authentic.

Now though it seems correct, this argument is fallacious. For example, if Eve knows that Alice is responding to Bob’s message of whether she would marry him, Eve may simply flip the bits on the response being sent by Alice. Since the response was supposed to be a yes / no, flipping the bits of the Ciphertext may also flip the response. This does depend on the strength of the encryption algorithm, but the possibility cannot be discounted. So in principle we cannot assume that encryption provides integrity.

Proving the fallacy of the “encryption provides integrity” argument in general requires that we look at some math. We will avoid that right now. Suffice to say that if Eve has some information about the nature of the data being sent, and the algorithm being used, she can modify parts of the Ciphertext in such a way so that the decryption of the Ciphertext actually yields wrong Plaintext which is meaningful to Bob.

Since encryption does not provide Integrity, we need some additional mechanisms. There are two popular solutions: Machine Authentication Codes and Digital Signatures.

6.2 Machine Authentication Code
The simplest way of solving the data integrity problem is called a Machine Authentication Code (MAC). The idea behind MACs is that we send a tag along with the message. The tag is a function of the key and the message. If the message gets modified, a verification check will reveal that the tag can no longer be computed using the message and the key. This means that the message has been tampered with. MACs work as follows:

  • Alice needs to send a message (M) to Bob. Confidentiality may or may not be a goal, but integrity certainly is.
  • For achieving integrity, Alice and Bob have a shared secret key K
  • Alice uses a “Tagging Algorithm” to generate a Tag T using the Message M and the Key K: T = TaggingAlgorithm(M, K)
  • The message M is then sent (maybe encrypted, maybe as Plaintext) along with the tag T to Bob
  • Eve intercepts the message, modifies it and forwards it Bob
  • Bob uses a “Verification Algorithm” to check if the Tag can be calculated using the Message M. If yes, the message is authentic, otherwise it is not.

The Verification Algorithm may look like: bool VA(M, K, T)
{
  T2 = TaggingAlgorithm (M, K) ;
  if (T2 == T) return true; else return false;
}

While the mechanism is similar to that in symmetric encryption, the goals here are totally different. In symmetric encryption, the message being sent – the Ciphertext – contained the entire information of the Plaintext in an encrypted form. The way the Tag is different from the Ciphertext is that the Tag does not represent the complete information of the Plaintext and is a much smaller value, a number, which is computed using the Plaintext and the secret key K, and is sent along with the message (in Plaintext or Ciphertext).

Note that symmetric encryption and MACing are not mutually exclusive. For the same Plaintext, an Encryption Algorithm, along with an Encryption Key K1 can generate Ciphertext, while a Tagging Algorithm along with a Tagging Key K2 can generate a Tag. So we can achieve both confidentiality and integrity. It is just that in this scenario, Alice and Bob have to share two secret keys.

Now this does seem secure, but there are a couple of interesting things that can still happen. The simplest one is what is called a Replay Attack. In a replay attack, Eve watches a message M along with a tag T being sent by Alice to Bob. She picks a copy of the message from the wire and keeps it with her. Sometime later, she sends the same copy over to Bob. How can this become an attack? What if the message Alice sent was: “I borrowed Rs. 1000 from Eve. Please credit Rs. 1000 in Eve’s account.” Every time Bob receives this message, he transfers Rs. 1000 to Eve’s account. Alice wanted to send this message only once, but Eve carries out a replay attack and gets richer. Another example is that of a user sending over a username + password combination to a web-server and the server sending over a cookie to denote a successful login. If a hacker intercepts the packet being sent over to the server and later does a replay, the hacker will also get the cookie, without knowing the username and the password.

 

6.3 Digital Signature
In MAC, we solved the Integrity problem by giving both Alice and Bob a secret key using which they could generate a Tag. But let us not forget that it is not easy to share the secret key in the first place. How then do we solve the integrity problem without sharing the key? The answer lies in what is called a Digital Signature.

Recall from our discussion on RSA that we addressed the confidentiality requirement by having the receiver publish its public key. The sender would use this public key to encrypt the data and send it to the receiver who would decrypt it using its private key.

Digital Signatures address the integrity requirement by having the sender publish its public key. To send a message, the sender generates a Signature using his private key and the message:

Signature = SigningAlgorithm(Message, SendersPrivateKey)

He then sends the Signature along with the message. The receiver uses the sender’s public key to verify whether the Signature is valid or not:

Yes / No = VerificationAlgorithm(Message, Signature, SendersPublicKey)

As you notice, Digital Signatures are similar in principle to MACs, except that the setting here is asymmetric. Once again, as with MACs, the confidentiality and integrity requirements are not mutually exclusive. We can achieve both.

An important point of difference between MACs and Digital Signatures concerns non-Repudiation. When Alice sends a message to Bob, we don’t want Alice to be able to deny the fact that she actually sent the message and not someone else. MACs are not able to create Non-Repudiation. If Bob can verify a tagged message, he can also produce a tagged message (because the key is the same). So Bob cannot drag Alice to a court of law, produce a tagged message and claim it was sent by Alice. Digital Signatures on the other hand do provide Non-Repudiation. Since a Digital Signature can only be produced by the Private Key of the sender, the sender cannot claim that the receiver concocted the evidence.

 

Summary

I have tried to cover most of what I think are essential concepts towards understanding cryptography. With this, one can now get started putting publically available crypto libraries to use. There are quite a few things I wanted to cover but have not been able to since this text grew too long and I basically got tired of working on this. These are about how typical algorithms work - Feistel networks, key schedules, S-box, P-box, etc., survey of the various standard algorithms – DES, 3DES, AES, MD5, SHA0, 1, 2, and choosing block sizes, IVs, key sizes, modes of operations, etc., and finally material on Authentication, Digital certificates, PKI, and Elliptic Curve Cryptography. Of course, even with all this info included, I would still have only scratched the surface since there is a ton more to be learned. The reading list below is therefore highly recommended:

  1. Applied Cryptography: Protocols, Algorithms and Source Code in C by Bruce Schneier: Comprehensive, but dated book. Written in a simple way which is very easy to follow.
  2. Handbook of Applied Cryptography by Menezes et al: Provides the mathematics and the fundamentals skipped in the Schneier book. Don’t let the math scare you – it is easy enough to follow if you pay attention and are willing to surf the net for bits you do not have background on. More suited if you are thinking of designing your own algorithms, etc.
  3. Cryptography and Network Security by William Stallings: Great info on hashing, choice of algorithms and design of secure networks. More suited for the application programmer.

 
counter