個人檔案Vineet Gupta部落格清單訪客留言更多 ![]() | 說明 |
Vineet GuptaMake it Simple, as Simple as Possible, but no Simpler |
||||||||||||||||
|
Thanks for visiting!
RaparlaAnil撰寫:
Hi Vineet,
I am working in Bangalore. I saw u r sharepoint web cast. It is very useful for my carrier. I have small dought in sharepoint server. How can we create host name? or we can create site port 80?. Can i get download u r Sharepoint webcasts.
5 月 13 日
Hi Vineeth
This is Phani Kumar here from Hyd working as a TL, I came to attend the Tech Mela and Attended u r session it was very much useful to my carrier. Thanks a Lot for that, I send a friend request if u dont mind plz accept me . Regards Phani Kumar PB
9 月 24 日
JayarajNS撰寫:
Hi Vinneth
This is Jayaraj here from Chennai working as a Developer, I came to attend the Tech Mela and Attended u r session it was very much useful to my carrier. Thanks a Lot for that, I send a friend request if u dont mind plz accept me . Regards Jayaraj
9 月 19 日
|
11 June Notes on XMPP Part I - FundamentalsWe recently augmented the team working on our desktop product. At the core of the product is XMPP – the protocol that drives several instant messaging servers and clients, sites like Chesspark and now Google Wave. Since XMPP is not known by many people, let alone be understood well enough, every time we on-board someone new, they have to go thru a steep learning curve. This post is an attempt to make it easier to understand the protocol .
What is XMPP XMPP generally refers to a collection of specifications that define protocols for real time interactions over the public internet. The core set of specifications has been standardized by IETF. While the first application (and the origin) of the protocols was to address instant messaging, the extensible nature of XMPP has led it to be used for a wide range of applications which need real time communication.
Architecture XMPP has a decentralized client-server architecture (like the WWW) – there are several hundreds of XMPP deployments, each running anywhere from one to hundreds of servers to which millions of clients connect. Key aspects:
Addressing: Each interacting entity on XMPP needs to have a unique address. This address is called a Jabber Id (JID). A JID consists of three parts: user-identifier: Unicode string representing the interacting entity. For example: user1 domain-identifier: Unicode string representing the domain of the interacting entity. For example: directi.com resource-identifier: Unicode string representing a resource used by the entity. For example: pwdesktop A full JID looks as follows: <user-identifier>@<domain-identifier>/<resource-identifier>. For example: user1@directi.com/pwdesktop. An identifier without the resource is called a bare JID: user1@directi.com. Another way of addressing is to use XMPP URIs: xmpp:user1@directi.com
Communication: The original purpose of XMPP was to do instant messaging. This requires the server to be able to push messages out to the client on a real time basis. Over HTTP, this requires the client to poll the server, or a technique like Comet where the server caches HTTP requests and then sends responses on those requests. XMPP however uses a long lived TCP connection. This gives the server an always on channel to push info to the client. The client also need not wait for the server to respond to its messages, but can instead send an indefinite amount of messages to the server without blocking, which the server can then respond to, enabling an asynchronous kind of communication. This is further helped by the fact that XMPP does not require every packet being sent over the wire to be acknowledged. An entity assumes a packet to be delivered unless it receives an error.
Protocol: XMPP uses streaming XML over a long lived TCP connection (though HTTP is also possible thru BOSH – see later) for communication. There is one stream from client to server and another stream from server to client. To start communicating, a client would send an opening XML tag: <?xml version="1.0"?> This is followed by a stream element – this marks the beginning or root of the document: <stream:stream> This in turn is followed by various messages sent across as XML elements. These elements are called stanzas. These stanzas can continue to get exchanged between the server and the client endlessly till a closing stream tag is sent which marks the end of the communication. There are three kind of stanzas: 1) Message: The <message/> stanza denotes the basic push method for sending stuff from one entity to the other. These need not be acknowledged and provide a quick fire and forget mechanism to send info from one point to the other. A Message has a “to” and a “from” attribute denoting the receiver and the sender, and can include one or more payloads. In IM conversations, the payload is often HTML markup for richly formatted text (defined by XHTML-IM) 2) Presence: The <presence/> stanza denotes a broadcast sent out by an entity to advertise its availability to other entities who have subscribed to receive these updates from the advertising entity. Presence messages can also include a payload. Common uses are to include an availability state like “away,” “busy,” etc. and personal status messages like “Working on blog post 1 of 6.” 3) Information Query: The <iq/> stanza provides a Request-Response mechanism like HTTP verbs GET, PUT and POST. The payload defines the request of the sender which needs to be processed by the receiver. This is the only stanza type where the sender expects a reply – a result or an error. This makes IQ more reliable than a Message, allowing the two entities to carry out a structured interaction. Common examples of IQ usage in IM applications are to fetch the the Roster, and add / remove entries in the Roster. A closing stream tag indicates end of conversation.
Decentralized client-server architecture XMPP servers talk to each other directly. So if user1@domain1.com needs to interact with user2@domain2.com, the servers at domain1.com would interact with domain2.com servers directly. This is different from email where communication between servers on different domains happens thru hops (which can lead to address spoofing and other issues) or HTTP where servers do not interact with each other at all.
Creating a XMPP Session user1@domain1.com wants to talk to user2@domain2.com. To accomplish this:
1) Client creates a TCP session with the server: The client used by user1 needs to find out the box that hosts the XMPP service for domain2. This is accomplished by doing a DNS service lookup by checking the DNS SRV record which maps the service to the machine name and the port of the service (5222 by default) becomes known, a AAAA lookup gives the IP of the machine. With the IP and the port we can now open a TCP connection.
2) Client and Server start streams in opposite directions: a) The client sends across the opening XML text declaration tag (optional) followed by an initial stream header: <?xml version="1.0"?> b) The server sends back a response stream header with a unique stream id: <?xml version="1.0"?>
3) Client and Server negotiate Stream Features: Right after sending the response stream header, the server send across a <stream:features> message on the features it supports. These features are typically about: a) Whether server supports TLS or not (recommended) b) Authentication mechanism supported (see my earlier blog post on authentication mechanisms) – typically SASL plaintext and digest-md5 are supported. Ideally one should not use plain text without TLS since in that case the password is sent in clear on the wire. c) Stream compression (optional) At this point, the client uses <iq/> stanzas to negotiate which features it wants to use, and if TLS is used, enter into a TLS negotiation, or if SASL is used, authenticate via the appropriate SASL mechanism.
4) Post Authentication Stream Negotiation After authentication, the server resets the session by sending a new stream header with a new stream id. This is done for security purposes. This new stream does not publish any authentication features (since that is already done), but now publishes new features. These typically include: a) Compression support b) Resource binding c) Formally starting an XMPP session At this point, the client again uses <iq/> stanzas to negotiate which features it wants to use, and once that part is over, the actual task of application specific stanza exchange can start between the client and the server.
BOSH I earlier mentioned that XMPP can work over HTTP as well. This seems counter-intuitive: XMPP requires push, and HTPP is pull based (client sends a request and server responds). However, it turns out that one can do push over HTTP as well – the technique for using XMPP over HTTP is called Bidirectional-streams Over Synchronous HTTP (BOSH):
1) There is a server in front of the XMPP server which handles HTTP clients. This is called a BOSH connection manager (CM).
2) Client sends DNS query for TXT records, and discovers that there is an entry for BOSH connection which points to the BOSH server mentioned above.
3) Session Creation Request: Client now sends a HTTP POST with an empty <body/> tag with some attributes set. The important ones are: a) hold – the number of HTTP requests the BOSH server can queue. This is typically set to 1 b) wait – the timeout in seconds before which the server must respond to a pending request c) rid – a large random number that acts as the initial request id
4) Session Creation Response: BOSH server opens a regular XMPP stream with the XMPP server over a TCP connection, receives the server’s XMPP response, wraps it up in a <body/> tag and returns this to the client over HTTP. The body tag contains the following attributes: a) hold – same as earlier b) requests – max number of HTTP requests that the client can open with the BOSH server at any time. This is typically set to hold + 1. Since hold is typically 1, requests is typically 2. c) sid – a large random number that acts as the session id. This is diff from the stream id sent by the XMPP server. The client must now include the sid in every subsequent request.
5) Hereafter the client and the server negotiate stream features and authenticate pretty much in the same way as with a TCP connection, with the BOSH server sitting in between and wrapping / unwrapping the <stream> and <iq/> stanzas in <body/> tags. The XMPP application is now ready to exchange its specific stanzas.
6) The question at this stage is, how does the server do a push. Recall that hold=1 is the max number of HTTP requests the BOSH server can queue, and requests=2 is the max number of HTTP requests that the client can open with the BOSH server at any time. Assume that the last request from the client was sent to the BOSH server just around 60 seconds back (let’s say a <presence/> packet) . The server had nothing to respond because the XMPP server had no stanzas. Now since the 60 seconds timeout is about to be over, this is what happens: a) BOSH server returns a HTTP 200 ok response to the last request with an empty <body/> tag. If the client too has nothing to say, it also sends across a HTTP 200 ok with an empty <body/> tag, to which the server can again respond at the 60 second timeout. This can go on ad-infiniteum as a keep-alive mechanism. b) Now assume that the client has something to say. One request is already in the play and max two are allowed. So the client can now send the new stanza in a new HTTP request. The BOSH server immediately responds to the earlier request (which was kept on hold) with a HTTP 200 ok with empty <body/> tag. It now again has one request outstanding which it can use to either send back the keep-alive, or send back a response from the XMPP server. c) Assume that the client and the server have been playing the keep alive game. Now the XMPP server sends a stanza (say an authorization request). The BOSH server needs to push this to the client. This is easy since it has a cached HTTP request. Hence push is accomplished over HTTP, without doing constant polling.
The advantage of using BOSH is that it can work even in flaky networks where a TCP connection would break, forcing the client to once again establish an XMPP session. Also, this makes it possible to use XMPP in web clients where one cannot open a TCP connection, for example, Facebook’s chat feature uses XMPP over BOSH.
Jingle XMPP uses a client-server model for all communication and is optimized for small snippets of info. So if the amount of data to be exchanged is very large, for example in applications like file transfer, audio-video calls and screen-sharing, an XEP called Jingle. Jingle is large and complex enough to deserve its own series of posts. I will summarize basic facts here: 1) The basic idea behind Jingle (and other multimedia protocols like SIP) is to use two channels: a) Signaling Channel to set up, manage and tear down application defined sessions b) Media Channel to transfer the payload either peer to peer or relayed thru a mediator over a application defined transport
2) In a Jingle negotiation (<jingle/> element inside a <iq/> element), the initiator makes an offer to start a session by declaring one or more app type (say voice video, etc.) and a transport method (ICE, UDP, etc.). The responder and the initiator then negotiate a set of parameters (for example codecs to be used), and if the negotiation works data is exchanged. Some parameters can be modified even while the data is being exchanged.
3) Jingle supports two transport types: a) Datagram transports like UDP – can tolerate packet loss – meant for apps like media streaming b) Streaming transports like TCP – no packet loss tolerated – for example file transfer
4) The real power of Jingle comes from using Jingle over ICE. ICE provides a mechanism for two entities to communicate and negotiate all possible ways of connecting between each other – direct or mediated. ICE in turn can use a STUN server to find out the IP address and port of an endpoint from outside the firewall, and a TURN server to relay data in case a direct peer to peer connection is not available. 24 May Lessons from Twitter @Replies tweaksWhen the Twitter reply feature tweak story started breaking, my first reaction was – this could be us in future. Watching Twitter struggle with the change and the backlash it generated was a big public lesson in product design, tech implementation and communications, and I thought I should document it here, lest I ever forget it. For the uninitiated, Twitter recently removed a setting where you could see responses by people who you follow to other people who you don’t follow. This was primarily done for tech reasons, but the blog announcement did not explain that very well, leading people to think of it as an arbitrary move, resulting in some strong feedback. Twitter however responded quickly with the real story: it was done for technical reasons. Here’s what I learnt:
1) Product Design: According to Alex Payne – Twitter’s API lead, only 3% of the users had ever turned on that feature, however, these 3% users are apparently power users, which in Twitter’s world also means that they are also strongly vocal with a large influence base, and they were using the feature to discover new friends. Hence the strong backlash. The lessons are: a) If you change a feature, figure out the audience profile it impacts most, and the consequence of that impact. Numbers-wise 3% is low, but what if they are some of your most important and vocal users? b) When you do change a feature, figure out if the impact can be lessened by coming up with an alternative. It seems that the Twitter team is now working on this – they could have started the work before going ahead with the decision to remove the setting.
2) Tech Implementation: The feature was removed for tech scalability reasons. Let us see what the situation is:
How hard is it to do this? Well you are looking at at least two select queries, where the second one involves a complex join. This is where the stress on the system comes from. Can this not be addressed by caching? Here’s what would need to be cached: the follower list for each user, and the 3% of the total users who have got this feature turned on. It would be ineffective if the follower numbers n1, n2, n3, … for users U1, U2, U3, … were very large and not easily cacheable. For a typical social network like Facebook it would not be very difficult, but Twitter is different: it is asymmetric – you can follow someone without requiring that person to follow you. This results in massive values of n1, n2, n3, … for popular users. How big? According to http://wefollow.com/top, each of the top 10 twitter users has more than a million followers, with Oprah Winfrey at 1.02M and Ashton Kutcher at 1.85M. Clearly not an easy number to cache. What is scarier is that Twitter is growing at a very fast pace – faster than anyone else according to http://blog.nielsen.com/nielsenwire/online_mobile/twitters-tweet-smell-of-success/, and the number of celebrities and businesses using Twitter is growing at a fast pace. According to http://twitterholic.com/, 5 of the top 10 users have joined in the last one year. Lesson: when designing features, design with scalability in mind. If it is not scalable, avoid putting it in – you may need to kill it.
3) Communications: This has been covered ad nauseam all over the web, so I won’t repeat it here. The moral of the story is that if you do have to kill a feature, the key thing is to be honest about the reasons and be clear in the message leaving no ambiguities. First Facebook learnt the lesson and now Twitter has. Hopefully, we would be able to avoid these mistakes. 1 May Activity Streams 101One of the cornerstones of the .PW platform is the Wall - the real time aggregate of activities being done by an entity and its network. So while I am personally rather inactive on the various social networks (way too distracting), the recent announcement by FB on opening up their feed via activity streams led to some analysis and thought – here’s a summary.
The Premise
Challenges
Solutions
Activity Streams Standard Take a look at the formats of the following public feeds: 1) http://api.flickr.com/services/feeds/photos_public.gne <entry>
2) http://picasaweb.google.com/data/feed/api/all
While both of them are about public photos, the formats are vastly different since they come from two different providers. So a potential system that wants to aggregate photo updates across multiple providers needs to understand each vendor’s format separately. Efforts like Yahoo’s Media RSS extensions seek to address this. (The picasa feed above uses the Yahoo extensions but the Flickr feed does not, which is strange considering it is a Yahoo property). Another example is Apple’s iTunes RSS extensions. However, these are RSS extensions, and what we really need in this space is an Atom extension. (why? see http://blog.unto.net/work/on-rss-and-atom/). An example is http://martin.atkins.me.uk/specs/atommedia. However, this is only about photos which is just one of the activities. When we start considering the possible list of activities, the problem scope becomes extremely large. This is where the Activity Streams effort comes in. Consider the following activity: “Vineet posted new photographs to Agastya on Facebook 6 hours ago.” This can be broken down as:
Other possible fields here can be:
To capture the above, the following draft specs are currently in place.
Note that this is work in progress and if you read the drafts, the gaps are fairly obvious. The big ones (I think) are:
Activity Streams Implementation 1. MySpace: http://wiki.developer.myspace.com/index.php?title=Standards_for_Activity_Streams is compliant to the current draft specs. Also, their documentation is simple to read, understand and use. 2. Facebook: http://wiki.developers.facebook.com/index.php/Using_Activity_Streams. For one, the stream only consists of user generated content and not app generated content. Second, they have a model where you need to prompt the user for permission to access the stream. This has two repercussions:
Note that the Activity Stream standard is just one way of accessing the FB stream. They also provide a XML/HTTP and a FQL API. See Using the Open Stream API for details. The restrictions however stay in place.
Further Reading
21 April Authentication MechanismsAfter publishing my previous post, I had thought that I would not be coming back to Crypto for a while. However, today evening Sebastiaan posted on SCRAM on one of the Directi mailing lists, and I got compelled to write down this one. Authentication in Cryptography has two aspects: data authentication and entity authentication. Data authentication is addressed using HMACs and Digital Signatures (discussed in my previous post). What we are talking about here is entity authentication: how does one entity get to know that the other entity is actually what is claims to be. This in turn has two aspects:
Here we are talking about the first kind of authentication. The goals of an entity authentication / identification protocol are as follows:
Passwords A password is a shared secret between two entities. By sending across the shared secret, one entity can prove to the other that it is indeed the possessor of the secret and hence the valid entity. However, sending the shared secret across as a mechanism for authentication is fairly weak. Here are some inherent weaknesses in the scheme:
Some of these issues can be addressed to some extent thru the following:
However, the biggest drawback of the fixed password approach is that it is open to a replay attack. So it cannot be used over an open network, but must require a trusted connection.
Challenge Response Identification The idea of a Challenge Response scheme is simple: Both Alice and Bob have a shared secret. To authenticate Bob, Alice makes him a challenge. To respond to the challenge, the shared secret is required which only Bob has. Thus by sending the response, Bob proves that he is the possessor of the shared secret without ever sending across the shared secret. There are multiple ways of doing challenge response:
Symmetric key Challenge Response using Encryption Here the assumption is that Alice and Bob exchange the shared secret a priori. Let’s suppose the shared key is K. Then:
The EA is typically a function like modular exponentiation which is easy to compute, but hard to reverse. Also, data integrity must be ensured in all transmission by using a HMAC.
Symmetric key Challenge Response using One-Way functions In case the usage of an Encryption / Decryption routine is infeasible (because of computing cost maybe), it is possible to achieve the same using one-way functions as well:
Asymmetric Challenge Response
Zero Knowledge Protocols Since in all the cases above, Bob responds with some response based on the secret key, there is a theoretical risk of information leakage. Zero knowledge protocols seek to address this by enabling a claimant to demonstrate knowledge of the secret without revealing any information whatsoever. The basic idea is that the claimant must answer two questions: one involving demonstrating the knowledge of the secret, and the second is a simple question to prevent cheating. The series of steps is repeated several times (typically 40) to reduce the probability of cheating, and only if all answers are correct is the entity authenticated. The math involved in doing this is detailed but straight forward, and it is also easy to understand how the protocol works without revealing any information. You can read about it any standard cryptography textbook.
SASL Mechanisms With that backgrounder, let’s come back to the issue of SCRAM-MD5 where we started. Basically SASL is a framework for entity and data authentication supporting multiple mechanisms including anonymous, plain-text, one-time-password, CRAM, Digest and now SCRAM. Let’s quickly examine these (please do not take the descriptions below as accurate – they are greatly simplified):
20 April Crypto 101Most developers whom I have come across, lack a solid grasp of the fundamentals of cryptography. When a developer who does not understand crypto needs to use crypto, several things can go wrong: • Not understanding the implications of using some crypto technology in the code The unfortunate part to this whole situation is that Cryptography is not hard to understand and most of the perceived complexity is in areas where an application developer would typically never venture: designing an algorithm or a protocol, cryptanalysis, etc. In most cases, one can use one of the modern crypto libraries and go by with a minimal level of understanding of the principles involved. The aim of this post is to provide that minimum background required to start using crypto effectively in day to day work.
1. Introduction This story is about two nice, genuine people - Alice and Bob - who are in separate locations and want to have a conversation. The third character in our story is Eve who does not like Alice and Bob and is their adversary. She wants to intercept this conversation, find out what Alice and Bob are talking about, and if possible, disrupt the conversation by modifying the messages between them or by impersonating Alice or Bob. Now Alice and Bob are aware of Eve and to prevent her from accomplishing her purposes, they set up a pipe which goes from Alice to Bob. Whatever message Alice drops in the pipe at her end would be received by Bob at the other end and vice versa. The pipe is opaque and indestructible so Eve cannot look at the messages nor modify them. Such a pipe or channel then has the following properties: • Confidentiality: Contents of messages passing between sender and receiver cannot be seen by the adversary since the pipe is opaque. The purpose of cryptography is to establish this kind of a secure channel over a public network like the Internet.
2. Confidentiality using a Shared Secret
[Note: Alice and Bob do not care if Eve gets to know EA or DA, as long as K remains a secret. This is a very important point in modern-day cryptography: we assume that the Adversary has the knowledge of how the algorithm works. This is called Kerckhoffs’ Principle. Mechanisms that are based on keeping the working of an algorithm secret are obscure, not secure. Such mechanisms are of historical interest only and are inadequate for today’s world. Some people compare this with government agencies (like NSA in the US) not publishing their mechanisms. While this is true, such non-disclosure does not imply that the algorithms used by agencies like NSA are designed on the basis of algorithm obscurity. Security agencies in governments (or large IT companies creating security technology) hire top-notch cryptographers who are well aware of the Kirchhoff's Principle and design with the assumption in mind. Obscurity then is either an extra defense (however weak that maybe), or a way of protecting intellectual property rights on the mechanism. Bottom-line: modern-cryptography assumes that the attacker has knowledge of the working of the algorithm and only the key is secret.] Now to send messages securely, Alice takes the message (called Plaintext in this context) and generates an encrypted message out of it (called Ciphertext) by using the Encryption Algorithm EA and the Secret Key K. Thus: Ciphertext = EA(Plaintext, K) This Ciphertext is now sent to Bob over the public network. On receiving the Ciphertext, Bob uses the Decryption Algorithm (DA) and the Secret Key (K) to generate the Plaintext back. Thus: Plaintext = DA(Ciphertext, K) Eve does not have the Secret Key (K) and is therefore not able to obtain the Plaintext. We have achieved confidentiality by sharing a secret.
2.1 Substitution Ciphers The idea is not new. It’s most famous usage was by none other than Julius Caesar who replaced each character in the message by a character three characters ahead (shifting by 3) in the Latin Alphabet. This is called the Caesar-Cipher. Given that most of his enemies were not even literate, let alone thinking of breaking ciphers, it would have been fairly secure. Even as late as 1915, the Russian army was using the Caesar cipher since their troops could not learn to use more complicated ones. A trivial modern day implementation is called ROT13 (Rotation 13 – rotation by 13 positions). Since the English alphabet is 26 characters, ROT13 is its own inverse, i.e., the decryption and encryption algorithms are the same rule because of the choice of the key (13). ROT13 is used by people asking questions / puzzles, etc, on the Usenet. The standard Usenet client has ROT13 built-in. What can Eve – the adversary – do about breaking a substitution cipher being used by Alice and Bob? Cryptanalysis - analysis of a Cryptographic Scheme with the goal of breaking it – of a substitution Cipher is rather straightforward, and by analyzing a few instances of Ciphertext produced by a substitution cipher, a cryptanalyst can easily break the cipher. One of the simplest techniques that Eve can use for this purpose is Frequency Analysis. This is based on the fact that in any language some letters are used more than others. (In English the approximate sequence is ETAOIN SHRDLU for the top 12 characters). So by analyzing the frequency of the alphabets appearing in the Ciphertext, the Plaintext can be figured out. The earliest use of this technique was by the Arabs in the 9th century who realized that certain characters appear more frequently in the Koran than others. Several of the ciphers used in World War II were also broken using frequency analysis.
2.2 Transposition Ciphers Anagrams aside, the earliest example of a Transposition Cipher being used for a security purpose is a Scytale – Greek for a baton – a device used by the ancient Greeks to communicate messages during a military campaign. The working was really simple: a strip of paper was wound around a cylinder of a known diameter and a message written on the paper. The paper would then be peeled off and sent thru a messenger. The recipient would have a similar cylinder on which it would wind the paper again to read the message. Unless the recipient’s cylinder was of the same diameter, the letters would not come together properly and the message would not make sense. However, manual analysis of the letters is straightforward – letters after a specific distance need to be put together to make sense out of the message, and the strip of paper is a big giveaway.
2.3 Product Ciphers
2.4 Modes of Operation
2.4.1 Stream Ciphers The keystream is generated using some internal state in the algorithm. For higher security, this internal state needs to be updated continuously. Based on how this state change takes place, Stream Ciphers can be classified as: 1. Synchronous Stream Ciphers: The internal state of the cipher changes independent of Plaintext or Ciphertext. 2. Self-Synchronizing Stream Ciphers: The internal state changes based on the previous Ciphertext digits. Since the keystream itself is not really random, the bit sequence in the stream would repeat itself after a period. The longer the period, the more secure the algorithm.
2.4.2 Block Ciphers 1. Electronic Codebook (ECB): This is the simplest mode. The message is split into parts and each block is encrypted / decrypted separately. This is insecure and not recommended since it is subject to two simple attacks: 2. Cipher Block Chaining (CBC): In this mode, each block is XORed with the previous block before encryption. It is very commonly used. A variation on CBC is called Propagating Cipher Block Chaining (PCBC). This mode is used in Kerberos. 3. Cipher Feedback (CFB), Output Feedback (OFB), Counter (CTR): All of these modes convert a block-cipher into a stream-cipher by taking the output from the previous block. CFB converts a CBC into a self-synchronizing stream cipher. OFB and CTR makes a block cipher a synchronous stream cipher. All modes except ECB use output from the previous block to operate on the next block. This leads us to the question: How is the first block processed? To solve this problem, a dummy block, called an Initialization Vector (IV) is used. The IV need not be secret, but the same IV should never be used with the same key.
3. The Chimera of Perfect Secrecy 3.1 One Time Pad The problem with OTPs is key distribution – it is logistically tough and prone to security breaches. So while OTPs in themselves are perfectly secure, their implementations are vulnerable because of the insecurity involved in key distribution. OTPs are typically never used, except in highly critical applications where the parties involved are willing to undertake the key exchange challenge. The reason I brought up OTPs is to point out that perfect security – that can defeat unlimited computing power - is not always practical. Modern-day Cryptography does not chase this chimera. Its goal is not to provide security in the face of infinite computing capacity. Instead, it assumes that the adversary has (reasonably) limited resources. What makes modern-day Crypto work is the gap between the computational ability it takes the legitimate user to encrypt vs. the computational infeasibility of decryption for the adversary who does not have the secret. 3.2 Computational Complexity Theory 3.2.1 Algorithm Efficiency T(n) = n3 + n2 + 1 Where n = problem size. Now as n gets arbitrarily large, the value of (n2 + 1) becomes insignificant as compared to n3. So T(n) = n3 for very large n. Such an algorithm then has complexity of the “order of n cube”, and the algorithm is called a O(n3) algorithm. An algorithm is considered to run in polynomial time if T(n) = O(nk) where k is a constant. Example include • Constant time: O(1) [k = 0]. An algorithm running in polynomial time would be more efficient than an algorithm running in exponential time: T(n) = O(kn)
3.2.2 Complexity Classes
3.2.3 Class P Complexity
From the perspective of Complexity theory any algorithm that runs in polynomial time is considered as being efficient (or feasible), and any algorithm that does not run in polynomial time as being intractable or infeasible. So any computation in the class P-Complexity is considered efficient. However, this is only in theory, it may not be so in real life. For example if the value of k is 10000, T(n) = O(n10000) is certainly a very inefficient algorithm.
3.2.4 Class NP Complexity
Since the verifier function (IsFactor) is a class P algorithm, the problem of whether n is composite or not is a NP problem. All NP problems have a verifier function which accepts an input and a verification value, which can be computed in polynomial time. A number of problems are Class NP Complex, the most common being the Factoring problem and the traveling salesman problem. The hardest problems in Class NP complexity are said to be NP Complete. An example is the problem of multi-processor scheduling: Find the minimum time required to schedule n jobs (of varying lengths) on m processors such that no two jobs overlap.
3.2.5 Is P = NP? Whether P is equal to NP is one of the most important unsolved problems in modern mathematics and if it can be proven that all class NP problems are actually class P problems, the implications would be stunning, with a huge impact on our society (for starters, most of modern-day Cryptography would be rendered useless)
3.3 Information Leakage So from a Cryptography standpoint, when we look at computational feasibility, we need to look at it not just from the perspective of the entire data becoming available, but also how feasible it would be to recover part of the information.
4. Cryptographic Primitives
4.1 One Way Functions Today we have no rigorous mathematical proof that what we consider one-way functions, are indeed one-way. What that means, is that today we are not sure whether there exist inversion methods that have lower computational complexity than the ones we are aware of. Just because a lower-complexity inversion has not yet been discovered, does not mean that they cannot exist. So cryptography is built on assumptions. So far the assumptions have been valid.
4.2 Hash Functions Digital Fingerprint / Message Digest = HashFunction(Message) A hash function is obviously one-way. That’s because a hash function is lossy – the number of output bits is less than the number of input bits so there is not enough information in the output to calculate the input. However, this alone would not make hash functions useful. What makes hash functions useful is that they are collision resistant and create an Avalanche effect. A collision is a situation where the hash function generates the same output for two different inputs. A good hash function is collision resistant. Once again, if you consider the fact that the number of output bits is less than the number of input bits, it is obvious that there is no way of avoiding a collision. A simple way of approaching this is to think about the input as a 64-bit number and the output as a 16-bit number. Now there are 18,000 trillion 64-bit numbers (264 = 18,446,744,073,709,551,616) 64-bit numbers and 65,536 16-bit numbers. As we keep reducing each number in the 18,000 trillion-range to 65,536 numbers, there are bound to be collisions. So a hash function cannot be collision resistant. However, we can reduce the probability of collision if we choose a large enough output range and a good algorithm. For cryptographic purposes, we define collision resistance as: It should be computationally hard to find two inputs m1 and m2 so that their output h = HashingFunction(m1) = HashingFunction (m2). The other important property for a cryptographically secure hashing algorithm is that it should be Pre-Image Resistant: Given an output h, it should be computationally hard to find an input m so that h = HashingFunction(m). The other desirable property of a good hash function is what is called the Avalanche Effect. The idea is that if the input changes even by a small amount, the output changes dramatically. The Strict Avalanche Criterion states that if one bit is flipped, each of the output bits should change with a probability of 0.5, in other words, half of the output bits should flip.
4.3 Random Numbers 1. The key should be long. The longer the key, the stronger the encryption Point one is easy to understand, but point two is a little ambiguous. What is “more random?” Without getting into a philosophical discussion on randomness, I would just like to say that a random number is one which is generated in such a way that its probability of being generated is as high / low as any other number, and there is no way of determining when the number would be generated. Generation of random numbers depends on the initial input on which a mathematical calculation would be performed. But unless that initial condition has randomness in itself, the output would also not be in random. This initial randomness is called an Entropy Source. Based on the entropy source, Random Number Generators (RNGs) can be classified as: 1. RNGs based on Deterministic Random-bit Generators (DRBGs): DRBG means that the source of entropy is deterministic. Modern computers are all deterministic (as compared to quantum computers which are non-deterministic). If the entropy source is deterministic, the numbers generated cannot really be random. This means that a software-based generator does not generate truly random numbers. Hence we call the numbers generated by such a generator Pseudo-Random Numbers, and the generator itself is called a Pseudo-Random Number Generator (PRNG). All software based RNGs are PRNGs. 2. RNGs based on Non-Deterministic Random-bit Generators (NDRBGs): NDRBG means that the source of entropy is non-deterministic. Such RNGs are typically hardware based and use a physical phenomenon as a source of entropy. Examples of physical phenomena used by NDRBGs include photoelectric effect, thermal noise, or some other quantum phenomena. Quantum phenomena in theory are unpredictable and therefore a reliable source of entropy. We call the numbers generated by such a generator Truly-Random Numbers, and the generator itself is called a True-Random Number Generator (TRNG). TRNGs are rather expensive and are needed only for very high security scenarios. How does one find out if a PRNG is good enough to be used in Cryptography? A random number generator would typically generate numbers in a given range. This leads to two important properties: • The distribution of values of the numbers generated in the range. The more even the distribution, the better. In order to have these properties, a PRNG needs two things: 1. A random bit stream from a source of entropy. There is some entropy even in a deterministic computer. Examples include Clock Tick count, and values of System Handles (like Thread handle, Process handle, Network Card Id, etc.) For the second point, there is a standard by a body called FIPS (Federal Information Processing Standard) which can be found in FIPS 186-2. A function catering to the above two requirements is called a Cryptographically Secure Pseudo Random Generator (CRNG). Typical random number generators like the C++ rand() are not CRNG and they should not be used in Cryptography.
5. Confidentiality without using a Shared Secret
The only problem in this situation is the initial condition – the sharing of the secret key. Since the security of the scheme derives from the key (recall Kerckhoffs’ Principle), its sharing in a secure way is a pre-requisite. The question is how does one share the secret in a secure way without meeting in person? There has to be some way of transferring the key in a secure way too. But wouldn’t secure transmission of the key also involve some sort of encryption? And in that case, wouldn’t we need a key once more? The problem seems to be recursive. But we do know that the problem has been solved, and the key to solving the problem (pun unintended), lies in the symmetry of the situation.
5.1 Diffie-Hellman Key Exchange
Hence both Alice and Bob are able to arrive at a shared secret. (Of course in a real implementation, a, b and p will be very large numbers) What makes D-H work is the fact that gba = gab. This leads to: (ga MOD p) b MOD p = (gb MOD p) a MOD p But the big question is what prevents the Adversary Eve from also carrying out the same computations given that she also knows p and g? We need to give Alice and Bob some advantage over Eve for this shared secret to be limited to just Alice and Bob. The advantage Alice and Bob have is that they have direct knowledge of a (or b) while Eve is looking at ga (or gb) Eve needs to calculate gab given g, ga, and gb where g belongs to a group defined as (Integers MODULO prime). This is called the Diffie-Hellman Problem (DHP). A fast means of solving this problem would break Diffie-Hellman Key Exchange (and a number of related schemes). The fastest way of solving it today is to find ‘a’ given ga. This is called the Discrete Logarithm Problem (DLP). There is no known efficient algorithm to solve DLP as of today. Why is that so? We have no rigorous mathematical proof; just that no efficient algorithms have so far been discovered. But to understand the nature of the problem, let’s try and break this down further. Let’s start by thinking about what is known to Eve. When Alice sends across her first computation to Bob, Eve knows: 1. p Now given this information, Eve needs to calculate ‘a’. Note that MOD is not a symmetric operation. Meaning that given (x = y MOD z), and given x and z, you cannot carry out a mathematical operation that yields y. The values of y can be = (z * n) + x, where n is a natural number. Hence there is an infinite set of values that can be taken by y. Applied to our case, when Eve looks at (25 MOD 11), she is looking at 3. And she knows ‘p’ is 11. So ga can be = (11 * n) + 3 where n is a Natural number. Given that g is known to be 5, our equation becomes: 5a can be = (11 * n) + 3. Now Eve has the unenviable job of calculating a. How easy is this computation? A trivial approach would require Eve to run thru all powers of 5 and compare each with (11 * n) + 3 for several iterations of n. Are there multiple solutions? That depends on values for g, p. Let’s assume that there is a single solution only. But even if there is a single solution, if we choose p, a, and b to be very large integers (g can be small. It is usually 2 or 5), the calculation is a very tough one for Eve, and takes a lot more computational power than it would take Alice or Bob to calculate the Shared Secret. This asymmetry is what makes Diffie-Hellman work. Note: These days, it is recommended that p be at least 512-bits long. For more security, use 1024-bits.
5.2 RSA
Bob also does a similar thing. He also creates a private-public key pair and publishes the public key. Alice can now send messages to Bob by encrypting the messages with his public key. Only Bob would be able to decrypt the messages (using his private key). We have solved the problem of the secret key by removing the symmetry of the model. The interesting part is the algorithm itself which requires one key for encryption and another key for decryption. Again, the math involved is straightforward, just that the calculations involve some huge numbers which means that the computing power required is rather high. As usual, a cryptographic scheme works if the legitimate users can compute efficiently while the adversary’s computations are complex. So the questions we should ask for RSA are: We will not get into the details here, but take it for granted for now that: 1. Computation of the private key from the public key is computationally hard. No simple algorithms exist as of today and the inversion is infeasible. Of course this is an assumption based on knowledge that we have today. So far the assumption has been valid, and that is why RSA works! 2. It is not very efficient to encrypt / decrypt messages using asymmetric algorithms. For one, RSA calculations involve really long numbers. So while a typical symmetric algorithm would be secure with a 128-bit key, the minimum key-size required to achieve security in RSA would be 1024-bits. Obviously the computations on such large numbers would be really slow. Hence it is rather inefficient to encrypt / decrypt large amounts of data using RSA. So the way RSA based cryptography really works is that the only data that is encrypted using asymmetric encryption is the key for a symmetric encryption. Thus:
6. Integrity This is called Message Integrity. In fact, integrity is often a more critical requirement than confidentiality. We are ok if an adversary discovers our contents of our message, what we certainly do not want is the adversary modifying the contents. 6.1 Encryption does not give Integrity
Now though it seems correct, this argument is fallacious. For example, if Eve knows that Alice is responding to Bob’s message of whether she would marry him, Eve may simply flip the bits on the response being sent by Alice. Since the response was supposed to be a yes / no, flipping the bits of the Ciphertext may also flip the response. This does depend on the strength of the encryption algorithm, but the possibility cannot be discounted. So in principle we cannot assume that encryption provides integrity. Proving the fallacy of the “encryption provides integrity” argument in general requires that we look at some math. We will avoid that right now. Suffice to say that if Eve has some information about the nature of the data being sent, and the algorithm being used, she can modify parts of the Ciphertext in such a way so that the decryption of the Ciphertext actually yields wrong Plaintext which is meaningful to Bob. Since encryption does not provide Integrity, we need some additional mechanisms. There are two popular solutions: Machine Authentication Codes and Digital Signatures. 6.2 Machine Authentication Code
The Verification Algorithm may look like: While the mechanism is similar to that in symmetric encryption, the goals here are totally different. In symmetric encryption, the message being sent – the Ciphertext – contained the entire information of the Plaintext in an encrypted form. The way the Tag is different from the Ciphertext is that the Tag does not represent the complete information of the Plaintext and is a much smaller value, a number, which is computed using the Plaintext and the secret key K, and is sent along with the message (in Plaintext or Ciphertext). Note that symmetric encryption and MACing are not mutually exclusive. For the same Plaintext, an Encryption Algorithm, along with an Encryption Key K1 can generate Ciphertext, while a Tagging Algorithm along with a Tagging Key K2 can generate a Tag. So we can achieve both confidentiality and integrity. It is just that in this scenario, Alice and Bob have to share two secret keys. Now this does seem secure, but there are a couple of interesting things that can still happen. The simplest one is what is called a Replay Attack. In a replay attack, Eve watches a message M along with a tag T being sent by Alice to Bob. She picks a copy of the message from the wire and keeps it with her. Sometime later, she sends the same copy over to Bob. How can this become an attack? What if the message Alice sent was: “I borrowed Rs. 1000 from Eve. Please credit Rs. 1000 in Eve’s account.” Every time Bob receives this message, he transfers Rs. 1000 to Eve’s account. Alice wanted to send this message only once, but Eve carries out a replay attack and gets richer. Another example is that of a user sending over a username + password combination to a web-server and the server sending over a cookie to denote a successful login. If a hacker intercepts the packet being sent over to the server and later does a replay, the hacker will also get the cookie, without knowing the username and the password.
6.3 Digital Signature Recall from our discussion on RSA that we addressed the confidentiality requirement by having the receiver publish its public key. The sender would use this public key to encrypt the data and send it to the receiver who would decrypt it using its private key. Digital Signatures address the integrity requirement by having the sender publish its public key. To send a message, the sender generates a Signature using his private key and the message: Signature = SigningAlgorithm(Message, SendersPrivateKey) He then sends the Signature along with the message. The receiver uses the sender’s public key to verify whether the Signature is valid or not: Yes / No = VerificationAlgorithm(Message, Signature, SendersPublicKey) As you notice, Digital Signatures are similar in principle to MACs, except that the setting here is asymmetric. Once again, as with MACs, the confidentiality and integrity requirements are not mutually exclusive. We can achieve both. An important point of difference between MACs and Digital Signatures concerns non-Repudiation. When Alice sends a message to Bob, we don’t want Alice to be able to deny the fact that she actually sent the message and not someone else. MACs are not able to create Non-Repudiation. If Bob can verify a tagged message, he can also produce a tagged message (because the key is the same). So Bob cannot drag Alice to a court of law, produce a tagged message and claim it was sent by Alice. Digital Signatures on the other hand do provide Non-Repudiation. Since a Digital Signature can only be produced by the Private Key of the sender, the sender cannot claim that the receiver concocted the evidence.
Summary I have tried to cover most of what I think are essential concepts towards understanding cryptography. With this, one can now get started putting publically available crypto libraries to use. There are quite a few things I wanted to cover but have not been able to since this text grew too long and I basically got tired of working on this. These are about how typical algorithms work - Feistel networks, key schedules, S-box, P-box, etc., survey of the various standard algorithms – DES, 3DES, AES, MD5, SHA0, 1, 2, and choosing block sizes, IVs, key sizes, modes of operations, etc., and finally material on Authentication, Digital certificates, PKI, and Elliptic Curve Cryptography. Of course, even with all this info included, I would still have only scratched the surface since there is a ton more to be learned. The reading list below is therefore highly recommended:
|
|
||||||||||||||
|
|