Wednesday, February 20, 2008

3GPP Multimedia Telephony & OMA CPM

In this post, I will address two standardization initiatives that try to define the future of communication services based on IMS, and more especially the support of Multimedia Communication.

As you will see, I have a soft spot for one of them (OMA Converged IP Messaging) and a strong bias against the other (3GPP Multimedia Telephony).

3GPP Multimedia Telephony

3GPP Multimedia Telephony (MMTel for short) has appeared in 3GPP release 7 and will evolve in the next releases. The initiative originates from ETSI TISPAN requirements for fixed networks.

The name always made me mad, but it is actually very telling about what MMTel is today.

The service is "multimedia" in the sense that it permits the combination and dynamic re-negotiation of different media components within an IMS session. Sample components include full duplex voice, real time video, text communication, file transfer (e.g. video clip, audio clip, pictures). Note that there is no explicit mention of applications, and the 3GPP requirements state that a "typical usage" is speech (voice) and speech combined with other media components.

On the other hand, this is a "telephony" service, which seems to anchor its definition to the well-known, decades-old, voice-centric, and circuit-switched implemented telephony service. Basically, you could infer that Multimedia Telephony is an incremental evolution of Voice Telephony with the addition of new media components.

This may lead to potential questions when specifying Multimedia Telephony:
- Is the service only related to person-to-person communication or can it apply to person-to-service interactions? I already showed an example of how a multimedia SIP session can be used in a person-to-service relationship, and you will see below that CPM supports it.
- Are current classical telephony supplementary services applicable in a new world where a session can be established with a service, where SIP URIs will eventually replace telephone numbers, where the protocol and the core network permit to reach the user on a multiplicity of devices, where enablers like group management and presence can change the users' behaviors to communication (e.g. is the user available for voice communication now? What are the alternatives?) as well as the way network-based call handling services can be implemented (using presence to take an informed decision), where users can negotiate the content of a session when establishing it, and where services can easily redirect users to web pages to allow them to decide how a call should be handled?

These questions are not purely theoretical when you consider the current standardization state of MMTel: it solely consists of the specification of how classical telephony services like Original Identification Presentation, Call Forwarding Unconditional or Communication Waiting can be implemented in IMS. The requirements specification clearly states that the behavior of each service, "as perceived by the user, should be consistent with the behavior perceived when using the equivalent services on PSTN/ISDN and CS Mobile networks." Isn't this crystal clear?

As such, at the moment MMTel looks like a vehicle to standardize voice-centric telephony services to be supported by a Telephony Application Server (TAS), which may eventually become a Multimedia Telephony Server when voice is not the only media component supported by a call.

The MMTel initiative can be associated to another 3GPP one called IMS Centralized Services (ICS) which aims at having an IMS-based TAS for both CS telephony and IMS telephony. The approach would eventually deprecate existing IN servers in legacy networks and see them replaced by a TAS. Put together, both initiatives draw the following potential roadmap:
1) A TAS for IMS voice telephony
2) A TAS for both CS and IMS based voice telephony
3) A TAS to support IMS-based Multimedia Telephony
Basically, a TAS at the center of the future telecommunications network.

I personally do not believe at all in this incremental approach to Multimedia Communication, as SIP and IMS will drastically change the way people communicate, leaving plain old telephony as a separate service whose usage will gradually decline as people will adopt a totally new way to communicate. A TAS should therefore be seen as an important box in an IMS network, but a box that will remain focused and essentially unchanged until its termination.

True Multimedia Communication is to be supported by something else, and this is...OMA CPM.

OMA Converged IP Messaging (CPM)

OMA Converged IP Messaging is currently under specification within OMA (the Open Mobile Alliance). In my opinion, this is by far to date the most significant step towards an optimal exploitation of IMS capabilities that I described in the past.

The name of this enabler may be misleading, as it seems to imply that CPM is a pure messaging service, while it is not.

Actually, CPM can be defined as a composite specification addressing two concerns:
- Full IMS messaging (OMA SIMPLE IM), including page mode and session-based approaches, as well as transparent interworking with legacy mobile messaging services (SMS, MMS, OMA IMPS). This could certainly be extended to other messaging services like email or Jabber/XMPP in either future versions of the specification or in smart implementations.
- Multimedia communication as I had the opportunity to describe it in the past.

Let's start with the messaging features.

Page mode messaging, which is based on the SIP method MESSAGE can be seen as the equivalent to SMS in the SIP/IMS world. A user can send a page mode short message to another user or a group of users. This message is either delivered instantly over the IMS core network or stored for deferred delivery if the recipient is not available.

Session-based messaging is based on a SIP session, with messages delivered through a protocol called MSRP. Session-based messaging essentially serves two requirements that cannot be fulfilled by page mode messaging: the support of chats, in which messages are exchanged between two or more parties within a dialog context; the possibility to send large volume files like music or video clips. In addition, session-based messaging has two important benefits: it minimizes SIP control traffic to session management, and utilizes a specific protocol for the messages itself, permitting to implement an appropriate support in the network; by reusing the concept of SIP session, it permits messaging to be one component among others in a multimedia session.

Interworking with legacy messaging services like SMS and MMS permits CPM to be interesting from the start, as it does not rely only on the initially limited IMS community to deliver its services.

The scope of multimedia communication supported by CPM is very broad.

CPM will support a wide range of discreet (e.g. messaging, files, applications) and continuous (e.g. full duplex voice, video) components. It supports the negotiation of initial components in a session, dynamic renegotiation of components in the session, and does not require any specific component to be part of the session (for instance, it is not mandatory to have a messaging component despite the name of the service). The session can be both person-to-person (2-party or multiparty) or person-to-service (and obviously person-to-person&service).

CPM supports a multi-device approach, making use of SIP convergence capabilities. It permits a user to share an identity between several devices and to use several identities per device. A user may have its CPM session shared between several of its devices on a media per media basis (e.g. video sharing on TV and voice on mobile). Session mobility between devices is also supported (e.g. ongoing session transferred from TV to mobile). The user can also define preferences on how CPM should address communication according to its devices (e.g. IMs should be sent only to mobile).

In summary, CPM supports both of the three axes I defined for an optimal exploitation of IMS capabilities: multimedia communication and user oriented convergence.

CPM supports conferencing through a variety of means supported by SIP and already used in services like PoC or IMS Messaging: sessions to ad-hoc groups (a user sends an INVITE to a group explicitly defined in the INVITE), to pre-defined groups (the INVITE is sent to a PSI identifying a group whose definition is stored in the network), and by adding participants to a session on the fly.

Interworking with non-CPM communication means is not limited to messaging. It also includes interworking with non-IMS voice, such as circuit-switched voice. This will be done by reusing current IMS/CS interworking components.

In addition, CPM supports a converged address book, converged in the sense that it is aimed at being shared by all the devices owned by the user. It also has an archiving functionality, able to store such things as messages, media objects and session histories. It interacts with presence in order to publish or access presence information for a user. CPM will also support web services towards application willing to use its services. An interesting feature is the possibility for users not to divulge their identity when using CPM, by using a nickname for instance.

The CPM architecture is quite straightforward:
- A CPM Client in the device supports CPM from a user perspective.
- A Converged Address Book component supports the address book for multiple devices.
- A Message and Media Storage component archives everything that needs to be archived.
- An Interworking Function supports interworking with non-CPM messaging solutions (voice interworking is supported through the IMS core network).
- A CPM User Prefs component interfaces with the user for CPM customization.
- The CPM Conversation Server supports multimedia sessions, and should in implementations look very similar to what I described here. Like PoC and IMS Messaging, it will be subdivided in a Participating Function dedicated to a specific user in the session (called Personal Multimedia Controller in my post on the subject) and a Controlling Function component supporting features shared by multiple users like conferencing (I called it Multimedia Focus in the same post). These SIP centric components will control media intermediaries for individual components in the multimedia session (I called them Media Mixers). Note that the specification does not mandate network intermediaries for every component, as it allows peer-to-peer media flows between devices when the operator's policies permit it.

Building block standardization approaches

Both specifications can claim (and actually do) a building block approach to standardization. However, there are significant differences between them.

CPM's approach is IMS-centric, as it intends to reuse IMS enablers like the interworking with the circuit-switched network for voice, presence, group management and to make generic and extend architecture patterns that were introduced with the specification of Push To Talk over Cellular and later reused for IMS Messaging (OMA SIMPLE IM).

On the other hand, Multimedia Telephony's approach is pre-IMS centric. The goal is to create within IMS building blocks which mimic pre-IMS voice-centric telephony, with the hope (illusion?) that a multimedia communication experience can be created through the reuse of these basic building blocks that totally ignore the capabilities of IMS.

The reader can make its own opinion on the advantages and drawbacks of both approaches.

For my part, if I was an operator, I would be very suspicious about suppliers coming with a story of a Multimedia Telephony server defined as an extension of a short-term Telephony Application Server, and would ask them about the positioning of this product with regards to CPM.

On the other hand, I would mandate from the potential suppliers of IMS Messaging products to provide a CPM-ready solution with a clear roadmap. Considering that IMS Messaging is a brand new IMS specification, it may be a big mistake to purchase in 2008 or 2009 a solution that has no clear path towards the next big thing: CPM.

Links to publicly available specifications:
OMA CPM: nothing as this is work in progress
Multimedia Telephony Requirements: TS 22.173
Multimedia Telephony Architecture: section 4.16 in TS 23.228 (the AS for Multimedia Telephony is unambiguously called TAS)
Multimedia Telephony Protocol Details: TS 24.173
Architecture for IMS Centralized Services: TS 23.292


Tuesday, February 12, 2008

A Bulding Block Approach to Standardization

For decades, the telecommunications industry has standardized solutions from A to Z, with little if any reuse of existing specifications when creating new ones. The progressive migration from circuit switched to IP based services did not initially change this fact much: MMS or OMA IMPS (that I take as an example in this post) are typical examples of creating telco-specific standards based on a loose reuse of IETF ones (SMTP for MMS, HTTP for OMA IMPS).

This has changed with IMS, and more especially its SIP component. 3GPP and the IETF collaborate with each other, and needed extensions to the SIP protocol due to IMS requirements are under the control of the IETF.

By importing IETF specifications into telecom standards, 3GPP implicitly accepted the building block approach to specifications that is common place in the Internet domain. In this post I will try to describe this approach and its benefits.

Building block standardization of SIP

SIP is a textbook example of a building block approach to standardization. The people and groups in charge of specifying SIP constantly try to apply the following rules:
- Do not reinvent the wheel. Reuse and adapt existing specifications if they fulfill your requirements. Only create when needed.

- Make everything as generic as possible. Even if your requirements are very precise, try to make your solution generic enough to be reused for other requirements.

Here follow some examples of how this was applied to SIP standardization:

- SIP sessions make use of the Session Description Protocol (SDP), which was specified prior to SIP. In effect, it is possible to use SDP without SIP.

- SIP SUBSCRIBE and NOTIFY methods were initially created to support a very specific requirement, actually related to the telecom domain (the support of the telephony Automatic Call Back service with SIP). However, it was decided to make the concept a generic and extensible means to distribute event notifications in a SIP network through event packages (see the first draft for SUBSCRIBE/NOTIFY here). When a part of the IETF community decided to support presence through SIP, they simply had to reuse the event package specification and create two presence-specific event packages. While the requirement was initially very specific, it gave birth to a concept that is fundamental for SIP and constantly evolving through the creation of new event packages. It is actually remarkable that this is a telephony -related requirement that led to a SIP concept which opens the door to a large variety of non-telephony related applications of the protocol.

- In the Instant Messaging (IM) area, presence was initially no more than a single state, describing if a recipient could accept an IM. The IETF decision to support presence through the inclusion of an XML document in the body of SIP methods, and allowing extensions to the basic schema, permitted the definition of presence to be gradually extended to become a large set of information about users (or services), their communication means, terminals and applications.

- SIP PUBLISH was initially created specifically for a client to remotely update presence information. The first versions of the draft were tightly linked to the presence event package and made impossible the reuse of PUBLISH in different contexts (see the very first draft here). However, the IETF community rapidly ensured the possibility to reuse PUBLISH for all existing and future event packages. PUBLISH therefore contributed to the enrichment of SIP-based presence, but at the same time a requirement initially scoped to presence contributed to the enrichment of the whole SIP protocol.

- Instant Messaging through SIP was initially supported only through the creation of a new SIP method: MESSAGE. However, it rapidly emerged that this approach was far from optimal to support all potential requirements associated to instant messaging: the concept of chat, which embeds IMs in a specific dialog context, the need to potentially exchange large documents via IM (e.g. a video file) while SIP is a control protocol and not a transport one like HTTP, or the need to support potentially high IM traffic while a SIP infrastructure might not have been implemented with this purpose in mind. It took time and several tries for the IETF community to address these requirements, and the final decision was to reuse the concept of SIP session as well as another protocol to transport an IM within the session. As a protocol like HTTP was not optimal to support the requirements for this IM transport protocol, it was decided to specify a new one called MSRP. This decision makes the comparison between Jabber/XMPP and SIP to support IM very biased. Maybe Jabber/XMPP is a better protocol than SIP for IM. However, Jabber/XMPP was initially specified and optimized for it, making its extension for, say VoIP, far from straightforward. On the other hand, in a SIP context, IM can be perceived as one communication component among others in a multimedia session.

OMA IMPS vs. IETF Presence and IM

The vertical standardization mindset that still prevailed a few years ago in the telecom community can be illustrated with OMA IMPS (initially called Wireless Village), a mobile specification to support instant messaging, chat rooms and presence.

Instead of reusing IM and presence related protocols available in the Internet, the Wireless Village group decided to specify a client to server protocol and a server to server protocol that would be specific to the mobile telecom domain, just reusing HTTP as a semantic-less transport protocol for OMA IMPS commands.

The group also decided to define IM, chat rooms and presence as tightly coupled together from a protocol and an architecture perspective, and to tightly link presence information to the mobile context.

In order to support its requirements, the Wireless Village group had to define various kinds of user lists (or groups) serving different purposes. Instead of creating a generic user group concept, they decided that each group fulfilling a specific purpose was a distinct object. Consequently, each group object led to a set of specific commands in the protocol, for creating/deleting the group, adding/removing elements to it, etc. With such an approach, if you define, say 4 types of user groups and 6 management commands, you end up with 24 distinct commands in the protocol.

In comparison, to address similar objectives, the IETF decided to decouple various concerns.
While presence is a concept originated in an IM context, the IETF decoupled one from the other, permitting each to evolve independently, thus permitting presence to apply to a much broader scope than simply IM.

By reusing the SIP session concept for session-based IM, the IETF permitted both the implementation of IM-specific systems, and multimedia systems using IM as one component among others in a SIP session.

The approach to address user groups and associated management, specified in RFCs related to XCAP, followed this approach:
- A user group is a user group, no matter what it is used for. The same user group can serve different purposes, and the set of applications for user groups is not arbitrarily bounded.
- A user group is user data, and there might be other user data that require similar access and management. No need to specialize access and management methods to user groups.
Consequently, XCAP is an HTTP-based protocol defining a few data management methods. The data itself is specified in XML, and there exist specifications for these data being user groups. As one of the requirements associated to data management was to be able to notify a user about changes made to data, the IETF decided to use a SIP event package. In effect, the IETF specifications for user data management include the joint usage of XCAP and SIP.

Building block standardization approach in IMS

The building block mindset to specifications has spread to IMS and non IMS standardization into 3GPP.

For instance, despite a terminology which is heavily related to SIP sessions (e.g. CSCF - Call Session Control Function), the IMS core network can be seen as a SIP connectivity network able to route SIP signaling, whether it is session-related or not, within an IMS domain, across IMS domains, and between IMS and non-IMS SIP domains.

In this context, IMS Presence, Messaging, and Chat Rooms are implemented as independent applications on top of the IMS core network and that make use of it. Once again, the comparison with OMA IMPS is quite interesting:
- OMA IMPS specifications lead to an implementation based on a network of IMPS servers over the mobile IP network. An IMS implementation relies on deploying application servers on top of an IMS core network. The IMS core network directly supports some of the requirements that are supported vertically in the OMA IMPS specifications (and implementations), like user authentication or routing and interfacing between various operators' OMA IMPS networks.
- OMA IMPS specifications tightly link the concepts of IM, presence and user groups. On the other hand, IMS specifications treat each of them as independent enablers which can be used together or in different contexts.
- OMA IMPS specifications were totally under the control of the Wireless Village group, and then OMA. On the other hand, by reusing IETF specifications, IMS specifications directly benefit from the evolutions performed in the IETF community, including some originating from people and companies which do not belong in the telecom or IMS domains.

A quite similar comparison can be applied to MMS and the equivalent support through IMS messaging.

Another interesting example is the 3GPP Generic User Profile specification, which permits to provide a centralized and homogeneous access to user data actually residing in various locations (e.g. HLR, HSS, AuC, application servers) and normally accessed through a variety of protocols (e.g. MAP, Diameter, LDAP). At the beginning of the erratic standardization process for GUP, 3GPP intended to standrdize a specific GUP protocol as well as a specific GUP schema to describe user data. Later on, it was decided to align on the specifications for Liberty Alliance, which define web services permitting 3rd party service providers to access user data owned by the network operator. As a consequence, GUP can be used directly as the means to access user data in network databases to support the Liberty Alliance web services exposed to 3rd parties.

The GUP specifications were also made generic enough to clearly distinguish between the methods used to access and manage user data and the data itself, that needs to be specified by instantiating and extending the generic GUP schema. On the other hand, the GUP specifications also include a SOAP-based user data modification notification mechanism, which duplicates what SIP event packages can and do support for XCAP. However, one can argue that the usage scope of GUP is broader than IMS and cannot rely on a protocol that 3GPP only uses in the context of IMS.

Some advantages associated to building block standardization

Reusing existing specifications instead of defining them from scratch permits to speed up the standardization process.

A protocol component or an application performing a generic task can be implemented once and reused several times, leading to faster development and validation.

In some cases, building blocks can be re-arranged with others to create new solutions. I gave the example of session-based messaging which, by applying the concept of SIP session to instant messaging, permits to integrate IM as one component among others in a multimedia SIP session.