Tuesday, May 15, 2007
Enabling Multimedia Communication
In my first post about multimedia communication, I mentioned three challenges to be addressed by operators. Today, I will write about the very first one: making multimedia communication possible.
As I wrote in the last post, the architectural problem is in the application layer. Monolithic application servers, which can only accept or support one or two media in a SIP sessions, add obstacles, more than value, to the fulfillment of a powerful IMS application layer.
While an option is to bypass the IMS application layer and allow peer to peer multimedia communication between endpoints, it does not permit the operator to add much value compared to a pure Internet alternative (besides core network aspects such as QoS). Neither can the operator have much control on the content of end-to-end sessions. It is therefore required to design an IMS application layer that can support multimedia sessions.
Separation of SIP and Media Control
The approach is straightforward: it is necessary to extract SIP session control logic from monolithic application servers (e.g. push to talk, session-based messaging) and to establish a one to many relationship between this SIP-centric application and the various media manipulation entities. This relationship should be dynamic and depend from the media components that are part of a session at a certain time.
The SIP centric application is implemented on a SIP Application Server. Its functionality differs from the one of an S-CSCF because it supports added-value and/or control features that are not part of a standard S-CSCF. This functionality implies control of media manipulation entities, user-specific features (e.g. preferences, added-value features, controls), as well as possibly HTTP-based interactions with the user(s) (e.g. web pages, XCAP).
In the 3GPP architecture, a SIP Application Server controlling multimedia sessions can be likened to a combined AS/MRFC (Media Resource Function Control), and the media entities to a set of specialized MRFPs (Media Resource Function Processor). This one to many relationship does not exist in the MRFC/MRFP architecture, as the MRF was initially introduced as an entity dedicated to a single media: voice.
The required architecture actually exists in RFC 4353 describing the framework for SIP conferencing (i.e. multiparty sessions). In the RFC, the SIP entity is called focus and the media manipulation ones are called mixers.
Personal and Shared Multimedia Communication Entities
Both the 3GPP MRF and RFC 4353 architectures support multiparty sessions, and more especially the need to mix media from different sources when more than two parties are part of a session. This is what I called a shared service in a previous post, as the conference server provides support to all the participants in the conference. In the remainder of this post, I will use the terms multimedia focus and media mixer for entities related to shared multimedia communication.
There can also be personalized multimedia communication support for each participant of a multimedia session, whether this is a 2-party or a multiparty session. This provides, if needed, individualized control and added value features for each user. The SIP control entity for a user, that I will call personal multimedia controller, is inserted in the SIP session through the service profile (initial filter criteria) associated to the user identity.
As for the shared multimedia communication entities, there is a dynamic one to many relationship between the personal multimedia controller and the entities manipulating the media components, that I will call media intermediaries (as the media passes through the entity, possibly with modification / enrichment). As of now, I have not seen such a personal multimedia support architecture described in a specification.
As a consequence, a multiparty session between John, Mary and Paul may involve one multimedia focus associated to a set of media mixers to support the conference, as well as personal multimedia controllers and media intermediaries for each of John, Mary and Paul.
For a 2-party session between John and Mary, the multimedia focus and media mixers are not necessary.
Tight Media Component Control
A typical pattern that is likely to be used for the most important communication media components (e.g. voice, video, messaging) in a multimedia session is the following.
The multimedia focus or the personal multimedia controller is inserted in the SIP signaling path at the beginning of the session. Then, depending on the media negotiated at the start of the session and re-negotiated during the session, the SIP application inserts / removes media mixers or media intermediaries by placing corresponding IP address / port information in the SDP (session description protocol) part of the SIP INVITE or re-INVITE. The media component therefore appears as a media peer of the user's client in the session.
A control interface permits the SIP application to control the media entity. 3GPP selected H.248 for the interface between the MRFC and the MRFP, but other protocols could be used for other types of media.
This approach has the advantage to be transparent to the client. It also permits a very tight control of the media by the network. In short, the user does not see and cannot prevent application control of the media component in the end-to-end session.
However, this approach is not realistic for all media components, more especially for application related ones (e.g. application sharing, whiteboard, gaming).
A first variant is that, for a specific media component, there is no media mixer and/or media intermediary inserted in the media path. Media exchanges are performed end-to-end between endpoints. Potential media mixing can be performed by each client which receives media from all the other endpoints. In this approach, the operator does not control the media and does not add any non-core network value to its delivery. This is certainly applicable to a lot of application-specific interactions.
In the case where a media mixer or media intermediary is inserted in the media path, clients may take the responsibility for it, and not the SIP application in the network. The information required for the insertion (e.g. address of the media intermediary or media mixer) may be provided by the SIP application server, or acquired by any other means by the client. If there is a need for session participants to exchange information about media mixers (e.g. a whiteboard server), this can be done either under the control of the application server or not.
The control of the media mixer or media intermediary by the multimedia focus or personal multimedia controller may be tight, relatively loose, or may not exist at all (i.e. the control is purely performed by endpoints).
There might be multiple approaches, as I believe solutions for handling different types of media should be a compromise between control, the possibility to add value, flexibility, and time to market. You may imagine that a given media component is initially supported without any media intermediary/mixer in the network, which are added later when the operator has found ways to add value to the delivery of this particular media or wants greater control on it.
Support in Clients
Multimedia communication also requires relevant support in clients. Besides the ability of client APIs to support generic sessions filled in with any type of media, there is also the need for applications residing on the device to seamlessly be integrated into sessions. User Interface aspects should not be underestimated either, as it is important to support multimedia communication as intuitively as possible for the end user.
PS: My apologies for writing "media" everywhere in this post, but I found out that not everybody knows that "media" is the plural for "medium". It hurts me though, as I studied latin during 5 years. In the same vein, the "s" at the end of "Initial Filter Criterias" is not correct as "criteria" is already the plural of "criterion". But this is the way this is written in 3GPP specs.