Skip to content.
Personal tools
You are here: Home » Discussion Area » Views » On Human-Centered Multimedia

On Human-Centered Multimedia

Document Actions

May 25, 2005

 

Alejandro Jaimes

FXPAL Japan, Fuji Xerox Co., Ltd.

 

While thinking about the topics that I wanted to cover in this article, my daily encounters with multimedia in the “real” world came to my mind. Using some of these as examples, I will briefly discuss some thoughts on what I consider three key factors in the development of future multimedia systems: (1) the role of culture; (2) integration of sensors; (3) access outside the desktop by a wide range of users.

 

I will argue that developing multimedia systems requires a human-centered approach. By human-centered I mean an approach in which the user is the starting point and in which social and cultural factors are quantified at multiple levels and incorporated into computational frameworks.

 

Culture, Deployment, and Access

 

I live in Japan. When one enters any establishment in Japan, there is an immediate “irashaimase” greeting (welcome!) by store employees. But it is also often automatic: when I enter an elevator, approach an ATM, or a metro ticket vending machine, a sensor activates, and I am greeted by multimedia cartoon characters that speak to me (welcome! going down! all information will be displayed in English!). The characters do not speak like computers—they speak like Japanese sales clerks (high-pitch voices of very specific characteristics). They even bow. The ATM welcomes me before I touch it, the elevator greets me as I enter, and the toilet seat goes up when I open the door of the restroom in a restaurant (another welcome sign).

 

It is interesting to consider these systems while thinking about multimedia. Although the interfaces are very primitive and some of them are not really multimedia computing systems, there are several important characteristics to consider: (1) they act according to the cultural context in which they are deployed; (2) they integrate different types of sensors for input and communicate through a combination of media; (3) they are deployed outside the desktop and they are meant to be accessed by a diversity of individuals.

 

Let me expand on these points.

 

  • Cultural factors: culture plays an important role in human-human communication because the way we interpret signals and symbols depends entirely on our cultural background. Multimedia systems should therefore be able to use cultural cues during interaction (e.g., ATM cartoon character bowing), as well as during analysis (e.g., algorithms to automatically analyze news broadcast from different countries, meeting videos, or any other type of content). Without a doubt, the differences in semantic content span every level of the multimedia content pyramid: low-level features (colors have strong cultural interpretations) to high-level semantics (consider differences in communication style between Japanese and American businessmen). In spite of this, the majority of work in multimedia assumes a one-size-fits all model in which the only differences between systems deployed in different parts of the world (or using different input data) is language. Multimedia systems should include culture-specific models at every level of content and interaction in multimedia communication.

 

  • Integration of sensors and multiple media: most of the systems I describe employ simple motion sensors that have been available for years (e.g., in washrooms and for automatic doors). New ATMs in Japan use biometric technology (palm and index readers) to verify identity, and some tour buses use GPS technology to automatically project tour guide videos as the bus passes tourist attractions. What is interesting about these applications is how, even at primitive levels, information from networks and sensors is integrated with other types of inputs and outputs. In spite of great efforts in the multimedia research community, the integration of multiple media (in analysis and interaction) is still in its infancy. Our ability to communicate and interpret meanings depends entirely on how multiple media is combined (body pose, gestures, tone of voice, choice of words, etc.), but most research on multimedia focuses on a single media model. We need new mathematical models that truly integrate multiple sources and media.

 

  • Access: on one hand are mobile systems such as 3G mobile phones used to create and access multimedia content. On the other hand are non-mobile systems (e.g., ATMs, ticket vending machines, etc.). An interesting characteristic of this second group of devices is that since they are deployed in public spaces, they are designed to be used by anyone (no need to read extensive manuals). Computing is migrating from the desktop, at the same time as the span of users is expanding dramatically to include people who would not normally access computers. This is important because although in industrialized nations almost everyone has a computer, a very small percentage of the world’s population owns a multimedia device (millions still do not have phones). The future of multimedia, therefore, lies outside the desktop, and multimedia will become the main access mechanism to information and services across the globe. Multimedia systems with which people can naturally interact (considering cultural context) are the key in allowing everyone to access a wide range of resources critical to economic and social development

 

A Human-Centered Approach

 

Human-centered multimedia systems should be multimodal (inputs and outputs in more than one modality or communication channel). They must also be proactive (understand cultural and social contexts and respond accordingly), and be easily accessible outside the desktop to a wide range of users.

 

A human-centered approach to multimedia parts from user models that consider how humans understand and interpret multimedia signals (feature, cognitive, and affective levels), and how humans interact naturally (cultural and social context as well as personal factors such as emotion, mood, attitude, and attention).

 

Inevitably, this means considering some of the work in fields such as psychology, communications research, HCI, and others, and incorporating what is known in those fields in mathematical models that can be used to construct algorithms and computational frameworks that integrate different media. Machine learning integrated with domain knowledge, automatic analysis of social networks, data mining, sensor fusion research, and multi-modal interaction will play a special role. More research into quantifying human-related knowledge is necessary, which means developing new theories (and mathematical models) of multimedia integration at multiple levels.

 

The Future is Bright

 

Multimedia computing offers, for the first time in history, real possibilities of human-like interaction with machines. This is very significant because technology has traditionally played a crucial role in development and multimedia can make the difference in the democratization of technology (access to all). That is crucial because computational technology is becoming the gateway to all basic human resources.

 

We are still far from achieving human-like interactions with machines and most of the world’s population does not have access to technology. A human-centered approach, however, contributes to making interaction more natural and will ultimately make technology more accessible to everyone.

 

Many technical challenges lie ahead and in some areas progress has been slow. With the cost of hardware continuing to drop and the increase in computational power, however, there have been many recent efforts to use multimedia technology in entirely new ways. One particular area of interest is “new media art.” Many universities around the world are creating new joint art-computer-science programs in which technical researchers/artists create artworks that combine new technical approaches or novel uses of existing technology with artistic concepts. What is interesting about some of these works is that technical novelty is introduced while many of the issues described above are considered: cultural and social context, integration of sensors, migration outside the desktop, and access.

 

Technical researchers need not venture into the arts to develop human-centered multimedia systems. In fact, in recent years many user-centered multimedia applications have been developed (e.g., smart homes and offices, etc.). However, more efforts are needed and the realization that multimedia research, except in very specific applications, is meaningless if the user is not the starting point.

 

Figures

Toilet:

The toilet seat goes up automatically as the customer opens the door to the restroom. It closes when he closes the door. Many of these toilets have a water jet spray used to wash and massage the buttocks, and warm the toilet seat and play a range of melodies while in use: chirping birds, rushing water, tinkling wind chimes, or traditional Japanese harp, among others. An article I read claimed that more than half of Japanese homes have such electric toilets, a rate higher than personal computers.

 

Store:

It is customary to welcome customers with a loud “irashaimase” (welcome). This can be tiring for the store clerks, so in some places sensors have been installed so that customers are automatically by a recording welcomed as they enter (and thanked as they leave).

 

Restaurant:

Restaurant figure

Some restaurants have wireless touch screens so customers can order. The screens can be passed around the table just like a menu. The waiters only show up when the food is ready or if they are called, by pressing the waiter icon.

 

Street:

Maybe nowhere in the world more than in Shibuya, a crowded, young area of Tokyo, are pedestrians bombarded with videos, images, and sounds. Everyone has a cell phone.

 

ATM:

ATM machine activates when it is approached. Unlike in western ATMs, the customer first chooses the desired option and then inserts the card.

 

Elevator:

Almost all elevators in Japan speak to indicate if they’re going up or down and which floor they’re on.

 

Navigator:

Car GPS navigation systems have cartoon characters that bow and speak to the driver.

 

Photo booth:

Japanese photo sticker booths are very popular. Users receive instructions (cartoon character that speaks) and can customize and modify their photos prior to printing.

 

Train vending machine:

The screen turns on and the cartoon character bows as the machine is approached.

 

Train:

Train figure

Monitors on trains show the map and are also used for advertisement (no sound though).

 

All photos, text, and videos © 2005 Alejandro Jaimes. All rights reserved.

 

Created by sigmmuser
Last modified 2006-03-27 09:17 AM
 

Powered by Plone