Donglok Kim, James E. Cabral Jr., and Yongmin Kim
Image Computing Systems Laboratory
Department of Electrical Engineering, Box 352500
University of Washington, Seattle, WA 98195-2500
Programmable multimedia imaging workstations over a high bandwidth network (ATM) were used to explore the role of multimedia workstations in the telemedicine application. The multimedia workstation is based on the MediaStation 5000 which uses a TMS 320C80 Multimedia Video Processor to handle the multimedia bitstreams and image display/processing functions. Although the telemedicine workstation exhibited its high performance and programmability, we found from the experiment that a tighter integration of the multimedia capability with the networking component should be one of the mostly desired improvements for the telemedicine system to become applicable in routine clinical environments.
Keywords: telemedicine, ATM, MediaStation 5000, MPEG, multimedia, medical image processing 1.
Telemedicine is generally defined as the use of telecommunications and computer technologies with medical expertise to facilitate remote health care delivery. Although there are a variety of possible applications, the purpose of telemedicine is to enable health care providers to exercise their expertise at the location of the patient or other collaborating care providers using a combination of video, audio, and externally-acquired images through the networking environment among hospitals, clinics, and remote locations. Some example applications of the telemedicine are:
Some of the potential benefits of telemedicine can be summarized as 1:
Telemedicine began in the 1950s as closed circuit systems presented at national medical society meetings, with conferences or presentations of major surgical procedures in auditoriums or conference rooms. A strong acceptance was shown in military appli cations such as Operation Restore Hope in Somalia and Croatia. Despite its enormous potential and benefits, however, telemedicine has found minimal use up to this time, other than in the military, totaling about 100 telemedicine units in North America3. The cost of telecommunications and equipment and the lack of infrastructure, standards, evidence in cost effectiveness and cultural acceptance were among those factors that discouraged its adoption. Although there have been attempts to reduce the cost by utilizing the computer communication network, they were technically limited largely due to slow network speed and the lack of real-time audio/video compression technology. With the ongoing technical advances in such areas as telecommunications, imaging, multimedia, computers, and information systems, interactive telemedicine is becoming increasingly possible. In this paper, we will focus on the technical issues of telemedicine that are related to the interact ive audiovisual communication over a high-speed communication network.
The telemedicine system should support various telemedicine and consultation functions to collaboratively transfer, manipulate, and view radiological images, image sequences, audio, and video. In order to gain hands-on experience on telemedicine and its requirements, we have developed a prototype telemedicine workstation which consists of a MediaStation 5000 (MS 5000)4,5 with a Texas Instruments TMS320C80 MVP (Multimedia Video Processor), a Fore Systems ATM (Asynchronous Transfer Mode) network adapter card, and a graphical user interface. Our experiment6 was performed on two telemedicine workstations over the Western Was hington Local Access Transport Area (LATA) Integrated Optical Network (LION) Sonet Ring using ATM. The experiment was conducted as the first phase of the two-phase project called "Seahawk". Project Seahawk is a regional telemedicine program in the Pacific Northwest with Madigan Army Medical Center (MAMC) as the hub connecting various military and other federal hospitals and clinics utilizing the state-of-the-art technologies. Although the expected network performance of ATM seems quite promising for the t elemedicine application, we have found through our experiments that several system issues should be addressed for a telemedicine system to be clinically useful. When the systems are connected over the network, the performance is usually scaled down due to slower-than-the-peak performance of the workstation's interface, the latency from the operating system's interrupt response time, and the overhead of the user's application program. We have been focusing our efforts on identifying the potential bottlenecks and issues in telemedicine and establishing the telemedicine's requirements. In this paper, we present our experience with the prototype telemedicine workstation we developed as well as the unique requirements for a successful telemedicine system.
A telemedicine system has a unique set of requirements which distinguishes itself from a normal teleconferencing system in many respects. In the case of real-time consultation of obstetrical ultrasound of potentially high-risk pregnancies, the ultr asound study consisting of images, color flow, and Doppler spectral and auditory information of good quality needs to be transmitted in real time. For dermatology applications, a high-resolution camera with either a low frame rate or still image capture c apability rather than standard video at 30 frames/sec might be required. Many image processing and graphics functions are often necessary when analyzing medical images to make a primary diagnosis or plan a treatment. This ranges from window and level adju stment, magnification and minification, digital magnifying glass, image mensuration, adaptive histogram equalization, unsharp masking, and convolution to 3-dimensional visualization, texture measurements, volume measurements, spatial registration, lung no dule screening, microcalcification detection, and stereotactic surgical planning. With regard to distance, instead of having the physicians or patients travel between the facilities, a telemedicine system could be used to transport the necessary informati on about the patient via a telecommunications channel. However, when the medical facilities are located only an hour or two apart, physicians must perceive the system as an acceptable alternative to physically referring the patient to another hospital. To meet the clinical requirements of a regional telemedicine system, it must be fast, reliable, easy to use, and provide excellent image quality. Otherwise, a physician will choose to have the patient travel to the other medical facility bypassing the telemedicine system.
From the above-mentioned and other scenarios, the requirements can be divided into three categories: telemedicine workstation, communication network, and the human perception of media.
A telemedicine workstation prototype we have developed is a medical imaging workstation with added multimedia capabilities for teleconferencing. Each workstation consisted of a dual-monitor, 80486-based host with at least 16 MB of RAM, a 300 MB hard drive, an MS 5000 multimedia card and a Fore Systems ATM network adapter card (Figure 1). The host PC used Windows NT for its graphical user interface on one monitor while the second monitor was used for image and video presentation and was controlled directly by the MS 5000. Two such workstations were developed and were installed in the Image Computing Systems Laboratory at the University of Washington and in the Department of Radiology at Madigan Army Medical Center, approximately 50 miles apart. Each system was connected via fiber-optic cable to a Fore Systems ATM switch in each location. Each switch was then connected via coaxial cable to a US West multiplexer/converter at each site which was connected via fiber optic cable to the LION Sonet ring and the other site.
Figure 1. Telemedicine workstation
The MS 5000 was chosen to provide the necessary compression and image, video and audio processing. It is a single board multimedia system capable of digitizing audio and video, displaying up to 1280 x 1024 pixels, and performing 2 billion operations per second using Texas Instruments' TMS320C80 MVP (Multimedia Video Processor)9 . The MS 5000 can generate the MPEG-1 bitstream in real time for the SIF-sized (352 x 240) video and audio (44.1 kHz, 16-bit stereo) inputs. Since the MVP is programmable, it can also perform other image processing tasks as needed. Figure 2 is a block diagram of the MS 5000 system. A 64-bit DRAM memory bank, consisting of two 32-bit wide single in-line memory modules (SIMMs), is connected directly to the MVP's data lines to minimize buffer delays. Using the DRAM's page mode allows us to achieve one memory access (64 bits) every 2 clock cycles. Since most of the memory accesses are targeted to DRAM, the system is able to maintain high bus throughput. Video input decoding is performed by a single-chip video decoder. This chip accepts NTSC, PAL, or S-Video as input, digitizes the incoming video data, and outputs the pixels in CCIR 601 4:2:2 (Y-Cr-Cb) format. This 16-bit video data is written alternately to two 64 x 16-bit FIFOs expanding the data to 32 bits to match the width of the video bus. The serial outputs from the video buffer and frame buffer are combined in the RAMDAC, which uses a chroma key and a programmable window to control the size of location of the video window. The MS 5000 system uses an audio codec for audio input and output. The codec is a single chip that contains 16-bit stereo A/D and D/A converters and supports sampling frequencies up to 48 kHz. Data can be manipulated in mono or stereo, 8 or 16 bits per sample in linear or companded (µ-law or A-law) formats. Audio data is buffered bidirectionally through two 4k x 8-bit FIFOs. The dual-port SRAM is the buffer for the host interface to the host's VESA local bus. For future expansions of the peripherals, there is a 50-pin external expansion connector that provides a generic bus interface.
ATM is inherently connection-oriented in that both the delay and the bandwidth are guaranteed for each virtual circuit. These guarantees make ATM an ideal framework for combining voice, video and data services concurrently and efficiently on a single network. We selected ATM as the networking choice for our telemedicine system based on its high bandwidth capabilities and the provisions for guaranteed bandwidth and maximum delays. Thus, each telemedicine workstation included a Fore Systems ESA-200PC ATM adapter card. These cards are designed for the EISA bus and, are therefore limited at the host interface by the 33 MHz bus speed. The network interface is a Transparent Asynchronous Transmitter/Receiver Interface (TAXI) with a maximum 100 Mbps transfer rate. Each ATM adapter was connected via fiber to a Fore Systems ASX-200 ATM switch. These switches are each roughly the size of a PC and are capable of networking up to 16 devices, supporting up to 12 full OC-3 (155 Mbps) circuits simultaneously. Each switch included a 4-port LAN module and a 2-port DS-3 (45 Mbps) WAN module.
The Local Exchange Carriers (LECs) serving western Washington State have cooperatively developed an advanced network known as the LATA Integrated Optical Network (LION). It is an optical fiber ring designed to provide high bandwidth, reliability, and survivability. Most sites included in Project Seahawk can be served easily by the LION ring. Those not reachable via the LION network require alternative networking strategies and are likely limited to reduced bitrates such as T1 (1.54 Mbps) or ISDN (128 kbps). Interfaces from the LION network to dedicated T1 connections or ISDN sites are available through the local carriers. Although the connections to the LION fiber ring were limited to DS-3 speeds (45 Mbps) for the demonstration, the ring is capable of supporting OC-3 speeds (155 Mbps) and higher.
There are a number of network protocol stacks to choose from when designing a telemedicine system (Figure 3).
The MPEG-1 and MPEG-2 standards define syntax for digital bitstreams containing audio and video. MPEG-1 supports compression of VHS quality video and CD quality audio into a 1.5 Mbps bitstream or higher quality at proportionally higher bitrates10. MPEG-2 supports compression of wide ranging quality video and audio beyond that of MPEG-1 and approaching that of HDTV11.
The H.320 family of international teleconferencing standards provides for simultaneous audio (G.700), video (H.261) and data transfer (T.120) using communication bitrates from 56 kbps to 1.92 Mbps12. H.320 is designed to work with the range of bitrates available using ISDN. Automatic negotiation between connected sites through H.221 and H.242 allows dynamic assignment of bits to individual audio and video channels based on the multimedia capabilities at each site and the available bandwidth. Additional connections can be established as more bits are required and both audio and video compression rates can be adjusted up and down to match limited bitrates. Compatibility with H.320 ensures interoperability with the widest range of third-party teleconferencing systems.
The ATM Forum has established a family of ATM Adaptation Layers (AALs) for the various types of data that will be carried over ATM networks. AAL1 and AAL2 include end-to-end timing information with AAL1 supporting constant bitrate (CBR) traffic and AAL2 supporting variable bitrate (VBR) traffic. Because end-to-end timing adds additional overhead and is not clearly required by telemedicine applications, we have only considered the remaining adaptation layers which do not include timing information. AAL3/4 supports variable bitrate and is optimized for compressed, continuous data streams such as video or audio. AAL5 is a simplified adaptation layer designed for maximum efficiency and compatibility with other LAN protocols such as TCP/IP.
The Internet Protocol (IP) family of protocols forms the common language for the global packet-switched network known as the Internet. IP is inherently connectionless-based. Transport Control Protocol (TCP) supports error correction, packet ordering and acknowledgments. User Datagram Protocol (UDP) provides minimal overhead but without any of the benefits of TCP listed above, but with a resulting increase in efficiency. Point-to-Point Protocol (PPP) supports the layering of IP and other packet-switched protocols on connection-oriented bitstreams including typical DS-0 serial lines and ISDN.
For our experiment, MPEG-1 was used for video compression in order to maximize video quality for diagnostic video (e.g., ultrasound) within the available processing and bandwidth limitations. Audio was sampled in stereo at 44.1 kHz using 16 bits/sample in both directions to provide CD-quality two-way audio for teleconferencing. Data, including static medical images, and system layer messaging were transferred uncompressed. Each of these data streams (video, audio, data, system layer) was assigned to an IP socket and transferred using WinSock TCP/IP layered over AAL5. A different logical channel is created for each type of data, including video, audio, images, and system messages. Data transfers are coordinated through the system message channel. Whenever one workstation needs to start sending video, audio, or image data, a system message is sent. Information is also sent on the system channel that ensures the two workstations are in sync, including cursor location and image manipulation commands.
The graphical user interface was implemented using Microsoft Windows NT 3.5 and the object-oriented Microsoft Foundation Classes of Visual C++. The menus and toolbars are available to both users simultaneously and are synchronized so that the available functions and images are presented identically on the telemedicine workstations located at both ends. The toolbars provide quick access to the common image processing functions including zoom, shrink, pan, window/level, cine, synchronization of multimodal images, horizontal and vertical flip and 90 degree rotations. As an image set is opened at either site, it is immediately transferred to the remote site and the two image sets are synchronized. Two separate cursors, white and gray, are controlled by the local and remote users respectively. Real-time image processing functions such as window/level and panning can then either be controlled locally or remotely. Local control provides real-time feedback to the user and updates the remote display at the completion of each operation (e.g., release of a mouse button in window/level). Multiple image sets can be open at once and real-time video can be transferred simultaneously with image manipulation.
In order to support basic medical image manipulation in addition to the real-time MPEG encoding/decoding, the following image processing functions were implemented on the MS 500013:
All of these operations are supported by software. For example, the window/level operation calculates for each pixel in the image a new gray level through multiplication, addition, and clipping. On a 512 x 512 image, the window/level operation is performed in 8 ms, zoom in 9 ms, shrink in 2 ms, 90 degree rotate in 5 ms, and flip in 4 ms. Initially, zoom and shrink operate only in increments of two. In other words, the image size can only be doubled or halved, although this can be done multiple times.
In January 1995, a telemedicine demonstration was conducted between the University of Washington and Madigan Army Medical Center using the telemedicine system prototype. Many physicians from the University of Washington, Madigan Army Medical Center and the Seattle Veterans Administration, were present at both ends as chest X-ray, CT and MR images and ultrasound video were exchanged, manipulated and discussed between the two sites. Response from the physicians observing the demonstration was positive with considerable anticipation for using an improved version of the system clinically. The areas that can be improved are discussed later in this section. All physicians questioned felt that the video quality was satisfactory for ultrasound consultation and that the quality of the medical images was excellent. However, audio delays averaging 0.5 seconds and up to one second were long for casual consultation and image transfer times of 0.33 seconds per 512x512x12 bit CT image and 6 seconds per 2Kx2Kx16 bit CR image could still be improved. By compressing these images, these transmission times can be reduced by a factor of 5. Image display and processing functions performance by the MS 5000 was accepted as excellent. The real-time 16-bit window/level, zooming and panning and cine operations were each executed quickly enough to maintain a high level of interaction to the local user.
In tests of network performance, maximum throughputs using TCP/IP and UDP/IP were measured at roughly 13 and 15 Mbps, respectively. These throughputs varied considerably over time apparently as a function of the availability of the host processor. This shows less than 33% of the peak performance could be achieved. In order to provide the high data throughput, the ideal solution would be to integrate the MS 5000 and ATM adapter together via a daughter card implementing a dedicated, high-speed communication channel through the external interface in the MS 5000. Two anticipated improvements to the current telemedicine system include upgrades from the current VESA local bus to a PCI bus and from the current 80486 host CPU to a Pentium. PCI is capable of supporting and sustaining higher transfer rates than VESA and will be supported in more architectures. Pentium processors offer up to 100% improvement in host processing power over 80486s, which will substantially increase the overall network performance by removing a bottleneck from the host.
Other necessary improvements for the revised telemedicine system include the use of a new TMS320C80 processor with a higher clock frequency to speed up compression and image processing, reduce delay and increase interactivity in the system. It will also improve support for multiple simultaneous media streams. Combined with an improved video digitizer, it will yield higher-quality video. In addition, video/audio conferencing at low bitrates as well as interoperability with other teleconferencing systems from different manufacturers can be provided through support of the H.320 codec standard.
By using the programmable nature and high processing performance of the MS 5000, we have integrated a prototypical workstation and evaluated it as a regional telemedicine system which requires high performance, flexible and upgradable workstations with several key telemedicine functions tightly integrated. The workstation provides functions for fast medical image display and manipulation as well as more telemedicine specific features such as collaborative viewing, multimedia data (image, video, audio, and text) transmission and cursor sharing. With its graphical user interface and high performance, the system is both easy and interactive. We found, however, that for a successful telemedicine system, various media (images, video, audio, graphics and text) need to be seamlessly integrated and supported in a single workstation. While the smooth integration of multiple real-time media streams is still a technical challenge and the required telecommunications infrastructure is still maturing, especially in rural areas, future research should encompass a variety of related issues including user interfaces, compression, medical equipment interfaces, and adaptation for specific clinical applications.
6. Y. Kim, J. E. Cabral Jr., D. M. Parsons, G. L. Lipski, R. H. Kirchdoerfer, A. Sado, G. N. Bender, and F. Goeringer, "Seahawk: A Telemedicine Project in the Pacific Northwest", SPIE Proceedings, Medical Imaging, Vol. 2435, 1995.
13. D. M. Parsons, J. E. Cabral Jr., Y. Kim, G. L. Lipski, and M. S. Frank, "MediaStation 5000: A Multimedia Workstation for Telemedicine", SPIE Proceedings, Medical Imaging, Vol. 2431, 1995, pp. 382-387.