Chapter 28

Content Handlers


CONTENTS


In this chapter you'll learn how to write content handlers to support the retrieval of objects by Web browsers. You'll also learn about the multipurpose Internet mail extensions (MIMEs) and how they are used to identify the type of objects that are provided by Web servers. You'll then develop a simple content handler and integrate it with a Web client program. This chapter builds on the material presented in Chapter 17, "Network Programming with the java.net Package."

Using Content Handlers

If you have been extensively involved with using your Web browser, you probably have encountered a number of external viewers or plug-ins that are used to supplement the capabilities provided by your browser. These external viewers are used to display and process files that are not normally supported by browsers.

Java supports additional internal or external viewers through the content handler mechanism. Content handlers are used to retrieve objects via an URLConnection object.

Content handlers are implemented as subclasses of the ContentHandler class. A content handler is only required to implement a single method-the getContent() method that overrides the method provided by the ContentHandler class. This method takes an URLConnection object as a parameter, and returns an object of a specific MIME type. You'll learn about MIME types in the following section of this chapter.

The purpose of a content handler is to extract an object of a given MIME type from an URLConnection object's input stream. Content handlers are not directly instantiated or accessed. The getContent() methods of the URL and URLConnection classes cause content handlers to be created and invoked to perform their processing.

A content handler is associated with a specific MIME type through the use of the ContentHandlerFactory interface. A class that implements the ContentHandlerFactory interface must implement the createContentHandler() method. This method returns a ContentHandler object to be used for a specific MIME type. A ContentHandlerFactory object is installed using the static setContentHandlerFactory() method of the URLConnection class.

Multipurpose Internet Mail Extensions (MIME)

Content handlers are associated with specific MIME types. Many Internet programs, including e-mail clients, Web browsers, and Web servers, use the multipurpose Internet mail extensions to associate an object type with a file. These object types include text, multimedia files, and application-specific files. MIME types consist of a type and a subtype. Examples are text/html, text/plain, image/gif, and image/jpeg, where text and image are the types and html, plain, gif, and jpeg are the subtypes. The URL classes provided by Java support the processing of each of these types; however, the number of MIME type/subtype combinations is large and growing. Content handlers are used to support MIME type processing.

Web servers map MIME types to the files they serve using the files' extensions. For example, files with the .htm and .html extensions are mapped to the text/html MIME type/subtype. Files with the .gif and .jpg extensions are mapped to image/gif and image/jpeg. The MIME type of a file is sent to Web browsers by Web servers when the server sends the designated files to the browsers in response to browser requests.

Developing a Content Handler

The first step to implementing a content handler is to define the class of the object to be extracted by the content handler. The content handler is then defined as a subclass of the ContentHandler class. The getContent() method of the content handler performs the extraction of objects of a specific MIME type from the input stream associated with an URLConnection object.

A content handler is associated with a specific MIME type through the use of a ContentHandlerFactory object. The createContentHandler() method of the ContentHandlerFactory interface is used to return a content handler for a specific MIME type.

Finally, the setContentHandlerFactory() method of the URLConnection class is used to set a ContentHandlerFactory as the default ContentHandlerFactory to be used with all MIME types.

A Simple Content Handler

This section presents an example of implementing a simple content handler. A bogus MIME type, text/cg, is created to implement objects of the character grid type. A character grid type is a two-dimensional grid made up of a single character. An example follows:

O
O
O
O
O
O
O
O
O

This example is a character grid object that is five character positions wide and five character positions high. It uses the O character to draw the grid. The grid is specified by a boolean array that identifies whether the drawing character is to be displayed.

This particular character grid is represented using the following text string:

55O1000101010001000101010001

The first character (5) represents the grid's height. The second character (also 5) represents the grid's width. The third character is the grid's drawing character. The remaining characters specify whether the draw character should be displayed at a particular grid position. A 1 signifies that the draw character should be displayed, and a 0 signifies that it should not be displayed. The array is arranged in row order beginning with the top of the grid.

The definition of the CharGrid class is shown in Listing 28.1.


Listing 28.1. The source code for the CharGrid class.

public class CharGrid {
 public int height;
 public int width;
 public char ch;
 public boolean values[][];
 public CharGrid(int h,int w,char c,boolean vals[][]) {
  height = h;
  width = w;
  ch = c;
  values = vals;
 }
}


The GridContentHandler Class

The GridContentHandler class is used to extract CharGrid objects from an URLConnection. Its source code is shown in Listing 28.2.


Listing 28.2. The source code for the GridContentHandler class.

import java.net.*;
import java.io.*;

public class GridContentHandler extends ContentHandler {
  public Object getContent(URLConnection urlc) throws IOException {
   DataInputStream in = new DataInputStream(urlc.getInputStream());
   int height = (int) in.readByte() - 48;
   int width = (int) in.readByte() - 48;
   char ch = (char) in.readByte();
   boolean values[][] = new boolean[height][width];
   for(int i=0;i<height;++i) {
    for(int j=0;j<width;++j) {
     byte b = in.readByte();
     if(b == 48) values[i][j] = false;
     else values[i][j] = true;
   }
  }
  in.close();
  return new CharGrid(height,width,ch,values);
 }
}


The GridContentHandler class extends the ContentHandler class and provides a single method. The getContent() method takes an URLConnection object as a parameter and returns an object of the Object class. It also throws the IOException exception.

The getContent() method creates an object of class DataInputStream and assigns it to the in variable. It uses the getInputStream() method of the URLConnection class to access the input stream associated with an URL connection.

The height, width, and draw character of the CharGrid object are read one byte at a time from the input stream. The values[][] array is then read and converted to a boolean representation. The CharGrid object is then created from the extracted values and returned.

The GetGridApp Program

The GetGridApp program illustrates the use of content handlers. It retrieves an object of the CharGrid type from my Web sever. I use the ncSA HTTPD server on a Linux system. I've set up the server's MIME type file to recognize files with the .cg extension as text/cg.

The source code of the GetGridApp program is shown in Listing 28.3.


Listing 28.3. The source code for the GetGridApp program.

import java.net.*;
import java.io.*;

public class GetGridApp {
 public static void main(String args[]){
  try{
   GridFactory gridFactory = new GridFactory();
   URLConnection.setContentHandlerFactory(gridFactory);
   if(args.length!=1) error("Usage: java GetGridApp URL");
   System.out.println("Fetching URL: "+args[0]);
   URL url = new URL(args[0]);
   CharGrid cg = (CharGrid) url.getContent();
   System.out.println("height: "+cg.height);
   System.out.println("width: "+cg.width);
   System.out.println("char: "+cg.ch);
   for(int i=0;i<cg.height;++i) {
    for(int j=0;j<cg.width;++j) {
     if(cg.values[i][j]) System.out.print(cg.ch);
     else System.out.print(" ");
    }
    System.out.println();
   }
  }catch (MalformedURLException ex){
   error("Bad URL");
  }catch (IOException ex){
   error("IOException occurred.");
  }
 }
 public static void error(String s){
  System.out.println(s);
  System.exit(1);
 }
}
class GridFactory implements ContentHandlerFactory {
 public GridFactory() {
 }
 public ContentHandler createContentHandler(String mimeType) {
  if(mimeType.equals("text/cg")) {
   System.out.println("Requested mime type: "+mimeType);
   return new GridContentHandler();
  }
  return null;
 }
}


When you invoke the GetGridApp program, provide it with the URL http://www.jaworski.com/java/chargrid.cg as a parameter.

The GetGridApp program's output is as follows:

C:\java\jdg\ch28>java GetGridApp http://www.jaworski.com/java/chargrid.cg
Fetching URL: http://www.jaworski.com/java/chargrid.cg
Requested mime type: text/cg
height: 5
width: 5
char: j
jjjjj
  j
  j
j j
 jj

C:\java\jdg\ch28>

This connects to my Web server, retrieves the chargrid.cg file, extracts the CharGrid object contained in the file, and displays it on the console window. The character grid object displays a grid of j characters.

The main() method creates an object of the GridFactory class, which implements the ContentHandlerFactory interface. It then sets the object as the default content handler. An URL object is created using the URL string passed as the program's parameter. The getContent() method of the URL class is then used to extract the CharGrid object from the URL. The getContent() method results in the GridFactory object assigned to the gridFactory variable being invoked to retrieve an appropriate content handler. An object of class GridContentHandler is returned and its getContent() method is invoked to extract the CharGrid object. This is performed behind the scene as the result of invoking the URL class's getContent() method. The CharGrid object is then displayed.

The GetGridApp program defines the GridFactory class as a ContentHandlerFactory. It implements the createContentHandler() method and checks to see if the MIME type passed to it is text/cg. If it is not, the null value is returned to signal that the Java-supplied content handler should be used. If the MIME type is text/cg, the requested MIME type is displayed, and a GridContentHandler object is returned.

Summary

In this chapter you have learned how to write content handlers to support the retrieval of objects by Web browsers. You have learned about the multipurpose Internet mail extensions and how they are used to identify the type of objects that are provided by Web servers. You have developed the GridContentHandler class and integrated it with the GetGridApp program. Chapter 29, "Protocol Handlers," shows you how to integrate custom protocol handlers into your Web-based applications.