| Home > Core Java FAQ
> Networking FAQ |
| Networking |
| URL
Connections(16) * Internet
Addresses(14) * Sockets(23)
* Security(01) * Miscellaneous
(14) |
| |
|
Q . What's the difference between a URL instance and a URLConnection instance?
|
Ans :
A URL
instance represents the location of a resource, and a URLConnection
instance represents a link for accessing or communicating with the
resource at that location.
The
URL class provides an abstraction of a Uniform
Resource Locator (URL), the World Wide Web's basic type of
pointer. A URL specifies where and how (by which protocol) to
reach a resource; it does not specify the contents at that
location.
The URLConnection
represents a connection to the resource specified by a URL. It
provides general connection support both for the well-known
protocols such as http and for custom protocols that you might
create. You can use a URLConnection instance to
inspect and set properties of the connection (e.g., whether the
connection can be used for output in addition to input), to get
information from the URL (e.g., content length and header fields),
and to get input and output streams for moving data through the
connection.
|
|
Q .
How do I make a connection to a URL?
|
Ans
:
You obtain
a URL
instance and then invoke openConnection
on it.
URLConnection
is an abstract class, which means you cannot directly create
instances of it using a constructor. Nor would you want to,
because the type of connection you need depends on the protocol
specified in the URL. The URL class's openConnection method
manages these details for you. When you invoke openConnection
on a URL instance, you automatically get the right kind of
connection (subclass of URLConnection) for your URL.
When you create a URL
instance or invoke openConnection, remember to handle
the exceptions that might be thrown (if not, the compiler will
remind you):
URL url;
URLConnection connection;
try {
url = new URL (...);
connection = url.openConnection();
} catch(MalformedURLException e) {
// ... handle exception from URL constructor
} catch(IOException e) {
// ... handle exception from URL.openConnection
}
Obtaining
a URLConnection instance is merely the first step. To
inspect or communicate with the resource at the other end, you
need to set up input or output streams for the connection
|
|
Q . How do I read from a remote file if I have its URL?
|
Ans
:
Get a URL
connection from the URL, get an input stream from the connection,
and then read from that stream according to the type of data you
expect.
A
URLConnection instance manages the connection between
your program and a URL, but it delegates much of the actual work
to other objects. For example, you do not directly send data to or
receive data from a URLConnection instance. Instead,
you ask the connection for an input stream or output stream and
then transfer data through that stream. To obtain finer control
over the data flow, you can wrap a stream filter (an instance of a
subclass of FilterInputStream or FilterOutputStream)
around the basic input or output stream.
Below are the typical steps for
reading from a file:
- Create a URL
instance that points to the file you want to read.
- Invoke
openConnection
on that URL instance.
- Invoke
getInputStream()
to get an InputStream object from the connection.
- Wrap an instance
of an appropriate
FilterInputStream subclass
around the basic input stream and read from it.
- Close the
InputStream.
For convenience, URL's
openStream method combines steps 2 and 3. The code
fragment below exemplifies the process:
/* using JDK 1.0.2: */
URL url = null;
URLConnection connection;
String urlString = "http://java.sun.com/";
String currentLine;
DataInputStream inStream;
try {
url = new URL (urlString);
} catch(MalformedURLException e) { /* ... */ }
try {
connection = url.openConnection();
inStream = new DataInputStream (connection.getInputStream());
while (null != (currentLine = inStream.readLine())) {
System.out.println(currentLine);
}
inStream.close();
} catch (IOException e) { /* ... */ }
To
write equivalent code using the JDK 1.1, perform the standard
conversion from byte-oriented input streams to character-oriented
readers.
Note:
In an applet, similar code could read data only from the host that
originally delivered the applet code. In general, applets loaded
over the net can make network connections only back to the host
they were loaded from.
|
|
Q . Why do I get a null result when I use the getHeader... methods in the URLConnection class?
|
Ans
:
There are
two main sources of null
results from getHeader... methods:
you requested a specific header field that doesn't exist for the
present connection, or you used a getHeader...
method that is not fully implemented in the JDK 1.0.2 (in other
words, a bug).
The
URLConnection class provides two different ways to
request header information: by specific header type and by
position in the overall header list. The general way to request
the value of a specific header field is to ask for it by name,
using the getHeaderField(String) method:
public String getHeaderField(String name)
If the named header
field exists, this method returns a String instance
representing the field's value; otherwise the method returns null.
For convenience, the URLConnection class also
provides methods to access standard header fields and return
digested numerical values where appropriate:
public
String getContentEncoding()
public int
getContentLength()
public
String getContentType()
public
long getDate()
public
long getExpiration()
The
second approach is to retrieve the header fields by position
rather than by name. For this you use the pair of methods that
index the header fields starting with one (not zero):
public
String getHeaderFieldKey(int index)
public
String getHeaderField(int index)
For example, the
following method iterates through all the header fields for the
given URLConnection instance:
public void printHeaders(URLConnection connection) {
for (int i = 1; true; ++i) {
String headerKey = connection.getHeaderFieldKey(i);
if (headerKey == null) {
break;
}
System.out.println(" Header " + i + ": "
+ headerKey + ": "
+ connection.getHeaderField(i));
}
}
In the JDK 1.0.2, the
two methods for retrieving headers by index always return null—this
is a bug, not a feature. The JDK 1.1 has fixed this bug, as you
can verify by running the GetHeaderExample sample
code within the JDK 1.1 virtual machine.
|
|
Q . What is URLConnection's getOutputStream method intended to work with on the server side?
|
Ans
:
The output
stream you get from a URL connection usually hooks up with an
http-related process on the server side, such as a CGI script.
Intermachine
(network) communication requires cooperating processes at the two
ends. With a URL connection, one end is a Java Virtual Machine
running your application or applet, and the other end is some
server process—using http, ftp, or some other protocol specified
in your URL. Reading from a URL connection's input stream is the
common and simple case. A general-purpose server process, such as
a web or ftp server, can locate and send back a copy of the
resource you request.
Writing to a URL connection's output
stream, however, is more restricted, because of the actions
required on the server side. Http servers, for example, can't
simply create or write to arbitrary files specified in a URL. They
must use special processes that have been configured explicitly to
accept input across the net. When you send data through a URL
connection, you will most likely be sending it to a CGI (Common
Gateway Interface) process. (A telltale "cgi-bin" in a
URL usually gives away that you are communicating with a CGI
process.)
Note: The default behavior
for URL connections is to disallow output streams. The current JDK
implementations (1.0.2 and 1.1) define getOutputStream
in the URLConnection class to throw an UnknownServiceException:
/* In URLConnection.java (JDK 1.0.2 and 1.1) */
public OutputStream getOutputStream() throws IOException {
throw new UnknownServiceException(
"protocol doesn't support output");
}
To enable output, a URLConnection
subclass must override getOutputStream, as Sun's JDK
implementation does in its HttpURLConnection class in
order to allow posting to http servers
|
|
Q .
How do I send data from my Java program to a CGI program?
|
Ans :
You can use
an http GET request by packing your data into the query string of
the URL, or you can use an http POST request by sending your data
through an output stream obtained from the URL connection.
The
way you send data to a particular CGI (Common Gateway Interface)
program depends on what that program has been written to handle.
CGI programs generally expect to receive their data either as an
http GET request or as an http POST request. In Java, URLConnection
instances support sending data through both mechanisms.
GET requests are the basic http
mechanism for fetching material from the World Wide Web. On the
server side, a GET request typically causes the server to locate
the resource indicated by the URL and send it back to the client.
CGI programs can piggyback on the GET mechanism by looking for a
query string at the tail of the URL:
http://<machine-name>/<path-to-cgi-program>?<query-string>
The information in the
query string is made accessible to the program as the value of an
environment variable called QUERY_STRING. The CGI program then
generates its output and sends that to the client.
To use a GET request from the client
side, a Java program must create an appropriate URLConnection
instance, obtain an input stream from the connection, and then
read from that stream. The following code fragment, for example,
simply reads and prints the CGI program's output:
/* http GET + query string — using JDK 1.0.2: */
DataInputStream inStream;
String urlString = "http://hoohoo.ncsa.uiuc.edu/cgi-bin/test-cgi"
String queryString = "arg1=val1"
+ "&arg2=val2"
+ "&arg3=val3"
+ "&arg4=val4"
+ "&arg5=val5";
urlString += ("?" + queryString);
try {
String currentLine;
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
inStream = new DataInputStream(connection.getInputStream());
/* Read the server's response and close up. */
while (null != (currentLine = inStream.readLine())) {
System.out.println(currentLine);
}
inStream.close();
inStream = null;
} catch (Exception e) {
// ... handle exception
} finally {
if (inStream != null) {
inStream.close();
}
}
GET requests are
appropriate when the CGI program needs relatively little
information per interaction.
Alternatively, a client program can
make an http POST request. A POST lets you send an arbitrary
amount of information to the CGI program, separate from the URL
used to initiate the connection. The URL thus has no query string;
it specifies only the path to the CGI program:
http://<machine-name>/<path-to-cgi-program>
After creating a URLConnection
instance, your program needs to prepare the connection for a POST
(setDoOutput(true)), obtain an output stream from the
connection, write data to it, and close it. After that, reading
the output from the CGI program works the same as for the GET
request just described. The following code fragment illustrates
these steps:
/* http POST — using JDK 1.0.2: */
DataInputStream inStream;
PrintStream outStream;
String urlString = "http://hoohoo.ncsa.uiuc.edu/cgi-bin/test-cgi"
String dataString = "arg1=val1"
+ "&arg2=val2"
+ "&arg3=val3"
+ "&arg4=val4"
+ "&arg5=val5";
try {
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
outStream = new PrintStream(connection.getOutputStream());
outStream.println(dataString);
outStream.println();
outStream.close();
outStream = null;
/* Read the server's response and close up. */
inStream = new DataInputStream(connection.getInputStream());
while (null != (currentLine = inStream.readLine())) {
System.out.println(currentLine);
}
inStream.close();
inStream = null;
} catch (Exception e) {
// ... handle exception
} finally {
if (outStream != null) {
outStream.close();
}
if (inStream != null) {
inStream.close();
}
}
|
|
Q . Can I write (from my applet) to an external file on a URL?
|
Ans
:
No; the
standard URLConnection
classes allow you only to send data to a CGI program on the server
(from which your applet was loaded), but not to write directly to
a file URL.
The
JDK URLConnection classes—the abstract URLConnnection
class and its various implementation subclasses—support writing
output to a URL only in the form of http POST requests. Thus,
there is no direct way to write to a file specified by a URL; you
need to have (or write) a CGI program at the receiving end that
can take the data in your POST request and save it to a file.
|
|
Q . How can my Java stand-alone application fetch documents in the same fashion as (partially simulating) a browser?
|
Ans
:
Fetching is
easy enough, with the assistance of java.net
classes, but you would also need to parse, format, and display
what you fetch, which requires substantial extra work on your
part.
|
|
Q .
Why do I get a security exception when I try to connect to an external URL from an applet? (If I run equivalent code as an application, it works fine.)
|
Ans
:
As part of
the Java security model, applets are quite restricted in the
network connections they can make: applets can typically make
network connections only back to the URL host they were fetched
from, whereas stand-alone applications do not have this
restriction.
For
more details, including the latest status of all known
security-related bugs, see JavaSoft's security
FAQ web page.
|
|
Q . How do I get a URLConnection to work through proxy firewalls? I.e. How do you get your Java application to do its web accesses through a proxy?
|
Ans
:
This is typically needed for any net access to another domain. Tell
the run time system what you are trying to do, by using these commandline arguments when you start the program.
java -DproxySet=true -DproxyHost=SOMEHOST -DproxyPort=SOMENUM code.java
Note proxyPort is optional and it defaults to 80. Without this, you will see an exception like java.net.UnknownHostException or
java.net.NoRouteToHostException
The proxy settings work for both java.net.URLConnection, and for
java.net.Sockets.
Netscape's and IE's JVMs (at least in versions 4.x+) take the proxy settings for applets from the browser's proxy configuration. You can
also do URL proxies in applications (not applets) with the following
code
// set up to use proxy
System.getProperties().put("proxySet", "true");
System.getProperties().put("proxyHost", "myproxy.server.name");
System.getProperties().put("proxyPort", "80");
But how do I know the name of the proxy server?
This code just tells you how you can get a URL connection to the outside. Since it is your proxy server, you are expected to know the
name of it. There isn't any code that you can write that will allow
arbitrary URL connections to be initiated from outside the firewall. Think about it! If there were, the firewall would not be doing its job.
Also note there are corresponding socksProxyPort and socksProxyHost for when socks is used instead of proxy. The default socks port is 1080.
|
|
Q . How do I do a HTTP GET in Java?
|
Ans
:
This one is easy. Just build a URL with the query string on the end in the usual way. I.e. http://www.site.com/perl-script.pl?val1=Put+stuff+here. java.net.URLEncode.encode() is a static method that will properly encode a string for you, as well. Optionally, you could perform an java.applet.AppletContext.showDocument(), which also will submit the information, using the browser to display the output.
|
|
Q . What are content handlers and why should I care?
|
Ans
:
Apparently, there are no content handlers defined in the Java spec. The JDK from sun provides Content handlers as Java's way to add functionality to the URL class. By adding content handlers, the URL class can be made to return various MIME types as objects with the getContent method. With the proper content handlers in place, downloading GIFs, JPEGs, MPEGs, and Postscript documents, to name a few, is just a single method call.
|
|
Q . What are protocol handlers?
|
Ans
:
Protocol handlers are another way to extend the URL class. Again, there are no standard protocol handlers defined in the Java spec. [Note: Again, I find that I have to get Gosling's book to check this.] In a URL, the protocol is the first part of the URL string, the part that precedes the colon. With the proper protocol handlers in place, URLs can handle ftp, mail, gopher, and even finger. Check out http://java.sun.com/people/brown/ for an implementation of a finger protocol handler.
|
|
Q . Can I use non-http: URLs?
|
Ans
:
Netscape apparently supports the ftp:, mailto:, gopher:, and even the telnet: content handlers (I couldn't find anywhere this was documented). As near as I can tell, noone else does at this time. As a result, you may wish to use this sparingly. It is very easy to test at runtime for content handler support, however. Simply create a URL using the protocol handler.
URL testurl;
boolean goodmailto;
try {
testurl = new
URL("mailto:maus@io.com");
goodmailto = true;
}
catch (MalformedURLException e) {
goodmailto = false;
}
In this example, goodmailto will be true if the protocol handler exists, and false if it does not. [Note: Include link to sample applet at my site which does this.]
|
|
Q . Should I write my CGI programs in Java?
|
Ans
:
The best answer to that is - "It depends". If you intend to write client-side CGI handling in Java, the mechanisms for it are there and easy to use (see CGI-POST and CGI-GET questions above). If you would like to write CGI server-side programs, be aware that Java has no defined way to grab the enviroment variables that you will need from the HTTP server. JavaSoft has introduced the Servlet API as a standard way to do this, but adoption of this standard by the major server players is still somewhat distant.
If you do not have access to a server with a Java API for CGI, you will have to write a shell wrapper for your Java programs, since Java has no way of obtaining enviroment variables. Examples exist on the web, if you need guidance.
|
|
Q . How can a Java program talk to a CGI program?
|
Ans
:
Web browsers display forms, read user input, encode that input into a standard format called a "query string", and send that data to CGI programs that live on the web server. When you write an applet that talks to a CGI program, you have to do all this yourself.
The first thing to know is that there are two ways a CGI program can accept data from a web browser, GET and POST. CGIs that use GET take their arguments from the URL. Programs that use POST read their arguments from standard input.
The second thing to know is that when you submit data to a form through a web browser, the web browser encodes the data for you. In an applet, however, you need to encode the data yourself. The data is encoded like this: Each form entry is a name-value pair. Names and values are separated from each other by equals signs (=). Pairs are separated from each other by ampersands (&). For example, consider this form:
<Form method=GET action="http://metalab.unc.edu/javafaq/cgi-bin/getform.pl">
Email: <Input NAME="email" size=40>
Name: <Input NAME="realname" size=40>
<Input TYPE="submit" VALUE="Subscribe">
</Form>
You see that this uses the GET method to communicate with a cgi-bin program at http://metalab.unc.edu/javafaq/cgi-bin/getform.pl. It sends two fields to the CGI program, email and realname. Let's say you want to send the string "elharo@metalab.unc.edu" for the email address, and the string "Elliotte Harold" for the real name. Then the query string would look like this:
String qs = "email=elharo%40metalab.unc.edu&realname=Elliotte%20Harold";
The spaces in "Elliotte Harold" and the @ in "elharo@metalab.unc.edu" have been converted into percent escapes. All non-alphanumeric characters in the values must be replaced with a % followed by their ASCII value. Thus a space becomes %20 and the @ becomes %40.
To send this data to the server, append a question mark (?) and the query string to the URL of the CGI program, and request that URL from the server. Thus the URL you want is:
http://metalab.unc.edu/javafaq/cgi-bin/getform.pl?email=elhr%40mlab.unc.edu;realname=Elliotte%20H";
In Java terms this requires constructing a URL object from this string, and opening that URL's InputStream to read the response. The following code fragment demonstrates:
try {
String thisLine;
String qs = "email=elharo%40metalab.unc.edu&realname=Elliotte%20Harold";
URL u = new URL("http://metalab.unc.edu/javafaq/cgi-bin/getform.pl?" + qs);
DataInputStream theHTML = new DataInputStream(u.openStream());
while ((thisLine = theHTML.readLine()) != null) {
System.out.println(thisLine);
}
}
catch (Exception e) {
System.err.println(e);
}
Communicating with CGI programs that use POST is somewhat more complex, and it doesn't work very well in Java 1.0.2. It may be improved in Java 1.1. When POSTing to a CGI, you encode the query string exactly as you do for GET requests. However instead of merely requesting a URL's InputStream, you open a URLConnection to the CGI program.
Do not append the query string to the URL as you did with GET. Instead set the URLConnection's doOutput and doInput fields to true and set AllowUserInteraction to false. Chain the URLConnection's OutputStream to a DataOutputStream and use the DataOutputStream's writeBytes() method to send the query string to the server.
If you want to read the response, then chain the URLConnection's InputStream to a DataInputStream, and use the DataInputStream's readLine() method to read the response in a while loop. The following code fragment demonstrates:
String query = "email=elharo%40metalab.unc.edu;realname=Elliotte%20Harold";
try {
// open the connection and prepare it to POST
URL u = new URL("http://metalab.unc.edu/javafaq/cgi-bin/postform.pl");
URLConnection uc = u.openConnection();
uc.setDoOutput(true);
uc.setDoInput(true);
uc.setAllowUserInteraction(false);
DataOutputStream dos = new DataOutputStream(uc.getOutputStream());
// Send the data
dos.writeBytes(query);
dos.close();
// Read the response
DataInputStream dis = new DataInputStream(uc.getInputStream());
String nextline;
while((nextline = dis.readLine()) != null) {
System.out.println(nextline);
}
dis.close();
}
catch (Exception e) {
System.err.println(e);
}
As you see, posting forms is considerably more complex than using the GET method. However on some platforms, GET has an annoying habit of failing once the query string grows past 200 characters. The exact point where GET fails varies depending on the operating system and the web server.
|
|