Janus Web Server application coding considerations

From m204wiki
Jump to navigation Jump to search

A review of web processing with Janus Web Server

Web Servers are generally stateless: the server responds to a single request and then breaks the connection. The client has to establish another connection to make another request. Each client connection exercises the server rules and re-navigates any application hierarchy, so it's worth writing your server with the simplest, most general rules you can, and implementing a simple, flat application hierarchy.

Requests for static HTML, plain text, images and other binaries are best handled by storing them in procedures and letting the server rules serve them up in response to URL requests. Control is typically routed to a SOUL application in response to a form.

The Janus Web Server $functions allow an application to query the contents of form headers and a form itself. Headers contain information automatically sent by a browser. Headers always contain the parameters METHOD and URL, and they may contain other browser-specific information which an application could use to tailor its replies to specific browsers. Forms contain whatever parameters (fields) the client wants to specify.

Most forms-based web applications will know the field names they need to process based on the URL (which is always retrievable from the header). For generic applications, Janus has web functions that count the incoming fields in the header and form and determine the names of the fields based on their relative position. Typically, applications just pick up data in predetermined form fields and use the functions to build and send the content.

The applications that process forms are not difficult to write, although they may be difficult to debug. Like all Janus applications, and client-server applications in general, the user is terminal-less. In the context of Model 204, that means the only source of debugging information is the Model 204 journal. A journal scanning tool like SirScan is critical to efficient application development.

The sequence of events in a Janus Web connection is:

  1. The client connects to the server with a URL.
  2. If the URL matches a REDIRECT rule (JANUS WEB REDIRECT), processing is re-directed to the server and/or URL specified in the rule. The connection is closed if a URL is re-directed, and the following steps are not performed.
  3. Security is evaluated (ALLOW and DISALLOW rules) User may be prompted for a logon ID and password, or may be forbidden access.
  4. The default mime type is set from the matching TYPE rule.
  5. The OPEN and CMD commands on the JANUS DEFINE for the web port are executed. It is possible that this processing could send a response and close the connection, preventing any of the following steps from being performed.
  6. OPEN is executed from the matching ON rule.
  7. The CMD, SEND, or RECV specified on the matching ON rule is performed.
  8. The connection is closed.

When evaluating an ON rule that indicates SEND or RECV, the connection is closed as soon as the indicated procedure is sent or received. When evaluating an ON rule that indicates CMD, the connection is broken when a LOGOFF is executed, when processing returns to the Model 204 command line, when a $Web_Done is executed, or when a $Web_Proc_Send is executed without the MORE parameter.

Determining which output is sent to the client

When executing a command specified with the CMD parameter, on either the port definition or an ON rule, any data that would go to the application "terminal" may instead be sent to the browser as part of the request output. Terminal output (Model 204 error messages, PRINT, and WRITE TERMINAL statement output, Model 204 command output) may also be mixed with data sent from procedures via the $Web_Proc_Send function.

When processing a command specified with CMD on the port definition, terminal output is sent by default to the Model 204 audit trail. However, when processing a CMD command specified in an ON rule, terminal output is sent by default to the client browser. These defaults can be overridden by the following $functions:

  • $Web_On, which indicates that terminal output is to be captured and sent back to the browser as part the request response.
  • $Web_Off, which indicates that terminal output is to be captured and sent to the audit trail, not to the browser.
  • $Web_Flush, which indicates that all current response data (captured terminal output and data sent with $Web_Proc_Send) is to be discarded.

In order to be able to send the response length to the browser and to make $Web_Flush possible, captured response data is saved in the Model 204 address space until the request is done. The response length makes it possible for a browser to present a progress bar to the end-user. The captured response data is saved in CCATEMP, and it is not allowed to exceed the number of CCATEMP pages specified by the MAXTEMP parameter on the port definition.

The contents of the response buffer are sent to the client after the connection is closed when any of the following occur:

  • A LOGOFF command is executed.
  • Processing returns to the Model 204 command line.
  • A $Web_Done is executed.
  • A $Web_Proc_Send is executed without the MORE parameter.

Note: If your response buffer data contains very long lines with no CRLFs (line-wrapping breaks), the client browser may not render them correctly. Janus Web Server does not wrap or truncate web output lines if their length exceeds the Model 204 LOBUFF parameter setting. Since "very long" in this context varies among browsers, however, no explicit length guideline can be given, and testing for your site's expected conditions and client browsers is advised.

Understanding browser caching

From a strictly functional perspective, web processing is quite simple. The end-user, working with a browser, requests a URL in either of these ways:

  • Explicitly, by typing in the URL
  • Implicitly, by clicking on a hypertext link, or by loading an HTML page that has embedded binary data, such as graphical images or audio data

Some browsers will also re-request a URL when the end-user goes back in the browser's history of visited URLs. This simple paradigm provides a flexible and powerful data retrieval mechanism that is easy for the end-user to understand and with which it is relatively easy to build applications, because the application can (must) be built as independent pieces, each associated with a URL or perhaps set of URLs.

Unfortunately, generating and transferring data associated with some URLs can be expensive and time consuming. The generation of the data could consume large amounts of server resources, and transferring the data could consume large amounts of network bandwidth. To reduce these overheads, browsers "cache" the response from URLs on the end-user's workstation, so in certain cases, the response from the URL can be simply loaded from local memory rather than being provided by the web server associated with the URL.

Unfortunately, not all URLs are static. That is, there is no way to guarantee that the response from a URL request will be the same from request to request. For example, if a URL provides an account balance, it is quite possible that a request for the same URL a few seconds later will provide a different response, that is, a different account balance. How then, can a browser know that it is OK to use its local copy of a URL instead of retrieving it again from the server?

The answer is that HTTP, the web communications protocol, provides several mechanisms by which a server can inform a browser when it is OK to use the browser's locally cached copy of a URL. Perhaps the simplest to understand is a mechanism based on a URL's modification time.

Last-modified processing

When the server sends the response to a URL request, it can indicate the date and time that the data associated with a URL was last modified. If a URL simply returns data from a static source such as a Model 204 procedure, the last-modified time would be the time the static source was last updated (for example, the last updated time for the procedure as indicated in the procedure dictionary). If the data source for a URL is a database file, the records in the file can be "stamped" to indicate the date and time that a set of data was last modified. These stamps could then be used to return a last-modified time for a URL request to a browser. A reasonable last-modified time in such a case would be the most recent modification time for the records participating in the request.

Note: Care must be taken if records can be deleted from the file, as the time stamp for the deletion must be kept somewhere, because a deletion is certainly a modification of the data and should be reflected as such to the browser.

In any case, the modification time for a URL is returned by a server using the "Last-Modified" response header parameter.

When a browser receives a "Last-Modified" response header parameter in the response to a URL, it saves the last-modified time along with the response in its local cache. This last-modified time can be examined in most browsers. When the same URL is requested again, the browser sends the URL request to the server, but this time with an "If-Modified-Since" request header parameter. The value for this parameter is, of course, the value received in the "Last-Modified" response header parameter the last time the URL was requested.

At this point, the server can determine whether the URL has been updated since the last time the browser requested it. If it has, the server simply carries on as it would if the URL were being requested for the first time, perhaps sending back a new last-modified time. If the URL hasn't been updated, the server can simply send a special response back to the browser indicating "Not Modified," at which point the browser presents the copy of the URL in its local cache to the end-user, eliminating both the need for further processing on the server and for transferring contents of the URL over the network. Both of these could potentially provide great response time improvements for the end-user.

There are two mechanisms available to indicate the last-modified time for a URL to a browser. The simplest is used when a URL response is sent as a result of a JANUS WEB ON rule that indicates SEND (rather than CMD). In this case, Janus Web Server sets the "Last-Modified" parameter to the modification time for the procedure in the procedure dictionary. If a browser requested such a URL with an "If-Modified-Since" request parameter, and the procedure dictionary indicates that the procedure has not been updated since the indicated time, Janus Web Server returns a "Not Modified" to the browser. All this happens "under the covers" and requires no programming.

For URL response data generated as a result of CMD processing, no last-modified time is ordinarily set, since the modification time of the returned data can only be determined on the application level. This is true even when a static procedure is sent back to the browser with a $Web_Proc_Send, since there is no guarantee that the URL will always return the same procedure to the browser. For CMD processing, $Web_Last_Modified is provided to enable applications to set the last-modified time for a URL. The $Web_Last_Modified function automatically compares the last-modified time passed as its input parameter with the "If-Modified-Since" time, if sent by the browser. If the indicated last-modified time is less than or equal to the "If-Modified-Since" time, Janus Web Server sends a "Not Modified" to the browser and closes the connection. If the return code from $Web_Last_Modified indicates that a "Not Modified" has been sent, the application can exit quickly without doing any more processing.

There are a few important considerations when designing or re-designing a database application as a web application:

  • It is probably a good idea to base last-modified times on time stamps in records at a relatively high level in the data hierarchy. For example, if a database has records, each of which is associated with a specific customer, where each customer might have anywhere from one to thousands of records associated with it, it might make sense to have a last-modified time stamp on some primary customer record. This accomplishes two things:
    • It makes it possible to determine if a URL has been modified by examining a single record — a big efficiency win.
    • It simplifies dealing with deletes, since the deletion of a record associated with a customer could update the primary customer last-modified time stamp. The cost of a "low granularity" time stamp strategy is that occasionally, a URL will be resent even though the records being examined haven't been modified — some other customer records might have been modified. For most applications, this will happen infrequently enough that it's not worth the extra programming and overhead of higher granularity time stamps.
  • Web time stamps only have a resolution of a full second. It is quite possible that a record or set of records might be updated more than once in a single second. While it is possible to keep internal timestamps with finer resolution, this doesn't really solve the problem, as these higher resolution time stamps will have to be rounded for last-modified processing. The real solution is to code the application in such a way as to guarantee that time stamps always change by a full second. This can be done by keeping timestamps with one second resolution, and by adding one to the timestamp if it is greater than or equal to the current time. While this might produce last-modified times in the future (by a few seconds), browsers won't care since they make no effort to synchronize their clocks with a server's clock, so they are not even aware that a last-modified time is in the future as far as the server is concerned. With a reasonable level of granularity in the timestamps, they will rarely be off by more than a couple of seconds from the actual time the record or records were updated.

    If it is absolutely essential to have an accurate timestamp on a record or set of records, two timestamps could be maintained: one for last-modified processing, and one for other purposes where accuracy is essential.

    The following code segment illustrates how a last-modified time stamp can be guaranteed to increase:

    For Each Record In CONTROL_RECORD %timestamp = $Web_DateNS If TIMESTAMP Ge %timestamp Then %timestamp = TIMESTAMP + 1 End If Change TIMESTAMP To %timestamp End For

  • Last-modified processing is used by most browsers even when a manual reload is requested by the browsers end-user. That is, if the end-user wants the contents associated with a URL to be refreshed by the server, the browser will still send an "If-modified-since" parameter to the server. This can be an annoyance during debugging of a procedure that sets a last-modified time, if the procedure and hence the output from the procedure is changing even though the underlying data from which the last-modified time is set is not. Changes might be made to the procedure without being reflected on the browser, no matter what the tester does.

    There are a couple of ways around this problem:

    • Clear the browser cache after each change. This can adversely affect overall browser performance, but only on the tester's workstation.
    • Temporarily modify the procedure to bypass the $Web_Last_Modified call, or set the modification time to the current time ($Web_DateNS).

    If a procedure is rolled into production that changes the contents of a URL, even though the underlying data from which the last-modified time is set has not, it might be a good idea to set a minimum last-modified time to the time the procedure was last updated. This ensures that users who revisit the URL after the procedure changes have been made will get a new copy of the URL contents.

Expiration-time processing

While last-modified processing can produce big savings in server and network resources, it can still seem wasteful in the case of a URL where the data is very static: every time the URL is requested, there is a network interaction between the browser and the server, and there is some server processing to determine that the URL has indeed not changed since the last time it was retrieved. To avoid this overhead, it is possible for a server to send an expiration time with a URL (via the HTTP Expires header parameter). With Janus Web Server, this can be done with the $Web_Expire function. When an expiration time is set for a URL, and that expiration time has not yet been reached, a browser will retrieve the contents of the URL from its cache without any interaction with the server. A manual reload function will, however, reload the URL contents from the server.

Setting an expiration date, then, provides a way of reducing the overhead associated with a URL to a bare minimum. This type of processing is ideal for static URLs such as images and applets that are changed infrequently. The one big problem with using expiration time is that the very lack of interaction with a server for URLs with expiration times makes it impossible to "take back" an expiration time.

For example, suppose a site confidently sets the expiration for a static HTML page to six months in the future. If three weeks later that page must be changed, there is no way to programmatically force browsers with the old version of this URL in their cache to reload the new version. Because of this, it is recommended that expiration times be set conservatively: that is, not more than a few days into the future. This gets most of the network/server overhead gains from expiration times without the risks inherent in extremely long expiration periods.

Because few early servers and many current servers do not bother to set either last-modified and/or expiration times, and because many graphical images are large and can occur in several different web pages, most browsers will cache these images at least for the duration of the browser session, anyway. This acts effectively as an automatic expiration time mechanism for images where the expiration time is the end of the browser session.

It might be tempting to combine the last-modified and expiration-time processing. It would seem appealing to set a last-modified time for a URL, as well as to set an expiration time of, say, eight hours hence. For eight hours, the URL would be loaded from cache, and at the end of the eight hours, the browser would send a request with an "If-modified-since" parameter. If the URL contents have not been modified, the server could just send a "Not modified" with a new expiration date eight hours later.

Unfortunately an idiosyncrasy (or perhaps a bug) in most browsers ignores an expiration date sent with a "Not modified" response. Because of this, the browser will request the URL from the server (with an "If-modified-since" parameter) every time the URL is requested after the initial expiration period has passed. As a result, it is probably more efficient to refrain from mixing expiration-time and last-modified processing and to simply use plain expiration-time processing where applicable.

Understanding cookies

"Cookies" is the name given a feature of HTTP 1.0 that allows HTTP servers to store information on the client's browser which will be returned to the server on subsequent browser requests. HTTP servers control the setting of cookies exclusively.

In simple terms, a cookie is a name/value pair of data that the server offers to the client's browser to be stored for later use by that server. That cookie can only be viewed by that server. In other words, a server can only see cookie data on a client's browser that the server itself has asked the browser to store there. It cannot see cookie data stored on the client's browser by other servers, and no server can see data about the client other than what that server has stored there previously.

Cookies are useful for storing information about a client on the client's browser rather than having to keep track of such information in a server database. This not only relieves the storage burden on the server, but since most clients are not uniquely identifiable to the server, it may present the only way to identify an individual user. Such identification is often used to recognize a user who has visited the URL before, perhaps to offer a custom view of data available or to indicate new data since their last visit.

Processing of cookies involves two steps, performed in the execution of two separate programs or two executions of the same program:

  • Set the cookie.
  • Process the cookie.

The second step cannot take place in the same execution of the program as set the cookie, for reasons described below.

Setting a cookie

Setting a cookie is quite straightforward: Using Janus Web Server, an application wishing to store a cookie in the client's browser would use the $Web_Cookie function to indicate a name and value for the cookie. Many browsers enforce limits on the size and number of cookies accepted from a server. For many browsers these limits are a maximum of 20 cookies accepted from a server and maximum length of 4,000 bytes per cookie. How these limits are enforced varies from browser to browser, but exceeding these limits is likely to cause application errors.

$Web_Set_Cookie_Lstr can be used to set cookies of an arbitrary length, though again, most browsers enforce a limit of 4,000 bytes for cookies. This cookie will be stored in the client's browser (actually, on the client's workstation in a file called Cookies.txt).

Processing a cookie

Processing a cookie is also straightforward, but it deserves a bit more discussion:

  • The cookie will be presented by the browser to the server the next time the browser requests the URL of the program that set the cookie. (If the cookie will be set in one program and processed in another, the call to $Web_Set_Cookie can specify a particular path and/or domain name indicating which URL should cause the browser to present the cookie to the server.)
  • The browser decides whether to send a cookie after searching its stored cookies for a match of the path and domain of the URL requested. All matching cookies are sent, thus it is possible to receive multiple cookies with the same name. Applications may generate more than one cookie, but only one instance of a particular cookie/path combination is sent to the browser.
  • The cookie will be presented to the server in a manner similar to header information, so the server can process the information or simply ignore it.
  • The application wishing to use cookies could use any of the following functions:
  • Servers can control the expiration date of a cookie, and they can control whether a cookie is secure, that is, whether it must be sent over an encrypted connection.
  • Expired cookies are discarded by the browser; server applications never need to determine if a cookie is "stale."
  • Since only the name and value of a cookie(s) are returned to the server, it is not always possible to tell which URL generated a cookie. If that is important, consider using unique cookie names that can be mapped to a specific URL.

Server side includes

"Server Side Include", or SSI, is a technique for filling actively generated content into an HTML page by inserting tags that initiate server activity. SSI tags are either "execute" or "include" directives hidden in HTML comment format so that browsers won't display them.

It is fairly cpu-intensive to scan every line of every HTML page on a web site for SSI tags, so the default is to have no SSI scanning performed except when it is explicitly turned on. Scanning for SSI tags is turned on by specifying the SSI subparameter on the JANUS WEB ON command or on the $Web_Proc_Send function.

A case where SSI is usefully applied is a site designed and built using a site generator like NetObjects Fusion ™, or Microsoft Frontpage ™. With these products, the majority of the site is static and often graphically complex, and the site designer doesn't usually want the database programmers altering the display format. Using these packages, all the designer has to do to specify active content is enter the correct SSI tags. The entire site can be published to the Janus Web Server in binary format, so server side programmers can't really alter the HTML. The database programmers don't have to understand the overall structure of the web site, and they can code the content portion of any web page without worrying about HTML structures outside the data area (like all the header graphics or standard company information).

Processing SSI tags

To process an SSI tag, for pages where SSI scanning is enabled:

  1. Output to the web buffer is paused when an SSI tag is encountered.
  2. The output of the specified activity is inserted into the web buffer.
  3. The SEND rule continues with the rest of the original HTML document.

For instance, the following page could be stored on a Janus Web Server:

<HTML> <head> <title>Sample SSI Page</title> </head> <body> <img src="header.gif"> <xmp> <!--#exec cmd="MONITOR DISKBUFF"--> </xmp> <img src="trailer.gif"> </body> </HTML>

The #exec tag is the SSI directive. If this page is accessed via a web rule that has the SSI subparameter specified, the web port will scan the HTML as it is sending it, and it will pause at the #exec directive in order to execute the specified command — in this case, filling the body of the page with the output of the MONITOR DISKBUFF command. When the command completes, the port finishes sending the remainder of the original HTML procedure.

If the SSI subparameter is not specified on the SEND rule for this page (or on the $Web_Proc_Send function if that was used), the page will be sent without executing the SSI command, and the browser will see the SSI tag as a comment and will ignore it.

So, for SSI to work you have to:

  1. Specify SSI on the JANUS WEB ON rule or $Web_Proc_Send function.
  2. Place the tags in the HTML where you want the server to fill in the content.
  3. Make sure the tags specify actual commands or content.

The SSI tag can cause the web server to send a Model 204 procedure to fill in the page, or it can specify a program to execute, the output from which will fill in the page. The following are the supported SSI tags:

<!--#exec cmd="m204_command"--> <!--#exec cgi="m204_command"--> <!--#include file="m204_proc_id"--> <!--#include virtual="m204_proc_id"-->

The cmd and cgi forms of the SSI tag are functionally identical: the specified Model 204 command or APSY is executed as shown. These two tags perform the same operation:

<!--#exec cmd="CALENDAR 7"--> <!--#exec cgi="CALENDAR 7"-->

They execute an APSY called CALENDAR, passing the parameter "7" on the APSY command line. The output of the CALENDAR subsystem is added to the web page.

The "file" and "virtual" forms of the SSI tag are functionally identical: they specify a procedure name, or a file name and procedure name, which is copied into the place of the tag. The following tags perform exactly the same operation:

<!--#include file="GREETINGS"--> <!--#include virtual="GREETINGS"-->

The file and virtual tags can also specify the file from which the procedure is to be sent:

<!--#include file="TPROCS GREETINGS"-->

In the preceding example, the procedure GREETINGS will be inserted into the source web page from Model 204 procedure file TPROCS.

Note: The SSI tag command will not OPEN a file, so the file specified must have been opened either on the JANUS DEFINE command or the JANUS WEB ON command or as part of the processing that went on before the SSI tag was encountered.

In the following example, two documents are stored in Model 204 procedure file TPROCS:

PROCEDURE SIMPLE.HTML <HTML> <head> <title>Sample SSI Page</title> </head> <body> <img src="header.gif"> <xmp> <!--#include file="GOODMORNING.HTML"--> </xmp> <img src="trailer.gif"> </body> </HTML> END PROCEDURE SIMPLE.HTML PROCEDURE GOODMORNING.HTML Good Morning! END PROCEDURE GOODMORNING.HTML

The user would see the following HTML in their browser:

<HTML> <head> <title>Sample SSI Page</title> </head> <body> <img src="header.gif"> <xmp> Good Morning! </xmp> <img src="trailer.gif"> </body> </HTML>

The SSI tag is not sent to the browser because it is processed by Janus Web Server.

The server-side include tags do not need to be on a line by themselves. They can be on a line with other code, including other SSI tags:

<!--#exec cmd="V"--><hr><!--#exec cmd="LOGWHO"-->

The preceding example specifies that the V command is to be executed, then a "hard rule" (the <hr> tag) is to be inserted, and then the output from a LOGWHO command is to be inserted into the output page.

The file or group name can be preceded by the word TEXT or BINARY to indicate that the procedure to be sent is in either TEXT or BINARY format. If not specified, it defaults to the type of the procedure containing the #include. These are all valid #include syntax:

<!--#include file="GREETINGS"--> <!--#include file="TPROCS GREETINGS"--> <!--#include file="TEXT TPROCS GREETINGS"--> <!--#include virtual="BOOKSBYBOB"--> <!--#include virtual="BOOKLIST BOOKSBYBOB"--> <!--#include virtual="BINARY BOOKLIST KIDLIST"-->

Specifying the SSI parameter

Scanning for server side includes is indicated by the SSI parameter in the JANUS WEB ON ... SEND rule for a URL, or by the SSI parameter on a $Web_Proc_Send. For example:

JANUS WEB WEB400 ON /ALEX/* OPEN FILE ALEXPROC SEND * SSI

or:

%x = $Web_Proc_Send( 'ALEXPROC, 'SMILEY.HTML', 'SSI')

As with most web facilities, SSI can be used in a number of different ways. The same page can be accessed via a rule that scans for SSI tags and by one that doesn't, allowing the dynamically generated content to be seen by some users and not by others. In other words, you could embed a tag in a page called MEETINGS/SEPT99.HTML, then provide different rules to access the same page, like this:

PROCEDURE MEETINGS/SEPT99.HTML ... <!--#exec cmd="MEETINGS 0999"--> ... END PROCEDURE MEETINGS/SEPT99.HTML JANUS WEB ON MEETINGS/* SEND MEETINGS/* JANUS WEB ON INTERNAL/MEETINGS/* SEND MEETINGS/* SSI

Anybody accessing the page with the INTERNAL path in the URL would see the content of the SSI tag translated, while other viewers would just see the static content.

URL parameters or isindex data

"Isindex data" is a somewhat obsolete term that has come into disfavor and has been replaced by the term "URL parameters". Unfortunately, the word "isindex" is heavily used in the Janus Web Server $functions and documentation. To move away from the term "isindex" and toward the term "URL parameters", new $functions were introduced in Sirius Mods version 6.7, such as $Web_Url_Parm, to replace their isindex equivalents.

"URL parameters", or "Isindex data," is data that appears as part of a URL in a hypertext link or in the URL window in a browser. The URL parameters come after the main part of the URL and are separated from the rest of a URL by a question mark (?). For example, in the following URL, "NUKES=PRODLIST&JANUS=7707" is URL parameter data:

http://sirius/maint/cust3?NUKES=PRODLIST&JANUS=7707

URL parameters can either be an explicit part of a URL as typed in or cut and pasted by an end-user, an explicit part of a URL on a hypertext link, or they can be generated by a browser when "method=GET" is specified on the <form> tag in an HTML page.

In any case, the URL parameter part of a URL is not used by Janus Web Server for JANUS WEB rules processing, but it is available to the application processing the request. A Janus Web Server application can retrieve URL parameters data with either of the following:

Using GET-request format for parameters

The Janus Web functions listed above operate correctly only on URL parameters that have the "proper" structure, namely the structure used by browsers on a "method=GET" form submission.

The structure of URL parameter data on "method=GET" requests is a collection of field name and value pairs separated by ampersand (&) symbols. The field names and values are separated by equal (=) signs, and the values and field names must have spaces converted to plus (+) signs and other special characters hex-encoded, such as question marks (?), plus signs (+), and non-displayable characters. Hex-encoding consists of a percent sign (%) followed by two hexadecimal characters that represent the hex value of the ASCII representation of the character.

This conversion of blanks to plus signs, and special characters to hex, is called "url-encoding." For example, the url-encoding of "What's all this ?" is:

What%27s+all+this+%3F

When URL parameters are generated by a browser for a "method=GET" form request, field names and values are automatically url-encoded and separated by equal signs and field name and value pairs separated by ampersands. An example of a typical browser-generated URL is:

http://www.opera.org/form.html?lname=Lescaut&fname=Manon

URLs generated by an application might or might not have this standard format; however, application-generated URLs should never have blanks or special characters in them, unless these characters are url-encoded. The $Web_URL_Encode function is provided to simplify the process of url-encoding data for URL parameter requests.

For example, an application might dynamically generate a URL that contains a customer ID as the only URL parameters in a request:

Print '<a href="update?' With %custid With '">'

This might result in a hypertext link to a URL such as http://home/cust/update?6784593. Because this URL does not have the standard format of "method=GET" URL parameter data, the customer ID cannot be retrieved with the $Web_Isindex functions, but it must be retrieved with a $Web_Hdr_Parm('ISINDEX') function instead. If it is possible that %custid might contain special characters that need to be url-encoded, the Print statement could be changed to this:

Print '<a href="update?' With - $Web_Url_Encode(%custid) With '">'

To give this URL the standard "method=GET" format, the Print statement could be coded as this:

Print '<a href="update?custid=' With - $Web_Url_Encode(%custid) With '">'

The application that processes the URL generated by this code could retrieve the customer ID as follows:

%custid = $Web_IsIndex_Parm('CUSTID')

It should be clear that URL parameter requests are very useful when an application wants to have a link to a URL with some variable data. This makes it clear to the browser that the URLs are different entities for caching purposes, but still allows a server to use the same rules and applications for the URLs. For example, the URLs http://home/cust?123456 and http://home/cust?765432 are understood by the browser to be two separate URLs, but they will still be handled on the server by the rules and application applicable to the path /cust.

GET requests versus POST requests

It might be less apparent when it is appropriate to use a "method=GET" on a <form> tag rather than a "method=POST". A good general rule of thumb is that "method=GET" is preferable whenever the form is specifying retrieval information rather than update information. In this type of request, the generated URL will work more "naturally" with a browser, especially in conjunction with the browser's caching algorithms, because the browser will understand that the URL will return the same thing no matter how many times it is requested. This is not the case for a true POST, where the POST operation typically changes the data on the web server.

Another advantage of a "method=GET" type of retrieval request is that users can cut and paste the generated URL into an e-mail or other applications, as in the following:

Tom, You might be interested in looking at http://violenttv.com/char=Itchy&char=Scratchy. The large hammer technology might be of special interest to you. Sincerely, Jerry

Generally, it is a fairly simple matter to change a Janus Web Server application from using "method=POST" to "method=GET", or vice-versa. In addition to the change in the <form> tag, the processing program must be changed so that all $Web_IsIndex functions are changed to $Web_Form functions, or vice-versa. It is possible to make it even simpler with some forethought: All that is required is to use the following functions instead of the $Web_IsIndex or $Web_Form functions:

These functions will return either form fields or URL parameter fields, whichever are available.

Note that it is possible to have both URL parameters and form fields, as in the following:

<form method=POST action="update?id=774321&time=20000301122301">

The non-form/isindex specific $Web functions search first for URL parameters, then for form fields. There is little extra overhead in using the non-URL parameter/form-specific $Web functions except in the rare cases where there is a large number of both kinds of fields. Because of this, these functions can also be used simply to reduce the amount of typing required to write an application.

For a <form> tag such as the following, most browsers seem to discard the URL parameter part of the data specified for the "action" keyword:

<form method=GET action="update?id=774321&time=20000301122301">

This should not be a big problem, as the fixed data one would like in the URL parameter data could be put into an invisible form field, as in the following:

<form method=GET action="update"> <input type=hidden name="id" value="774321"> <input type=hidden name="time" value="20000301122301">

The "invisible" data would be placed into the URL parameter data by the browser when the form is submitted.

Using URL parameters on redirects

URL parameters are also useful on URLs that are used on redirects, because redirects are always interpreted as GETs by browsers, even when they are received as a response to a POST. For example, if an application sometimes wishes to redirect a form submission to a different server, a "method=GET" should be specified on the <form> tag, and the redirect (when needed) should be performed as follows:

%rc = $Web_Redirect('http://oracledb.toxico.com/update?' With $Web_Hdr_Parm('ISINDEX'))

Care must be taken in such an application, however, because there is no way currently to redirect to a URL that contains more than 255 characters, and URLs with URL parameters can often be considerably longer.

Error processing

There are many categories of errors that can occur in a web application. Some of these are specific to the application, and so are implemented with your own error processing User Language code: for these cases, you should consider the use of HTTP status codes and the ability to set that with $Web_Done. Some of these fall into the several common error categories, such as a logon error, and are discussed in Exception rules.

In addition, there are errors that can happen, for example, due to User Language programming errors, that immediately terminate the User Language request. For these errors, Janus Web Server prepares a simple, standard web page containing the available error information. Usually there will be a single error to report, and the page will be as simple as:

Internal server error - request cancelled Unable to process browser request Model 204 error : M204.0553: SUBSCRIPT RANGE ERROR FOR %A

In that simple case, there was a single error. In some cases, there are multiple errors to report. The following shows an example of such a case (when there are multiple messages they are shown in the web page in the same order they would appear in the Model 204 audit trail):

Internal server error - request cancelled Unable to process browser request First error, this command : M204.1030: INVALID MODEL 204 COMMAND First error after ERCNT last zero : MSIR.0668: XML doc parse error: invalid first byte of UTF-8 encoding Last Model 204 error : MSIR.0561: $XML_COUNT (argument 2): relative XPath expression based on XmlDoc

In the example above, the first error, indicated by First error, this command :, is the message string that would be obtained by issuing the SETGRC command.

The second error, indicated by First error after ERCNT last zero :, is the message string that would be obtained by the $Fsterr function.

The third and last error, indicated by Last Model 204 error :, is the message string that would be obtained by the $Errmsg function.

This automatic error page will contain the fewest number of error messages possible: if any two consecutive messages in the above list are the same, only one of them is shown.