Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

Availability Manager Version 1.4

Release Notes

The following notes address late-breaking information and known problems for the Availability Manager Version 1.4. These notes appear in the following categories:

1 Corrections, Known Problems, and New and Changed Features in Version 1.4

The following sections discuss key problems that have been corrected and that remain in Version 1.4. Changes and new features in this release are also described.

1.1 Problems Corrected Since Version 1.3

This section discusses key problems that have been corrected in Version 1.4.

1.1.1 Patch Kits No Longer Needed on OpenVMS

The patch kit required in Version 1.3 of the Availability Manager is not required in Version 1.4.

In Version 1.3, a POLYCENTER Software Installation Utility patch kit is required for versions of OpenVMS prior to Version 7.1, for Alpha and VAX systems:


    DEC-AXPVMS-VMS62TO71_PCSI-V0100--4.PCSI 
 
    DEC-VAXVMS-VMS62TO71_PCSI-V0100--4.PCSI 

These kits are available at the following websites, for Alpha and VAX computers, respectively:


    ftp://ftp.service.digital.com/public/vms/axp/v6.2/ 
 
    ftp://ftp.service.digital.com/public/vms/vax/v6.2/ 

1.1.2 System Failures on OpenVMS Nodes That Are Being Monitored

Earlier versions of the OpenVMS Data Collector (RMDRIVER) could infrequently cause system failures when they received certain types of truncated network packets. Transient network problems and software bugs sometimes truncated these packets, and the safeguards in RMDRIVER to detect this problem proved to be inadequate. The Availability Manager Version 1.4 OpenVMS Data Collector contains more safeguards to prevent the system from failing under these circumstances.

1.1.3 Long Delays in Discovering Data Collector Nodes on Small LANs

A software bug in the Version 1.3 Data Analyzer often resulted in delays of 10-20 minutes for nodes to be discovered on small LANs. This problem has been corrected in the Availability Manager Version 1.4 Data Analyzer, which now detects and displays Data Collector nodes noticeably faster.

1.1.4 User with Inadequate Page File Quota Cannot Run OpenVMS Data Analyzer

If a user with inadequate page file quota (PGFLQUOTA) tries to run the Availability Manager Data Analyzer on OpenVMS, an error message is displayed and the application stops. Inadequate PGFLQUOTA causes unusual behavior in the OpenVMS Java Virtual Machine, preventing the Availability Manager from starting and running normally. Please refer to the OpenVMS Installation Instructions for the appropriate PGFLQUOTA settings.

1.2 Problems Remaining in Version 1.4

This section discusses known problems in Version 1.4.

1.2.1 Page and Swap File Names in Event List Display

If page and swap file events are signaled before the Data Analyzer has resolved their file names from the file ID (FID), events such as LOPGSP display the FID instead of the file name information. You can determine the file name for the FID by checking the File Name field in the I/O Page Swap Files page. The FID for the file name is displayed after the file name.

1.2.2 Events Sometimes Displayed After Background Collection Stops

On both OpenVMS and Windows systems, the Data Analyzer sometimes displays events after users customize their systems to stop collecting a particular kind of data. This is most likely to occur when the Data Analyzer is monitoring many nodes. Under these conditions, a data handler sometimes clears events before all pending packets have been processed. The events based on the data in these packets are displayed even though users have requested that this data not be collected.

1.2.3 Data Analyzer Might Not Recognize Impromptu Operating System Upgrades

If the Availability Manager Data Analyzer is monitoring an OpenVMS node that is shut down and then restarted with a different version of the operating system, the Data Analyzer does not recognize the change. Displays for this node continue to show the previous operating system version, and data collection for this node might also be affected.

1.3 Changes and New Features in Version 1.4

This section discusses the changes and new features in Version 1.4.

1.3.1 New Process States Added

The following table lists the new process states that have been added to the Availability Manager Version 1.4. These process states are shown on the CPU Process Summary page and on the Process Information Page of the single process displays. (All of these process states were previously included in MWAIT.)
Process State Definition
BYTLM Wait 1 Process waiting for buffered I/O byte count quota.
JIB Wait 1 Process in either BYTLM Wait or TQELM Wait state.
TQELM Wait 1 Process waiting for timer queue entry quota.
EXH Kernel thread in exit handler.
INNER_MODE Kernel thread waiting to acquire inner-mode semaphore.
PSXFR Process waiting during a POSIX fork operation.


1Previously included in MUTEX value.

1.3.2 New Process States Reflected in Wait States Page

On the Wait States page for a single process display, states that have been reflected in the "Control" value prior to Version 1.4 are CEF, MWAIT, LEF, LEFO, RWAST, RWMBX, RWSCS, RWCLU, RWCSV, RWUNK, and LEF waiting for an ENQ.

The following states, introduced in Version 1.4, are now also reflected in the "Control" value: BYTLM Wait, INNER_MODE, JIB Wait, PSXFR, and TQELM Wait.

1.3.3 Definition of PRCMWT Event Changed

Additions have been made to the investigation hint for the "Process waiting in MWAIT" (PRCMWT) event, which now reads as follows:

"Various resource wait states are part of the collective wait state called MWAIT. See Appendix A in the Availability Manager User's Guide for a list of these states. The state the process is in is displayed on the CPU Process page and the Single Process page. Check the Single Process pages to determine which resource the process is waiting for and whether the resource is still available for the process."

1.3.4 New Events Signaled

A number of new events are signaled in Version 1.4. The following table lists the type of data collection that can produce the event, the abbreviation of the event, and a short description of the event:
Type of Data Collection Event Description
Single process KTHIMD Kernel thread waiting for inner-mode semaphore.
Single process PRCPSX Process waiting in PSXFR wait state.
Fix-generated FXUERR Unknown error code for fix.
Node-level PKTFER Packet format error.

These events are explained further in Appendix B of the Availability Manager User's Guide.

1.4 Additional Information

This section contains a note with additional information.

1.4.1 Recognizing a System Failure Forced by the Availability Manager

Because a user with suitable privileges can force a node to fail from the Data Analyzer by using the "Crash Node" fix, system managers have requested a method for recognizing these particular failure footprints so that they can distinguish them from other failures. These failures all have identical footprints: they are operator-induced system failures in kernel mode at IPL 8. The top of the kernel stack is similar the following display:


                SP => Quadword system address 
                      Quadword data 
                      1BE0DEAD.00000000 
                      00000000.00000000 
                      Quadword data            TRAP$CRASH 
                      Quadword data            SYS$RMDRIVER + offset 

2 Configuration, Setup, and Installation Notes

The following notes pertain to configuring, setting up, and installing the Availability Manager.

2.1 Recommended Hardware Configurations

There are no minimum hardware requirements for the Data Collector. Compaq recommends using, at a minimum, one of the following hardware configurations on systems running the Data Analyzer:
System Hardware
Windows NT/Windows 2000 300 MHz Intel Pentium processor with 96 MB of memory
Windows NT 500 MHz Alpha processor with 128 MB of memory
OpenVMS 500 MHz Alpha processor with 128 MB of memory

2.2 Notes on Installing the Data Analyzer on OpenVMS Systems

The following notes pertain to the installation of the Availability Manager Data Analyzer on OpenVMS systems.

2.2.1 Enabling and Disabling Kernel Multithreading

On multiple-CPU OpenVMS systems, the logical name AMDS$AM_MULTITHREADING controls whether or not the Availability Manager runs on multiple CPUs (that is, whether it uses kernel multithreading). This logical name is defined in the SYS$MANAGER:AMDS$LOGICALS.COM file.

Setting AMDS$AM_MULTITHREADING to TRUE can improve application performance, but at the cost of application stability. See Section 4.2.3 for an example of one stability problem.

Setting the logical name to FALSE (the default) forces the application to run on a single CPU. For the current set of patches available on OpenVMS, this approach offers the greatest stability.

Enabling and Disabling Commands

To enable kernel multithreading, set the logical to TRUE:
$ AMDS$DEF AMDS$AM_MULTITHREADING TRUE

To disable kernel multithreading, set the logical to FALSE:
$ AMDS$DEF AMDS$AM_MULTITHREADING FALSE

2.2.2 PCSI Installation Messages

If you install DECamds Version 7.3 after installing Availability Manager Version 1.4, you might see any of the following PCSI messages:


 
%PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.EXE was not replaced because 
file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.STB was not replaced because 
file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.COM was not replaced because 
file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.TEMPLATE was not replaced 
because file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYSEXE]AMDS$RMCP.EXE was not replaced because file 
from kit does not have higher generation number 
 
%PCSI-I-RETAIN, module AVAIL was not replaced because module from kit does 
not have higher generation number 
 
%PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.DAT was not replaced 
because file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.TEMPLATE was not replaced 
because file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.COM was not replaced because 
file from kit does not have higher generation number 
 
%PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.TEMPLATE was not replaced 
because file from kit does not have higher generation number 

These messages are to be expected because DECamds and the Availability Manager share all the files cited.

2.2.3 Postinstallation Task: Editing Command File for Online Help

The Netscape browser program on your system might not be in the directory specified in the AMDS$AM_SYSTEM:AMDS$AM_LAUNCH_BROWSER.COM command file, which is part of the installation of the Availability Manager. If this is the case, you must edit this file to display online help.

To define the correct location of Netscape, edit the following line in the command file to reflect the location and name of the Netscape browser program on your system:


   $ Netscape:= $SYS$COMMON:[NETSCAPE.ALPHA]NETSCAPE-JAVA.EXE 

2.3 Notes on Installing the Data Analyzer on Windows Systems

The following notes pertain to the installation of the Availability Manager Data Analyzer on Windows NT and Windows 2000 systems.

2.3.1 Upgrading to Windows 2000

If you upgrade to Windows 2000 after installing the Availability Manager Version 1.4 on Windows NT 4.0, you must reinstall the Version 1.4 kit. When you reinstall, select the "Modify" option on the Windows Installation Welcome box. The reinstallation installs Windows 2000-compatible network drivers.

2.3.2 Running the Self-Extracting .EXE Multiple Times

The Availability Manager software for Windows systems is packaged in a self-extracting executable (.EXE). On Alpha systems, if you run multiple installations of Availability Manager Version 1.4, the .EXE unpacks the installation in the same temporary folder. As a result of a duplicate installation, the system displays a message box entitled Overwrite Protection, which contains a message that "the following file is already installed on your system... Do you wish to overwrite the file?" You can ignore these messages. Click Yes to All.

2.3.3 Registry Subkey Message

In some situations during an installation, the system displays the message "Registry Service Subkey already exists." You can ignore this message.

2.3.4 Self-Extracting Executable Does Not Exit

In some situations, the self-extracting executable extracts the installation package but does not exit and start the installation. When this occurs, the system displays the "Unpacking" progress bar, and then nothing happens. Windows Task Manager shows the self-extracting executable as an active process, but it appears to be stalled. To activate the Availability Manager installation, press Ctrl + Alt + Delete, and then choose Cancel. The InstallShield progress bar then appears, and the installation continues normally.

2.3.5 Problem with the Reboot Dialog Window on Intel Platforms if Another Window Is Open

If you have any other window open (such as the Windows Explorer) during an installation, this window will be in front of the reboot dialog box at the end of the installation. Look for InstallShield Wizard in the task bar, and single-click it to bring the reboot window to the front. Note that you will also see this problem at the end of an uninstall operation.

2.3.6 Problem with the Shared Files Dialog Window on Intel Platforms During an Uninstall Operation

If you have any other window open (such as the Windows Explorer) during an uninstall operation, the status box is moved to the back when the uninstall operation encounters a shared file to be removed. Look for InstallShield Wizard in the task bar, and single-click it to bring the message box about the shared file to the front. You can then click Yes to remove a shared file or No to keep the file.

3 Startup and Shutdown Notes

The following notes pertain to starting up and shutting down the Availability Manager.

3.1 Avoid Using Multiple Data Analyzers on the Same System

If the Availability Manager is shut down improperly or abruptly on a Windows system, the AM_SESSION.LOCK file might not be deleted, thereby preventing subsequent sessions from starting.

In this situation, when you try to start the Data Analyzer, you will see the following warning: Could not establish session lock! Another AM session may be running.

Either one of the following situations might exist:

3.2 Restarting After an Uninstall Operation on a Windows System

To uninstall the Availability Manager on a Windows system using
Add/Remove Programs on the Windows Control panel, follow these steps:

  1. Uninstall the software.
  2. Restart the system.
    This step completes the removal of the network bindings.
  3. Optionally, reinstall the software.

If you omit step 2, starting the Availability Manager could cause the system to fail. To recover from this situation, restart the system and then reinstall the Availability Manager (uninstalling the software again is not necessary). Finally, restart your system at the end of the installation. The Availability Manager should run properly.

4 Operation Notes

Availability Manager operation notes fall into the following categories:

4.1 General Information

The notes in this section contain information about the general operation of the Availability Manager.

4.1.1 Some DECamds Features Not Yet Implemented

The Availability Manager is, in most respects, a Java implementation of the DECamds availability management software product. With each release, more features of DECamds are being added to the Availability Manager. However, not all features have yet been implemented in the Availability Manager. These features are planned to be added in future releases.

4.1.2 Data Collection and Events on OpenVMS Nodes

Node summary data is the only data that is collected by default. The Availability Manager looks for events only in data that is being collected.

You can collect additional data in either of the following ways:

4.1.3 Limit Your Background Collection of Detailed Data

By default, the only data collected on OpenVMS nodes is node summary data. You can collect this data on many nodes without incurring performance problems. If you do not have a high-performance workstation, and you have many nodes configured, be careful about enabling more data collection on the customization Data Collection page. This is especially true when you run the Data Analyzer on OpenVMS systems.

A new feature since Version 1.3 might help satisfy your data collection needs: when you open a node-specific data page, all types of data are automatically collected for that node.

4.1.4 Size of Event Log

If you are collecting data on many nodes, running the Availability Manager for a long period of time can result in a large event log. For example, in a run that monitors more than 50 nodes with most of the background data collection enabled, the event log can grow by up to 30 MB per day. At this rate, systems with small disks might fill up the disk where the event log resides.

Closing the Availability Manager application will enable you to access the event log for tasks such as archiving. Starting the Availability Manager starts a new event log.

4.2 Known Problems

The notes in this section discuss known problems with Version 1.4 of the Availability Manager.

4.2.1 Windows NT Data Collector Does Not Recognize New Disk Configurations

If you change the logical disk configuration on a running Windows NT node, the Data Collector does not recognize the modified disk configuration and continues to report the previous configuration to the Data Analyzer. For the Data Collector to recognize the new disk configuration, you must stop and restart the Data Collector (PerfServ).

4.2.2 Problem with Daylight Saving Time Changes

For some time zones, especially European ones, the time-zone logic in the Java software libraries that the Data Analyzer uses might disagree with the Windows operating system about when the shift to daylight saving time occurs. For a two-week period in early April and late October, you might see a one-hour discrepancy between the time shown in the Data Analyzer and the time of day shown by the system and the Date-Time Control panel.

Also, Sun's Java classes disagree with Windows about whether daylight saving time even exists for Asian time zones. The Windows DateTime CP usually indicates that daylight saving time is not possible for these zones; time strings generated from the calendar classes in Java appear to recognize a daylight saving time shift. Therefore, for all time zones between eastern Europe, going east to Alaska, a one-hour discrepancy is likely from April through October. This discrepancy occurs for months at a time.

For OpenVMS systems, make sure that the time zone differential logical name SYS$TIMEZONE_DIFFERENTIAL is defined correctly.

4.2.3 Occasional Application Failure for Data Analyzer on OpenVMS Systems

If you are running the Data Analyzer on a multiprocessor OpenVMS system, you might encounter a "SIGBUS 10" application error. In this application error, your output window displays several hundred lines of low-level thread state. Compaq has seen this only when kernel multithreading was enabled for the process. Section 2.2.1 contains instructions for disabling kernel multithreading.

Future patch kits for the kernel-threads subsystem on OpenVMS might solve this problem. Note that disabling kernel multithreading for the Data Analyzer does not disable application-level multithreading within the Java Virtual Machine or affect kernel multithreading for other applications on the OpenVMS system.

4.2.4 Event Reporting Problems

The following list contains known event reporting problems that have been reported in Version 1.4:

4.2.5 Out-of-Memory Problems on Long Runs

If a session runs for many days, and the Data Analyzer is collecting data on many nodes, the Data Analyzer might run out of virtual memory (object heap). (See the installation instructions for Windows or OpenVMS for details on modifying the heap size.) On Windows systems, the Data Analyzer does not report the problem. On OpenVMS systems, the Data Analyzer displays an "OutOfMemoryException" error in the window in which the Data Analyzer was started. On either system, one or more parts of the display might stop updating. The only workaround is to restart the Data Analyzer.

5 Display Notes

The following notes pertain to the display of data on Availability Manager pages and have been organized under the following headings:

5.1 Problems Using the Data Analyzer on All Platforms

The problems discussed in this section apply to running the Data Analyzer on all platforms.

5.1.1 Hardware Model Sometimes Not Displayed on Node Summary Page

For some long hardware model names, the Node Summary page hides most of the model name. On OpenVMS nodes, you can force the page to reveal the name by clicking the portion of the name that is visible and scrolling right. This problem will be resolved in the next release.

5.1.2 Problem Displaying Help in Some Browsers on Windows

The following problems have been observed when using Version 4.7 of Netscape and some versions of Internet Explorer:

These problems have not been seen with Netscape Version 4.5; Compaq has not tested other versions of Netscape.

5.1.3 Incomplete Repainting of Windows

If you obscure part of an Availability Manager window with another window, the obscured portion of the Availability Manager window might not repaint completely when you move the top window. This appears to be a Java Swing problem that is currently under investigation.

5.2 Problems Using the Data Analyzer on OpenVMS

The problems discussed in this section apply to running the Data Analyzer on OpenVMS systems.

5.2.1 Problem Exiting Field on OpenVMS Data Collection Customization Page

In customizing the OpenVMS Data Collection page on OpenVMS, if you change a data collection interval and press Enter to exit the field, the value is not entered as expected. You must use the mouse to move the cursor out of the field.

5.2.2 Long Runs Exhaust XLIB Resource ID

The version of Motif currently shipping with OpenVMS is based on X11R5. That release of X11 uses a resource ID allocation scheme that works poorly with the Motif support in Java for OpenVMS. As a result, most long-running Availability Manager sessions will stop updating the display at a time that depends on the speed of the OpenVMS machine. For example, a session running on a dual-processor 275 MHz system reported the following after 14 hours:


     Xlib:  resource ID allocation space exhausted! 

On faster machines, this message was reported after only 8 hours. This problem is under investigation.

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6552REL_NOTES_V14.HTML