Australia - Updated: 24-SEP-2003
hp.com home products and services support and drivers solutions how to buy
» contact hp
hp.com home hp OpenVMS ECOs

IMPORTANT NOTICE

The online distribution of OpenVMS and related product patches is being migrated to the HP ITRC (Information Technology Resource Center) patch distribution site. The new ITRC patch server will allow OpenVMS customers to take advantage of many enhanced features for patch searching and distribution.

Beginning August 1, 2003, OpenVMS and related Layered Product, publicly available patches will be available from the HP ITRC web site at

http://itrc.hp.com/service/patch/mainPage.do

The same patches will still be available from the existing patch server in Colorado Springs (http://www.support.compaq.com/patches/) through the end of October 2003, to give customers sufficient time to update their bookmarks and make the transition to the HP ITRC web site.

ECO kits will also be available by raw FTP from (ftp://ftp.itrc.hp.com/).

PLEASE UPDATE YOUR BOOKMARKS AND REGISTER ON THE NEW SITE NOW

Note: if you're having trouble connecting to the ITRC site, please delete any cookies for "itrc.hp.com" from your browser and try again. Report any difficulties with or suggestions to MrVMS

» Sydney CSC home page

Navigation
» ECOinfo main index
» Search ECOs
» Search FTP site
» Browse FTP site

ECO Indexes
» Chronological Index
» Indexed by Version
» Indexed by Rating
» Alpha Indexed by Name
» VAX Indexed by Name
» On Hold List

Associated Links
» OpenVMS Home Page
» OpenVMS News
» DIA/WIS Web Service

Feedback
» mail to CSC
.
Sydney Customer Support Centre OpenVMS ECO information
    Updated: 24-SEP-2003 (Use your browsers' Reload button to ensure you're viewing the most recent version)

VMS721H1_MEM_CHAN-V0100 Alpha V7.2-1H1 Memory Channel ECO Summary

To obtain this kit please call the Customer Support Centre or use the FTP site

Search for this ECO kit and dependencies
Search the Compaq FTP web site this kit (exact match)
Search the Compaq FTP web site this or related ECOs

    
    
    *OpenVMS] VMS721H1_MEM_CHAN-V0100 Alpha V7.2-1H1 Memory Channel ECO Summary
    
    New Kit Date:       01-NOV-2002
    Modification Date:  Not Applicable
    Modification Type:  NEW KIT
    
    Copyright (c) Compaq Computer Corporation 2002.  All rights reserved.
    
    OP/SYS:     OpenVMS Alpha V7.2-1H1
    
    COMPONENT:  Memory Channel
    
    SOURCE:     Compaq Computer Corporation
    
    ECO INFORMATION:
    
         ECO Kit Name:  VMS721H1_MEM_CHAN-V0100
                        DEC-AXPVMS-VMS721H1_MEM_CHAN-V0100--4.PCSI
         ECO Kits Superseded by This ECO Kit: None
         ECO Kit Approximate Size: 704 Blocks
         Kit Applies To:  OpenVMS Alpha V7.2-1H1
         System/Cluster Reboot Necessary: Yes
         Rolling Re-boot Supported:  Yes
         Installation Rating:  INSTALL_2
                                 2 : To  be  installed  by   all  customers  using  the  following
               		         feature(s):
    
         			         Memory Channel.
    
         Kit Dependencies:
    
           The following remedial kit(s), or later, must be installed BEFORE
           installation of this, or any required kit:
    
             VMS721H1_UPDATE-V0500
    
           In order to receive all the corrections listed in this
           kit, the following remedial kits should also be installed:
    
             None
    
    
    ECO KIT SUMMARY:
    
    An ECO kit exists for Memory Channel on OpenVMS Alpha V7.2-1H1.
    This kit addresses the following problems:
    
    
    PROBLEMS ADDRESSED IN VMS721H1_MEM_CHAN-V0100 KIT
    
    
          o  1.  A Memory Channel virtual-hub (VHUB) will fail to come
                 "ONLINE" and form SCS-virtual-circuitlink-up if the Memory
                 Channel VHUB VH0/Master node is not booted first, prior to
                 booting the VHUB VH1/Slave MC-node
    
             2.  If a VH0/Master Memory Channel node crashes and/or reboots
                 while the VH1/Slave Memory Channel node remains running,
                 the Memory Channel link will fail and both VHUB Memory
                 Channel nodes MCA0 (and MCB0 if applicable) will remain
                 "OFFLINE"
    
             This MCx0 "OFFLINE" problem may also occur during MCA0/MCB0
             adapter/link error-handling/recovery.
    
             The following symptoms are manifestations of this MC VHUB BOOT
             "OFFLINE" problem:
    
             OPA0: console errors:
             --------------------
    
             %MCA0 CPU00:  19-SEP-2000 04:17:50  Slave but adapter_ok
                           off, retrying.
             %MCA0 CPU00:  19-SEP-2000 04:17:50 MC re-init 5 second timer.
             %MCA0 CPU00:  19-SEP-2000 04:17:55 Slave but adapter_ok
                           off, retrying.
             %MCA0 CPU00:  19-SEP-2000 04:17:55 MC re-init 5 second timer.
                                  .
                                  .
             ....... after 20 retries ...............
                                  .
                                  .
             %MCA0 CPU00:  19-SEP-2000 04:18:00 Slave but adapter_ok
                           off, retrying.
             %MCA0 CPU00:  19-SEP-2000 04:18:00 MC re-init 10 minute timer.
             %MCA0 CPU00:  19-SEP-2000 04:28:00 Slave but adapter_ok off,
                           retrying.
             %MCA0 CPU00:  19-SEP-2000 04:28:00 MC re-init 10 minute timer.
                                  .
                                  .
                                  .
             ON REMOTE NODE ATTEMPTING MC SW INIT .........
             MCA0 CPU00:  19-SEP-2000 04:27:50 node state retries exceeded"
    
    
             DCL SHOW DEVICE command output:
             -------------------------------
             $ DCL SHOW DVICE MCA0: & PMA0: (& MCB0:/PMB0:) = OFFLINE:
    
             $ SHOW DEVICE MC
              Device                  Device           Error
               Name                   Status           Count
               MCA0:                   Offline              2
               MCB0:                   Offline             16
    
             $ SHOW DEVICE PM
              Device                  Device           Error
               Name                   Status           Count
               PMA0:                   Offline              0
               PMB0:                   Offline              0
    
             Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE
    
    
    
          o  An MC_INCONSTATE (SYS$MCDRIVER) bugcheck may occur during
             local/remote Memory Channel node reboot or Memory Channel
             adapter/Memory Channel link- error-recovery.  This bugcheck can
             occur regardless of the Memory Channel hub configuration:  VHUB
             or real-HUB.  The MC_INCONSTATE bugcheck will typically occur
             when a "nested error (MCDRIVER-internal or MC-adapter
             HW-error)" is encountered while recovering from a memory
             channel link error or local/remote memory channel node
             crash/reboot.
    
             The "MC_INCONSTATE" bugcheck is obvious, and is nearly always
             caused by this "nested error-handling" bug.  A typical MCx0:
             error-log event sequence, and SDA> crash summary are shown
             below:
    
             MCx0: ERROR-LOG SUMMARY: Unsuccessful events:
             ---------------------------------------------
             MCB0 - Hardware error, reinitializing.
             MCB0 -
                     Node 0:     State:  Uninitialized
                     Node 1:     State:  Uninitialized
             MCB0 - Memory channel link online failure 2
             MCB0 - We shouldn't be here.
                     CRASH - MC_INCONSTATE
    
             Crashdump Summary Information:
             ------------------------------
             Bugcheck Type:     MC_INCONSTATE, Fatal error
                                detected by Memory Channel
             Failing PC:        FFFFFFFF.E2983A44  SYS$MCDRIVER+0BA44
             Failing PS:        30000000.00000804
             Module:            SYS$MCDRIVER (Link Date/Time:
                                 29-DEC-1999 04:09:37.99)
             Offset:            0000BA44
    
    
             Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE
    
    
          o  Memory Channel Receive channel (RX_MESS_CHAN) message
             processing may hang after processing 512 RX_MESS_CHAN messages
             during a single fork-thread ([MEM_CHAN]MC$HANDLE_MESS_CHAN_INT
             routine).  This could occur with heavy Memory Channel
             SCS-traffic and high IPL-8 fork-thread scheduling latency.  A
             Memory Channel RX_MESS_CHAN message-handling hang will lead to
             CNXMGR/LOCK_MGR stalls (and potential cluster hangs) as well as
             SCS "virtual-circuit timeouts".
    
             OPA0: CONSOLE PM/MC ERROR MESSAGES:
             -----------------------------------
             %PMA0 CPU00:  ... MC$_CHAN_QUE_EMPTY
                               channel = 541C8  ppd = 83DD4CC0
             %PMA0 CPU00:  ... stall state CLEAR
                               channel = 541C8  ppd = 83DD4CC0
             %MCA0 CPU00:  ... Timeslice exceeded while in workque
                               for node RM763A
             %MCA0 CPU00:  ... Timeslice exceeded while in workque
                               for node RM763A
             %MCA0 CPU00:  ... Timeslice exceeded while in workque
                               for node RM763A
             %PMA0, Virtual Circuit Timeout - REMOTE PORT  xxxx
    
    
             SCS VC-TIMEOUT ERRLOG ENTRY:
             ----------------------------
                                      .
                                      .
                                      .
             Error Type/SubType     x4009    Signaled via Packet, Virtual
                                             Circuit Timeout.
    
             The "...  Timeslice exceeded" error may continue to occur after
             this fix is applied.  However, MC RX_MESS_CHAN processing will
             no longer hang after this event.
    
             Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE
    
    
    
    
          o  Following a boot-time Memory Channel C unit-init/self-test
             "LOOPBACK WRITE TEST" failure, which indicates a Memory Channel
             adapter PCI-DMA error, the MCDRIVER will enter an infinite
             HW/SW initialization error-retry loop.  The following
             OPA0:/console errors will be issued at 5 second intervals,
             changing to 10 minute intervals after 20 retries:
    
             %MCA0 CPU00:  ... MC loopback write interrupt test failed.
             %MCA0 CPU00:  ... Couldn't get mgmt lock.
             %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
             %MCA0 CPU00:  ... Couldn't get mgmt lock.
             %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
             %MCA0 CPU00:  ... Couldn't get mgmt lock.
             %MCA0 CPU00:  ... ERR - ucb offline and adapter not crashing .
    
    
             Note:  The first error message occurs on the first pass only.
    
             Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE
    
    
    
    
    
          o  CPUSPINWAIT bugchecks may occur on any GSxxx Alphaserver
             platform (GS140,GS80/160/320) with a Memory Channel-adapter.
             The bugchecks occur due to an eror in the SYS$MCDRIVER
             "MC$ALLOCATE_MESSAGE" routine performing Memory Channel message
             free-queue-header "loopback WRITE", and an incorrect timer
             implementation.  The CPUSPINWAIT bugcheck will always involve
             an SMP$TIMEOUT acquiring the SCS-spinlock while another SMP-CPU
             is holding the SCS-spinlock within the SYS$MCDRIVER /
             [MEM_CHAN]MCCHANNELS.C MC$ALLOCATE_MESSAGE routine.
    
             Crashdump Summary Information:
             ------------------------------
             Bugcheck Type:     CPUSPINWAIT, CPU spinwait timer expired
             Failing PC:        FFFFFFFF.8007A384    SMP$TIMEOUT_C+00064
             Failing PS:        28000000.00000804
             Module:            SYSTEM_SYNCHRONIZATION_MIN
             Offset:            00000384
    
    
             NOTE:  The "MC loopback write interrupt test failed" error is
             typically due to a leftover/stale Memory Channel adapter
             PCI-logic error-state that will only clear with a CONSOLE >>>
             INIT operation (to perform PCI-bus RESET).  Users who
             frequently reboot without using the CONSOLE >>> BOOT_RESET = ON
             switch (Environment Variable) or without performing a CONSOLE
             >>> INIT command are susceptible to this "MC loopback write
             test" error.
    
             Images Affected:[SYS$LDR]SYS$MCDRIVER.EXE
    
    
    
    
          o  Any SCS-data-transfer of "0-length", using the
             Memory-Channel/MC SCS-port will result in an "INVPTEFMT,
             Invalid page table entry format" bugcheck The bugcheck is
             within IOC_STD$PTETOPFN, as a result of a call to
             IOC_STD$FILSPT from PMDRIVER.C/SETUP_COPY.
    
             Crashdump Summary Information:
             ------------------------------
             Bugcheck Type:     INVPTEFMT, Invalid page table
                                entry format
             Current Process:   NULL
             Current Image:     <not available>
             Failing PC:        FFFFFFFF.800B88FC
                                IOC_STD$PTETOPFN_C+0008C
             Failing PS:        38000000.00000804
             Module:            IO_ROUTINES (Link Date/Time:
                                13-DEC-2000 00:39:37.49)
             Offset:            000048FC
    
    
             Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE
    
    
    
    
          o  SCS "SEND MESSAGE" (typically LOCK_MGR and MSCP disk commands)
             and SCS data transfer commands, issued over a PM/MC SCS virtual
             circuit (VC), can stall or hang following exhaustion of Memory
             channel "channel-free-queue" entries.  The duration of this
             stall or hang is entirely dependent on SCS-sysap traffic and
             flow-control (SCS "credit") patterns and will persist until one
             of the following occurs:
    
              o  SCS VC timeout error closes the VC
    
              o  SCS-sysap sends a message that breaks the stalemate
    
              o  SCS VC timeout mechanism sends a message that breaks the
                 stalemate
    
              o  PMx0:  SCS-port timeout occurs, crashing the MC port
    
    
             This SYS$PMDRIVER MC-SCS-command processing hang/stall can
             occur under the following two conditions:
    
              -  HANG:  Under heavy and primarily unidirectional loads;
    
              -  STALL:  Under more bi-directional loads, stalls will create
                 low performance over the Memory Channel VC, drastically
                 reducing Memory Channel performance under load.
    
    
             Because this hang/stall will block internode SCS-sysap cluster
             communications, symptoms can be obscure and numerous, or may
             manifest as:
    
              o  Performance degradation over Memory Channel based SCS VCs
    
              o  A SCS VC-timeout
    
              o  A LOCK_MGR stall/hang or performance loss
    
              o  MSCP served disk command timeouts or disk I/O slowdowns
    
              o  Customer LOCK_MGR-dependent application stalls, hangs, or
                 slowdowns
    
    
             Images Affected:[SYS$LDR]SYS$PMDRIVER.EXE
    
    
    
    RELATED ARTICLES:
    
    Detailed articles describing the problems listed above may exist in
    the OPENVMS database(s).  To view these articles,
    open the appropriate product database and perform a query using either
    of the following search strings: 'VMS721H1_MEM_CHAN-V0100' or
    'VMS721H1_MEM_CHAN'.
    
    
    ECO KIT ORDERING INSTRUCTIONS:
    
    If after an evaluation you wish to obtain this kit, request it
    electronically using the appropriate Advanced Electronic Services
    (AES) Service Tool.  If you are not familiar with how to request
    kits electronically, open the DIA, WIS or DSNLINK database and
    review the article entitled:
    
         [AES] How To Electronically Request ECO Kits Using Service Tools
    
    
    INSTALLATION NOTES:
    
    This kit requires a system reboot.  Compaq strongly recommends that
    a  reboot  is performed immediately after kit installation to avoid
    system instability
    
    If you have other nodes in your OpenVMS cluster, they must also  be
    rebooted  in  order  to make use of the new image(s).  If it is not
    possible or convenient to reboot the entire cluster at this time, a
    rolling re-boot may be performed.
    
    INSTALLATION INSTRUCTIONS:
    
    Install this kit with the POLYCENTER Software installation utility
    by logging into the SYSTEM account, and typing the following at the
    DCL prompt:
    
    PRODUCT INSTALL VMS721H1_MEM_CHAN /SOURCE=[location of Kit]
    
    The kit location may be a tape drive, CD, or a disk directory that
    contains the kit.
    
    Additional help on installing PCSI kits can be found by typing
    HELP PRODUCT INSTALL at the system prompt
    
    Special Installation Instructions:
    
         o  Scripting of Answers to Installation Questions
    
            During installation, this kit will ask and require user
            response to several questions.  If you wish to automate the
            installation of this kit and avoid having to provide responses
            to these questions, you must create a DCL command procedure
            that includes the following definitions and commands:
    
               -  $ DEFINE/SYS NO_ASK$BACKUP TRUE
    
               -  $ DEFINE/SYS NO_ASK$REBOOT TRUE
    
               -  Add the following qualifiers to the PRODUCT INSTALL
                  command and add that command to the DCL procedure.
    
                    /PROD=DEC/BASE=AXPVMS/VER=V1.0
    
    
               -  De-assign the logicals assigned
    
            For example, a sample command file to install the
            VMS721H1_MEM_CHAN-V0100 kit would be:
    
              $
              $ DEFINE/SYS NO_ASK$BACKUP TRUE
              $ DEFINE/SYS NO_ASK$REBOOT TRUE
              $!
              $ PROD INSTALL VMS721H1_MEM_CHAN/PROD=DEC/BASE=AXPVMS/VER=V1.0
              $!
              $ DEASSIGN/SYS NO_ASK$BACKUP
              $ DEASSIGN/SYS NO_ASK$REBOOT
              $!
              $ exit
    
    
    
    
    
    All trademarks are the property of their respective owners.
      
      ==========================================================================
      |                     Table of Kit Image Information                     |
      +----------------------------+----------+-----------------+--------------+
      |                            | Overall  | Image File      | Image Link   |
      | Image Name                 | Checksum | Identification  | Date/Time    |
      +----------------------------+----------+-----------------+--------------+
      | SYS$MCDRIVER.EXE           | 551C7780 | X-59            | 14-MAY-2002  |
      |                                       |                 | 16:48:09.79  |
      +----------------------------+----------+-----------------+--------------+
      | SYS$PMDRIVER.EXE           | 70A17906 | X-25A1          | 30-JUL-2002  |
      |                                       |                 | 14:27:13.79  |
      +----------------------------+----------+-----------------+--------------+
    
    
    
    
privacy statement using this site means you accept its terms feedback to the webmaster
VMS rules VMS rocks OpenVMS rules OpenVMS rocks