Example: stock market

HACMP Troubleshooting Guide - billsden.org

High Availability ClusterMulti-Processing for AIXT roubleshooting GuideVersion Edition (June 2002)Before using the information in this book, read the general information in Notices for HACMP Troubleshooting edition applies to HACMP for AIX, version and to all subsequent releases of this product until otherwise indicated in new editions. Copyright International Business Machines Corporation 1998, 2002. All rights to Government Users Restricted Rights--Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Guide 3 ContentsAbout This Guide7 Chapter 1:Diagnosing the Problem11 Troubleshooting an HACMP Cluster ..11 Becoming Aware of the Problem ..11 Application Services Are Not Available .. 12 Messages Displayed on System Console .. 12 Determining a Problem Source ..12 Examining Messages and Log Files .. 13 Investigating System Components.

High Availability Cluster Multi-Processing for AIX Troubleshooting Guide Version 4.5 SC23-4280-04

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of HACMP Troubleshooting Guide - billsden.org

1 High Availability ClusterMulti-Processing for AIXT roubleshooting GuideVersion Edition (June 2002)Before using the information in this book, read the general information in Notices for HACMP Troubleshooting edition applies to HACMP for AIX, version and to all subsequent releases of this product until otherwise indicated in new editions. Copyright International Business Machines Corporation 1998, 2002. All rights to Government Users Restricted Rights--Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Guide 3 ContentsAbout This Guide7 Chapter 1:Diagnosing the Problem11 Troubleshooting an HACMP Cluster ..11 Becoming Aware of the Problem ..11 Application Services Are Not Available .. 12 Messages Displayed on System Console .. 12 Determining a Problem Source ..12 Examining Messages and Log Files .. 13 Investigating System Components.

2 14 Tracing System Activity .. 16 Using the cldiag Utility to Perform Diagnostic Tasks .. 17 Using the Cluster Snapshot Utility to Check Cluster Configuration.. 17 Using SMIT Cluster Recovery Aids ..18 Correcting a Script Failure .. 18 Verifying Expected Behavior ..19 Chapter 2:Examining Cluster Log Files21 HACMP Messages and Cluster Log Files ..21 Types of Cluster Messages .. 21 Cluster Message Log Files .. 22 Understanding the File ..25 Format of Messages in the File .. 25 Viewing the File .. 25 Understanding the Log File ..27 Format of Messages in the Log File .. 28 Viewing the Log File .. 30 Changing the Location of the Log File .. 32 Resource Group Processing Messages in the File .. 32 Understanding the System Error Log ..33 Format of Messages in the System Error Log .. 33 Viewing Cluster Messages in the System Error Log.

3 33 Understanding the Cluster History Log File ..35 Format of Messages in the Cluster History Log File .. 35 Viewing the Cluster History Log File .. 36 Understanding the / File ..36 Viewing the / File .. 36 Contents4 Troubleshooting Guide Understanding the File ..37 Format of Messages in the File .. 37 Viewing the File .. 38 Understanding the / File ..39 Format of Messages in the / File .. 39 Viewing the / File .. 40 Chapter 3:Investigating System Components41 Overview ..41 Checking Highly Available Applications ..41 Checking the HACMP Layer ..42 Checking HACMP Components .. 42 Checking for Cluster Configuration Problems .. 46 Checking a Cluster Snapshot File .. 48 Checking the Logical Volume Manager ..53 Checking Volume Group Definitions .. 53 Checking Physical Volumes .. 55 Checking Logical Volumes .. 56 Checking Filesystems.

4 57 Checking the TCP/IP Subsystem..59 Checking Point-to-Point Connectivity .. 60 Checking the IP Address and Netmask .. 61 Checking ATM Classic IP Hardware Addresses .. 62 Checking the AIX Operating System ..63 Checking Physical Networks ..63 Checking Disks and Disk Adapters ..64 Recovering from PCI Hot Plug Network Adapter Failure .. 66 Checking System Hardware ..66 Chapter 4:Solving Common Problems67 HACMP Installation Issues ..67 Cannot Find Filesystem at Boot Time.. 67cl_convert Does Not Run Due to Failed Installation .. 68 Configuration Files Could Not Be Merged During Installation .. 68 System ID Licensing Issues .. 68 HACMP Startup Issues ..69 ODMPATH Environment Variable Not Set Correctly .. 69 Cluster Manager Starts but then Hangs .. 69clinfo Daemon Exits After Starting.. 70 Node Powers Down; Cluster Manager Will Not Start.

5 70configchk Command Returns an Unknown Host Message.. 71 Cluster Manager Hangs During Reconfiguration .. 71clsmuxpd Does Not Start or Exits After Starting .. 71 Pre- or Post-Event Does Not Exist on a Node After Upgrade .. 72 ContentsTroubleshooting Guide 5 Node Fails During Configuration with 869 LED Display .. 72 Node Cannot Rejoin the Cluster After Being Dynamically Removed .. 73 Disk and File System Issues ..73 AIX Volume Group Commands Cause System Error Reports .. 74varyonvg Command Fails on Volume Group .. 74cl_nfskill Command Fails .. 75cl_scdiskreset Command Fails .. 75fsck Command Fails at Boot Time .. 75 System Cannot Mount Specified File Systems .. 76 Cluster Disk Replacement Process Fails .. 76 Network and Switch Issues ..76 Unexpected Adapter Failure in Switched Networks .. 77 Cluster Nodes Cannot Communicate .. 77 Distributed SMIT Causes Unpredictable Results.

6 77 Cluster Managers in a FDDI Dual Ring Fail to Communicate .. 77 Token-Ring Network Thrashes .. 78 System Crashes Reconnecting MAU Cables After a Network Failure.. 78 TMSCSI Will Not Properly Reintegrate when Reconnecting Bus .. 78 Lock Manager Communication on FDDI or SOCC Networks Is Slow .. 79 SOCC Network Not Configured after System Reboot .. 79 Unusual Cluster Events Occur in Non-Switched Environments.. 79 Cannot Communicate on ATM Classic IP Network .. 80 Cannot Communicate on ATM LAN Emulation Network .. 81 HACMP Takeover Issues ..82varyonvg Command Fails during Takeover .. 83 Highly Available Applications Fail.. 83 Node Failure Detection Takes Too Long .. 84 Cluster Manager Sends a DGSP Message.. 84cfgmgr Command Causes Unwanted Behavior in Cluster .. 85 Deadman Switch Causes a Node Failure .. 86 Releasing Large Amounts of TCP Traffic Causes DMS Timeout.

7 92A device busy Message Appears After node_up_local Fails .. 92 Adapters Swap Fails Due to an rmdev device busy Error .. 93 MAC Address Is Not Communicated to the Ethernet Switch.. 94 Client Issues..94 Adapter Swap Causes Client Connectivity Problem .. 94 Clients Cannot Find Clusters.. 95 Clients Cannot Access Applications .. 95 Clinfo Does Not Appear to Be Running .. 95 Clinfo Does Not Report that a Node Is Down.. 96 Miscellaneous Issues ..96 Limited Output when Running the tail -f Command on / .. 96 CDE Hangs After IPAT on HACMP Startup .. 97cl_verify Utility Gives Unnecessary Message .. 97config_too_long Message Appears .. 97 Console Displays clsmuxpd Messages .. 98 Device LEDs Flash 888 (System Panic) .. 99 Resource Group Down though Highest Priority Node Up .. 103 Unplanned System Reboots Cause Fallover Attempt to Fail .. 104 Contents6 Troubleshooting Guide Deleted or Extraneous Objects Appear in NetView Map.

8 104F1 Doesn't Display Help in SMIT Screens .. 105/usr/es/sbin/ File (Event Summaries Display) Grows Too Large.. 105 Display Event Summaries does not Display Resource Group Information as Expected .. 105 Appendix A: HACMP Messages107 Appendix B: HACMP Tracing147 Index157 Troubleshooting Guide 7 About This GuideManaging an HACMP system involves several distinct tasks. Installation and configuration prepare the system for use, while administration involves making planned changes to the contrast, Troubleshooting deals with the unexpected; it is an important part of maintaining a stable, reliable HACMP Guide presents a comprehensive strategy for identifying and resolving problems that may affect an HACMP cluster. The Guide presents the evaluation criteria, procedures, and tools that help you determine the source of a problem. Although symptoms and causes of common problems are examined in detail, the Guide s overall focus is on developing a general methodology for solving problems at your Should Read This GuideThis Guide is intended for the system administrator responsible for maintaining an HACMP environment.

9 It helps you identify and solve problems that may occur while using the HACMP software. Even if your site is not experiencing problems with the software, it is still useful to develop the diagnostic skills described in this you are running HACMP /ES, see the Enhanced Scalability Installation and Administration Guide for a discussion of Troubleshooting in general and the RSCT Services in You BeginAs a prerequisite, you need a basic understanding of the components that make up an HACMP cluster in order to solve problems in the cluster. This Guide assumes that you understand: HACMP software and concepts Communications, including the TCP/IP subsystem The AIX operating system, including the Logical Volume Manager subsystem The hardware and software installed at your should also read the following HACMP documentation: Concepts and Facilities Guide Planning Guide Installation Guide Administration Guide Enhanced Scalability Installation and Administration Guide (if you are running HACMP /ES)8 Troubleshooting Guide HighlightingThe following highlighting conventions are used in this book:ISO 9000 ISO 9000 registered quality systems were used in the development and manufacturing of this PublicationsThe following publications provide additional information about the HACMP software.

10 Release Notes in /usr/lpp/cluster/doc/release_notes contain hardware and software requirements and last-minute information about the current release. Concepts and Facilities Guide - SC23-4276 Planning Guide - SC23-4277 Installation Guide - SC23-4278 Administration Guide - SC23-4279 Programming Locking Applications - SC23-4281 Programming Client Applications - SC23-4282 Enhanced Scalability Installation and Administration Guide - SC23-4284 Master Glossary -SC23-4285 IBM International Program License AgreementManuals accompanying machine and disk hardware also provide relevant PublicationsOn the World Wide Web, enter the following URL to access an online library of documentation covering AIX, RS/6000, and related products: Identifies variables in command syntax, new terms and concepts, or indicates Identifies routines, commands, keywords, files, directories, menu items, and other items whose actual names are predefined by the Identifies examples of specific data values, examples of text similar to what you might see displayed, examples of program code similar to what you might write as a programmer, messages from the system, or information that you should actually Guide 9 TrademarksThe following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both.


Related search queries