Manage and troubleshoot your Integration Agent
There are some basic tasks you can follow and logs you can review to get information on the status of your Integration Agent and specific integrations, and to troubleshoot issues you might come across.
Stopping the Integration Agent
You might need to stop the Integration Agent following integration script or Integration Agent configuration changes, or to clear the inbound queues.
Follow the instructions for your operating system to stop the Integration Agent.
To stop the Integration Agent running as a Windows Service, do one of the following:
- Open Windows Administrative Tools > Component Services, right-click xMatters Integration Agent, and then click Stop.
- Run stop_service.bat (located at: <IAHOME>\bin) from a command line.
- Run shutdown.bat (located at: <IAHOME>\bin) from a command line.
- This commands suspends the agent, waits 30 seconds if there are pending requests, terminates any requests, and then stops the service.
To stop the Integration Agent running as a Windows program:
- Press Ctrl+C in the console window running the agent.
To stop the Integration Agent daemon, do one of the following:
- From a command line, run ./<IAHOME>/bin/stop_daemon.sh
- From a command line, run ./<IAHOME>/bin/shutdown.sh
- This commands suspends the agent, waits 30 seconds if there are pending requests, terminates any requests, and then stops the service.
To stop the Integration Agent running as a Linux application:
- Press Ctrl+C in the console window running the agent.
Integration service runtime states
Each integration service running within the Integration Agent has a runtime state. These states are specific to the integration services running within a single Integration Agent (if you have multiple Integration Agents configured.
The states displayed in the xMatters web user interface may be different because they represent the states of an integration service across all Integration Agents providing that service.
The IAdmin tool's suspend or resume commands changes the service's runtime state between SUSPENDED and ACTIVE.
State | Description |
ACTIVE | Indicates the integration service is able to access and process requests. |
SUSPENDED |
Indicates the integration service is properly configured, but has been manually set to deny requests (for example, when a management system is undergoing maintenance — this allows you to work on one service and reload it without impacting other system components ). When suspended:
|
ERROR | Indicates the integration service is improperly configured, in most cases due to an issue in the configuration file or a script syntax error. Once the cause of the error is identified and corrected, you can reload the integration service and make it active by using the IAdmin tool or by restarting the Integration Agent. |
IAdmin tool
The Integration Agent includes a command line tool, named IAdmin, that you can use to issue commands to and get the status of the Integration Agent after it has started.
IAdmin is located at:
- Windows: <IAHOME>\bin\iadmin.bat
- Linux: <IAHOME>/bin/iadmin.sh
There are various commands you can use with the iadmin command line tool to manage the Integration Agent and get information on its status.
Command syntax | Description |
get-status |
Use the get-status command to perform troubleshooting or view status information about the Integration Agent. The get-status command displays the following information:
|
display-settings | Displays the settings that are currently in use for the Integration Agent. |
suspend <domain> <service> | Suspends the specified integration service. Incoming requests to the service are refused, but pending requests are maintained. |
suspend all | Suspends all active integration services. Incoming requests to all services are refused, but pending requests are maintained. |
suspend-now <domain> <service> | Suspends the specified integration service. Incoming requests to the service are refused, and pending requests are immediately terminated. |
suspend-now all | Suspends all active integration services. Incoming requests to all services are refused, and pending requests are immediately terminated. |
resume <domain> <service> |
Resumes the specified integration service. Only SUSPENDED or ACTIVE services can be resumed. |
resume all | Resumes all suspended integration services. |
reload <domain> <service> |
Reloads the configuration file for the specified integration service:
|
reload all | Reloads the Integration Agent’s configuration file. This effectively removes any integration services that are no longer included in IAConfig.xml, creates and loads any new integration services, and reloads any existing integration services. Additionally, all of the Integration Agent configuration setttings are updated, except for the admin-gateway, heartbeat-interval, and id elements. |
purge <domain> <service> |
Removes all inbound and outbound APXML messages from the specified integration service. APXML messages that are being processed are maintained. Unlike other IAdmin commands, purge can be executed even if the Integration Agent is not running. |
purge all |
Removes all inbound and outbound APXML messages from all integration services. APXML messages that are being processed are maintained. Unlike other IAdmin commands, purge can be executed even if the Integration Agent is not running. |
Command syntax | Description |
UNKNOWN | No connection attempt has been made, or the attempt has not been completed. |
FAILED |
No connection can be made between the Integration Agent and the primary xMatters web server. The Integration Agent continues sending heartbeats, and a Health Monitor notification is sent when the heartbeat recovers. |
PRIMARY_CONNECTED |
There is a connection between the Integration Agent and a primary xMatters web server, but the heartbeat generates an error. The Integration Agent continues to send heartbeats to the primary servers, but functionality may be limited until the heartbeat is fully accepted. A Health Monitor notification is sent when the heartbeat is fully accepted (see the Integration Agent log for details). |
PRIMARY_ACCEPTED | The Integration Agent has successfully sent a fully accepted heartbeat to a primary xMatters web server, and the Integration Agent is fully functional. |
IAdmin Logging
The results of administrative commands in the IAdmin tool are displayed in the console and captured in the log files.
Additionally, exit codes are captured in the log files. If errors occur during the execution of a command, a brief description of the error is displayed on the console, and additional details are captured in the log files.
The IAdmin command line tool returns an exit code number (meant to be used in automation scripts) indicating either success or the type of error encountered.
IAdmin tool exit codes
Error code | Description |
0 | Success (no pending requests) |
30 | Success, at least one pending request |
35 | Invalid arguments; check logs for details |
40 | IA Config error; check logs for details |
45 | Command failed; check logs for details |
Integration Agent log
The log files for the Integration Agent and the IAdmin tool are the first place to check if you notice issues with your installation.
Integration Agent logs: Contains logs for all Integration Agent entries of a warning level or higher (by default).
- Windows: <IAHOME>\log\IntegrationAgent.txt
- Linux: <IAHOME>/log/IntegrationAgent.txt
IAdmin log: Contains the results of administrative commands in the IAdmin tool.
- Windows: <IAHOME>\log\IntegrationAgentIAdmin.txt
- Linux: <IAHOME>/log/IntegrationAgentIAdmin.txt
The Integration Agent uses log4j version 2.17.1 for logging. The information here covers items that are specific to the Integration Agent. See the log4j documentation for full details about its format and its settings.
Default log entry format
Log entries are in the following format:
<date> <time> <thread> <log_level> <log_message>
The following represents a log entry in the default format:
2018/04/25 15:52:01.537 -0700 PDT [applications|sample-plan-1] INFO - Calling JavaScript method apia_http
Logging configuration file
The logging configuration file is named log4j2.xml and is located at:
- Windows: <IAHOME>\conf
- Linux: <IAHOME>/conf
To apply logging configuration changes, restart the Integration Agent after saving the configuration file.
Logging categories
Most Integration Agent log messages appear under the com.alarmpoint.integrationagent category hierarchy. Each major component or activity (for example, Health Monitor, heartbeats, integration service requests, etc.) logs to a dedicated subcategory. The advantage to this approach is that logging can be focused on a specific aspect of the Integration Agent’s activities.
The following table summarizes selected log4j categories:
Category | What it logs |
com.alarmpoint.integrationagent |
All Integration Agent activity. In the entries that follow, <root> represents com.alarmpoint.integrationagent |
<root>.admin | IAdmin tool requests |
<root>.apclient | APClient.bin submissions |
<root>.health | Health Monitor activity |
<root>.health.mail | Health Monitor mailer activity |
<root>.heartbeat | Heartbeats activity |
<root>.messaging | apxml-exchange activity |
<root>.services | All integration services activity |
<root_AP_cat>.services.test_service_1 | Example for specific integration service. This would log the test_service_1 integration service. |
To expose a specific logging category, open the log4j2.xml configuration file and un-comment one or more logging categories.
By default, the Heartbeat logging category is inactive because it is commented out in the log4j2.xml file:
<!--
<logger name="com.alarmpoint.integrationagent.heartbeat">
<level value="DEBUG"/>
</logger>
-->
To enable Heartbeat logging, uncomment the entry and save the file:
<logger name="com.alarmpoint.integrationagent.heartbeat">
<level value="DEBUG"/>
</logger>
Troubleshooting
If you encounter any hiccups using the Integration Agent, there are some things you can check to help fix the issue.
Startup issues
On startup, the Integration Agent validates its configuration, including the:
- Presence of the logl4j-bridge.xml configuration file.
- Presence and well-formedness of the IAConfig.xml file.
- Availability of the Admin and Integration Agent port.
Not all incorrect configurations prevent startup. For example, problems with the xMatters web server URLs or the integration service configuration files are logged as errors, but do not prevent the Integration Agent from starting. You can inspect the startup exit codes to investigate startup issues.
- Windows: The service state is “Starting” during this initial validation, and does not change to “Started” until the validation is complete. If the agent cannot start, the reason for the failure is written to the Windows system log and an exit code is set on the xMattersIntegration Agent Service (if the agent was started as a console application, the agent’s exit code is set as the batch file’s exit code).
- To see the Window’s service exit code, run the following from a command line:
sc query apia
The code appears in the SERVICE_EXIT_CODE field. - Alternatively, if the shutdown.bat script is used to start the Integration Agent, the batch file returns the service exit code.
- To see the Window’s service exit code, run the following from a command line:
- Linux: The initial validation is part of the daemon process. Any problems preventing startup cause the daemon to terminate. Details regarding startup problems are sent to the syslog daemon through the user facility at the fatal level and with the identity apia. The syslog daemon must be configured by an administrator to perform logging.
Additionally, on both Windows and Linux, all startup logging is written to the console and a special log file named apia.txt located in the log directory. After startup is complete, logging reverts to the standard Integration Agent log files.
Startup exit codes | Description |
0 | Success |
30 | Service already started |
35 | Service not installed |
40 | Service not stopped |
45 | Service failed to start, but did not have a SERVICE_EXIT_CODE |
50 | get-status failure (check logs) |
60 | Missing configuration file |
61 | Unreadable configuration file (i.e., file is locked) |
65 | Malformed configuration file |
70 | Missing or unreadable logl4j-bridge properties file |
80 | Unable to bind to Admin Gateway port |
85 | Unable to bind to Web Services Gateway port |
86 | Malformed Mule configuration file |
87 | Mule startup error |
88 | Mule timeout error |
90 | Nonspecific startup error (check logs) |
Unix error code 45
Typically, the Integration Agent is started by a non-privileged user. If the Integration Agent daemon is started using a root account and the hosting machine needs to be restarted for any reason, the service may not start after boot up and return an error code 45. This is caused by the root account owning some of the required folders and denying access to the non-privileged user.
- Log in as the root account and navigate to the <IAHOME> folder.
- Run the following commands:
chown -Rf xm:xm .mule
chown -Rf xm:xm .activemq-data
chown -Rf xm:xm log/IntegrationAgent.txt
- Log out of the root account.
- Restart the Integration Agent daemon as a non-privileged user.
SSL certificates error
When starting up the Integration Agent, you might encounter an error if the SSL certificates were not imported properly.
The error message is similar to the following:
[Heartbeat-1] ERROR - The xMatters Web Server
https://<HOST>.xmatters.com/api/services/AlarmPointWebService is unavailable or completely rejected the heartbeat.
org.apache.axis2.AxisFault: sun.security.validator.ValidatorException:
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable
to find valid certification path to requested target
To resolve this error:
- Stop the Integration Agent.
- Ensure that the required SSL certificates have been imported into <IAHOME>\jre\lib\security\cacerts.
- For additional instructions or troubleshooting tips, contact Support.
- Restart the Integration Agent.
For more information about SSL troubleshooting for xMatters integrations, visit our support site.
Linux process issues
In some cases, a user may be unable to start the Integration Agent due to an “AlarmPoint_Integration_Agent is already running” exception when the Integration Agent is not actually running. This is because the Integration Agent may not have been stopped properly (e.g., due to a power failure).
To verify that the Integration Agent is not running, search for a process with the argument containing the keyword “wrapper” or “java” that refers to the Integration Agent’s install directory.
On some Linux systems, your can search for this process by reviewing the output of the following commands:
ps -ef | grep wrapper
ps -ef | grep java
If such processes exist, then the Integration Agent is running and must be stopped before it is restarted.
If no such process exists, then deleting the file located at <IAHOME>/lib/mule-1.4.3/bin/.apia.pid should allow the Integration Agent to start.
Integration service request issues
There are two main types of integration service issues that can occur: integration service request receives no SOAP response and integration service receives SOAP error response.
Integration service request receives No SOAP response
Usually this type of issue indicates one of the following:
- Improperly installed integration service: For example, caused by a problem with the Integration Agent or integration service configuration
- Integration service addressing problem: Integration Agent is reachable, but the URL used to specify the Web Service Gateway is invalid (e.g., incorrectly named integration service)
- Network failure: Integration Agent unreachable from the client
- Make sure the Integration Agent has started then issue a get-status command using IAdmin.
- Verify that the integration service appears in the list of services and is ACTIVE.
- If the service does not appear in the list or has an ERROR_ACTIVE or ERROR_INACTIVE status, check the Integration Agent log to determine the nature of the configuration problem.
- Attempt to reach an integration service by opening a web browser and typing the URL of the service followed by ?wsdl in the address bar.
For example, if an integration service has the following configuration:
- Name: sample-plan
- Event Domain: applications
- IAConfig.xml: <service-gateway ssl="true" host="www.company.com" port="8081"/>
To test this service, type the following into a web browser’s address bar (preferably from a computer in the same location as the client):
https://www.company.com:8081/applications_sample-plan?wsdl
If the integration service is properly installed and accessible, then a response similar to the following is returned:
<?xml version="1.0" encoding="UTF-8" ?> <wsdl:definitions target...<snip>
If this response is received, the problem is likely related to an incorrectly configured Integration Agent client. If this response is not received, the problem is likely related to a connectivity issue (e.g., connection prevented by a firewall) between the Integration Agent client and the Integration Agent.
Integration service receives SOAP Error response
This indicates that the integration service is able to receive requests, but not process them.
- Make sure the Integration Agent has started then issue a get-status command using IAdmin.
- Verify that the integration service appears in the list of services and is ACTIVE.
- If the service is in any other state, then the expected behavior is for the service to deny the request and respond with a SOAP error indicating that the service is not able to process the request.
- In the Integration Agent’s log configuration file, set (or add) a DEBUG category for the specific integration service. For example, for an integration service named sample-plan, the logl4j-bridge.xml entry would be similar to the following:
- Save the log configuration file and wait at least 10 seconds to allow the change to be detected.
- Submit a new integration service request, and then review the log file to determine the nature of the problem.
<logger name="com.alarmpoint.integrationagent.services.applications.sample-plan">
<level value="DEBUG"/>
</logger>
Heartbeat issues
Usually this type of issue indicates one of the following:
- Incorrect Integration Agent configuration: For example, the xMatters web server URL is improperly specified.
- Connectivity issue between xMatters web server and the Integration Agent.
- Make sure the Integration Agent has started then issue a display-settings command using IAdmin.
- Verify that the xMatters web server URL appears in the list of server URLs.
- If the URL does not appear in the list, then this indicates a problem in the IAConfig.xml file (e.g., the xMatters web server’s URL is malformed or not specified).
- In the Integration Agent’s log configuration file, set the heartbeat category to DEBUG; for example:
- Restart the Integration Agent and wait for an attempt to send a heartbeat to the xMatters web server that is rejecting the heartbeats.
- Consult the Integration Agent’s log file and locate the ERROR entry associated with the heartbeat attempt. The ERROR message either indicates a connectivity failure or provides one of the following reasons for the heartbeat’s rejection:
- UNKNOWN_DOMAIN: indicates that the integration service's event domain has not been configured on the xMatters web server
- UNKNOWN_SERVICE: indicates that one of the integration service names has not been configured on the xMatters web server
- REGISTRATION_ACL_FAILED: indicates that this Integration Agent’s ID has not been configured on the xMatters web server
- UNKNOWN_APPLICATION_ERROR: indicates that an unexpected error occurred
- SERVICE_DENIED: Depending on the error message, the web services user account in xMatters does not have the "Receive APXML" and "Send APXML" permissions (error message contains reference to "ReceiveAPXML") or does not have the "Register Integration Agent" permission (error message references a rejected heartbeat/registration).
<logger name="com.alarmpoint.integrationagent.heartbeat">
<level value="DEBUG"/>
</logger>
Integration service configuration issues
An integration service with a runtime state of ERROR has a problem with its configuration file. The most likely cause is a syntax error in the integration service's JavaScript.
- Review the Integration Agent log file.
- Locate entries pertaining to parsing the Integration Agent configuration.
- Within these entries, locate log entries that refer to parsing the integration service configuration file causing the issue. One of these entries contains an error and stack trace identifying the problem.
Example
The following log excerpt shows the context in which integration service configuration errors appear within the Integration Agent log file (the excerpt has been edited for brevity).
The first three log entries show that the integration service configuration is being parsed. The final entry shows that the integration service’s JavaScript contains a syntax error.
2007-11-06 17:28:59,567 [WrapperSimpleAppMain] INFO - Starting to initialize Integration Agent using IA Config file
C:\sandbox\INTEGR~1\distr\INSTAL~1\conf\IAConfig.xml.
2007-11-06 17:28:59,598 [WrapperSimpleAppMain] INFO - Starting to parse the Integration Service Config files in directory ../integrationservices.
2007-11-06 17:28:59,598 [WrapperSimpleAppMain] INFO - Parsing Integration Service Config file netcool/netcool-admin.xml.
2007-11-06 17:28:59,614 [WrapperSimpleAppMain] ERROR - The script for Integration Service (default,test_service_1) could not be created due to an exception.
ScriptCreationException: The script for Integration Service (default,test_service_1) could not be created due to an exception.
Caused by: org.mozilla.javascript.EvaluatorException: missing } after function body
(C:\sandbox\integrationagent\distr\installation\integrationservices\netcool\netcool-admin.js#71)
Notification delay and backlog issues
There's a delay between notification requests being sent to the Integration Agent and delivery of notifications from xMatters. A secondary symptom is a backlog of notification requests in the Integration Agent's inbound queues.
The iadmin get-status command reports a significant number of requests under the integration service's "Normal Priority inbound APXML queue" and/or the "High Priority inbound APXML queue" headings, and these numbers do not decrease substantially when the command is repeated. New notification requests will not reach xMatters until this backlog has been cleared.
- Increase the number of messages logged by the Integration Agent's queue manager and by the integration scripts, so that the cause of the delay can be identified:
- In <AHome>/conf/logl4j-bridge.xml, un-comment the "com.alarmpoint.integrationagent.jms" logger and ensure it is set to DEBUG.
- Ensure that the "com.alarmpoint.integrationagent.services" logger is un-commented and set to DEBUG.
- In the "txtAppender" section, ensure that the "maxBackupIndex" value is 10 and the "MaxFileSize" value is 10000KB.
- Save the file.
- Purge the Integration Agent queue, to allow new notification requests to be processed without delay:
- Stop the Integration Agent (<IAHOME>\bin\shutdown).
- Delete the .mule and .activemq-data folders from <IAHome>.
- Start the Integration Agent.
- Inspect the messages in the Integration Agent log file (<IAHome>\log\IntegrationAgent.txt) to verify that new notification requests are processed by the Integration Agent without timeouts or other errors. Use the xMatters web user interface to check that xMatters alerts are created without undue delay.
- Use the iadmin get-status command to monitor the queue sizes.
If delays are still observed or queue sizes increase and do not drop again, use the support-zip tool to create an archive and contact Customer Support:
- Open a command prompt or terminal window.
- Navigate to cd <IAHome>/tools.
- Type or paste the command: support-zip. The tool will tell you the name and location of the archive that it creates.
- Open a support ticket and attach the support-zip archive.
Password utility issues
If you are attempting to run the IAPassword utility, and receive a "The system cannot find the specified path" error, install the AdoptOpenJDK archive on your system.