VMs / EC2 Service Unavailable

A service running on the VM/EC2 instance is not responding.

Understanding Prometheus and Its Purpose

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is designed to record real-time metrics in a time-series database, built using an HTTP pull model, with flexible queries and real-time alerting. Prometheus is widely used for monitoring cloud environments, including VMs and EC2 instances, to ensure services are running smoothly and efficiently.

Symptom: Service Unavailable

The Prometheus alert 'Service Unavailable' indicates that a service running on your VM or EC2 instance is not responding as expected. This alert is crucial as it can impact the availability and performance of your applications.

Details About the Alert

When Prometheus triggers a 'Service Unavailable' alert, it means that the service is either down or not reachable. This could be due to various reasons such as the service crashing, network issues, or resource exhaustion on the VM/EC2 instance. The alert is typically generated when Prometheus fails to receive a response from the service within a specified timeout period.

Common Causes

  • Service crash or failure
  • Network connectivity issues
  • Resource exhaustion (CPU, memory, disk)

Impact of the Alert

This alert can lead to downtime for users relying on the service, potentially affecting business operations and user satisfaction. It is important to address this alert promptly to restore service availability.

Steps to Fix the Alert

1. Check Service Status

First, verify the status of the service on your VM/EC2 instance. You can use the following command to check if the service is running:

systemctl status <service-name>

If the service is not running, attempt to start it:

sudo systemctl start <service-name>

2. Review Service Logs

Examine the service logs to identify any errors or issues that may have caused the service to become unavailable. Logs are typically located in /var/log/<service-name>/ or can be accessed using:

journalctl -u <service-name>

3. Check Resource Utilization

Ensure that your VM/EC2 instance has sufficient resources. Use the following commands to check CPU and memory usage:

top

or

htop

If resources are exhausted, consider scaling your instance or optimizing your service.

4. Network Connectivity

Verify network connectivity to ensure there are no issues with the network interface or firewall rules. Use ping or curl to test connectivity:

ping <service-host>curl http://<service-host>:<port>

Additional Resources

For more information on managing services on Linux, visit the systemd documentation. To learn more about Prometheus alerts, check the Prometheus Alerting documentation.

Try DrDroid: AI Agent for Production Debugging

80+ monitoring tool integrations
Long term memory about your stack
Locally run Mac App available

Thank you for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.
Read more
Time to stop copy pasting your errors onto Google!

Try DrDroid: AI Agent for Debugging

80+ monitoring tool integrations
Long term memory about your stack
Locally run Mac App available

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

Thank you for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.
Read more
Time to stop copy pasting your errors onto Google!

MORE ISSUES

Deep Sea Tech Inc. — Made with ❤️ in Bangalore & San Francisco 🏢

Doctor Droid