Debug Your Infrastructure

Get Instant Solutions for Kubernetes, Databases, Docker and more

AWS CloudWatch
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Pod Stuck in CrashLoopBackOff
Database connection timeout
Docker Container won't Start
Kubernetes ingress not working
Redis connection refused
CI/CD pipeline failing

VMs / EC2 Service Unavailable

A service running on the VM/EC2 instance is not responding.

Understanding Prometheus and Its Purpose

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is designed to record real-time metrics in a time-series database, built using an HTTP pull model, with flexible queries and real-time alerting. Prometheus is widely used for monitoring cloud environments, including VMs and EC2 instances, to ensure services are running smoothly and efficiently.

Symptom: Service Unavailable

The Prometheus alert 'Service Unavailable' indicates that a service running on your VM or EC2 instance is not responding as expected. This alert is crucial as it can impact the availability and performance of your applications.

Details About the Alert

When Prometheus triggers a 'Service Unavailable' alert, it means that the service is either down or not reachable. This could be due to various reasons such as the service crashing, network issues, or resource exhaustion on the VM/EC2 instance. The alert is typically generated when Prometheus fails to receive a response from the service within a specified timeout period.

Common Causes

  • Service crash or failure
  • Network connectivity issues
  • Resource exhaustion (CPU, memory, disk)

Impact of the Alert

This alert can lead to downtime for users relying on the service, potentially affecting business operations and user satisfaction. It is important to address this alert promptly to restore service availability.

Steps to Fix the Alert

1. Check Service Status

First, verify the status of the service on your VM/EC2 instance. You can use the following command to check if the service is running:

systemctl status <service-name>

If the service is not running, attempt to start it:

sudo systemctl start <service-name>

2. Review Service Logs

Examine the service logs to identify any errors or issues that may have caused the service to become unavailable. Logs are typically located in /var/log/<service-name>/ or can be accessed using:

journalctl -u <service-name>

3. Check Resource Utilization

Ensure that your VM/EC2 instance has sufficient resources. Use the following commands to check CPU and memory usage:

top

or

htop

If resources are exhausted, consider scaling your instance or optimizing your service.

4. Network Connectivity

Verify network connectivity to ensure there are no issues with the network interface or firewall rules. Use ping or curl to test connectivity:

ping <service-host>curl http://<service-host>:<port>

Additional Resources

For more information on managing services on Linux, visit the systemd documentation. To learn more about Prometheus alerts, check the Prometheus Alerting documentation.

Master 

VMs / EC2 Service Unavailable

 debugging in Minutes

— Grab the Ultimate Cheatsheet

(Perfect for DevOps & SREs)

Most-used commands
Real-world configs/examples
Handy troubleshooting shortcuts
Your email is safe with us. No spam, ever.

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

VMs / EC2 Service Unavailable

Cheatsheet

(Perfect for DevOps & SREs)

Most-used commands
Your email is safe thing.

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

MORE ISSUES

Deep Sea Tech Inc. — Made with ❤️ in Bangalore & San Francisco 🏢

Doctor Droid