The OpenTelemetry Collector is a vendor-agnostic service that collects, processes, and exports telemetry data (metrics, logs, and traces) to various backends. It is designed to handle large volumes of telemetry data efficiently, providing a centralized point for data collection and processing.
When using OpenTelemetry Collector, you might encounter a situation where logs are being dropped. This symptom is typically observed when the volume of incoming logs exceeds the processing capacity of the collector, leading to data loss.
The primary reason for log dropping in high-volume scenarios is that the collector's capacity is insufficient to handle the incoming data rate. This can be due to inadequate resource allocation, inefficient pipeline configuration, or lack of log sampling mechanisms.
Resource constraints such as CPU, memory, or network bandwidth can limit the collector's ability to process logs efficiently.
An improperly configured processing pipeline can lead to bottlenecks, causing logs to be dropped.
To handle higher log volumes, consider scaling up the resources allocated to the OpenTelemetry Collector. This can be achieved by increasing CPU and memory limits in your deployment configuration. For example, in a Kubernetes environment, you can modify the resource requests and limits in your Deployment
or StatefulSet
:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
Review and optimize the processing pipeline to ensure efficient handling of logs. This includes:
Refer to the OpenTelemetry Collector Configuration documentation for detailed guidance on configuring pipelines.
Implement log sampling to reduce the volume of logs processed by the collector. Sampling can be configured in the pipeline to only process a subset of logs, thus reducing the load. Here's an example of configuring a sampling processor:
processors:
tail_sampling:
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 10
policies:
- name: always_sample
type: always_sample
Continuously monitor the performance of your OpenTelemetry Collector using metrics and logs. Adjust the configuration as needed to ensure optimal performance. Utilize tools like Prometheus for monitoring and Grafana for visualization.
By increasing the collector's capacity, optimizing the processing pipeline, and implementing log sampling, you can effectively address the issue of high volume log dropping in OpenTelemetry Collector. Regular monitoring and adjustments will help maintain the efficiency and reliability of your telemetry data collection system.
Let Dr. Droid create custom investigation plans for your infrastructure.
Book Demo