Decode Developer Lingo!

Ever stumbled upon phrases like 'API', 'latency', or 'Kafka', and felt like you've entered a sci-fi flick? Fret not! We've got you covered.

Find it too difficult to debug production issues?

Connect DrDroid to your stack and get answers faster

Auto-creates knowledge graph

DrDroid builds a comprehensive memory and knowledge graph about your company's infrastructure

Integrates with 80+ tools

Works with your existing monitoring tools, APMs, logs, and cloud platforms out of the box

Book Demo Try locally

ACID

ACID is a standard compliance requirement in the database ecosystem. It represents the following four principles: Atomicity, Consistency, Isolation, a...

Airflow

Airflow is an open-source platform used for orchestrating complex workflows and data pipelines. It allows users to define, schedule, and monitor workf...

Alert Fatigue

Alert fatigue in software engineering is a state of exhaustion that occurs when a large number of alerts makes the individuals responsible for address...

Anomaly Detection

Anomaly detection is the process of identifying unexpected behaviours within a dataset by leveraging Machine Learning and statistical analysis....

Anonymization

Anonymization is the process of hiding sensitive information from log entries to protect privacy. It involves removing or altering data in a way that ...

Application Performance Monitoring

APM is a popular category within the realm of software monitoring and observability....

BigQuery

BigQuery is a cloud-based data warehouse and data analytics platform, by Google Cloud. While it is a data platform similar to Snowflake or RedShift, o...

Custom metric

Custom metrics are specific performance indicators that teams define to monitor aspects of their applications or businesses that are unique to their n...

Data Archival

Data archival is the process of storing data in a structured manner for long-term retention, typically for compliance, reference, or backup purposes. ...

Data lake

A data lake is a large, unstructured pool of data where there is no need to structure the data upfront. Data lakes are effective for storing massive a...

Data Lakehouse

A data lakehouse is a relatively modern approach to managing and processing data. It combines the best features of data warehouses and data lakes by i...

Data Mart

A data mart is a subset of a data warehouse, that contains information that is needed specific to the requirements of the business....

Data Retention

Data retention refers to the policies governing the management of data. A data retention policy consists of rules that determine the duration for whic...

Data warehouse

A data warehouse is a system commonly used to manage, store and analyse data at scale. In a data warehouse, there is a centralised repository of data,...

Distributed Tracing

Distributed tracing is a monitoring process that tracks the progress of application requests as they traverse through distributed systems or microserv...

ELT

ELT is a practice of data management in data warehousing where data is stored before being processed/transformed for usage. ELT stands for Extract, Lo...

Error budget

An error budget is a concept for defining acceptable reliability limits and managing it’s variance. It represents the acceptable amount of errors or d...

Error rate

Error rate is the frequency of errors that occur within an API or an application. It provides insights about the reliability and quality of the applic...

ETL

It’s a method of data management in data warehousing, where ETL stands for Extract, Transform, and Load....

Incident Response

Incident response is a structured approach to handling and managing the operations after an incident or some disruption to the normal functioning of t...

Ingestion

Data Ingestion refers to the process of collecting and importing data from various sources into a system or storage infrastructure for further process...

Instrumentation

Instrumentation is a crucial aspect of system monitoring and optimization. It involves equipping systems with the necessary tools and capabilities to ...

Jenkins

Jenkins is a Java-based open-source automation server renowned for its vast plugin ecosystem, enabling continuous integration and delivery (CI/CD). It...

Kafka

Apache Kafka is an open-source distributed event streaming platform, originally developed at LinkedIn and open-sourced in early 2011. It's designed fo...

Latency

Latency refers to the delay or waiting time (time lag) that happens when a request or operation is initiated within an application and when a response...

Log Enrichment

Log enrichment adds more details to the existing logs. Enriched logs provide a more comprehensive view of events....

Log parsing

Log parsing is a process of extracting the log data that is generated by systems or applications, and finding valuable insights from it, which involve...

Log Rotation

Log rotation is a method used to manage the size of log files on servers. When a log file reaches a specified limit, usually based on its size or numb...

Logs

Logs are the records of what happened within the application that are either auto-generated or added by a software developer....

Metrics

They are numerical values quantifying a certain behavioral aspect of your software, which is saved in a time series storage for seeing over a period o...

MTTR

MTTR, which stands for Mean Time to Repair, is a crucial metric in system management. It represents the average duration needed to diagnose, troublesh...

Observability

Observability is an industry term, coined to the practice of adding enough telemetry data (metadata) in an application such that an “unknown issue” (o...

OLAP

OLAP which stands for Online Analytical Processing is a type of analytical database. OLAPs are designed for fast aggregation and analytics on data, hi...

On-call Rotation

On-call rotation is a practice commonly used in Engineering/ DevOps teams, where a developer takes turns to be “on-call” during non-business hours, su...

OpenTelemetry

OpenTelemetry is an industry wide protocol created, to denote the standard template of adding distributed traces (primarily) and metrics / logs into a...

Profiling

Profiling is the process of collecting performance data to analyze and make changes in the application. Its purpose is to improve the overall performa...

Redshift

Redshift is a cloud-based data warehouse solution by the Amazon AWS team. It allows organizations to run on cloud analytics with high speed....

Resource Utilization

Resource utilization refers to the management of the available tech resources in an engineering team. Such as computing resources, human resources, te...

Sandbox

A sandbox environment is a controlled and isolated space to test a specific product....

SLA (Service Level Agreement)

SLA, or Service Level Agreement is the agreement that any service provider makes with their users about measurable metrics like uptime and responsiven...

SLI (Service Level Indicators)

SLI, or Service Level Indicators (SLIs) are the quantifiable metrics that are used to measure the performance or quality of a service or an applicatio...

SLOs (Service Level Objectives)

SLOs or Service Level Objectives are part of SLA agreements where promises are made on specific metrics within an SLA agreement. SLO is made by the se...

Snowflake

Snowflake is a cloud-based data warehousing platform that provides fully managed and scalable solutions for storing and analysing data. It enables dat...

Streaming

Streaming is a process of sending/receiving data one-by-one in a steady manner instead of it being stored in a single place forever. This allows for r...

TCO (total cost of ownership)

Any project that picked, is much more beyond a tool. It involves people, process and tooling — a one-time effort initially as well as a recurring effo...

Test Driven Development

Test Driven Development or TDD is a software development process in which all software requirements are broken down into test cases, before building o...

Threat Detection

Threat detection is the process of identifying potential security threats or malicious activities within the organization’s resources. The primary goa...

Triaging

Triaging is a process of doing a preliminary analysis / assessment in any situation toassess their urgency and impact on system reliability and user e...

Uptime

Uptime is a measure of the duration during which a service or application operates without any interruption or downtime. It is a direct indicator of t...