Handling Log Volume and Index Management in Elasticsearch
Logs are a critical component in managing and debugging applications. However, as an application scales, log volumes can grow exponentially, straining storage resources and degrading query performance in Elasticsearch if not managed effectively. This is where strategic index management steps in.
By implementing proper index rotation, setting up Index Lifecycle Management (ILM), reducing log verbosity in production, and leveraging tools like Kafka or Filebeat for log aggregation, you can make your Elasticsearch environment efficient and scalable. This blog will walk you through these best practices, tailored for Spring Boot applications.
Table of Contents
- Why Manage Log Volume in Elasticsearch?
- Rotating Indices Daily or Weekly
- Setting Up Index Lifecycle Management (ILM)
- Reducing Log Verbosity in Production Environments
- Aggregating Logs via Kafka or Filebeat
- Summary
Why Manage Log Volume in Elasticsearch?
Elasticsearch is powerful, but poorly managed log volumes can disrupt performance. Here’s why effective log and index management is crucial:
- Optimized Search Performance: Large indices with overgrown logs can slow down search queries due to increased overhead.
- Cost Efficiency: Smaller, rotated indices with defined lifecycles prevent storage overuse, lowering infrastructure costs.
- Data Retention Compliance: Different environments (e.g., dev, prod) might need specific retention periods. Managing indices ensures you comply with data policies.
- Scalability: With effective log aggregation and reduced verbosity, Elasticsearch can scale to ingest even massive amounts of data from distributed microservices.
Managing your logs doesn’t just improve performance but also ensures seamless day-to-day operations.
Rotating Indices Daily or Weekly
Index rotation involves creating smaller time-based indices (e.g., daily or weekly) rather than a single monolithic one. This reduces the resource strain on Elasticsearch while improving query speeds for time-specific searches.
Step 1. Dynamic Index Creation in Spring Boot
To enable time-based rotation, configure indices dynamically using Spring Boot’s logging frameworks.
Example logback-spring.xml
Configuration for Daily Indices:
<configuration>
<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<destination>http://localhost:5044</destination>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<customFields>{"application":"my-app"}</customFields>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="LOGSTASH"/>
</root>
</configuration>
Update the Logstash configuration to create dynamic daily indices:
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "spring-logs-%{+yyyy.MM.dd}"
}
}
With this setup, each day a new index is created (spring-logs-2025.06.13
, spring-logs-2025.06.14
, etc.).
Step 2. When to Choose Weekly vs. Daily Rotation
- Daily Rotation: Suitable for high-volume applications with frequent log activity (e.g., 1M+ logs/day). Smaller daily indices reduce resource overhead.
- Weekly Rotation: Ideal for moderate-volume applications. Weekly indices keep related logs together while reducing the number of indices.
Rotating indices also helps in minimizing noise during debugging, particularly for trace-based searches.
Setting Up Index Lifecycle Management (ILM)
Index Lifecycle Management (ILM) automates index management tasks like hot-warm-cold tier transitions, archiving, and deletions, ensuring logs don’t overwhelm your cluster.
Step 1. Create ILM Policy in Elasticsearch
Define a lifecycle policy to manage log indices.
Example Policy for Retention and Deletion:
PUT _ilm/policy/log-retention-policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "7d",
"max_size": "50gb"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
This policy:
- Rolls over indices after 7 days or 50GB, whichever comes first.
- Deletes indices older than 30 days.
Step 2. Apply the Policy to an Index
Attach the ILM policy to your index template:
PUT _index_template/spring-logs-template
{
"index_patterns": ["spring-logs-*"],
"data_stream": { },
"template": {
"settings": {
"index.lifecycle.name": "log-retention-policy"
}
}
}
Elasticsearch now manages your indices automatically, reducing manual effort.
Benefits of ILM:
- Improved Cluster Health: Prevents large outdated logs from exhausting resources.
- Simplified Retention: Clean-up happens automatically, saving time and ensuring compliance.
Reducing Log Verbosity in Production Environments
Excessive logging in production can quickly fill storage, generate noise, and overwhelm index resources. Adjusting log verbosity levels ensures that critical data is captured without unnecessary overhead.
Step 1. Update Spring Boot Logging Configurations
Reduce verbosity in production by changing the logging level in application-prod.properties
:
logging.level.root=WARN
logging.level.com.example.myapp=INFO
Step 2. Filter Out Unnecessary Log Fields
Exclude verbose fields like debug payloads or thread info in production:
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp />
<loggerName />
<message />
<customFields>{"environment":"production"}</customFields>
</providers>
</encoder>
Step 3. Enable Sampling for High-Volume Endpoints
Use Spring Sleuth to log only a percentage of trace data:
spring.sleuth.sampler.probability=0.1
This captures just 10% of the logs, effectively lowering production logging volume.
Step 4. Periodically Validate Log Relevance
Regularly review which logs are essential to keep in production. Adjust log levels or exclude extraneous loggers accordingly.
Reduced verbosity not only boosts performance but also ensures actionable data is prioritized in Elasticsearch.
Aggregating Logs via Kafka or Filebeat
Tools like Kafka and Filebeat are indispensable for managing distributed logging pipelines, especially when handling high log volumes.
Choosing Between Kafka and Filebeat:
- Kafka:
- Strengths: Handles massive log throughput with durability and buffering.
- Use Case: Scenarios where logs require preprocessing or are sourced from multiple clusters.
- Filebeat:
- Strengths: Lightweight, easy to configure, and integrates seamlessly with Elasticsearch.
- Use Case: Smaller setups where logs can be directly shipped to Elasticsearch or Logstash.
Step 1. Aggregating Logs with Filebeat
Install Filebeat and configure it to collect logs from Spring Boot applications:
filebeat.inputs:
- type: log
paths:
- /var/log/myapp/*.log
processors:
- add_fields:
fields:
application_name: my-spring-app
output.elasticsearch:
hosts: ["http://localhost:9200"]
index: "spring-logs-%{+yyyy.MM.dd}"
Step 2. Aggregating Logs with Kafka
* Configure Spring Boot to stream logs to Kafka:
This dependency allows Spring Boot applications to easily produce and consume messages from Apache Kafka, using Spring’s programming model and configuration.
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
* Update logback-spring.xml
to include Kafka appender:
<configuration>
<appender name="KAFKA" class="ch.qos.logback.classic.net.KafkaAppender">
<topic>application-logs</topic>
<!-- Kafka producer config -->
<producerConfig>
bootstrap.servers=localhost:9092
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
</producerConfig>
<!-- Encoder for JSON logs -->
<encoder class="net.logstash.logback.encoder.LogstashEncoder" />
</appender>
<!-- Root logger -->
<root level="INFO">
<appender-ref ref="KAFKA" />
</root>
</configuration>
* Use Logstash to pull logs from Kafka and push to Elasticsearch:
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["application-logs"]
group_id => "logstash-consumer"
codec => "json"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "application-logs-%{+YYYY.MM.dd}"
}
}
Log aggregation simplifies distributed debugging and centralizes logs efficiently for query and visualization.
Summary
Managing log volume and indices in Elasticsearch is critical for maintaining a high-performing observability stack. Here’s what we covered:
- Index Rotation: Use daily or weekly indices for better performance and storage management.
- ILM Setup: Automate index deletion and rollover policies for scalability and cost efficiency.
- Reduced Log Verbosity: Optimize production logs by filtering unnecessary entries and enabling sampling.
- Log Aggregation: Leverage Kafka or Filebeat to centralize and streamline log ingestion.
Implement these strategies today to ensure your Elasticsearch environment remains robust, scalable, and ready to handle the growing log demands of your application!