uberAgent

Reducing the Data Volume

Since Splunk is licensed by daily indexed data volume, it is in every customer’s interest to keep the data volume generated by uberAgent as small as possible. uberAgent comes prepared for that by offering two default configurations and many ways for fine-tuning.

Choose Between Detail and Data Volume

Start by choosing either the default configuration, which provides full detail and high resolution or the configuration optimized for data volume, which differs from the default in the following ways:

  • Process & application performance: information is collected only on the 10-15 most active processes in terms of CPU, RAM, disk, and network utilization. The processes included in the data collection are determined dynamically for every collection interval. One could say uberAgent "follows" the active processes.
  • Collection interval of 120 s instead of 30 s.

See this document for instructions on how to switch between the two configurations.

Take Stock

Before modifying the configuration, find out how much data is generated per endpoint by the default settings. The easiest way to do that is to have uberAgent tell you in the Data Volume dashboard.

Reduce the Data Volume per Endpoint

Once you know the currently generated data volume, you should have an idea by how much it needs to be reduced. Start with the endpoint configuration.

Through uberAgent’s configuration you can do three things to reduce the data volume:

Reduce the Frequency

By default, uberAgent collects performance data every 30 seconds. You can cut the volume nearly in half by changing the frequency to one minute (any other value is possible, too, of course).

You can fine-tune the data collection by adding additional timers. The data collection frequency can be set per timer. Move each metric to the timer with the desired frequency to optimally balance accuracy and data volume. While optimizing, focus on those metrics that generate the highest data volume (the Data Volume dashboard shows you which those are).

Remove Metrics

By default, all metrics are enabled. If you do not need the information collected by some of them, turn them off by removing them from the configuration.

Special Treatment for ProcessDetail

As you can see in the Data Volume dashboard, the ProcessDetail metric generates by far the highest data volume. Consider replacing ProcessDetailFull with ProcessDetailTop5. Once you do that, uberAgent only collects performance data for processes with the highest activity. This may lead to a dramatic reduction in data volume.

ProcessDetailTop5

By configuring ProcessDetailTop5, only the top 5 ProcessDetail metrics are collected based on each of the following criteria:

  • Process CPU usage
  • Count of process I/O read/write operations
  • Amount of process I/O read/write operations data volume
  • Process consumed RAM
  • Process generated network traffic

Event Data Filtering

Event Data Filtering is a powerful feature that replaces the previous allowlist and denylist options. This feature allows defining rules with conditions that are evaluated for every event before it is sent to the backend. With each matching rule, a pre-defined action is executed that controls whether the event is sent to the backend or not. Additionally, certain fields can be cleared before sending the event.

For detailed guidance, refer to the Event Data Filtering documentation.

Reduce the Number of Endpoints

If the data volume is still too high after optimizing the configuration as recommended above you need to reduce the number of endpoints that send data to Splunk. You can simply do that by stopping and disabling the uberAgent system service on select endpoints.

Reducing the Data Volume