doc.overops.com

Jenkins Plugin Guide

The purpose of the Overops Jenkins Plugin is to allow application owners, DevOps engineers, or SREs to determine the quality of their code before promoting the code into production, by providing a Reliability Report that will determine if the build is unstable.
Working with the OverOps/Jenkins plugin is usually done during the QA phase, and in some case on staging. In order to use OverOps with the Jenkins plugin you’ll need to have OverOps installed on the designated environment. Install and configure the Jenkins plugin according to the guide. After looking at OverOps Reliability Report you can drill down into each specific error using OverOps Automated Root Cause analysis screen to solve the issue.

Example:

OverOps Reliability Report

OverOps Reliability Report

Quality Gates

OverOps provides four quality gates that are configurable to mark the build as unstable. Using the configuration screen (shown below), each gate can be configured to meet your specific needs. To skip a gate, leave the values blank or zero as described in configuration .

1. New Error gate

This gate is used to check for any new errors in the build. This is critical to ensure no new errors are introduced into production. If any new errors are detected, the build will be marked as unstable.

2. Resurfaced Error gate

This gate is used to check for any resurfaced errors in the build. This is critical to ensure a previously fixed issued does not find it way back into production. If any resurfaced errors are detected, the build will be marked as unstable.

3. Total Error Volume gate

This gate is used to check the total number of errors for the build. If there is one unique error happening 15K times, then the total error volume would be 15K.

  • Example:
    If the value is set to 10K and after all test(s) are run, there are 15K total events, the build will be marked as unstable.

4. Unique Error Volume gate

This gate is used to check the total number of unique errors for the build. If there is one unique error happening 15K times, then the total unique error volume would be 1.

  • Example:
    If the value is set to 10 and after the regression test(s) run, there are 12 unique events, the build will be marked as unstable.

5. Critical Exception Types gate

This gate is used to identify new critical exceptions (you define) in your application. If any of the critical exceptions occur during your regression test(s), the build will be marked as unstable.

  • Example:
    • Values set in configuration: NullPointerException, IndexOutOfBoundsException, YourCustomException
    • Regression test(s) run and if any of the configured exceptions occurs, the build will be marked as unstable.

6. Increasing Errors gate

This gate is used to compare an Active Time Window (either the current build as defined in the Deployment Name or the time set in the Active Time Window) to a Baseline Time Window. Once the windows are defined, the algorithm looks at several components to identify real issues.

Type of Events:

  1. New - new errors that introduced to the build
  2. Severe New - errors the introduced to the build that are considered severe based on the regression parameters.
  3. Regressions - existing errors that have increased but are not considered severe based on the regression parameters
  4. Severe Regressions - existing errors that have increases - existing errors that have increased that are considered severe based on the regression parameters

Configuration

Application Name

(Optional) Application Name as specified in OverOps

  • If populated, the plugin will filter the data for the specific application in OverOps.
  • If blank, no application filter will be applied in query.

    Example:
    ${JOB_NAME}

Deployment Name

(Optional) Deployment Nameas specified in OverOps or use Jenkins environment variables.
Example:
${BUILD_NUMBER} or ${JOB_NAME}-${BUILD_NUMBER}

  • If populated, the plugin will filter the data for the specific deployment name in OverOps
  • If blank, no deployment filter will be applied in the query.

If using Jenkins environment variables, they must be added to the build’s manifest file for OverOps to use. See this link for details.

Environment ID

The OverOps environment identifier (e.g S4567) to inspect data for this build. If no value is provided here, the value provided in the global Jenkins plug settings will be used.

Regex Filter

A way to filter out specific event types from affecting the outcome of the OverOps Reliability report.

  • Sample list of event types, Uncaught Exception, Caught Exception,|Swallowed Exception, Logged Error, Logged Warning, Timer
  • This filter enables the removal of one or more of these event types from the final results.
  • Example filter expression with pipe separated list- "type":\"s*(Logged Error|Logged Warning|Timer)

Mark Build Unstable

If checked the build will be marked unstable if any of the above gates are met

Show Top Issues

Prints the top X events (as provided by this parameter) with the highest volume of errors detected within the active time window, This is used in conjunction with Max Error Volume and Unique Error Volume to identify the errors which caused a build to fail.

"Quality Gates"

New Error Gate

If any new error is detected, the build will be marked as unstable.

Resurfaced Error Gate

If any resurfaced error is detected, the build will be marked as unstable.

Total Error Volume Gate

Set the max total error volume allowed. If exceeded the build will be marked as unstable.

Unique Error Volume Gate

Set the max unique error volume allowed. If exceeded the build will be marked as unstable.

Critical Exception Types Gate

A comma delimited list of exception types that are deemed as severe regardless of their volume. If any events of any exceptions listed have a count greater than zero, the build will be marked as unstable.
Example:
NullPointerException,IndexOutOfBoundsException

Increasing Errors Gate

Combines the following parameters:

  • Event Volume Threshold
  • Event Rate Threshold
  • Regression Delta
  • Critical Regression Threshold
  • Apply Seasonality

Active Time Window (d - day, h - hour, m - minute)

The time window inspected to search for new issues and regressions. Set to zero to use the Deployment Name (which would be the current build).

  • Example: 1d would be one day active time window.

Baseline Time Window (d - day, h - hour, m - minute)

The time window (in minutes) against which events in the active window are compared to test for regressions. Must be set to a non zero value

  • Example: 14d would be a two week baseline time window.

Event Volume Threshold

The minimal number of times an event of a non-critical type (e.g. uncaught) must take place to be considered severe.

  • If a New event has a count greater than the set value, it will be evaluated as severe and could break the build if its event rate is above the Event Rate Threshold.
  • If an Existing event has a count greater than the set value, it will be evaluated as severe and could break the build if its event rate is above the Event Rate Threshold and the Critical Regression Threshold.
  • If any event has a count less than the set value, it will not be evaluated as severe and will not break the build.

Event Rate Threshold (0-1)

The minimum rate at which event of a non-critical type (e.g. uncaught) must take place to be considered severe. A rate of 0.1 means the events is allowed to take place <= 10% of the time.

  • If a New event has a rate greater than the set value, it will be evaluated as severe and could break the build if its event volume is above the Event Volume Threshold.
  • If an Existing event has a rate greater than the set value, it will be evaluated as severe and could break the build if its event volume is above the Event Volume Threshold and the Critical Regression Threshold.
  • If an event has a rate less than the set value, it will not be evaluated as severe and will not break the build.

Regression Delta (0-1)

The change in percentage between an event's rate in the active time span compared to the baseline to be considered a regression. The active time span is the Active Time Window or the Deployment Name (whichever is populated). A rate of 0.1 means the events is allowed to take place <= 10% of the time.

  • If an Existing event has an error rate delta (active window compared to baseline) greater than the set value, it will be marked as a regression, but will not break the build.

Critical Regression Threshold (0-1)

The change in percentage between an event's rate in the active time span compared to the baseline to be considered a critical regression. The active time span is the Active Time Window or the Deployment Name (whichever is populated). A rate of 0.1 means the events is allowed to take place <= 10% of the time.

  • If an Existing event has an error rate delta (active window compared to baseline) greater than the set value, it will be marked as a severe regression and will break the build.

Apply Seasonality

If peaks have been seen in baseline window, then this would be considered normal and not a regression. Should the plugin identify an equal or matching peak in the baseline time window, or two peaks of greater than 50% of the volume seen in the active window, the event will not be marked as a regression.

Debug Mode

If checked, all query and results will be displayed in the OverOps reliability report. For advanced debugging purposes only.

Examples for Regression Testing

New Issues

combines both Event Volume Threshold and Event Rate Threshold

  • Event Volume Threshold
    used to identify the minimum number of occurrence of an event to be considered in the algorithm. If an error happens only once or twice, it is likely not as important as one that happens 10k times.
  • Event Rate Threshold
    used to identify a minimum error rate minimum of an event to be considered in the algorithm. If an error happens at .0001%, it is likely not as important as one that happens at 2%.
    • Example combining the two thresholds
      • Event Volume Threshold set to 20
      • Event Rate Threshold set to .05 (5%)
      • If an event occurs 10 times, it will be excluded
      • If an event occurs 100 times at a rate of 5%, it will be marked as severe and break the build

Existing Issues

combines Event Volume Threshold, Event Rate Threshold, Regression Delta, Critical Regression Threshold and Apply Seasonality.

  • Event Volume Threshold
    used to identify the minimum number of occurrence of an event to be considered in the algorithm. If an error happens only once or twice, it is likely not as important as one that happens 10k times.
  • Event Rate Threshold
    used to identify a minimum error rate minimum of an event to be considered in the algorithm. If an error happens at .0001%, it is likely not as important as one that happens at 2%.
  • Regression Delta
    used to measure the the delta of the active time window’s error rate to the baseline time window’s error rate. OverOps will measure the difference to determine if the error rate is increasing over time. Any event that has an error rate in the active time window that is above the defined value, will be marked as a regression. These will not break the build.
  • Critical Regression Threshold
    used to measure the the delta of the active time window’s error rate to the baseline time window’s error rate. OverOps will measure the difference to determine if the error rate is increasing over time. Any event that has an error rate in the active time window that is above the defined value, will be marked as a severe regression. These will break the build.
  • Apply Seasonality
    used to identify previous peaks in the baseline time window to rule out previous peaks compared to the active time window. Should the plugin identify an equal or matching peak in the baseline time window, or two peaks of greater than 50% of the volume seen in the active window, the event will not be marked as a regression.
    Example combining all thresholds:
    • Event Volume Threshold set to 20
    • Event Rate Threshold set to .05 (5%)
    • Regression Delta set to .1 (10%)
    • Critical Regression Threshold set to .2 (20%)
    • Apply Seasonality checked

Stable example: If an event occurs more than 20 times at an error rate of 10% in the active time window, it will be compared to the baseline time window. When compared to the baseline time window, if the event was occurring at 9% versus the current 10%, that would be a 11% increase. If no previous spikes the in the baseline window are detected the event would mark the event as a regression.
Unstable example: If an event occurs more than 20 times at an error rate of 12% in the active time window, it will be compared to the baseline time window. When compared to the baseline time window, if the event was occurring at 9% versus the current 12%, that would be a 33% increase. If no previous spikes the in the baseline window are detected the event would mark the event as a severe regression.

Download

Jenkins Setup

Jenkins