Alerts

What are Galileo Alerts?

Galileo Alerts are your starting point in your data inspection journey. After you complete a run, Galileo surfaces a summary of issues it has found in your dataset in the Alerts section. Each Alert represents a problematic pocket of data that Galileo has identified. Clicking on an alert will filter the dataset to this problematic subset of data and allow you to fix them.

Alerts will also educate you on why this subset of your data might be causing issues and tell you how you can fix this. You can think of Alerts as a partner Data Scientist working with you to find and fix your data.

Alerts that we support today

We support a growing list of alerts, and are open to feature requests! Some of the highlights include:

Likely Mislabeled

Leverages our Likely Mislabeled algorithm to surface the samples we believe were incorrectly labeled by your annotators

Misclassified

Surfaces mismatches between your data and the model's prediction

Hard For The Model

Exposes the samples we believe we hard for your model to learn. These are samples with high Data Error Potential scores

Low Performing Classes

Classes that performed significantly worse than average (e.g. their F1 score was 1 std below the mean F1 score)

Low Performing Metadata

Slices the data by different metadata values and shows any subsets of data that perform significantly worse than average

High Class Imbalance is Impacting Performance

Exposes classes that have a low relative class distribution in the training set and perform poorly in the validation/test set

High Class Overlap

Surfaces classes our Class Overlap algorithm detected as being confused by one another by the model

Out Of Coverage

Surfaces samples in your validation/test split that are fundamentally different from samples contained in your training set

PII

Identifies any Personal Identifiable Information in your data

Non-Primary Language

Exposes samples that are not in the primary language of your dataset

Semantic Cluster with High DEP

Surfaces semantic clusters of data found through our Clustering algorithm that have high Data Error Potential

High Uncertainty Samples

Surfaces samples that exist on the model's decision boundary

[Inference Only] Data Drift

The data your model sees in this inference run has drifted from what it was trained on

[Named Entity Recognition Only] High Frequency Problematic Word

Shows you words that the models struggles with (i.e. have high Data Error Potential) more than 50% of the time

[Named Entity Recognition or Semantic Segmentation Only] False Positives

Spans or Segments predicted by the model for which the Ground Truth has no annotation

[Named Entity Recognition Only] False Negatives

Surfaces spans for which the Ground Truth had an annotation but the model didn't predict any

[Named Entity Recognition Only] Shifted Spans

Surfaces spans where the beginning and end locations are not aligned in the Ground Truth and Prediction

[Object Detection Only] Background Confusion Errors

Surfaces predictions that don’t overlap significantly with any Ground Truth

[Object Detection Only] Localization Mistakes

Surfaces detected objects that overlap poorly with their corresponding Ground Truth

[Object Detection Only] Missed Predictions

Surfaces annotations the model failed to make predictions for

[Object Detection Only] Misclassified Predictions

Surfaces objects that were assigned a different label than their associated Ground Truths

[Object Detection Only]

Surfaces instances where multiple duplicate predictions were being made for the same object

How to request a new alert?

Have a great idea for a new alert? We'd love to hear about it! File any requests under your Profile Avatar Menu > "Bug Report or Feature Request", and we'll immediately get your request 🔭

Last updated