Galileo
Search…
⌃K

Natural Language Inference - Keras

​
​Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), is a sequence classification problem, where given two (short, ordered) documents -- premise and hypothesis, the task is to determine the inference relation between them.
Samples are classified into one of the three labels depending on whether a hypothesis is true (entailment), false (contradiction), or undetermined (neutral) given a premise. Here's an example:
Premise: A man inspects the uniform of a figure in some East Asian country.
Hypothesis: The man is sleeping.
Label: contradiction
​
​
Premise: An older and younger man smiling.
Hypothesis: Two men are smiling and laughing at the cats playing on the floor.
Label: neutral
​
​
Premise: A soccer game with multiple males playing.
Hypothesis: Some men are playing a sport.
Label: entailment

Initializing Run

Galileo supports NLI as an extension of Text Classification.
Keras
import dataquality as dq
​
# 🔭🌕 Galileo logging - initialize project/run name
dq.login()
dq.init(task_type="text_classification", project_name="Sample_project", run_name="Sample_run")

Logging the Data Inputs

Log a human-readable version of your dataset. Galileo will join these samples with the model's outputs and present them in the Console.
Note: For NLI you must combine the premise and hypothesis documents for logging. We recommend joining the document text with a separator such as <> to help visualization in the Galileo console.
Keras
# Option 1) Option Logging Examples Manually
labels = ["contradiction", "neutral", "entailment"]
texts = [{
"Premise": "A man inspects the uniform of a figure in some East Asian country.",
"Hypothesis": "The man is sleeping."
},{
"Premise": "An older and younger man smiling.",
"Hypothesis": "Two men are smiling and laughing at the cats playing on the floor."
},{
"Premise": "A soccer game with multiple males playing.",
"Hypothesis": "Some men are playing a sport."
}]
joined_texts = ["A man inspects the uniform of a figure in some East Asian country. <> The man is sleeping.",
"An older and younger man smiling. <> Two men are smiling and laughing at the cats playing on the floor.",
"A soccer game with multiple males playing. <> Some men are playing a sport."]
ids = [001, 002, 003]
# 🔭🌕 Log your dataset to Galileo
dq.log_data_samples(texts=joined_text, ids=ids, labels=labels, split="train")
dq.log_data_samples(texts=joined_text, ids=ids, labels=labels, split="test")
​
# Option 2) Logging a Full Dataset
...
# train_dataset + test_dataset pandas/vaex/huggingface dataframe
# or other iterable containing (at least) the following cols:
# "text", "label", and "id"
​
# 🔭🌕 Log your datasets to Galileo
dq.log_dataset(train_dataset, split="train")
dq.log_dataset(test_dataset, split="test")
​
# 🔭🌕 Galileo logging
# Log the class labels in the order they are outputted by the model
labels_list = ["entailment", "contradiction", "neutral"]
dq.set_labels_for_run(labels_list)

Logging the Model Outputs

Add our logging layers to your Keras model's definition. This works with the functional or sequential syntax for defining models in Keras.
Keras
from dataquality.integrations.keras import DataQualityLoggingLayer
​
model = keras.Sequential(
[
DataQualityLoggingLayer("ids"), # 🌕🔭
...
DataQualityLoggingLayer("embs"), # 🌕🔭
...
DataQualityLoggingLayer("probs"), # 🌕🔭
]
)
​
model.summary()

Training Loop Callback

Make sure to compile your model to run eagerly if it's not the default. Add ids to your model's inputs. And, add our callback to auto-log the epochs and splits.
Keras
from dataquality.integrations.keras import add_ids_to_numpy_arr, DataQualityCallback
​
x_train = add_ids_to_numpy_arr(x_train, train_ids) # 🌕🔭 ids from dataset logging
​
model.compile(..., run_eagerly=True)
​
model.fit(x_train, y_train, ...,
callbacks=[ DataQualityCallback() ]) # 🌕🔭

Uploading Data to Galileo

To finish, simply call dq.finish and your data will be uploaded and processed by the Galileo API server. This may take a few minutes, depending on the size of your dataset.
Keras
dq.finish() # 🔭🌕 This will wait until the run is processed by Galileo