Chunk Utilization Basic

Understand Galileo's Chunk Utilization Plus Metric

Definition: For each chunk retrieved in a RAG pipeline, Chunk Utilization measures the fraction of the text in that chunk that had an impact on the model's response.

Chunk Utilization ranges from 0 to 1. A value of 1 means that the entire chunk affected the response, while a lower value like 0.5 means that the chunk contained some "extraneous" text which did not affect the response.

Chunk Utilization is closely related to Chunk Attribution: Attribution measures whether or not a chunk affected the response, and Utilization measures how much of the chunk text was involved in the effect. Only chunks that were Attributed can have Utilization scores greater than zero.

Calculation: Chunk Utilization Basic is computed using a fine-tuned in-house Galileo evaluation model. The model is a transformer-based encoder that is trained to identify the relevant and utilized information in the provided a query, context, and response. The same model is used to compute Chunk Adherence, Chunk Completeness, Chunk Attribution and Utilization, and a single inference call is used to compute all the Basic metrics at once. The model is trained on carefully curated RAG datasets and optimized to closely align with the RAG Plus metrics.

For each token in the provided context, the model outputs a utilization probability, i.e the probability that this token affected the response. Chunk Utilization Basic is then computed as the fraction of tokens with high utilization probability out of all tokens in the chunk.

We recommend starting with "Basic" and seeing if this covers your needs. If you see the need for higher accuracy or would like explanations for the ratings, you can switch over to Chunk Utilization Plus.

Last updated