tau_eval.metrics package

Submodules

tau_eval.metrics.bertscore.compute_bertscore(input_texts: str | list[str], output_texts: str | list[str], model_id: str = 'distilbert-base-uncased') dict[str, list[float]][source]

Computes BERTScore for a list of input and output text pairs.

Parameters:
  • input_texts – A string or a list of input text strings.

  • output_texts – A string or a list of output text strings.

  • model_id – Bert specification, HuggingFace model to use.

Returns:

A dictionary containing BERTScore scores for each input-output pair. The dictionary will contain keys “precision”, “recall”, and “f1”.

tau_eval.metrics.cola.cola_score(text: str, cola_tokenizer: PreTrainedTokenizer, cola_model: PreTrainedModel, device: str = 'cuda') float[source]

Calculates the CoLA score for a single piece of text.

Parameters:
  • text – The text to score.

  • cola_tokenizer – The tokenizer for the CoLA model.

  • cola_model – The CoLA model.

  • device – The device to run the model on (“cuda” or “cpu”).

Returns:

The CoLA score (a float between 0 and 1).

tau_eval.metrics.cola.compute_cola(output_texts: str | list[str], cola_tokenizer: PreTrainedTokenizer, cola_model: PreTrainedModel, device: str = 'cuda') dict[str, list[float]][source]

Computes CoLA scores for a list of input texts.

Parameters:
  • output_texts – A list of text strings.

  • cola_tokenizer – The tokenizer for the CoLA model.

  • cola_model – The CoLA model.

  • device – The device to run the model on (“cuda” or “cpu”).

Returns:

A dictionary containing CoLA scores for each input text.

tau_eval.metrics.cola.load_cola(model_name: str = 'textattack/roberta-base-CoLA', device: str = 'cuda') tuple[PreTrainedModel, PreTrainedTokenizer][source]

Loads the CoLA (Corpus of Linguistic Acceptability) model and tokenizer.

Parameters:
  • model_name – HuggingFace model to load

  • device – The device to load the model onto (“cuda” or “cpu”).

Returns:

A tuple containing the loaded model and tokenizer.

Sentence transformers version of LUAR

tau_eval.metrics.luar.compute_luar(input_texts: str | list[str], output_texts: str | list[str], sim_model: SentenceTransformer) dict[str, list[float]][source]

Computes LUAR scores based on cosine similarity between the embeddings of original and rewritten texts.

Parameters:
  • original – A string or list of original texts.

  • rewrites – A string or list of rewritten texts.

  • sim_model – The loaded SentenceTransformer model.

Returns:

A dictionary containing the LUAR scores for each pair of input texts. The dictionary has the key “luar” with a list of float values.

tau_eval.metrics.luar.load_luar(model_name: str = 'gabrielloiseau/LUAR-MUD-sentence-transformers', device: str = 'cuda') SentenceTransformer[source]

Loads the LUAR (Language Understanding and Reasoning) sentence transformer model.

Parameters:
  • model_name – SentenceTransformers model to load.

  • device – The device to load the model onto (“cuda” or “cpu”).

Returns:

The loaded SentenceTransformer model.

tau_eval.metrics.meteor.compute_meteor(input_texts: str | list[str], output_texts: str | list[str], alpha: float = 0.9, beta: float = 3, gamma: float = 0.5) dict[str, list[float]][source]

Computes METEOR scores for a list of input and output text pairs.

Parameters:
  • input_texts – A list of input text strings.

  • output_texts – A list of output text strings.

  • alpha – Parameter for controlling relative weights of precision and recall.

  • beta – Parameter for controlling shape of penalty function.

  • gamma – Relative weight of fragmentation penalty.

Returns:

A dictionary containing METEOR scores for each input-output pair.

tau_eval.metrics.nli.compute_nli(input_texts: str | list[str], output_texts: str | list[str], nli_tokenizer: PreTrainedTokenizer, nli_model: PreTrainedModel, batch_size: int = 16, device: str = 'cuda', max_length: int = 128) dict[str, list[float]][source]

Computes the probability of entailment between two texts using the NLI model.

Parameters:
  • input_text – The premise text.

  • output_text – The hypothesis text.

  • nli_tokenizer – The tokenizer for the NLI model.

  • nli_model – The NLI model.

Returns:

A dictionary containing the probability of entailment. The dictionary has the key “entailment” with a float value.

tau_eval.metrics.nli.load_nli(model_name: str = 'alisawuffles/roberta-large-wanli', device: str = 'cuda') tuple[PreTrainedTokenizer, PreTrainedModel][source]

Loads the NLI (Natural Language Inference) model and tokenizer.

Parameters:
  • model_name – HuggingFace model to load

  • device – The device to load the model onto (“cuda” or “cpu”).

Returns:

A tuple containing the loaded tokenizer and model.

tau_eval.metrics.perplexity.compute_perplexity(output_texts: str | list[str], model_id: str = 'gpt2') dict[str, list[float]][source]

Computes perplexity scores for a list of output texts.

Parameters:
  • output_texts – A string or list of output text strings.

  • model_id – HuggingFace model to use

Returns:

A dictionary containing perplexity scores for each input text.

tau_eval.metrics.rouge.compute_rouge(input_texts: str | list[str], output_texts: str | list[str]) dict[str, list[float]][source]

Computes ROUGE scores for a list of input and output text pairs.

Parameters:
  • input_texts – A list of input text strings.

  • output_texts – A list of output text strings.

Returns:

A dictionary containing ROUGE scores for each input-output pair.

tau_eval.metrics.sbert.compute_sbert(input_texts: str | list[str], output_texts: str | list[str], sim_model: SentenceTransformer) dict[str, list[float]][source]

Computes the cosine similarity between the embeddings of original and rewritten texts.

Parameters:
  • original – A string or a list of original texts.

  • rewrites – A string or a list of rewritten texts.

  • sim_model – The loaded SentenceTransformer model.

Returns:

A dictionary containing the similarity scores for each input text pair. The dictionary has the key “similarity” with a list of float values.

tau_eval.metrics.sbert.load_sbert(model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', device: str = 'cuda') SentenceTransformer[source]

Loads the sentence similarity model.

Parameters:
  • model_name – SentenceTransformers model to load.

  • device – The device to load the model onto (“cuda” or “cpu”).

Returns:

The loaded SentenceTransformer model.