tau_eval.visualization module
- tau_eval.visualization.compute_correlation(data, metric1_name: str, metric2_name: str, method: str = 'kendall', aggregate_across_tasks: bool = False, task_name: str | None = None) dict[source]
Computes correlation between two metrics for models.
- Parameters:
data – Data structure containing metrics for models across tasks
metric1_name – Name of the first metric
metric2_name – Name of the second metric
method – Correlation method (‘kendall’, ‘pearson’, ‘spearman’)
aggregate_across_tasks – If True, compute single correlation across all tasks
task_name – Specific task name to compute correlation (ignored if aggregate_across_tasks=True)
- Returns:
(correlation, p_value)} for all tasks
- Return type:
dict of {task_name
- tau_eval.visualization.get_all_dataset_names(data)[source]
Extracts all dataset names from the Experiment data.
- tau_eval.visualization.get_all_model_method_names(data)[source]
Extract unique model names across all datasets
- tau_eval.visualization.get_all_numeric_metric_names(data)[source]
Extracts all unique numeric metric names present in model/method results or in ‘original_metrics’.
- tau_eval.visualization.plot_all_metrics_for_model_on_dataset(data, dataset_name, model_name)[source]
Plots all numeric metrics for a specific model on a specific dataset.
- Parameters:
data – The parsed JSON data from Experiment.
dataset_name – The name of the dataset.
model_name – The name of the model (e.g., ‘google/gemini-flash-1.5-8b’). Use “Original Model” to see metrics from original texts.
- tau_eval.visualization.plot_metric_comparison_across_datasets(data, metric_name, specific_models=None, show_original=True)[source]
Compares a specific metric for selected models across all datasets.
- Parameters:
data – The parsed JSON data from Experiment.
metric_name – The metric to compare (e.g., ‘bertscore_f1’, ‘test_accuracy’).
specific_models (optional) – A list of model names to include. If None, includes all found models.
show_original – If True, includes ‘Original Model’ performance for the metric.
- tau_eval.visualization.plot_metric_distribution(data, metric_name, chart_type='hist')[source]
Plots the distribution of a specific metric across all models/methods and datasets.
- Parameters:
data – The parsed JSON data from Experiment.
metric_name – The metric whose distribution is to be plotted.
chart_type – Type of chart: ‘hist’ for histogram, ‘box’ for box plot.
- tau_eval.visualization.plot_radar_model_comparison(data, metric_name, model_list, ordered_dataset_keys)[source]
Generates a single radar plot to compare specified model series across datasets for a given metric.
- Parameters:
data – The parsed JSON data from Experiment.
metric_name – The metric to plot (e.g., ‘sbert’, ‘test_f1’).
model_list – A list of model/method names to include.
ordered_dataset_keys – List of dataset names, determining the order of axes on the radar.
- tau_eval.visualization.plot_trade_off_metric(data, x_metric_name, y_metric_name, task_list=None, model_list=None)[source]
Creates a scatter plot of a task performance metric vs. an anonymization/utility metric, with a legend for different model/method groups.
- Parameters:
data – The parsed JSON data from Experiment.
x_metric_name – e.g., ‘test_f1’, ‘test_accuracy’.
y_metric_name – e.g., ‘bertscore_f1’, ‘rougeL’, ‘sbert’.
model_list (optional) – Filter for specific models.