tau_eval.tasks.deidentification module
- class tau_eval.tasks.deidentification.DeIdentification(dataset: datasets.arrow_dataset.Dataset = None, name: str = '', s1: str = 'text', s2: str = '', max_rows: int = None)[source]
Bases:
CustomTask- dataset: Dataset = None
- max_rows: int = None
- name: str = ''
- tau_eval.tasks.deidentification.dataset_task_preprocessing(dataset_name: str, dataset_size: int = 2500) Dataset[source]
- tau_eval.tasks.deidentification.extract_non_o_words(tokens, tags)[source]
Extract words associated with non-“O” tags by merging tokens.
- Parameters:
tokens (list of str) – The list of tokens.
tags (list of str) – The list of tags corresponding to each token.
- Returns:
- A dictionary where keys are the tags (e.g., “B-NAME”, “B-EMAIL”)
and values are the merged words for each tag.
- Return type:
dict