alexandra_ai_eval

Evaluation of Finetuned Models

Quickstart

To install the package simply write the following command in your favorite terminal:

pip install alexandra-ai-eval

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

evaluate --model-id <model_id> --task <task>

Here model_id is the HuggingFace model ID, which can be found on the HuggingFace Hub, and task is the task you want to benchmark the model on, such as "ner" for named entity recognition. See all options by typing

evaluate --help

The specific model version to use can also be added after the suffix '@':

evaluate --model_id <model_id>@<commit>

It can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

Multiple models and tasks can be specified by just attaching multiple arguments. Here is an example with two models:

evaluate --model_id <model_id1> --model_id <model_id2> --task ner

See all the arguments and options available for the evaluate command by typing

evaluate --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Evaluator class, and call this evaluate object with your favorite models and/or datasets:

>>> from alexandra_ai_eval import Evaluator
>>> evaluator = Evaluator()
>>> evaluator('<model_id>', '<task>')

Contributors

If you feel like this package is missing a crucial feature, if you encounter a bug or if you just want to correct a typo in this readme file, then we urge you to join the community! Have a look at the CONTRIBUTING.md file, where you can check out all the ways you can contribute to this package. :sparkles:

_Your name here?_ :tada:

Maintainers

The following are the core maintainers of the alexandra_ai_eval package:

@saattrupdan (Dan Saattrup Nielsen; saattrupdan@alexandra.dk)
@AJDERS (Anders Jess Pedersen; anders.j.pedersen@alexandra.dk)

Project structure

.
├── .github
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows
│       ├── ci.yaml
│       └── docs.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── makefile
├── poetry.toml
├── pyproject.toml
├── src
│   ├── alexandra_ai_eval
│   │   ├── __init__.py
│   │   ├── automatic_speech_recognition.py
│   │   ├── cli.py
│   │   ├── co2.py
│   │   ├── config.py
│   │   ├── country_codes.py
│   │   ├── enums.py
│   │   ├── evaluator.py
│   │   ├── exceptions.py
│   │   ├── gui.py
│   │   ├── hf_hub_utils.py
│   │   ├── leaderboard_utils.py
│   │   ├── local_hf_utils.py
│   │   ├── local_pytorch_utils.py
│   │   ├── metric_configs.py
│   │   ├── model_adjustment.py
│   │   ├── model_loading.py
│   │   ├── named_entity_recognition.py
│   │   ├── question_answering.py
│   │   ├── scoring.py
│   │   ├── sequence_classification.py
│   │   ├── spacy_utils.py
│   │   ├── task.py
│   │   ├── task_configs.py
│   │   ├── task_factory.py
│   │   └── utils.py
│   └── scripts
│       ├── add_models_to_leaderboard.py
│       ├── fix_dot_env_file.py
│       └── versioning.py
└── tests
    ├── __init__.py
    ├── conftest.py
    ├── test_cli.py
    ├── test_co2.py
    ├── test_config.py
    ├── test_country_codes.py
    ├── test_enums.py
    ├── test_evaluator.py
    ├── test_exceptions.py
    ├── test_gui.py
    ├── test_hf_hub_utils.py
    ├── test_leaderboard_utils.py
    ├── test_local_hf_utils.py
    ├── test_local_pytorch_utils.py
    ├── test_metric_configs.py
    ├── test_model_adjustment.py
    ├── test_model_loading.py
    ├── test_named_entity_recognition.py
    ├── test_question_answering.py
    ├── test_scoring.py
    ├── test_sequence_classification.py
    ├── test_spacy_utils.py
    ├── test_task.py
    ├── test_task_configs.py
    ├── test_task_factory.py
    └── test_utils.py

View Source

 1"""
 2.. include:: ../../README.md
 3"""
 4
 5import logging
 6import os
 7
 8import colorama
 9import pkg_resources
10from termcolor import colored
11
12from .evaluator import Evaluator  # noqa
13from .utils import block_terminal_output
14
15# Fetches the version of the package as defined in pyproject.toml
16__version__ = pkg_resources.get_distribution("alexandra_ai_eval").version
17
18
19# Block unwanted terminal outputs
20block_terminal_output()
21
22
23# Ensure that termcolor also works on Windows
24colorama.init()
25
26
27# Set up logging
28fmt = colored("%(asctime)s", "light_blue") + " ⋅ " + colored("%(message)s", "green")
29logging.basicConfig(level=logging.INFO, format=fmt, datefmt="%Y-%m-%d %H:%M:%S")
30
31
32# Disable parallelisation when tokenizing, as that can lead to errors
33os.environ["TOKENIZERS_PARALLELISM"] = "false"
34
35
36# Enable MPS fallback to CPU
37os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
38
39
40# Tell Windows machines to use UTF-8 encoding
41os.environ["ConEmuDefaultCp"] = "65001"
42os.environ["PYTHONIOENCODING"] = "UTF-8"

fmt = '%(asctime)s ⋅ %(message)s'