alexandra_ai_eval
Evaluation of Finetuned Models
Quickstart
To install the package simply write the following command in your favorite terminal:
pip install alexandra-ai-eval
Benchmarking from the Command Line
The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:
evaluate --model-id <model_id> --task <task>
Here model_id
is the HuggingFace model ID, which can be found on the HuggingFace
Hub, and task
is the task you want to benchmark the
model on, such as "ner" for named entity recognition. See all options by typing
evaluate --help
The specific model version to use can also be added after the suffix '@':
evaluate --model_id <model_id>@<commit>
It can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.
Multiple models and tasks can be specified by just attaching multiple arguments. Here is an example with two models:
evaluate --model_id <model_id1> --model_id <model_id2> --task ner
See all the arguments and options available for the evaluate
command by typing
evaluate --help
Benchmarking from a Script
In a script, the syntax is similar to the command line interface. You simply initialise
an object of the Evaluator
class, and call this evaluate object with your favorite
models and/or datasets:
>>> from alexandra_ai_eval import Evaluator
>>> evaluator = Evaluator()
>>> evaluator('<model_id>', '<task>')
Contributors
If you feel like this package is missing a crucial feature, if you encounter a bug or if you just want to correct a typo in this readme file, then we urge you to join the community! Have a look at the CONTRIBUTING.md file, where you can check out all the ways you can contribute to this package. :sparkles:
- _Your name here?_ :tada:
Maintainers
The following are the core maintainers of the alexandra_ai_eval
package:
- @saattrupdan (Dan Saattrup Nielsen; saattrupdan@alexandra.dk)
- @AJDERS (Anders Jess Pedersen; anders.j.pedersen@alexandra.dk)
Project structure
.
├── .github
│ ├── ISSUE_TEMPLATE
│ │ ├── bug_report.md
│ │ └── feature_request.md
│ └── workflows
│ ├── ci.yaml
│ └── docs.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── makefile
├── poetry.toml
├── pyproject.toml
├── src
│ ├── alexandra_ai_eval
│ │ ├── __init__.py
│ │ ├── automatic_speech_recognition.py
│ │ ├── cli.py
│ │ ├── co2.py
│ │ ├── config.py
│ │ ├── country_codes.py
│ │ ├── enums.py
│ │ ├── evaluator.py
│ │ ├── exceptions.py
│ │ ├── gui.py
│ │ ├── hf_hub_utils.py
│ │ ├── leaderboard_utils.py
│ │ ├── local_hf_utils.py
│ │ ├── local_pytorch_utils.py
│ │ ├── metric_configs.py
│ │ ├── model_adjustment.py
│ │ ├── model_loading.py
│ │ ├── named_entity_recognition.py
│ │ ├── question_answering.py
│ │ ├── scoring.py
│ │ ├── sequence_classification.py
│ │ ├── spacy_utils.py
│ │ ├── task.py
│ │ ├── task_configs.py
│ │ ├── task_factory.py
│ │ └── utils.py
│ └── scripts
│ ├── add_models_to_leaderboard.py
│ ├── fix_dot_env_file.py
│ └── versioning.py
└── tests
├── __init__.py
├── conftest.py
├── test_cli.py
├── test_co2.py
├── test_config.py
├── test_country_codes.py
├── test_enums.py
├── test_evaluator.py
├── test_exceptions.py
├── test_gui.py
├── test_hf_hub_utils.py
├── test_leaderboard_utils.py
├── test_local_hf_utils.py
├── test_local_pytorch_utils.py
├── test_metric_configs.py
├── test_model_adjustment.py
├── test_model_loading.py
├── test_named_entity_recognition.py
├── test_question_answering.py
├── test_scoring.py
├── test_sequence_classification.py
├── test_spacy_utils.py
├── test_task.py
├── test_task_configs.py
├── test_task_factory.py
└── test_utils.py
1""" 2.. include:: ../../README.md 3""" 4 5import logging 6import os 7 8import colorama 9import pkg_resources 10from termcolor import colored 11 12from .evaluator import Evaluator # noqa 13from .utils import block_terminal_output 14 15# Fetches the version of the package as defined in pyproject.toml 16__version__ = pkg_resources.get_distribution("alexandra_ai_eval").version 17 18 19# Block unwanted terminal outputs 20block_terminal_output() 21 22 23# Ensure that termcolor also works on Windows 24colorama.init() 25 26 27# Set up logging 28fmt = colored("%(asctime)s", "light_blue") + " ⋅ " + colored("%(message)s", "green") 29logging.basicConfig(level=logging.INFO, format=fmt, datefmt="%Y-%m-%d %H:%M:%S") 30 31 32# Disable parallelisation when tokenizing, as that can lead to errors 33os.environ["TOKENIZERS_PARALLELISM"] = "false" 34 35 36# Enable MPS fallback to CPU 37os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" 38 39 40# Tell Windows machines to use UTF-8 encoding 41os.environ["ConEmuDefaultCp"] = "65001" 42os.environ["PYTHONIOENCODING"] = "UTF-8"