Created by: Samoed
Code Quality
-
Code Formatted: Format the code using make lintto maintain consistent style.
Documentation
-
Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.
Testing
-
New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage. -
Tests Passed: Run tests locally using make testormake test-with-coverageto ensure no existing functionality is broken.
Adding datasets checklist
Reason for dataset addition: ...
-
I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name}command.-
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 -
intfloat/multilingual-e5-small
-
-
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores). -
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform() -
I have filled out the metadata object in the dataset file (find documentation on it here). -
Run tests locally to make sure nothing is broken using make test. -
Run the formatter to format the code using make lint.
Adding a model checklist
-
I have filled out the ModelMeta object to the extent possible -
I have ensured that my model can be loaded using -
mteb.get_model(model_name, revision)and -
mteb.get_model_meta(model_name, revision)
-
-
I have tested the implementation works on a representative set of tasks.