Add `text-ranking` pipeline tag #1267

tomaarsen · 2025-03-11T10:45:48Z

Hello!

Pull Request overview

Add text-ranking pipeline tag
Slightly update the docs for sentence-similarity

Details

This PR adds a text-ranking pipeline tag for reranker models like:

E.g.:

from sentence_transformers import CrossEncoder

# 1. Load a pre-trained CrossEncoder model
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")

# 2a. Either: predict scores for a pair of sentences
scores = model.predict([
    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
    ("How many people live in Berlin?", "Berlin is well known for its museums."),
])
# => array([ 8.607138 , -4.3200774], dtype=float32)

# 2b. Or: rank a list of passages for a query
query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "Berlin is well known for its museums.",
    "In 2014, the city state Berlin had 37,368 live births (+6.6%), a record number since 1991.",
    "The urban area of Berlin comprised about 4.1 million people in 2014, making it the seventh most populous urban area in the European Union.",
    "The city of Paris had a population of 2,165,423 people within its administrative city limits as of January 1, 2019",
    "An estimated 300,000-420,000 Muslims reside in Berlin, making up about 8-11 percent of the population.",
    "Berlin is subdivided into 12 boroughs or districts (Bezirke).",
    "In 2015, the total labour force in Berlin was 1.85 million.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
    "Berlin has a yearly total of about 135 million day visitors, which puts it in third place among the most-visited city destinations in the European Union.",
]
ranks = model.rank(query, passages)

# Print the scores
print("Query:", query)
for rank in ranks:
    print(f"{rank['score']:.2f}\t{passages[rank['corpus_id']]}")
"""
Query: How many people live in Berlin?
8.92    The urban area of Berlin comprised about 4.1 million people in 2014, making it the seventh most populous urban area in the European Union.
8.61    Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.
8.24    An estimated 300,000-420,000 Muslims reside in Berlin, making up about 8-11 percent of the population.
7.60    In 2014, the city state Berlin had 37,368 live births (+6.6%), a record number since 1991.
6.35    In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.
5.42    Berlin has a yearly total of about 135 million day visitors, which puts it in third place among the most-visited city destinations in the European Union.
3.45    In 2015, the total labour force in Berlin was 1.85 million.
0.33    Berlin is subdivided into 12 boroughs or districts (Bezirke).
-4.24   The city of Paris had a population of 2,165,423 people within its administrative city limits as of January 1, 2019
-4.32   Berlin is well known for its museums.
"""

I haven't created a spec for the API here, as I think that's better left to those who've created other specs. I think we might already have a Sentence Ranking API that we might not want to break.

This is slightly blocking the next Sentence Transformers release, as I'd like to know whether I can tag CrossEncoder (a.k.a. reranker) models as text-ranking.
Related to this PR: https://github.com/huggingface-internal/moon-landing/pull/12877 (private repo).

Tom Aarsen

packages/tasks/src/pipelines.ts

I forgot to replace these when I locally renamed the tag from `sentence-ranking` to `text-ranking`

Wauplin

👍 on my side but let's wait for an approval from @merveenoyan or @pcuenca who are more used to updating the tasks files.

Wauplin · 2025-03-13T09:23:45Z

packages/tasks/src/tasks/text-ranking/data.ts

+			id: "microsoft/ms_marco",
+		},
+	],
+	demo: {


Where is the "demo" data used ? The inputs don't seem to follow the API you've described in the PR description (with query: str + passages: List[str])

There is no API for task-ranking yet, nor consensus on what the spec should be. Because of this, I adopted the format from https://huggingface.co/tasks/sentence-similarity

I was under the impression that this was only used to format this box:

I can definitely remove the demo section if you prefer.

having the one you provided makes sense imo @tomaarsen

But how will it render?

No idea, I can't get moon-landing to work anymore, not with the "Easy mode" nor with the default one. Not with Windows, not with WSL, not with Docker, not with local builds. I'll try again some other time.

we can make sure it works after merging (when updating the dependency in moon-landing), no worries 👍

Sounds good, thanks

merveenoyan

thanks a lot (also for adding task page!)

packages/tasks/src/pipelines.ts

packages/tasks/src/tasks/text-ranking/about.md

merveenoyan · 2025-03-17T09:50:41Z

packages/tasks/src/tasks/text-ranking/data.ts

+			id: "microsoft/ms_marco",
+		},
+	],
+	demo: {


having the one you provided makes sense imo @tomaarsen

packages/tasks/src/tasks/text-ranking/about.md

pcuenca · 2025-03-17T10:26:50Z

packages/tasks/src/tasks/text-ranking/data.ts

+			id: "microsoft/ms_marco",
+		},
+	],
+	demo: {


But how will it render?

pcuenca · 2025-03-17T10:29:59Z

packages/tasks/src/tasks/text-ranking/about.md

+-4.24   The city of Paris had a population of 2,165,423 people within its administrative city limits as of January 1, 2019
+-4.32   Berlin is well known for its museums.
+"""
+```


I suppose this will be faster, right?

You mean the model.rank vs model.predict? No, the model.rank is just a more convenient interface to the model. They're equally fast.

…reranker

tomaarsen added 3 commits March 11, 2025 11:42

Add text-ranking pipeline tag

5053f21

Update sentence-similarity docs slightly

03ff709

Update model names following rename

c1900c5

tomaarsen requested review from SBrandeis, gary149, Wauplin, julien-c, pcuenca and ngxson as code owners March 11, 2025 10:45

Wauplin reviewed Mar 11, 2025

View reviewed changes

packages/tasks/src/pipelines.ts Outdated Show resolved Hide resolved

Sentence Ranking -> Text Ranking

4865124

I forgot to replace these when I locally renamed the tag from `sentence-ranking` to `text-ranking`

tomaarsen requested a review from Wauplin March 13, 2025 09:09

Wauplin reviewed Mar 13, 2025

View reviewed changes

merveenoyan approved these changes Mar 17, 2025

View reviewed changes

pcuenca approved these changes Mar 17, 2025

View reviewed changes

Expanded on docs; clearer example; more links to docs; explain why a …

49f7cd1

…reranker

julien-c approved these changes Mar 18, 2025

View reviewed changes

tomaarsen merged commit 1540c48 into huggingface:main Mar 18, 2025
4 checks passed

Add text-ranking pipeline tag #1267

Add text-ranking pipeline tag #1267

Uh oh!

Conversation

tomaarsen commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request overview

Details

Uh oh!

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Add `text-ranking` pipeline tag #1267

Add `text-ranking` pipeline tag #1267

tomaarsen commented Mar 11, 2025 •

edited

Loading