pypinlp95% confidence\u2191 12

How can I make sentence-BERT throw an exception if the text exceeds max_seq_length, and what is the max possible max_seq_length for all-MiniLM-L6-v2?

Full error message
I'm using sentence-BERT from Huggingface in the following way:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
model.max_seq_length = 512
model.encode(text)

When text is long and contains more than 512 tokens, it does not throw an exception. I assume it automatically truncates the input to 512 tokens.

How can I make it throw an exception when the input length is larger than max_seq_length?

Further, what is the maximum possible max_seq_length for all-MiniLM-L6-v2?

First of all, it should be noted that the sentence transformer supports a different sequence length than the underlying transformer. You check those values with: # that's the sentence transformer print(model.max_seq_length) # that's the underlying transformer print(model[0].auto_model.config.max_position_embeddings) Output: 256 512 That means, the position embedding layer of the transformers has 512 weights, but the sentence transformer will only use and was trained with the first 256 of them. Therefore, you should be careful with increasing the value above 256. It will work from a technical perspective, but the position embedding weights (>256) are not properly trained and can therefore mess up your results. Please also check this StackOverflow post. Regarding throwing an exception, I think that is not offered by the library and you, therefore, have a write a workaround by yourself: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') my_text = "this is a test "*1000 try: o = model[0].tokenizer(my_text, return_attention_mask=False, return_token_type_ids=False) if len(o.input_ids) > model.max_seq_length: raise ValueError("Oh no!") except ValueError: ... model.encode(my_text)

API access

Get this solution programmatically \u2014 free, no authentication.

curl https://depscope.dev/api/error/2fad2926fc6ec9667c4320a094041296cbe81cc48f226abd051a4f7c42a5bc39
hash \u00b7 2fad2926fc6ec9667c4320a094041296cbe81cc48f226abd051a4f7c42a5bc39
How can I make sentence-BERT throw an exception if the text… — DepScope fix | DepScope