{"id":1018,"hash":"07507662b8826568f63cac752468fe8ccca4f8d59197b521907a9f8a90a24b58","pattern":"ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error","full_message":"def split_data(path):\n  df = pd.read_csv(path)\n  return train_test_split(df , test_size=0.1, random_state=100)\n\ntrain, test = split_data(DATA_DIR)\ntrain_texts, train_labels = train['text'].to_list(), train['sentiment'].to_list() \ntest_texts, test_labels = test['text'].to_list(), test['sentiment'].to_list() \n\ntrain_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=0.1, random_state=100)\n\nfrom transformers import DistilBertTokenizerFast\ntokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased\n\ntrain_encodings = tokenizer(train_texts, truncation=True, padding=True)\nvalid_encodings = tokenizer(valid_texts, truncation=True, padding=True)\ntest_encodings = tokenizer(test_texts, truncation=True, padding=True)\n\nWhen I tried to split from the dataframe using BERT tokenizers I got an error us such.","ecosystem":"pypi","package_name":"tokenize","package_version":null,"solution":"I had the same error. The problem was that I had None in my list, e.g:\n\nfrom transformers import DistilBertTokenizerFast\n\ntokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-german-cased')\n\n# create test dataframe\ntexts = ['Vero Moda Damen Übergangsmantel Kurzmantel Chic Business Coatigan SALE',\n         'Neu Herren Damen Sportschuhe Sneaker Turnschuhe Freizeit 1975 Schuhe Gr. 36-46',\n         'KOMBI-ANGEBOT Zuckerpaste STRONG / SOFT / ZUBEHÖR -Sugaring Wachs Haarentfernung',\n         None]\n\nlabels = [1, 2, 3, 1]\n\nd = {'texts': texts, 'labels': labels} \ntest_df = pd.DataFrame(d)\n\nSo, before I converted the Dataframe columns to list I remove all None rows.\n\ntest_df = test_df.dropna()\ntexts = test_df[\"texts\"].tolist()\ntexts_encodings = tokenizer(texts, truncation=True, padding=True)\n\nThis worked for me.","confidence":0.95,"source":"stackoverflow","source_url":"https://stackoverflow.com/questions/63517293/valueerror-textencodeinput-must-be-uniontextinputsequence-tupleinputsequence","votes":53,"created_at":"2026-04-19T04:52:12.287491+00:00","updated_at":"2026-04-19T04:52:12.287491+00:00"}