{"id":1049,"hash":"07b17be6ee8f3e0a0a02baf715de7e04641a9bd0ccc1694097d6f09f6378387a","pattern":"RandomForestClassfier.fit(): ValueError: could not convert string to float","full_message":"Given is a simple CSV file:\n\nA,B,C\nHello,Hi,0\nHola,Bueno,1\n\nObviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so:\n\ncols = ['A','B','C']\ncol_types = {'A': str, 'B': str, 'C': int}\ntest = pd.read_csv('test.csv', dtype=col_types)\n\ntrain_y = test['C'] == 1\ntrain_x = test[cols]\n\nclf_rf = RandomForestClassifier(n_estimators=50)\nclf_rf.fit(train_x, train_y)\n\nBut I just get this traceback when invoking fit():\n\nValueError: could not convert string to float: 'Bueno'\n\nscikit-learn version is 0.16.1.","ecosystem":"pypi","package_name":"scikit-learn","package_version":null,"solution":"You have to do some encoding before using fit(). As it was told fit() does not accept strings, but you solve this.\n\nThere are several classes that can be used :\n\nLabelEncoder : turn your string into incremental value\nOneHotEncoder : use One-of-K algorithm to transform your String into integer\n\nPersonally, I have post almost the same question on Stack Overflow some time ago. I wanted to have a scalable solution, but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective, but if you have a lot of different strings the matrix will grow very quickly and memory will be required.","confidence":0.95,"source":"stackoverflow","source_url":"https://stackoverflow.com/questions/30384995/randomforestclassfier-fit-valueerror-could-not-convert-string-to-float","votes":95,"created_at":"2026-04-19T04:52:13.804098+00:00","updated_at":"2026-04-19T04:52:13.804098+00:00"}