{"id":260,"hash":"f36699a12a9e887cef088c24d842248b2c50565b48c8b56ae1a2cbffeda545de","pattern":"UnicodeDecodeError when reading CSV file in Pandas","full_message":"I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error...\n\n  File \"C:\\Importer\\src\\dfman\\importer.py\", line 26, in import_chr\n    data = pd.read_csv(filepath, names=fields)\n  File \"C:\\Python33\\lib\\site-packages\\pandas\\io\\parsers.py\", line 400, in parser_f\n    return _read(filepath_or_buffer, kwds)\n  File \"C:\\Python33\\lib\\site-packages\\pandas\\io\\parsers.py\", line 205, in _read\n    return parser.read()\n  File \"C:\\Python33\\lib\\site-packages\\pandas\\io\\parsers.py\", line 608, in read\n    ret = self._engine.read(nrows)\n  File \"C:\\Python33\\lib\\site-packages\\pandas\\io\\parsers.py\", line 1028, in read\n    data = self._reader.read(nrows)\n  File \"parser.pyx\", line 706, in pandas.parser.TextReader.read (pandas\\parser.c:6745)\n  File \"parser.pyx\", line 728, in pandas.parser.TextReader._read_low_memory (pandas\\parser.c:6964)\n  File \"parser.pyx\", line 804, in pandas.parser.TextReader._read_rows (pandas\\parser.c:7780)\n  File \"parser.pyx\", line 890, in pandas.parser.TextReader._convert_column_data (pandas\\parser.c:8793)\n  File \"parser.pyx\", line 950, in pandas.parser.TextReader._convert_tokens (pandas\\parser.c:9484)\n  File \"parser.pyx\", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\\parser.c:10642)\n  File \"parser.pyx\", line 1046, in pandas.parser.TextReader._string_convert (pandas\\parser.c:10853)\n  File \"parser.pyx\", line 1278, in pandas.parser._string_box_utf8 (pandas\\parser.c:15657)\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 6: invalid    continuation byte\n\nThe source/creation of these files all come from the same place. What's the best way to correct this to proceed with the import?","ecosystem":"pypi","package_name":"pandas","package_version":null,"solution":"read_csv takes an encoding option to deal with files in different formats. I mostly use read_csv('file', encoding = \"ISO-8859-1\"), or alternatively encoding = \"utf-8\" for reading, and generally utf-8 for to_csv.\n\nYou can also use one of several alias options like 'latin' or 'cp1252' (Windows) instead of 'ISO-8859-1' (see python docs, also for numerous other encodings you may encounter).\n\nSee relevant Pandas documentation,\npython docs examples on csv files, and plenty of related questions here on SO. A good background resource is What every developer should know about unicode and character sets.\n\nTo detect the encoding (assuming the file contains non-ascii characters), you can use enca (see man page) or file -i (linux) or file -I (osx) (see man page).","confidence":0.95,"source":"stackoverflow","source_url":"https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas","votes":719,"created_at":"2026-04-19T04:41:41.014469+00:00","updated_at":"2026-04-19T04:51:53.056620+00:00"}