1. 检测文件的原始编码格式。
2. 打开文件并读取内容。
3. 将读取的内容按照目标编码格式进行转换。
4. 将转换后的内容写入新文件。
```python
import os
import chardet
import codecs
def convert_encoding(source_file_path, target_file_path, source_encoding=None, target_encoding='UTF-8'):
检测源文件的编码格式
with open(source_file_path, 'rb') as f_in:
data = f_in.read()
encoding = chardet.detect(data)['encoding']
如果源文件编码不是UTF-8,则进行转换
if encoding != target_encoding:
with codecs.open(source_file_path, 'r', encoding) as source_file:
contents = source_file.read()
with codecs.open(target_file_path, 'w', target_encoding) as target_file:
target_file.write(contents)
print(f"Converted {source_file_path} from {encoding} to {target_encoding}")
else:
print(f"{source_file_path} is already encoded in {target_encoding}, no conversion needed.")
示例使用
path_to_convert = 'path/to/your/files' 替换为实际文件路径
convert_encoding(path_to_convert, path_to_convert, source_encoding='GBK', target_encoding='UTF-8')
请确保将`path_to_convert`替换为实际要转换文件的路径,并根据需要调整源文件编码和目标文件编码。