在Python中删除重复单词,你可以使用以下几种方法:
1. 使用集合(set):
def remove_duplicate_words(sentence):words = sentence.split()unique_words = set(words)return ' '.join(unique_words)sentence = "Python is great and Java is also great"print(remove_duplicate_words(sentence))
2. 使用`dict.fromkeys()`方法:
def remove_duplicate_words_ordered(sentence):words = sentence.split()unique_words = list(dict.fromkeys(words))return ' '.join(unique_words)sentence = "Python is great and Java is also great"print(remove_duplicate_words_ordered(sentence))
3. 使用列表推导式:

def remove_duplicate_words_list_comprehension(sentence):words = sentence.split()unique_words = [word for i, word in enumerate(words) if word not in words[:i]]return ' '.join(unique_words)sentence = "Python is great and Java is also great"print(remove_duplicate_words_list_comprehension(sentence))
4. 使用`nltk`库进行分词和去重(保留顺序):
import nltkdef remove_duplicate_words_nltk(sentence):nltk.download('punkt')tokens = nltk.word_tokenize(sentence)unique_tokens = list(dict.fromkeys(tokens))return ' '.join(unique_tokens)sentence = "The Sky is blue also the ocean is blue also Rainbow has a blue colour."print(remove_duplicate_words_nltk(sentence))
以上方法各有优缺点,你可以根据具体需求选择合适的方法。需要注意的是,使用集合去重会丢失原始列表中单词的顺序,而使用`dict.fromkeys()`和列表推导式可以保持顺序。如果需要更复杂的处理,比如大小写不敏感或者词形还原,可能需要使用更高级的自然语言处理库,如`nltk`或`spaCy`
