使用Python进行K均值聚类分析的基本步骤如下:
导入必要的库
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
准备数据
生成示例数据集
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=0)
实现K均值算法
def k_means(X, n_clusters, max_iters=100):
centroids = X[np.random.choice(len(X), n_clusters, replace=False)]
for _ in range(max_iters):
clusters = [[] for _ in range(n_clusters)]
for x in X:
distances = np.linalg.norm(x[:, np.newaxis] - centroids, axis=2)
closest_centroid = np.argmin(distances)
clusters[closest_centroid].append(x)
new_centroids = np.array([np.mean(cluster, axis=0) for cluster in clusters])
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, clusters
执行K均值聚类
centroids, clusters = k_means(X, n_clusters=4)
可视化结果
for i, centroid in enumerate(centroids):
plt.scatter(X[clusters[i], 0], X[clusters[i], 1], label=f'Cluster {i+1}')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='x', label='Centroids')
plt.legend()
plt.show()
以上步骤展示了如何使用Python进行K均值聚类分析。你可以根据实际需要调整数据集、聚类数以及其他参数。