使用Python进行K均值聚类分析的基本步骤如下:
导入必要的库
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import make_blobs
准备数据
生成示例数据集X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=0)
实现K均值算法
def k_means(X, n_clusters, max_iters=100):centroids = X[np.random.choice(len(X), n_clusters, replace=False)]for _ in range(max_iters):clusters = [[] for _ in range(n_clusters)]for x in X:distances = np.linalg.norm(x[:, np.newaxis] - centroids, axis=2)closest_centroid = np.argmin(distances)clusters[closest_centroid].append(x)new_centroids = np.array([np.mean(cluster, axis=0) for cluster in clusters])if np.all(centroids == new_centroids):breakcentroids = new_centroidsreturn centroids, clusters

执行K均值聚类
centroids, clusters = k_means(X, n_clusters=4)
可视化结果
for i, centroid in enumerate(centroids):plt.scatter(X[clusters[i], 0], X[clusters[i], 1], label=f'Cluster {i+1}')plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='x', label='Centroids')plt.legend()plt.show()
以上步骤展示了如何使用Python进行K均值聚类分析。你可以根据实际需要调整数据集、聚类数以及其他参数。
