CFAModelCluster类提供多种聚类算法,主要应用于:
pip install FreeAeon-ML
| 参数名 | 类型 | 说明 |
|---|---|---|
| cluster_count | int | 聚类数量 |
| models | dict | 自定义模型字典(可选) |
def fit_predict(self, df_sample)
对数据进行聚类并返回每个样本的聚类标签。
示例:
from FreeAeonML.FAModelCluster import CFAModelCluster
from FreeAeonML.FASample import CFASample
df_sample = CFASample.get_random_cluster()
cluster_model = CFAModelCluster(cluster_count=3)
df_result = cluster_model.fit_predict(df_sample)
print(df_result)
def evaluate(self, df_sample)
计算聚类质量指标: - silhouette_score: 轮廓系数(越大越好) - calinski_harabasz_score: CH指数(越大越好) - davies_bouldin_score: DB指数(越小越好)
示例:
df_perf = cluster_model.evaluate(df_sample)
print(df_perf)
def sample_cluster(self, df_cluster_result, df_sample, model_name)
将聚类标签添加到原始数据。
示例:
df_clustered = cluster_model.sample_cluster(df_result, df_sample, "KMeans")
print(df_clustered.head())
import matplotlib.pyplot as plt
from FreeAeonML.FAModelCluster import CFAModelCluster
from FreeAeonML.FASample import CFASample
# 生成数据
df_sample = CFASample.get_random_cluster()
# 聚类
cluster_model = CFAModelCluster(cluster_count=3)
df_result = cluster_model.fit_predict(df_sample)
# 评估
df_perf = cluster_model.evaluate(df_sample)
print("聚类性能:")
print(df_perf.sort_values('silhouette_score', ascending=False))
# 可视化最佳模型
best_model = df_perf.loc[df_perf['silhouette_score'].idxmax(), 'model_name']
df_clustered = cluster_model.sample_cluster(df_result, df_sample, best_model)
plt.figure(figsize=(10, 6))
for cluster in df_clustered['_cluster'].unique():
subset = df_clustered[df_clustered['_cluster'] == cluster]
plt.scatter(subset.iloc[:, 0], subset.iloc[:, 1], label=f'Cluster {cluster}')
plt.legend()
plt.title(f'{best_model} Clustering')
plt.show()