Python利用scikit-learn实现近邻算法分类的示例详解

更新时间：2023年02月28日 09:21:54 作者：吃肉的小馒头

scikit-learn已经封装好很多数据挖掘的算法，这篇文章就来用scikit-learn实现近邻算法分类，文中的示例代码讲解详细，感兴趣的小伙伴可以了解一下

scikit-learn库

scikit-learn已经封装好很多数据挖掘的算法

现介绍数据挖掘框架的搭建方法

1.转换器（Transformer）用于数据预处理，数据转换

2.流水线（Pipeline）组合数据挖掘流程，方便再次使用（封装）

3.估计器（Estimator）用于分类，聚类，回归分析（各种算法对象）

所有的估计器都有下面2个函数

fit() 训练

用法：estimator.fit(X_train, y_train)

estimator = KNeighborsClassifier() 是scikit-learn算法对象

X_train = dataset.data 是numpy数组

y_train = dataset.target 是numpy数组

predict() 预测

用法：estimator.predict(X_test)

estimator = KNeighborsClassifier() 是scikit-learn算法对象

X_test = dataset.data 是numpy数组

示例

%matplotlib inline
# Ionosphere数据集
# https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/
# 下载ionosphere.data和ionosphere.names文件，放在 ./data/Ionosphere/ 目录下
import os
home_folder = os.path.expanduser("~")
print(home_folder) # home目录
# Change this to the location of your dataset
home_folder = "." # 改为当前目录
data_folder = os.path.join(home_folder, "data")
print(data_folder)
data_filename = os.path.join(data_folder, "ionosphere.data")
print(data_filename)
import csv
import numpy as np

# Size taken from the dataset and is known已知数据集形状
X = np.zeros((351, 34), dtype='float')
y = np.zeros((351,), dtype='bool')


with open(data_filename, 'r') as input_file:
    reader = csv.reader(input_file)
    for i, row in enumerate(reader):
        # Get the data, converting each item to a float
        data = [float(datum) for datum in row[:-1]]
        # Set the appropriate row in our dataset用真实数据覆盖掉初始化的0
        X[i] = data
        # 1 if the class is 'g', 0 otherwise
        y[i] = row[-1] == 'g' # 相当于if row[-1]=='g': y[i]=1 else: y[i]=0

# 数据预处理
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14)
print("训练集数据有 {} 条".format(X_train.shape[0]))
print("测试集数据有 {} 条".format(X_test.shape[0]))
print("每条数据有 {} 个features".format(X_train.shape[1]))

输出：

训练集数据有 263 条
测试集数据有 88 条
每条数据有 34 个features

# 实例化算法对象->训练->预测->评价
from sklearn.neighbors import KNeighborsClassifier

estimator = KNeighborsClassifier()
estimator.fit(X_train, y_train)
y_predicted = estimator.predict(X_test)
accuracy = np.mean(y_test == y_predicted) * 100
print("准确率 {0:.1f}%".format(accuracy))

# 其他评价方式
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(estimator, X, y, scoring='accuracy')
average_accuracy = np.mean(scores) * 100
print("平均准确率 {0:.1f}%".format(average_accuracy))

avg_scores = []
all_scores = []
parameter_values = list(range(1, 21))  # Including 20
for n_neighbors in parameter_values:
    estimator = KNeighborsClassifier(n_neighbors=n_neighbors)
    scores = cross_val_score(estimator, X, y, scoring='accuracy')
    avg_scores.append(np.mean(scores))
    all_scores.append(scores)

输出：

准确率 86.4%
平均准确率 82.3%

from matplotlib import pyplot as plt
plt.figure(figsize=(32,20))
plt.plot(parameter_values, avg_scores, '-o', linewidth=5, markersize=24)
#plt.axis([0, max(parameter_values), 0, 1.0])

for parameter, scores in zip(parameter_values, all_scores):
    n_scores = len(scores)
    plt.plot([parameter] * n_scores, scores, '-o')

plt.plot(parameter_values, all_scores, 'bx')

from collections import defaultdict
all_scores = defaultdict(list)
parameter_values = list(range(1, 21))  # Including 20
for n_neighbors in parameter_values:
    for i in range(100):
        estimator = KNeighborsClassifier(n_neighbors=n_neighbors)
        scores = cross_val_score(estimator, X, y, scoring='accuracy', cv=10)
        all_scores[n_neighbors].append(scores)
for parameter in parameter_values:
    scores = all_scores[parameter]
    n_scores = len(scores)
    plt.plot([parameter] * n_scores, scores, '-o')

plt.plot(parameter_values, avg_scores, '-o')

以上就是Python利用scikit-learn实现近邻算法分类的示例详解的详细内容，更多关于Python scikit-learn近邻算法分类的资料请关注脚本之家其它相关文章！

您可能感兴趣的文章:

Python pycharm读取文件相对路径与绝对路径的方法
这篇文章主要给大家介绍了关于Python pycharm读取文件相对路径与绝对路径的方法,绝对路径就是文件的真正存在的路径,是指从硬盘的根目录(盘符)开始,进行一级级目录指向文件,相对路径就是以当前文件为基准进行一级级目录指向被引用的资源文件,需要的朋友可以参考下
2023-12-12
matplotlib legend()里字体如何修改
这篇文章主要介绍了matplotlib legend()里字体如何修改问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教
2023-09-09
Python实现更改图片尺寸大小的方法(基于Pillow包)
这篇文章主要介绍了Python实现更改图片尺寸大小的方法,结合实例形式分析了Python基于Pillow包更改图片属性的相关技巧,需要的朋友可以参考下
2016-09-09
Python正则表达式指南推荐
本文介绍了Python对于正则表达式的支持，包括正则表达式基础以及Python正则表达式标准库的完整介绍及使用示例。本文的内容不包括如何编写高效的正则表达式、如何优化正则表达式，这些主题请查看其他教程。
2018-10-10
python实现用户答题功能
这篇文章主要为大家详细介绍了python实现用户答题功能，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-01-01
Python基础之numpy库的使用
这篇文章主要介绍了Python基础之numpy库的使用,文中有非常详细的代码示例,对正在学习python基础的小伙伴们有非常好的帮助,需要的朋友可以参考下
2021-04-04
Python中如何快速解析JSON对象数组
由于浏览器可以迅速地解析JSON对象,它们有助于在客户端和服务器之间传输数据,本文将描述如何使用Python的JSON模块来传输和接收JSON数据
2023-09-09
python+selenium+chrome批量文件下载并自动创建文件夹实例
这篇文章主要介绍了python+selenium+chrome批量文件下载并自动创建文件夹实例，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2020-04-04
详解PyCharm安装MicroPython插件的教程
PyCharm可以说是当今最流行的一款Python IDE了，大部分购买TPYBoard的小伙伴都会使用PyCharm编写MicroPython的程序。这篇文章给大家介绍了PyCharm安装MicroPython插件的教程，需要的朋友参考下吧
2019-06-06
python获取指定时间差的时间实例详解
这篇文章主要介绍了python获取指定时间差的时间实例详解的相关资料,需要的朋友可以参考下
2017-04-04

Python利用scikit-learn实现近邻算法分类的示例详解

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具