Pandas如何对Categorical类型字段数据统计实战案例

更新时间：2022年08月23日 11:31:16 作者：菜鸟实战

这篇文章主要介绍了Pandas如何对Categorical类型字段数据统计实战案例，文章围绕主题展开详细的内容介绍，具有一定的参考价值，需要的小伙伴可以参考一下

一、Pandas如何对Categorical类型字段数据统计

实战场景：对Categorical类型字段数据统计，Categorical类型是Pandas拥有的一种特殊数据类型,这样的类型可以包含基于整数的类别展示和编码的数据

1.1主要知识点

文件读写
基础语法
Pandas
read_csv

实战：

1.2创建 python 文件

import pandas as pd
#读取csv文件
df = pd.read_csv("Telco-Customer-Churn.csv")
 
# 填充 TotalCharges 的缺失值
median = df["TotalCharges"][df["TotalCharges"] != ' '].median()
df.loc[df["TotalCharges"] == ' ', 'TotalCharges'] = median
df["TotalCharges"] = df["TotalCharges"].astype(float)
 
# 将分类列转换成 Categorical 类型
number_columns = ['tenure', 'MonthlyCharges', 'TotalCharges']
for column in number_columns:  df[column] = df[column].astype(float) #对三列变成float类型
for column in set(df.columns) - set(number_columns):  df[column] = pd.Categorical(df[column])
print(df.info())
print(df.describe(include=["category"]))

1.3运行结果

RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null category
1 gender 7043 non-null category
2 SeniorCitizen 7043 non-null category
3 Partner 7043 non-null category
4 Dependents 7043 non-null category
5 tenure 7043 non-null float64
6 PhoneService 7043 non-null category
7 MultipleLines 7043 non-null category
8 InternetService 7043 non-null category
9 OnlineSecurity 7043 non-null category
10 OnlineBackup 7043 non-null category
11 DeviceProtection 7043 non-null category
12 TechSupport 7043 non-null category
13 StreamingTV 7043 non-null category
14 StreamingMovies 7043 non-null category
15 Contract 7043 non-null category
16 PaperlessBilling 7043 non-null category
17 PaymentMethod 7043 non-null category
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null float64
20 Churn 7043 non-null category
dtypes: category(18), float64(3)
memory usage: 611.1 KB
None
customerID gender SeniorCitizen Partner ... Contract PaperlessBilling PaymentMethod Churn
count 7043 7043 7043 7043 ... 7043 7043 7043 7043
unique 7043 2 2 2 ... 3 2 4 2
top 0002-ORFBO Male 0 No ... Month-to-month Yes Electronic check No
freq 1 3555 5901 3641 ... 3875 4171 2365 5174

[4 rows x 18 columns]

二、Pandas如何从股票数据找出收盘价最低行

实战场景：Pandas如何从股票数据找出收盘价最低行

2.1主要知识点

文件读写
基础语法
Pandas
read_csv

2.2创建 python 文件

"""
数据是CSV格式
1、加载到dataframe
2、找出收盘价最低的索引
3、根据索引找出数据行4 打印结果数据行
"""
import pandas as pd
 
df = pd.read_csv("./00700.HK.csv")
df["Date"] = pd.to_datetime(df["Date"])
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
print(df)
print(df.groupby("Year")["Close"].mean())
print(df.describe())

2.3运行结果

Date Open High Low Close Volume Year Month
0 2021-09-30 456.000 464.600 453.800 461.400 17335451 2021 9
1 2021-09-29 461.600 465.000 450.200 465.000 18250450 2021 9
2 2021-09-28 467.000 476.200 464.600 469.800 20947276 2021 9
3 2021-09-27 459.000 473.000 455.200 464.600 17966998 2021 9
4 2021-09-24 461.400 473.400 456.200 460.200 16656914 2021 9
... ... ... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000 2004 6
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000 2004 6
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000 2004 6
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500 2004 6
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000 2004 6

[4267 rows x 8 columns]
Year
2004 4.338686
2005 6.568927
2006 15.865951
2007 37.882724
2008 54.818367
2009 96.369679
2010 157.299598
2011 189.737398
2012 228.987045
2013 337.136066
2014 271.291498
2015 144.824291
2016 176.562041
2017 291.066667
2018 372.678862
2019 346.225203
2020 479.141129
2021 586.649189
Name: Close, dtype: float64

三、Pandas如何给股票数据新增年份和月份

实战场景：Pandas如何给股票数据新增年份和月份

3.1主要知识点

文件读写
基础语法
Pandas
Pandas的Series对象
DataFrame

实战:

3.2创建 python 文件

"""
给股票数据新增年份和月份
"""
import pandas as pd
 
df = pd.read_csv("./00100.csv")
print(df)
 
# to_datetime变成时间类型
df["Date"] = pd.to_datetime(df["Date"])
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
 
print(df)

3.3运行结果

Date Open High Low Close Volume
0 2021-09-30 456.000 464.600 453.800 461.400 17335451
1 2021-09-29 461.600 465.000 450.200 465.000 18250450
2 2021-09-28 467.000 476.200 464.600 469.800 20947276
3 2021-09-27 459.000 473.000 455.200 464.600 17966998
4 2021-09-24 461.400 473.400 456.200 460.200 16656914
... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000

[4267 rows x 6 columns]
Date Open High Low Close Volume Year Month
0 2021-09-30 456.000 464.600 453.800 461.400 17335451 2021 9
1 2021-09-29 461.600 465.000 450.200 465.000 18250450 2021 9
2 2021-09-28 467.000 476.200 464.600 469.800 20947276 2021 9
3 2021-09-27 459.000 473.000 455.200 464.600 17966998 2021 9
4 2021-09-24 461.400 473.400 456.200 460.200 16656914 2021 9
... ... ... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000 2004 6
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000 2004 6
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000 2004 6
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500 2004 6
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000 2004 6

[4267 rows x 8 columns]

四、Pandas如何获取表格的信息和基本数据统计

实战场景：Pandas如何获取表格的信息和基本数据统计

4.1主要知识点

文件读写
基础语法
Pandas
Pandas的Series对象
numpy

实战：

4.2创建 python 文件

import pandas as pd
import numpy as np
 
df = pd.DataFrame(  data={  "norm": np.random.normal(loc=0, scale=1, size=1000),  "uniform": np.random.uniform(low=0, high=1, size=1000),  "binomial": np.random.binomial(n=1, p=0.2, size=1000)},  index=pd.date_range(start='2021-01-01', periods=1000))
 
# df.info(),查看多少行，多少列，类型等基本信息
# df.describe()，查看每列的平均值、最小值、最大值、中位数等统计信息;
print(df.info())
print()
print(df.describe())

4.3运行结果

DatetimeIndex: 1000 entries, 2021-01-01 to 2023-09-27
Freq: D
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 norm 1000 non-null float64
1 uniform 1000 non-null float64
2 binomial 1000 non-null int32
dtypes: float64(2), int32(1)
memory usage: 27.3 KB
None

norm uniform binomial
count 1000.000000 1000.000000 1000.000000
mean -0.028664 0.496156 0.215000
std 0.987493 0.292747 0.411028
min -3.110249 0.000629 0.000000
25% -0.697858 0.238848 0.000000
50% -0.023654 0.503438 0.000000
75% 0.652157 0.746672 0.000000
max 3.333271 0.997617 1.000000

五、Pandas如何使用日期和随机数生成表格数据类型

实战场景：Pandas如何使用日期和随机数生成表格数据类型

5.1主要知识点

文件读写
基础语法
Pandas
Pandas的Series对象
numpy

实战：

5.2创建 python 文件

"""
输出:一个DataFrame，包含三列
1000个日期作为索引:从2021-01-01开始
数据列:正态分布1000个随机数，loc=0，scale=1
数据列:均匀分布1000个随机数，low=0，high=1
数据列:二项分布1000个随机数，n=1，p=0.2
"""
 
import pandas as pd
import numpy as np
 
#生成索引列，1000天
date_range = pd.date_range(start='2021-01-01', periods=1000)
 
data = {  'norm': np.random.normal(loc=0, scale=1, size=1000),  'uniform': np.random.uniform(low=0, high=1, size=1000),  'binomial': np.random.binomial(n=1, p=0.2, size=1000)
}
df = pd.DataFrame(data=data, index=date_range)
print(df)

5.3运行结果

norm uniform binomial
2021-01-01 1.387663 0.223985 0
2021-01-02 2.080345 0.704094 0
2021-01-03 1.615880 0.012283 0
2021-01-04 0.523260 0.053396 0
2021-01-05 -0.872305 0.973047 0
... ... ... ...
2023-09-23 -1.601608 0.423913 0
2023-09-24 -0.712566 0.727326 1
2023-09-25 -0.188441 0.879798 0
2023-09-26 2.249404 0.229298 0
2023-09-27 2.132976 0.472873 0

[1000 rows x 3 columns]

到此这篇关于Pandas如何对Categorical类型字段数据统计实战案例的文章就介绍到这了,更多相关Pandas Categorical数据统计内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

您可能感兴趣的文章:

python接口,继承,重载运算符详解
这篇文章主要给大家介绍了关于Python接口,继承,重载运算符的相关资料，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们参考借鉴，下面随着小编来一起学习学习吧
2021-08-08
深度理解Python中Class类、Object类、Type元类
本文主要介绍了深度理解Python中Class类、Object类、Type元类，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2023-06-06
python检测服务器端口代码实例
这篇文章主要介绍了python检测服务器端口代码实例,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
2019-08-08
python 函数嵌套及多函数共同运行知识点讲解
在本篇文章里小编给各位整理的是一篇关于python 函数嵌套及多函数共同运行知识点讲解，需要的朋友们可以学习下。
2020-03-03
Python基础常用内建函数图文示例解析
这篇文章主要为大家Python常用内建函数，文中通过图例详细的给大家作出了讲解分析，有需要的朋友可以借鉴参考下，希望可以有所帮助
2021-09-09
Python如何将jpg图像修改大小并转换为png
这篇文章主要介绍了Python如何将jpg图像修改大小并转换为png问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教
2023-09-09
Python实现学校管理系统
这篇文章主要为大家详细介绍了Python实现学校管理系统，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-01-01
使用python向MongoDB插入时间字段的操作
这篇文章主要介绍了使用python向MongoDB插入时间字段的操作，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教
2021-05-05
详解如何在Python中替换文件路径和要读取的行号
这篇文章主要为大家详细介绍了如何在Python中替换文件路径和要读取的行号，文中的示例代码讲解详细，有需要的小伙伴可以跟随小编一起学习一下
2007-02-02
python防止随意修改类属性的实现方法
这篇文章主要介绍了python防止随意修改类属性的实现方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2019-08-08