Python基础指南之集合set的创建与自动去重详解

更新时间：2026年06月15日 08:51:57 作者：星河耀银海

集合是Python内置的四大核心容器类型之一（列表、元组、字典、集合）,它有两个最独特的天性,即元素不重复和无序性,本文介绍了Python中集合(set)的基本概念和自动去重特性,希望对大家有所帮助

一、开篇：一个自动帮你"去重"的数据类型

先看一个生活中常见的场景：你有一个包含1000条用户访问记录的文件，想知道到底有多少个不同的IP访问了你的网站。如果只会用列表，你需要写一个循环，逐一判断每个IP是否已经记录过，代码笨重且低效。

但如果用Python的集合（set），一行代码就能搞定：

ip_list = ['192.168.1.1', '10.0.0.5', '192.168.1.1', '172.16.0.3', '10.0.0.5']
unique_ips = set(ip_list)
print(unique_ips)  # {'192.168.1.1', '10.0.0.5', '172.16.0.3'}
print(f'独立IP数量: {len(unique_ips)}')  # 3

集合是Python内置的四大核心容器类型之一（列表、元组、字典、集合）。它有两个最独特的天性：元素不重复和无序性。这两个特性让它成为去重、成员检测、集合运算（交并差）的最佳选择。

今天这篇文章，我们来深入拆解集合的创建方式和自动去重特性。下一篇我们会讲集合的交并差运算。

二、集合的基本概念

2.1 什么是集合？

集合是一个无序的、元素唯一的容器。它和数学里的"集合"概念一脉相承——集合中的元素不能重复，而且"{1, 2}“和”{2, 1}"是同一个集合（因为没有顺序）。

# 集合的三个核心特征
# 1. 元素唯一（自动去重）
s = {1, 2, 2, 3, 3, 3}
print(s)  # {1, 2, 3}——重复的2和3被自动去掉了

# 2. 无序——元素的存储顺序和插入顺序无关
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s)  # 输出顺序通常不是你插入的顺序

# 3. 可变——可以添加和删除元素（但元素本身必须是不可变的）
s = {1, 2, 3}
s.add(4)      # ✅ 可以添加
s.remove(1)   # ✅ 可以删除
print(s)      # {2, 3, 4}

2.2 集合在Python类型体系中的位置

集合和列表、字典一样是可变类型。但集合中的元素必须是不可变（可哈希）类型。这意味着：

# ✅ 这些可以放进集合——都是不可变类型
valid_set = {1, 3.14, 'hello', (1, 2), True, None, frozenset([1, 2])}
print(valid_set)

# ❌ 这些不能放进集合——是可变类型
# {[1, 2]}      # TypeError: unhashable type: 'list'
# {{'a': 1}}    # TypeError: unhashable type: 'dict'
# {{1, 2}}      # TypeError: unhashable type: 'set'（除非用frozenset）

这和字典的"键"的要求是一样的——因为集合底层也是基于哈希表实现的，和字典共用同一套数据结构。

2.3 集合 vs 列表 vs 字典

特性	列表(list)	字典(dict)	集合(set)
元素唯一	❌ 可重复	✅ 键唯一	✅ 元素唯一
有序性	✅ 有序	✅ 保序(3.7+)	❌ 无序
可变性	✅ 可变	✅ 可变	✅ 可变
索引访问	✅ `lst[0]`	✅ `d[key]`	❌ 不支持
查找速度	O(n)	O(1)	O(1)
底层结构	动态数组	哈希表	哈希表

三、创建集合的N种方式

3.1 花括号直接创建

# 最基本的方式——用花括号
fruits = {'apple', 'banana', 'orange'}
print(fruits)  # {'banana', 'orange', 'apple'}
print(type(fruits))  # <class 'set'>

# 空集合——⚠️ 必须用set()，因为{}是空字典！
empty_set = set()
print(type(empty_set))  # <class 'set'>

empty_dict = {}
print(type(empty_dict))  # <class 'dict'>——注意区别！

# 集合中的元素自动去重
numbers = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4}
print(numbers)  # {1, 2, 3, 4}
print(len(numbers))  # 4（不是10）

3.2 set()构造函数——从任何可迭代对象创建

# set()接受任何可迭代对象

# 从列表创建
s1 = set([1, 2, 3, 2, 1])
print(s1)  # {1, 2, 3}

# 从元组创建
s2 = set((10, 20, 30, 20))
print(s2)  # {10, 20, 30}

# 从字符串创建——每个字符成为一个元素
s3 = set('hello world')
print(s3)  # {'h', 'e', 'l', 'o', ' ', 'w', 'r', 'd'}
# 注意：两个'l'和两个'o'各只保留一个

# 从range对象创建
s4 = set(range(10))
print(s4)  # {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

# 从字典创建（取键）
s5 = set({'a': 1, 'b': 2, 'c': 3})
print(s5)  # {'a', 'b', 'c'}

# 从生成器表达式创建
s6 = set(x ** 2 for x in range(5))
print(s6)  # {0, 1, 4, 9, 16}

# 从文件对象创建（每行一个元素）
# with open('file.txt') as f:
#     lines = set(f)  # 自动去重的行集合

3.3 创建包含各种类型元素的集合

# 混合类型集合
mixed = {42, 'hello', 3.14, True, None, (1, 2)}
print(mixed)  # {False, 3.14, 'hello', 42, None, (1, 2)}

# ⚠️ 注意：True在集合中等于1，False等于0
# 因为bool是int的子类，True==1, False==0
numbers_with_bool = {1, True, 2, False, 0, 3}
print(numbers_with_bool)  # {0, 1, 2, 3}——True和1合并了，False和0合并了

# 验证
print(True == 1)   # True
print(False == 0)  # True
print({1, True})   # {1}——因为是同一个值
print({0, False})  # {0}——同理

四、自动去重特性的深度理解

4.1 去重的依据：hash和eq

集合判断两个元素是否"相同"，依据的是哈希值和相等性比较：

# 去重流程：
# 1. 计算新元素的哈希值 hash(element)
# 2. 找到哈希表中对应位置
# 3. 如果该位置已有元素，用 == 比较
# 4. 如果相等，则不插入（去重）
# 5. 如果不等（哈希冲突），用开放地址法探测下一个位置

# 演示：自定义类的去重行为
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __hash__(self):
        """定义哈希值——基于name"""
        return hash(self.name)
    
    def __eq__(self, other):
        """定义相等性——基于name和age"""
        if not isinstance(other, Person):
            return False
        return self.name == other.name and self.age == other.age
    
    def __repr__(self):
        return f'Person({self.name}, {self.age})'


# 哈希相同但==不相等 → 两个元素都保留（哈希冲突）
p1 = Person('Alice', 25)  # hash基于'Alice'
p2 = Person('Alice', 30)  # hash还是基于'Alice'，但age不同
s = {p1, p2}
print(s)  # {Person(Alice, 25), Person(Alice, 30)}——两个都保留

# 哈希相同且==相等 → 去重
p3 = Person('Alice', 25)  # 和p1一模一样
p4 = Person('Alice', 25)  # 也和p1一模一样
s2 = {p1, p3, p4}
print(s2)  # {Person(Alice, 25)}——只有一个

4.2 各种类型的去重行为

# 整数——按数值去重
s = {1, 2, 3, 1, 2}
print(s)  # {1, 2, 3}

# 浮点数——按数值去重
s = {1.0, 2.0, 1.0, 3.0}
print(s)  # {1.0, 2.0, 3.0}

# ⚠️ 注意：1和1.0在集合中是一样的！
s = {1, 1.0, 2, 3}
print(s)  # {1, 2, 3}——1和1.0被视为相同
print(1 == 1.0)  # True
print(hash(1))   # 1
print(hash(1.0)) # 1——相同的哈希值！

# 字符串——按内容去重
s = {'hello', 'world', 'hello', 'HELLO'}
print(s)  # {'hello', 'world', 'HELLO'}——'hello'被去重，'HELLO'不同

# 元组——按内容去重
s = {(1, 2), (3, 4), (1, 2), (2, 1)}
print(s)  # {(1, 2), (3, 4), (2, 1)}——(1,2)去重，(2,1)是不同元组

# None——只有一个None
s = {None, None, None}
print(s)  # {None}

4.3 去重的实际性能

集合的去重依赖哈希表，查询和插入的平均时间复杂度都是O(1)，所以大批量数据的去重非常快：

import time
import random

# 生成10万个有重复的整数
data = [random.randint(1, 10000) for _ in range(100000)]

# 方法一：手动去重（用列表 + in 检查）
start = time.perf_counter()
unique_list = []
for x in data:
    if x not in unique_list:  # O(n)每次
        unique_list.append(x)
print(f'手动去重: {time.perf_counter() - start:.4f}s，结果数量: {len(unique_list)}')

# 方法二：用集合去重
start = time.perf_counter()
unique_set = set(data)  # O(1)每次，总共O(n)
print(f'集合去重: {time.perf_counter() - start:.4f}s，结果数量: {len(unique_set)}')

# 性能差异通常有数百倍到数千倍！

五、集合的增删改操作

虽然我们后面会详细讲集合操作，但这里先快速了解一下基本增删：

5.1 添加元素

s = {1, 2, 3}

# add()——添加单个元素
s.add(4)
print(s)  # {1, 2, 3, 4}

# 添加重复元素——静默忽略，不报错
s.add(4)
print(s)  # {1, 2, 3, 4}——还是原来的样子

# update()——批量添加（从可迭代对象）
s.update([5, 6, 7])
print(s)  # {1, 2, 3, 4, 5, 6, 7}

s.update({8, 9}, (10, 11))
print(s)  # {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

# update也可以接受字符串（逐个字符添加）
s2 = set()
s2.update('abc')
print(s2)  # {'a', 'b', 'c'}

5.2 删除元素

s = {1, 2, 3, 4, 5}

# remove()——删除指定元素，不存在则报错
s.remove(3)
print(s)  # {1, 2, 4, 5}
# s.remove(99)  # KeyError: 99

# discard()——删除指定元素，不存在也不报错
s.discard(4)
print(s)  # {1, 2, 5}
s.discard(99)  # 不报错

# pop()——随机删除并返回一个元素
# ⚠️ 注意：集合无序，pop()删除的是"任意"一个元素
popped = s.pop()
print(f'弹出: {popped}, 剩余: {s}')

# clear()——清空集合
s.clear()
print(s)  # set()

5.3 删除操作的对比表

方法	行为	元素不存在时	返回值
`remove(x)`	删除x	KeyError	None
`discard(x)`	删除x	不报错	None
`pop()`	删除任意一个	KeyError(集合为空)	被删除的元素
`clear()`	删除所有	不报错	None

六、集合的成员检测和遍历

6.1 成员检测——O(1)的查找速度

# 集合最大的性能优势：成员检测是O(1)
# 列表的成员检测是O(n)，集合是O(1)

s = set(range(100000))

import time

# 集合查找
start = time.perf_counter()
for _ in range(1000):
    _ = 99999 in s
print(f'集合in: {time.perf_counter() - start:.6f}s')

# 列表查找（对比）
lst = list(range(100000))
start = time.perf_counter()
for _ in range(1000):
    _ = 99999 in lst
print(f'列表in: {time.perf_counter() - start:.6f}s')

# 实际示例：快速过滤
all_users = {'alice', 'bob', 'charlie', 'david', 'eve', 'frank', 'grace'}
vip_users = {'alice', 'charlie', 'grace'}

# 检查是否是VIP——O(1)
print('alice' in vip_users)    # True
print('bob' in vip_users)      # False

# 检查是否有访问权限
def has_access(username):
    return username in all_users

print(has_access('hacker'))  # False

6.2 遍历集合

s = {'apple', 'banana', 'orange', 'grape', 'mango'}

# for循环遍历——顺序不保证！
print('遍历集合:')
for fruit in s:
    print(f'  {fruit}', end='')
print()

# 排序后遍历
print('排序后遍历:')
for fruit in sorted(s):
    print(f'  {fruit}', end='')
print()

# 按长度排序遍历
print('按长度排序:')
for fruit in sorted(s, key=len):
    print(f'  {fruit}({len(fruit)})', end='')
print()

# enumerate遍历（带序号）
print('带序号遍历:')
for i, fruit in enumerate(s, 1):
    print(f'  {i}. {fruit}')

# ⚠️ 遍历时不能修改集合！
# for item in s:
#     s.add('new')  # RuntimeError: Set changed size during iteration

七、集合的常用内置操作

7.1 基础信息操作

s = {1, 2, 3, 4, 5}

# len()——元素个数
print(len(s))  # 5

# max() / min()——最大/最小元素
print(max(s))  # 5
print(min(s))  # 1

# sum()——求和
print(sum(s))  # 15

# any() / all()
print(any({0, False, 1}))  # True（至少有一个为真）
print(all({1, True, 2}))   # True（全部为真）
print(all({0, 1, 2}))      # False（0为假）

7.2 集合之间的关系判断

a = {1, 2, 3}
b = {1, 2, 3, 4, 5}
c = {1, 2, 3}

# 判断子集
print(a.issubset(b))      # True——a是b的子集
print(a <= b)             # True——运算符写法
print(a < b)              # True——真子集（a⊂b，不等于b）

# 判断超集
print(b.issuperset(a))    # True——b是a的超集
print(b >= a)             # True——运算符写法
print(b > a)              # True——真超集

# 判断相等（内容相同）
print(a == c)             # True
print(a == b)             # False

# 判断不相交
d = {4, 5, 6}
print(a.isdisjoint(d))    # True——没有公共元素
print(a.isdisjoint(b))    # False——有公共元素(1,2,3)

八、实战场景

8.1 场景一：数据去重与清洗

# 场景：从用户行为日志中提取不同的操作类型
log_entries = [
    'LOGIN - alice',
    'LOGOUT - alice',
    'LOGIN - bob',
    'PURCHASE - alice',
    'LOGIN - alice',      # 重复操作
    'VIEW_PAGE - charlie',
    'LOGIN - bob',         # 重复操作
    'PURCHASE - bob',
    'VIEW_PAGE - alice',
    'LOGOUT - bob',
]

# 提取所有不同的操作类型
action_types = {entry.split(' - ')[0] for entry in log_entries}
print(f'操作类型: {action_types}')
# {'LOGIN', 'LOGOUT', 'PURCHASE', 'VIEW_PAGE'}

# 提取所有不同的用户
users = {entry.split(' - ')[1] for entry in log_entries}
print(f'活跃用户: {users}')
# {'alice', 'bob', 'charlie'}

8.2 场景二：成员白名单/黑名单

# 场景：API接口的IP黑白名单
class IPFilter:
    """IP黑白名单过滤器——用集合实现O(1)查找"""
    
    def __init__(self):
        self._whitelist = set()  # 白名单（优先）
        self._blacklist = set()  # 黑名单
    
    def add_whitelist(self, ip):
        self._whitelist.add(ip)
    
    def add_blacklist(self, ip):
        self._blacklist.add(ip)
    
    def remove_whitelist(self, ip):
        self._whitelist.discard(ip)
    
    def remove_blacklist(self, ip):
        self._blacklist.discard(ip)
    
    def is_allowed(self, ip):
        """判断IP是否允许访问"""
        # 白名单优先——白名单中的IP即使同时在黑名单中也允许
        if ip in self._whitelist:
            return True
        # 在黑名单中
        if ip in self._blacklist:
            return False
        # 都不在——默认允许
        return True
    
    def load_from_file(self, filename, list_type):
        """从文件加载IP列表"""
        target = self._whitelist if list_type == 'whitelist' else self._blacklist
        with open(filename) as f:
            for line in f:
                ip = line.strip()
                if ip:
                    target.add(ip)
        print(f'已加载 {len(target)} 个IP到{list_type}')


# 使用示例
ip_filter = IPFilter()
ip_filter.add_blacklist('10.0.0.99')
ip_filter.add_blacklist('192.168.1.100')
ip_filter.add_whitelist('10.0.0.1')  # 管理员IP，始终允许

# 检查
print(ip_filter.is_allowed('10.0.0.99'))     # False——黑名单
print(ip_filter.is_allowed('10.0.0.1'))      # True——白名单优先
print(ip_filter.is_allowed('172.16.0.5'))    # True——不在任何名单中

8.3 场景三：文本分析——找出不重复的单词

# 场景：分析两篇文章的词汇差异
article1 = """
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience without being explicitly
programmed machine learning focuses on the development of computer
programs that can access data and use it to learn for themselves
"""

article2 = """
Deep learning is a subset of machine learning in artificial intelligence
that has networks capable of learning unsupervised from data that is
unstructured or unlabeled deep learning is also known as deep neural
network learning
"""

# 提取单词集合
def extract_words(text):
    return {word.strip('.,;:!?()[]{}\'"').lower() for word in text.split() if len(word) > 1}

words1 = extract_words(article1)
words2 = extract_words(article2)

print(f'文章1不同单词数: {len(words1)}')
print(f'文章2不同单词数: {len(words2)}')

# 文章1独有的单词
only_in_1 = words1 - words2
print(f'\n文章1独有的词 ({len(only_in_1)}个): {only_in_1}')

# 文章2独有的单词
only_in_2 = words2 - words1
print(f'文章2独有的词 ({len(only_in_2)}个): {only_in_2}')

# 两篇文章共有的单词
common = words1 & words2
print(f'两篇共有的词 ({len(common)}个): {common}')

# 所有出现过的单词
all_words = words1 | words2
print(f'总词汇量: {len(all_words)}')

8.4 场景四：快速统计不同元素

# 场景：统计一个论坛帖子里有多少不同的参与者
# 包括发帖人和所有回复人

posts = [
    {'author': 'alice', 'replies': ['bob', 'charlie', 'bob']},
    {'author': 'bob', 'replies': ['alice', 'david']},
    {'author': 'charlie', 'replies': ['alice', 'eve', 'alice', 'david']},
    {'author': 'david', 'replies': ['eve']},
]

# 统计所有参与者（去重）
all_participants = set()
for post in posts:
    all_participants.add(post['author'])      # 发帖人
    all_participants.update(post['replies'])  # 回复人

print(f'参与者总数: {len(all_participants)}')
print(f'参与者名单: {all_participants}')

# 统计谁发了帖（有帖子但没回复？）
authors = {post['author'] for post in posts}
repliers = set()
for post in posts:
    repliers.update(post['replies'])

print(f'只发帖没回复的人: {authors - repliers}')
print(f'只回复没发帖的人: {repliers - authors}')
print(f'又发帖又回复的人: {authors & repliers}')

九、集合的底层原理简述

9.1 哈希表

集合底层和字典一样，使用哈希表。每个元素通过哈希函数计算出一个整数，这个整数决定了元素在哈希表中的存储位置：

# 查看元素的哈希值
print(hash('hello'))   # 某个整数（每次运行可能不同）
print(hash(42))        # 42（整数的哈希就是自己）
print(hash((1, 2, 3))) # 某个整数

# 哈希表的工作流程：
# 1. 对元素x计算 hash(x)
# 2. 用 hash(x) % table_size 确定存储槽位
# 3. 如果槽位为空，直接存储
# 4. 如果槽位已被占用（哈希冲突），用开放地址法探测下一个位置
# 5. 如果发现相等的元素（__eq__返回True），则不插入（去重）

9.2 为什么集合是无序的

# 集合的无序不是"随机"，而是"按哈希值分布"
# 元素的存储位置由哈希值决定，不是插入顺序

s = set()
s.add(1)
s.add(2)
s.add(3)
print(s)  # 输出顺序通常看起来"有序"，是因为小整数的哈希就是自己

# 但对字符串就不一定了
s = {'banana', 'apple', 'cherry'}
print(s)  # 输出顺序不可预测

# ⚠️ 如果你需要保序的集合，可以：
# 1. 使用字典的键（Python 3.7+保序）
ordered_unique = dict.fromkeys([3, 1, 4, 1, 5, 9, 2, 6])
print(list(ordered_unique))  # [3, 1, 4, 5, 9, 2, 6]——保序去重

# 2. 使用sorted()排序输出
print(sorted(s))  # ['apple', 'banana', 'cherry']——字母序

9.3 集合的内存占用

import sys

# 集合有一定的内存开销（维护哈希表）
numbers = list(range(1000))
lst = numbers
s = set(numbers)

print(f'列表内存: {sys.getsizeof(lst)} 字节')
print(f'集合内存: {sys.getsizeof(s)} 字节')
# 集合通常占用更多内存——这是哈希表结构的代价
# 换取的是O(1)的查找和去重能力

# 实际内容的内存还包括元素本身
# sys.getsizeof只测量容器对象本身
# 如果需要精确测量，可以用更专业的工具

十、常见陷阱与注意事项

10.1 陷阱一：空集合用{}是错的

# ❌ 这不是空集合！
empty = {}
print(type(empty))  # <class 'dict'>——这是字典！

# ✅ 这才是空集合
empty_set = set()
print(type(empty_set))  # <class 'set'>

# 为什么？因为Python用{}初始化空字典的历史更早
# set字面量语法是后来添加的，必须避免歧义

10.2 陷阱二：可变元素不能放入集合

# ❌ 列表不能放入集合
# s = {[1, 2, 3]}  # TypeError: unhashable type: 'list'

# ❌ 字典不能放入集合
# s = {{'a': 1}}   # TypeError: unhashable type: 'dict'

# ❌ 集合不能放入集合
# s = {{1, 2}}     # TypeError: unhashable type: 'set'

# ✅ 但可以用frozenset
s = {frozenset([1, 2]), frozenset([3, 4])}
print(s)  # {frozenset({1, 2}), frozenset({3, 4})}

# ❌ 包含可变元素的元组也不能放入集合
# s = {(1, [2, 3])}  # TypeError: unhashable type: 'list'
# 因为元组中的列表是可变的，导致整个元组不可哈希

# ✅ 纯不可变元素的元组可以
s = {(1, 2), (3, 4)}
print(s)  # {(1, 2), (3, 4)}

10.3 陷阱三：遍历时修改集合

s = {1, 2, 3, 4, 5}

# ❌ 遍历时修改集合——会引发RuntimeError
# for x in s:
#     if x % 2 == 0:
#         s.remove(x)  # RuntimeError: Set changed size during iteration

# ✅ 正确方式一：遍历副本
for x in s.copy():
    if x % 2 == 0:
        s.remove(x)
print(s)  # {1, 3, 5}

# ✅ 正确方式二：用集合推导式创建新集合
s = {1, 2, 3, 4, 5}
s = {x for x in s if x % 2 != 0}
print(s)  # {1, 3, 5}

# ✅ 正确方式三：先收集要删除的，再统一删
s = {1, 2, 3, 4, 5}
to_remove = {x for x in s if x % 2 == 0}
s -= to_remove
print(s)  # {1, 3, 5}

10.4 陷阱四：集合是不可索引的

s = {10, 20, 30, 40, 50}

# ❌ 集合不支持索引访问
# print(s[0])  # TypeError: 'set' object is not subscriptable

# ❌ 集合不支持切片
# print(s[1:3]) # TypeError

# 如果你需要"第n个元素"——先转为列表
# 但注意：转为列表后的顺序不确定！
lst = list(s)
print(lst[0])  # 可以访问，但不知道是哪个元素

# 如果需要有序集合，用sorted
for i, item in enumerate(sorted(s)):
    print(f'{i}: {item}')

10.5 陷阱五：True和1、False和0的混淆

# ⚠️ 在集合中，True==1，False==0
s = {1, True, 0, False}
print(s)  # {0, 1}——只有两个元素

# 含义：True和1在集合中被视为同一个值
# 同理False和0也被视为同一个值
# 这是因为 bool 是 int 的子类

# 验证
print(issubclass(bool, int))  # True
print(True == 1)              # True
print(False == 0)             # True
print(hash(True))             # 1
print(hash(1))                # 1
print(hash(False))            # 0
print(hash(0))                # 0

十一、集合 vs 列表：什么时候用哪个

# 选型决策树
def choose_collection(requirements):
    """
    根据需求选择合适的集合类型
    """
    needs_order = requirements.get('order')      # 需要保持顺序
    needs_unique = requirements.get('unique')    # 需要唯一性
    needs_index = requirements.get('index')      # 需要按位置访问
    needs_search = requirements.get('search')    # 需要快速查找
    needs_modify = requirements.get('modify')    # 需要频繁修改
    
    if needs_unique and needs_search:
        if needs_order:
            return 'dict（Python 3.7+保序，键唯一，用作有序集合）'
        return 'set'  # 去重 + 快速查找
    elif needs_order and needs_index:
        return 'list'  # 保序 + 索引访问
    elif needs_unique and needs_order:
        return 'OrderedDict的键 或 dict.fromkeys()'
    else:
        return '根据具体情况选择'


# 示例
print(choose_collection({'unique': True, 'search': True}))
# set

print(choose_collection({'order': True, 'index': True}))
# list

print(choose_collection({'unique': True, 'order': True}))
# dict（Python 3.7+）

十二、本篇小结

集合是Python中专门用于去重和快速查找的数据结构：

核心特性：

元素唯一：自动去重（基于__hash__和__eq__）
无序：存储顺序由哈希值决定，不保证插入顺序
可变：可以add、remove（但元素本身必须是不可变的）
O(1)查找：成员检测（in）极快

创建方式：

{1, 2, 3}——花括号直接创建（空集合必须用set()）
set(iterable)——从任何可迭代对象创建
集合推导式：{x for x in iterable if condition}

常用操作：

增：add()（单个）、update()（批量）
删：remove()（不存在报错）、discard()（不存在不报错）、pop()（随机删一个）、clear()
查：in（O(1)成员检测）
判断：issubset()、issuperset()、isdisjoint()

掌握集合的创建和去重特性后，下一篇我们将深入学习集合最强大的能力——交并差对称差运算。这些集合运算在数据处理、权限管理、图算法等领域有着极其广泛的应用。

到此这篇关于Python基础指南之集合set的创建与自动去重详解的文章就介绍到这了,更多相关Python集合set创建与去重内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

您可能感兴趣的文章:

Windows下pycharm安装第三方库失败(通用解决方案)
这篇文章主要介绍了Windows下pycharm安装第三方库失败(通用解决方案)，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2020-09-09
Python还能这么玩之用Python修改了班花的开机密码
今天带大家学习如何用Python修改开机密码,文中有非常详细的代码示例,喜欢恶作剧的小伙伴可以看一下,不过不要乱用哦,需要的朋友可以参考下
2021-06-06
Python实现的双色球生成功能示例
这篇文章主要介绍了Python实现的双色球生成功能,涉及Python基于random模块生成随机数的相关操作技巧,需要的朋友可以参考下
2017-12-12
python 如何将浮点数尾部无效0去掉和无效的‘.’号
这篇文章主要介绍了python 如何将浮点数尾部无效0去掉和无效的‘.’号，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2021-03-03
基于PyQT5制作一个课堂点名系统
这篇文章主要为大家介绍一个基于PyQt5实现的抖音同款课堂点名系统，文中的示例代码讲解详细，感兴趣的小伙伴可以跟随小编一起动手试一试
2022-02-02
Vscode 中 python模块的导入问题
文章介绍了在VSCode中使用Python开发时,遇到模块导入错误的问题及原因,并提供了通过配置PYTHONPATH解决的方法,通过修改用户配置或项目配置文件,将项目根目录添加到PYTHONPATH中,可以使Python解释器正确找到项目中的模块,感兴趣的朋友跟随小编一起看看吧
2026-04-04
python读写ini文件示例(python读写文件)
项目用到数据库，多个地方使用，不能硬编码。ython支持ini文件的读取，就在项目中使用了ini文件，下面是示例
2014-03-03
Python 多线程之threading 模块的使用
这篇文章主要介绍了Python 多线程之threading 模块的使用，帮助大家更好的理解和学习使用python，感兴趣的朋友可以了解下
2021-04-04
python项目以docker形式打包部署详细流程
Docker是一个开源项目,为开发人员和系统管理员提供了一个开放平台,可以将应用程序构建、打包为一个轻量级容器,并在任何地方运行,这篇文章主要给大家介绍了关于python项目以docker形式打包部署的详细流程,需要的朋友可以参考下
2024-08-08
Python析构函数__del__定义原理解析
这篇文章主要介绍了Python析构函数__del__定义原理解析,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
2020-11-11