Python中将语音转换为文本的实现方法

更新时间：2024年01月26日 08:56:58 作者：无水先生

语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力，在本教程中，您将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本,文中有相关的代码供大家参考，需要的朋友可以参考下

一、说明

学习如何使用语音识别 Python 库执行语音识别，以在 Python 中将音频语音转换为文本。想要更快地编码吗？我们的Python 代码生成器让您只需点击几下即可创建 Python 脚本。现在就现在试试！

二、语言AI库

2.1 相当给力的转文字库

语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中，您将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本。

因此，我们不需要从头开始构建任何机器学习模型，这个库为我们提供了各种知名公共语音识别 API（例如 Google Cloud Speech API、IBM Speech To Text 等）的便捷包装。

请注意，如果您不想使用 API，而是直接对机器学习模型进行推理，那么一定要查看本教程，其中我将向您展示如何使用当前最先进的机器学习模型在Python中执行语音识别。

另外，如果您想要其他方法来执行 ASR，请查看此语音识别综合教程。

另请学习：如何在 Python 中翻译文本。

2.2 安装过程

好吧，让我们开始使用以下命令安装库pip：

pip3 install SpeechRecognition pydub

好的，打开一个新的 Python 文件并导入它：

import speech_recognition as sr

这个库的好处是它支持多种识别引擎：

CMU Sphinx（离线）
谷歌语音识别
谷歌云语音API
维特人工智能
微软必应语音识别
Houndify API
IBM 语音转文本
Snowboy 热词检测（离线）

我们将在这里使用 Google 语音识别，因为它很简单并且不需要任何 API 密钥。

2.3 转录音频文件

确保当前目录中有一个包含英语演讲的音频文件（如果您想跟我一起学习，请在此处获取音频文件）：

filename = "16-122828-0002.wav"

该文件是从LibriSpeech数据集中获取的，但您可以使用任何您想要的音频 WAV 文件，只需更改文件名，让我们初始化我们的语音识别器：

# initialize the recognizer
r = sr.Recognizer()

下面的代码负责加载音频文件，并使用 Google 语音识别将语音转换为文本：

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

这将需要几秒钟才能完成，因为它将文件上传到 Google 并获取输出，这是我的结果：

I believe you're just talking nonsense

上面的代码适用于小型或中型音频文件。在下一节中，我们将为大文件编写代码。

2.4 转录大型音频文件

如果您想对长音频文件执行语音识别，那么下面的函数可以很好地处理这个问题：

# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function to recognize speech in the audio file
# so that we don't repeat ourselves in in other functions
def transcribe_audio(path):
    # use the audio file as the audio source
    with sr.AudioFile(path) as source:
        audio_listened = r.record(source)
        # try converting it to text
        text = r.recognize_google(audio_listened)
    return text

# a function that splits the audio file into chunks on silence
# and applies speech recognition
def get_large_audio_transcription_on_silence(path):
    """Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks"""
    # open the audio file using pydub
    sound = AudioSegment.from_file(path)  
    # split audio sound where silence is 500 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        try:
            text = transcribe_audio(chunk_filename)
        except sr.UnknownValueError as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text
    # return the text for all chunks detected
    return whole_text
        ```




 &emsp;&emsp;  <font face="楷体"   size=4>


注意：您需要安装Pydub才能pip使上述代码正常工作。

上述函数使用模块split_on_silence()中的函数pydub.silence在静音时将音频数据分割成块。该min_silence_len参数是用于分割的最小静音长度（以毫秒为单位）。

silence_thresh是阈值，任何比这更安静的东西都将被视为静音，我将其设置为平均dBFS - 14，keep_silence参数是在检测到的每个块的开头和结尾处留下的静音量（以毫秒为单位）。

这些参数并不适合所有声音文件，请尝试根据您的大量音频需求尝试这些参数。

之后，我们迭代所有块并将每个语音音频转换为文本，然后将它们加在一起，这是一个运行示例：

path = "7601-291468-0006.wav"
print("\nFull text:", get_large_audio_transcription_on_silence(path))
注意：您可以在此处7601-291468-0006.wav获取文件。

输出：

```python
audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. 
audio-chunks\chunk2.wav : At a short distance from the city. 
audio-chunks\chunk3.wav : Just at what is now called dutch street. 
audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. 
audio-chunks\chunk5.wav : Patent smokejacks. 
audio-chunks\chunk6.wav : It required a horse to work some. 
audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. 
audio-chunks\chunk8.wav : Carts that went before the horses. 
audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. 
audio-chunks\chunk10.wav : So just understand can found it all beholders. 

Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

因此，该函数会自动为我们创建一个文件夹，并放置我们指定的原始音频文件块，然后对所有这些文件运行语音识别。

如果您想将音频文件分割成固定的间隔，我们可以使用以下函数：

# a function that splits the audio file into fixed interval chunks
# and applies speech recognition
def get_large_audio_transcription_fixed_interval(path, minutes=5):
    """Splitting the large audio file into fixed interval chunks
    and apply speech recognition on each of these chunks"""
    # open the audio file using pydub
    sound = AudioSegment.from_file(path)  
    # split the audio file into chunks
    chunk_length_ms = int(1000 * 60 * minutes) # convert to milliseconds
    chunks = [sound[i:i + chunk_length_ms] for i in range(0, len(sound), chunk_length_ms)]
    folder_name = "audio-fixed-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        try:
            text = transcribe_audio(chunk_filename)
        except sr.UnknownValueError as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text
    # return the text for all chunks detected
    return whole_text

上述函数将大音频文件分割成 5 分钟的块。您可以更改minutes参数以满足您的需要。由于我的音频文件不是那么大，我尝试将其分成 10 秒的块：

print("\nFull text:", get_large_audio_transcription_fixed_interval(path, minutes=1/6))

输出：

audio-fixed-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called.
audio-fixed-chunks\chunk2.wav : Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some.
audio-fixed-chunks\chunk3.wav : Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong
head.
audio-fixed-chunks\chunk4.wav : Contrivances that astonished and confound it all beholders.

Full text: His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called. Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some. Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong head. Contrivances that astonished and confound it all beholders.

2.5 从麦克风读取

这需要在您的计算机上安装PyAudio ，以下是根据您的操作系统安装的过程：

windows

你可以直接pip 安装它：

$ pip3 install pyaudio

Linux

您需要先安装依赖项：

$ sudo apt-get install python-pyaudio python3-pyaudio
$ pip3 install pyaudio

苹果系统

你需要先安装portaudio，然后你可以直接 pip 安装它：

$ brew install portaudio
$ pip3 install pyaudio

现在让我们使用麦克风来转换我们的语音：

import speech_recognition as sr

with sr.Microphone() as source:
    # read the audio data from the default microphone
    audio_data = r.record(source, duration=5)
    print("Recognizing...")
    # convert speech to text
    text = r.recognize_google(audio_data)
    print(text)

这将从您的麦克风中听到 5 秒钟，然后尝试将语音转换为文本！

它与前面的代码非常相似，但是我们在这里使用该Microphone()对象从默认麦克风读取音频，然后我们使用函数duration中的参数record()在5秒后停止读取，然后将音频数据上传到Google以获取输出文本。

您还可以使用函数offset中的参数在几秒record()后开始录制offset。

此外，您可以通过将language参数传递给recognize_google()函数来识别不同的语言。例如，如果您想识别西班牙语语音，您可以使用：

text = r.recognize_google(audio_data, language="es-ES")

在此 StackOverflow 答案中查看支持的语言。

三、结论

正如您所看到的，使用这个库将语音转换为文本非常容易和简单。这个库在野外被广泛使用。查看官方文档。

如果您也想在 Python 中将文本转换为语音，请查看本教程。

以上就是Python中将语音转换为文本的实现方法的详细内容，更多关于Python语音转文本的资料请关注脚本之家其它相关文章！

您可能感兴趣的文章:

pytorch之torch_scatter.scatter_max()用法
这篇文章主要介绍了pytorch之torch_scatter.scatter_max()用法,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教
2023-09-09
Python语言进阶知识点总结
在本文中我们给学习PYTHON的朋友们总结了关于进阶知识点的全部内容，希望我们整理的内容能够帮助到大家。
2019-05-05
Python+Appium实现自动抢微信红包
不知从何时开始微信红包横空出世，对于网速和手速慢的人只能在一旁观望，做为python的学习者就是要运用编程解决生活和工作上的事情。于是我用python解决我们的手速问题python实现自动抢微信红包，至于网速慢得那就只能自己花钱提升了。
2021-05-05
Python中http请求方法库汇总
最近在使用python做接口测试，发现python中http请求方法有许多种，今天抽点时间把相关内容整理，对python http请求相关知识感兴趣的朋友一起学习吧
2016-01-01
浅谈Django QuerySet对象(模型.objects)的常用方法
这篇文章主要介绍了浅谈Django QuerySet对象(模型.objects)的常用方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2020-03-03
python实现图片转字符画的完整代码
这篇文章主要给大家介绍了关于python实现图片转字符画的相关资料，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2021-02-02
python3使用SMTP发送HTML格式邮件
这篇文章主要为大家详细介绍了python3使用SMTP发送HTML格式的邮件，具有一定的参考价值，感兴趣的小伙伴们可以参考一下
2018-06-06
pandas获取groupby分组里最大值所在的行方法
下面小编就为大家分享一篇pandas获取groupby分组里最大值所在的行方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2018-04-04
使用Python实现优雅生成假数据
Faker是一个Python包,开源的GITHUB项目,主要用来创建伪数据,这篇文章主要为大家详细介绍了Python如何使用Faker生成假数据,感兴趣的小伙伴可以了解下
2023-12-12
python基础--除法(/,//,%)的应用说明
这篇文章主要介绍了python基础--除法(/,//,%)的应用说明，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2021-03-03