python中的txt文件转换为XML

更新时间：2022年12月14日 17:11:34 作者：LGDDDDDD

这篇文章主要介绍了python中的txt文件转换为XML问题，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教

txt文件转换为XML

很多目标检测的模型都是默认需要VOC的文件输入格式

手上数据label是txt文件。

为了避免不必要的bug，还是选择转换下格式

将数据按VOC形式放置

文件夹	内容
Annotations	存放生成的XML文件
JPEGImages	JPG图片
ImageSets	标明训练集测试集的txt文件
Labelss	txt格式的Label文件

# -*- coding: utf-8 -*-

from xml.dom.minidom import Document
import os
import os.path
from PIL import Image
import importlib
import sys
importlib.reload(sys)


xml_path = "Annotations\\"
img_path = "JPEGImages\\"
ann_path = "Labelss\\"

if not os.path.exists(xml_path):
    os.mkdir(xml_path)


def writeXml(tmp, imgname, w, h, objbud, wxml):
    doc = Document()
    # owner
    annotation = doc.createElement('annotation')
    doc.appendChild(annotation)
    # owner
    folder = doc.createElement('folder')
    annotation.appendChild(folder)
    folder_txt = doc.createTextNode("VOC2007")
    folder.appendChild(folder_txt)

    filename = doc.createElement('filename')
    annotation.appendChild(filename)
    filename_txt = doc.createTextNode(imgname)
    filename.appendChild(filename_txt)
    # ones#
    source = doc.createElement('source')
    annotation.appendChild(source)

    database = doc.createElement('database')
    source.appendChild(database)
    database_txt = doc.createTextNode("The VOC2007 Database")
    database.appendChild(database_txt)

    annotation_new = doc.createElement('annotation')
    source.appendChild(annotation_new)
    annotation_new_txt = doc.createTextNode("PASCAL VOC2007 ")
    annotation_new.appendChild(annotation_new_txt)

    image = doc.createElement('image')
    source.appendChild(image)
    image_txt = doc.createTextNode("flickr")
    image.appendChild(image_txt)
    # onee#
    # twos#
    size = doc.createElement('size')
    annotation.appendChild(size)

    width = doc.createElement('width')
    size.appendChild(width)
    width_txt = doc.createTextNode(str(w))
    width.appendChild(width_txt)

    height = doc.createElement('height')
    size.appendChild(height)
    height_txt = doc.createTextNode(str(h))
    height.appendChild(height_txt)

    depth = doc.createElement('depth')
    size.appendChild(depth)
    depth_txt = doc.createTextNode("3")
    depth.appendChild(depth_txt)
    # twoe#
    segmented = doc.createElement('segmented')
    annotation.appendChild(segmented)
    segmented_txt = doc.createTextNode("0")
    segmented.appendChild(segmented_txt)


    # threes#
    object_new = doc.createElement("object")
    annotation.appendChild(object_new)

    name = doc.createElement('name')
    object_new.appendChild(name)
    name_txt = doc.createTextNode('cancer')
    name.appendChild(name_txt)

    pose = doc.createElement('pose')
    object_new.appendChild(pose)
    pose_txt = doc.createTextNode("Unspecified")
    pose.appendChild(pose_txt)

    truncated = doc.createElement('truncated')
    object_new.appendChild(truncated)
    truncated_txt = doc.createTextNode("0")
    truncated.appendChild(truncated_txt)

    difficult = doc.createElement('difficult')
    object_new.appendChild(difficult)
    difficult_txt = doc.createTextNode("0")
    difficult.appendChild(difficult_txt)
    # threes-1#
    bndbox = doc.createElement('bndbox')
    object_new.appendChild(bndbox)

    xmin = doc.createElement('xmin')
    bndbox.appendChild(xmin)
    
    #objbud存放[类别，xmin,ymin,xmax,ymax]
    xmin_txt = doc.createTextNode(objbud[1])
    xmin.appendChild(xmin_txt)

    ymin = doc.createElement('ymin')
    bndbox.appendChild(ymin)
    ymin_txt = doc.createTextNode(objbud[2])
    ymin.appendChild(ymin_txt)

    xmax = doc.createElement('xmax')
    bndbox.appendChild(xmax)
    xmax_txt = doc.createTextNode(objbud[3])
    xmax.appendChild(xmax_txt)

    ymax = doc.createElement('ymax')
    bndbox.appendChild(ymax)
    ymax_txt = doc.createTextNode(objbud[4])
    ymax.appendChild(ymax_txt)
    # threee-1#
    # threee#

    tempfile = tmp + "test.xml"
    with open(tempfile, "wb") as f:
        f.write(doc.toprettyxml(indent="\t", newl="\n", encoding="utf-8"))

    rewrite = open(tempfile, "r")
    lines = rewrite.read().split('\n')
    newlines = lines[1:len(lines) - 1]

    fw = open(wxml, "w")
    for i in range(0, len(newlines)):
        fw.write(newlines[i] + '\n')

    fw.close()
    rewrite.close()
    os.remove(tempfile)
    return


for files in os.walk('E:\ssd_pytorch_cancer\data\cancer_or_not\Labels'):
    print(files)
    temp = "/temp/"
    if not os.path.exists(temp):
        os.mkdir(temp)
    for file in files[2]:
        print(file + "-->start!")
        img_name = os.path.splitext(file)[0] + '.jpg'
        fileimgpath = img_path + img_name
        im = Image.open(fileimgpath)
        width = int(im.size[0])
        height = int(im.size[1])

        filelabel = open(ann_path + file, "r")
        lines = filelabel.read().split(' ')
        obj = lines[:len(lines)]

        filename = xml_path + os.path.splitext(file)[0] + '.xml'
        writeXml(temp, img_name, width, height, obj, filename)
    os.rmdir(temp)

不过代码只使用于每个label文件只有一个标注框，可在生成bndbox节点处加入循环

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持脚本之家。

您可能感兴趣的文章:

Python 3 使用Pillow生成漂亮的分形树图片
这篇文章主要介绍了Python 3 使用Pillow生成漂亮的分形树图片,本文通过实例代码介绍的非常详细，具有一定的参考借鉴价值，需要的朋友可以参考下
2019-12-12
深度学习详解之初试机器学习
机器学习可应用在各个方面，本篇将在系统性进入机器学习方向前，初步认识机器学习，利用线性回归预测波士顿房价，让我们一起来看看吧
2021-04-04
Python的三种主要模块介绍
这篇文章介绍了Python的三类主要模块，文中通过示例代码介绍的非常详细。对大家的学习或工作具有一定的参考借鉴价值，需要的朋友可以参考下
2022-07-07
如何使用matplotlib让你的数据更加生动
数据可视化用于以更直接的表示方式显示数据,并且更易于理解,下面这篇文章主要给大家介绍了关于如何使用matplotlib让你的数据更加生动的相关资料,需要的朋友可以参考下
2021-11-11
Python装饰器语法糖
今天小编就为大家分享一篇关于Python装饰器语法糖，小编觉得内容挺不错的，现在分享给大家，具有很好的参考价值，需要的朋友一起跟随小编来看看吧
2019-01-01
Python Django框架url反向解析实现动态生成对应的url链接示例
这篇文章主要介绍了Python Django框架url反向解析实现动态生成对应的url链接,结合实例形式分析了Django框架URL反向解析具体原理与应用操作技巧,需要的朋友可以参考下
2019-10-10
python远程登录代码
因为 python 已内建了一个 pop3 的函式库，所以我们直接用它来完成邮件的下载和处理。事实上, 如果我们不用 poplib 的话，我们还是可以完成那个例子中的所有作业：就是通过模拟 telnet的协定。
2008-04-04
Python常用的模块和简单用法
这篇文章主要给大家介绍Python#常用的模块和简单用法，以random 随机模块展开话题，感兴趣的小伙伴可以参考一下
2021-10-10
五个Pandas 实战案例带你分析操作数据
pandas是基于NumPy的一种工具，该工具是为了解决数据分析任务而创建的。Pandas纳入了大量库和一些标准的数据模型，提供了高效操作大型数据集的工具。pandas提供大量快速便捷地处理数据的函数和方法。你很快就会发现，它是使Python强大而高效的数据分析环境的重要因素之一
2022-01-01
Python struct模块解析
我们知道python只定义了6种数据类型，字符串，整数，浮点数，列表，元组，字典。但是C语言中有些字节型的变量，在python中该如何实现呢？这点颇为重要，特别是要在网络上进行数据传输的话。
2014-06-06

python中的txt文件转换为XML

目录

txt文件转换为XML

将数据按VOC形式放置

总结

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具