fdupe 查找重复文件的Perl脚本代码

更新时间：2013年03月23日 12:55:46 作者：

fdupe 是一个很小的 Perl 脚本，用来检索指定目录并找出其中重复的文件，该脚本是通过文件内容来识别是否重复文件，而非文件名。fdupe 无需其他 Perl 脚本支持，运行速度非常快

图：

#!/usr/bin/perl
#
# fdupe tool - finding duplicate files
#
# $Id: fdupe,v 1.7 2011/10/14 20:11:21 root Exp root $
#
# Source code Copyright (c) 1998,2011 Bernhard Schneider.
# May be used only for non-commercial purposes with
# appropriate acknowledgement of copyright.
#
# FILE :        fdupe
# DESCRIPTION : script finds duplicate Files.
# AUTHOR:       Bernhard Schneider <bernhard@neaptide.org>
# hints, crrections & ideas are welcome
#
# usage: fdupe.pl <path> <path> ...
#        find / -xdev | fdupe.pl
#
# how to select and remove duplicates:
#   redirect output to >file, edit the file and mark lines you
#   wish to move/delete with a preceding dash (-)
#   Use following script to delete marked files:
#   #!/usr/bin/perl -n
#   chomp; unlink if s/^-//;
#
# history:
# 12.05.99 - goto statment replaced with next
# 14.05.99 - minor changes
# 18.05.99 - removed confusing 'for $y'
#            included hash-search
# 20.05.99 - minor changes
# 02.03.00 - some functions rewritten, optimized for speed
# 10.01.01 - hint-fix by Ozzie |ozric at kyuzz.org|
# 05.03.02 - fixed hangups by reading block/char-Devices
# 08.09.11 - skips checking of hard links
# 14.10.11 - accept file names from stdin
#
#use strict; # uncomment for debugging

$|=1;
local (*F1,*F2); my %farray = (); my $statF1;

# ------------------------------
# traverse directories
sub scan ($) {
    my ($dir) = $_[0];
    opendir (DIR, $dir) or die "($dir) $!:$@";
    map {
          (-d) ? scan ($_) : push @{$farray{-s $_}},$_
             unless (-l or -S or -p or -c or -b);
    } map "$dir/$_", grep !/^\.\.?$/, readdir (DIR); closedir (DIR);
}

# ------------------------------
# get chunk of bytes from a file
sub getchunk ($$) {
my ($fsize,$pfname) = @_;
my $chunksize = 32;
my ($nread,$buff);

return undef unless open(F1,$$pfname);

$statF1 = [(stat F1)[3,1]];
binmode F1;
$nread = read (F1,$buff,$chunksize);
($nread == $chunksize || $nread == $fsize) ? "$buff" : undef;
}

# ------------------------------
# compare two files
sub mycmp ($) {
my ($fptr) = $_[0];
my ($buffa, $buffb);
my ($nread1,$nread2);
my $statF2;
my ($buffsize) = 16*1024;

return -1 unless (open(F2,"<$$fptr"));

$statF2 = [(stat F2)[3,1]];

return 0
if ($statF2->[0] > 1 && $statF1->[1] == $statF2->[1]);

binmode F2;
seek (F1,0,0);

do { $nread1 = read (F1,$buffa,$buffsize);
$nread2 = read (F2,$buffb,$buffsize);

    if (($nread1 != $nread2) || ($buffa cmp $buffb)) {
         return -1;
        }
} while ($nread1);

return 0;
}

# ------------------------------

print "collecting files and sizes ...\n";

if (-t STDIN) {
$ARGV[0] = '.' unless $ARGV[0]; # use wd if no arguments given
map scan $_, @ARGV;
} else {
while (<STDIN>) {
  s癧\r\n]$鞍g;
  push @{$farray{-s $_}},$_
   unless (-l or -S or -p or -c or -b);
}
}

print "now comparing ...\n";
for my $fsize (reverse sort {$a <=> $b} keys %farray) {

my ($i,$fptr,$fref,$pnum,%dupes,%index,$chunk);

# skip files with unique file size
next if $#{$farray{$fsize}} == 0;

$pnum = 0;
%dupes = %index = ();

nx:
for (my $nx=0;$nx<=$#{$farray{$fsize}};$nx++) # $nx now 1..count of files
{                                             # with the same size
$fptr = \$farray{$fsize}[$nx];          # ref to the first file
    $chunk = getchunk $fsize,$fptr;
    if ($pnum) {
   for $i (@{$index{$chunk}}) {
         $fref = ${$dupes{$i}}[0];
      unless (mycmp $fref) {
            # found duplicate, collecting
         push @{$dupes{$i}},$fptr;
   next nx;
      }
   }
    }

    # nothing found, collecting
    push @{$dupes{$pnum}},$fptr;
    push @{$index{$chunk}}, $pnum++;
}
# show found dupes for actual size
for $i (keys %dupes) {
    $#{$dupes{$i}} || next;
    print "\n size: $fsize\n\n";
    for (@{$dupes{$i}}) {
        print $$_,"\n";
    }
}
}

close F1;
close F2;

您可能感兴趣的文章:

基于charles抓取https请求使用过程解析
这篇文章主要介绍了基于charles抓取https请求使用过程解析,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
2020-11-11
PERL 正则表达式详细说明
PERL里正则表达式的简介，适合喜欢学习perl正则的朋友。
2009-03-03
冒充su ,perl写的su.pl盗取root密码
backtrack3里面/pentest/housekeeping里面有个超囧的偷root密码的东西，冒充su ,perl写的，管理员输入密码的时候还直接回显
2008-09-09
perl如何避免脚本在windows中闪一下就关闭
写好了perl程序，运行后，准备等待结果输出时，结果双击后，看到屏幕闪了一下，然后什么都没有了，根本没有机会然你看到输出的结果
2013-03-03
构造函数中Perl方法用法介绍
本文和大家重点讨论一下Perl方法的概念，Perl方法定义不提供任何特殊语法，但规定Perl方法的第一个参数为对象或其被引用的包。Perl有两种Perl方法：静态Perl方法和虚Perl方法
2013-03-03
perl后门,正向和反向!实例代码
写过很多关于Perl编程，今天继续分享一篇利用perl后门实现正向和反向连接的实例代码，需要的朋友可以参考下其中的内容详情
2008-05-05
perl中单行注释和多行注释使用介绍
在编程时把一段代码注释掉，即通过注释的方法，使其不能够运行，但是依然存在于代码中，等以后需要时再去掉注释
2013-03-03
perl 中文处理技巧
perl对中文的处理(encode,decode) 最近在处理中文时遇到乱码的问题，google了一下，发现下面这篇文章。茅塞顿开！
2008-10-10
用perl实现生物突变的随机模拟程序代码
perl写的生物突变的随机模拟程序，有需要的朋友可以参考下
2013-03-03
perl中srand()与time的函数使用方法介绍
这篇文章主要介绍了perl中srand与time函数的使用，需要的朋友可以参考下
2013-03-03

fdupe 查找重复文件的Perl脚本代码

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具