Wednesday, 31 August 2016

BASH Script: Compare File Sizes Within Given Date Range

BASH Script
After the Previous Post, let's discuss another important task. Compare Files / Folders sizes from one location to another. There are many ways to do this task in LINUX. Also one can find multiple commands as well. We will discuss one specific scenario as follows.

Let's assume that you want to compare Folders / Files from source to destination but only if destination location has the specified folders / files. Also this operation will be spanning for a range of dates or within start date and end date. Also one additional thing to remember is the directory structure. How you want to preserve your directory structure in destination location that also matters a lot. In our case, let's say that we have directories with dates (directory name is in the format of date. e.g. 20160101) in YYYYMMDD format. Do remember though that one can have any format for the directory as per their own requirements but I am going to discuss for above mentioned scenario only. If you want then you can change the code keeping in mind that your changes reflect the task you want to perform.

Summary of scenario,
1. Compare files size only if Destination location has the specified files. Here we will use "ls" command and "AWK" command.
2. Directory structure in the Destination location must match to Source location.
3. The Date range should be in YYYYMMDD Format.
4. Script should accept Date range as command line arguments.

Following is the sample code of the implementation. You can also get it from my GitHub repository.

# This code compare file sizes within the given date range. 
# This code creates 3 output txt files as follows,
# 1. Matched Files
# 2. Mismatched Files,
# 3. MissingFiles.
# Please check src & dest path before you run the code.

StartDate=`date +"%Y%m%d" -d $1`  #"20160121
EndDate=`date +"%Y%m%d" -d $2`    #"20160123"

# Please change src and dest location as per your need.

if [ $# -ne 2 ]
    echo "Usage:`basename $0` Start_Date End_Date"
    echo "e.g. :bash CompareFiles.bash 20160621 20160625"
    exit $E_BADARGS

if [[ ! -d $src || ! -d $dest ]]
    echo "Given source or destination path doesn't exist."
    exit $E_NOFILE

echo "Now: "$StartDate
echo "End: "$EndDate
echo "src: "$src
echo "dest: "$dest

# Delete Files if already exists.
 for i in OutM*; do if [[  -f "$i" ]]; then rm -f $i; fi; done

function CompareFiles {


for srcfile in $srcpath


    echo "Srcfilepath: "$srcfile
    echo "Destfilepath: "$destfile

     if [ -f $destfile ]

    filesize=`ls -l  $srcfile | awk '{print $5}'`
    destfilesize=`ls -l  $destfile | awk '{print $5}'`

    echo "SrcFilesize: "$filesize
    echo "DestFilesize: "$destfilesize
        if [ "$filesize" == "$destfilesize" ]
        # File names which matches in size will be written into following file.
            echo $destfile >>OutMatchFiles.txt
        # File names which do not matches in size will be written into following file.
            echo $destfile >>OutMismatchFiles.txt
     # File names which do not exists in destination path will be written into following file.
        echo $destfile >>OutMissingFiles.txt
done #For Complete                                                                                                                                                                                                                      

while [ "$StartDate" -le "$EndDate" ] ;
    echo "Date being Processed: "$StartDate


    StartDate=`date +"%Y%m%d" -d "$StartDate + 1 day"`;

echo "All Done"

A sample command to run the above program would be like as follows.

$ bash CompareFiles.bash 20160621 20160625

Let's discuss about the above mentioned code. The BASH Script takes two command line arguments. Start Date as First & End Date as Second. It checks that valid arguments are provided or not at the time of execution of script otherwise it exits without processing further with a "Usage" message. It also checks the existence of Source as well as Destination paths.

If both the IF conditions are satisfied then; It checks for destination file existence and compare file sizes between source and destination files. Also, please note here that we have used "ls" and "AWK" command which to get the file size. In LINUX, various ways are present to get the files size. We just used one of them.
Also, after execution of the code maximum 3 text files will be created in the working directory which will includes the list of Matched Files, Mis-matched Files and Missing Files.

Following is the sample output of the script for quick reference.

$ bash Compare-File.bash 20160101 20160103
Now: 20160101
End: 20160103
src: /home/yogesh/bash-tp
dest: /home/yogesh/bash-tp/desti
Date being Processed: 20160101
Srcfilepath: /home/yogesh/bash-tp/20160101/abc.txt
Destfilepath: /home/yogesh/bash-tp/desti/20160101/abc.txt
SrcFilesize: 3
DestFilesize: 3
Srcfilepath: /home/yogesh/bash-tp/20160101/func.bash
Destfilepath: /home/yogesh/bash-tp/desti/20160101/func.bash
SrcFilesize: 397
DestFilesize: 397
Date being Processed: 20160102
Srcfilepath: /home/yogesh/bash-tp/20160102/abcd.txt
Destfilepath: /home/yogesh/bash-tp/desti/20160102/abcd.txt
SrcFilesize: 3
DestFilesize: 3
Srcfilepath: /home/yogesh/bash-tp/20160102/func.bash
Destfilepath: /home/yogesh/bash-tp/desti/20160102/func.bash
SrcFilesize: 397
DestFilesize: 397
Date being Processed: 20160103
Srcfilepath: /home/yogesh/bash-tp/20160103/abcde.txt
Destfilepath: /home/yogesh/bash-tp/desti/20160103/abcde.txt
Srcfilepath: /home/yogesh/bash-tp/20160103/func.bash
Destfilepath: /home/yogesh/bash-tp/desti/20160103/func.bash
All Done

I hope you understood the discussion so far and liked the post.
I would like to Thank You for visiting the Website & going through the post. Stay tuned for more interesting stuff.

==>Posted By Yogesh B. Desai

Next Post: BASH Home Page

Previous Post: BASH Script: Copy Files From Source To Destination Within Given Date Range By RSYNC