                Find Duplicates - discover duplicated files

Find Duplicates was written to allow you to control your disk space usage by
discovering files that are duplicated and, should you so wish, deleting one 
or more of these duplicates.  There are many ways in which duplicate files 
can be deposited on your hard disk, for example programs which don't check to 
see if you have a particular DLL installed and install their own private copy 
in any case, or other programs that install a DLL in your \Windows folder 
when it is already in \Windows\System32.  You can also use Find Duplicates to 
see if any files on a floppy are already present anywhere on your hard disk.  
For safety, Find Duplicates moves files to the recycle bin before you delete 
them.


REGISTRATION

While this software is offered free-of-charge, if you want to say "thank-you"
or you wish technical support you need to register the software.  You can do
this on my Web site:

  http://www.satsignal.net  => Software, Register, Disk Tools


BASIC USAGE

Extract FindDupl.exe from the archive to a folder of your choice and run it. 


HOW DOES Find Duplicates WORK?

Find Duplicates scans one or more disks on your system to find multiple 
files, in a two-phase process.  First it scans all the folders and sorts 
all the files it finds into size order (files HAVE to be the same size to 
be identical - yes?)  You can limit the scan to one folder tree, if you wish.  
It then compares files of the same size to see if the contents are actually 
identical, and lists identical files by size order.  You can then double-
click on any file to examine its properties, and optionally move it to the 
recycle bin.


HOW IS THE PROCESS SPEEDED UP?

This process can take some time, so Find Duplicates will first perform one of
two preliminary checks to see if the files might actually be identical 
without having to actually examine the whole file.  By default, it checks the 
modification date and time of the files, and only compares the files byte-by-
byte if the timestamps are the same.  But it is possible for two files to 
have the same contents without having the same timestamp, so you can enable 
an option whereby the first 512 bytes of each file are checksummed.  This 
improves the recognition of identical files, but it is slower, and since it 
involves a file access, the file's last access date will be altered.  

By default, the timestamp, not the checksum comparison is selected.  In 
either case, the filename is normally ignored, so simply renaming a file will 
not hide the fact that it is a duplicate.  The timestamp of zero size files 
is ignored.  If you wish, you can also require that duplicate files must have 
the same file name.  You may be rather surprised to discover what duplicates 
by content actually exist in some popular office suites!

You can turn off the timestamp checking in favour of the slower checksum 
method should you so wish.  For example, if many identically sized and 
timestamped files are found from the initial search, using just timestamps 
might miss some duplicate files since the duplicates may not be adjacent in 
the name ordered list produced by the folder scan.  The program was not 
designed for this sort of duplicate search, but will perform adequately with 
timestamp checking turned off.  You might also wish to disable timestamp 
checking if you suspected that different products had installed identical 
support DLLs.


USAGE DETAILS

Extract FindDupl.exe from the zip file to a convenient location, and run it!
Only the FindDupl.exe file is required from the archive.  You will be 
presented with a dialog box showing you disk drives, with your local hard 
disk drives selected.  You can optionally enter a file spec such as *.EXE and 
a folder specification such as \windows to limit the search.  Note that if 
you enter a folder specification, only that folder will be searched on each 
drive (e.g. c:\windows, d:\windows and so on).  Press the Start Search button 
to find duplicate files.


FINDING AND DELETING FILES

There is a status bar which will keep you informed on the progress of both 
the folder scan phase, and the file comparison phase.  Once the main list box 
has filled up with file names, you can double-click on a file name to get a 
pseudo Properties dialog box (actually written in Delphi, not derived from 
the system right-click -> Properties box).  You will see a delete button 
which allows you actually to delete the file.  Note that you can't right-
click on the file name, you must double-click instead.
If you prefer, you can press the delete key while a file name is selected, 
and you will be asked to confirm the deletion.  Like Windows, if you hold 
down the Shift key at the same time as pressing Delete, the file will be 
deleted rather than being sent to the recycle bin.  Please be sure you really 
want to delete the file before using the shift key option.


CAN I SEE IF ANY FILES ON A FLOPPY ARE ALREADY ON MY HARD DISK?

If a floppy disk (specifically drive A:) is included in the selected drives 
to scan, the program will normally assume that you wish to find files in 
common between the floppy and the other disk drives, so that during the 
folder scan phase Find Duplicates will only record files on the other drives 
that are the same size as files found on the floppy.  This makes the scanning 
faster and allows you to ask the question "Do I already have any files on my 
hard disk that are on this floppy?"  You can treat floppies just as ordinary 
disks by unchecking the "Treat floppy as master" check box.  You may notice a 
slightly different message in the status bar during the folder scan phase 
in this case.


HOW ARE SPECIAL FOLDERS HANDLED?

Windows has a special hidden folder called SYSBCKUP where backup copies of
critical system files are stored.  Find Duplicates will recognise a folder 
with \SYSBCKUP\ in the path name, and ignore any files in that folder.  To 
disable this safety feature, uncheck the "Skip SYSBCKUP folder" check box.  
The status bar will indicate that the folder is being skipped, but you'll 
have to be quick to see that message!  Other hidden folders are scanned 
normally, except that folders which have the file DESKTOP.INI, and are 
therefore special folders, are skipped (e.g. fonts, Internet Explorer history 
and channels).  You can force the scanning of these folders from the Advanced 
options, by unchecking "Skip Desktop.ini folders".


WHAT ABOUT DUPLICATED FILES IN THE ROOT FOLDER?

You may find files such as Command.com that appear in both the root folder 
(C:\) and in the Windows or DOS folder (C:\Windows\Command.com, 
C:\DOS\Command.com).  Unless you are completely sure you know what you are 
doing, don't delete files that are in the root folder or copies, or your 
system may not boot again!


HOW CAN I CONTROL WHICH FOLDERS ARE SCANNED OR SKIPPED?

Multiple folder trees to scan can be specified in the "In:" edit box, with
the folder names separated by commas.  Folder names containing spaces must be
entered with quotes - "\My Files".  A leading "\" will be supplied if you 
omit it.  If you enter multiple folders, and one folder contains another the
contained folder is ignored so that the contained folder is not scanned 
twice.  For example, if \Windows and \Windows\System had been entered, 
\Windows\System would be removed from the list of folders to scan as files 
there would be found when the \Windows tree was scanned.

Should you wish to skip one or more folders, you can enter a single folder or 
a list of folders to skip in the "Skip:" edit box.  Suppose you have files 
from a service pack that you know are duplicated in your Windows directory, 
and that these files are stored in a folder called SP3.  Entering SP3 as the 
folder to skip will cause all folders having \SP3\ as part of the path name 
to be ignored.  You can use comma separated text to enter more than on folder 
- SP2, SP3.  If your folder contains spaces, you'll need to enclose the name 
in quotes - "Program Files".  Simply entering "\" will be ignored.  The list 
of folders to skip will be saved in the registry between runs.


HOW ARE ZERO-LENGTH FILES HANDLED?

Find Duplicates will ignore files that have zero length, because the data in
such files does not occupy disk space, and they are often simply marker files
(e.g. hidden files to show that a folder was created by installing an
application and not a user).  If you prefer to find these files, uncheck the
"Skip zero-length files" check box.  Be aware that these files actually take 
up at least 32 bytes of directory space, but that since the folder must be at 
least a cluster size long (e.g. 4096, 8192 bytes) there will typically be 
very little overhead for a single zero-length file within a moderately full 
folder.


HOW CAN I FIND FILES WITH THE SANE NAME IRRESPECTIVE OF SIZE?

The program will allow you to find "duplicates" which are not the same size.  
This special mode will help those trying to sort out multiple incompatible 
DLLs (such as CTL3D.dll) which can get dumped on your system.


I'D LIKE TO CONTINUE AT A LATER TIME, ARE THE RESULTS SAVED SAVED?

Upon exiting, Find Duplicates will try to save the list of duplicates in a 
file named FindDupl.lis in the same folder as the FindDupl.exe program file.  
If this file is present on starting the program, Find Duplicates will ask if 
you would like to reload the list.  This allows you to split the task of 
deleting of duplicate files into short sessions without having to run the 
time consuming scan and compare phases every time.  Do not edit FindDupl.lis.  
Note that this file may take a few seconds to load, hence the hour-glass 
cursor.


HOW CAN I SKIP CERTAIN FILES EVERY TIME I RUN Find Duplicates?

Find Duplicates will save a list of duplicates it has discovered in 
NewDupl.lis in the same folder as the program.  You may copy this list to a 
file named SkipDupl.lis which may be used to skip duplicates that are already 
known.  The two separate files allow you to build up a composite list for 
your whole system from a number of separate runs, adding in parts of 
NewDupl.lis to SkipDupl.lis.  Note that you should only list the duplicate 
files, not the master files, in SkipDupl.lis, otherwise when new duplicate 
files appear on the disk, there will be nothing for them to match against, 
and they will not be detected.  Eventually, I may make this a drag-and-drop 
function within the program.


ARE THE DELETED DUPLICATES GONE FOR EVER?

For safety, Find Duplicates will not actually delete files, but instead will 
move them to the Recycle Bin.  This means that the disk space will not 
actually be returned until the Recycle Bin is emptied.  Right-click on the 
Recycle Bin to access the Empty Recycle Bin function.  However, if you used 
the Shift-Delete key combination to remove the file, it IS gone forever.


  +------------------------------ WARNING ---------------------------------+
  |                                                                        |
  |  You take sole responsibility if you choose to delete a file.  Find    |
  |  Duplicates makes no attempt to check if the file is in use or key to  |
  |  the functioning of your computer. Take backups before making changes. |
  |                                                                        |
  +------------------------------ WARNING ---------------------------------+


RECENT RELEASE INFORMATION

V5.1.0  2003 Mar 11  Remove requirement for runtime library


RECENT ACKNOWLEDGEMENTS

Henrik Gemal (gemal@dk.net) suggested many improvements including the View 
button, and Neil Carter has made useful suggestions.


CONTACTING THE AUTHOR

This program is freeware, and remains copyright of David J Taylor, Edinburgh, 
1997-2003.  This program is provided "as is", without any support.  Whilst I 
cannot answer queries relating to the use of this program, I'd welcome any 
comments or suggestions for improvements you may have, and I would like to 
thank those who have contributed such feedback which has helped mould the 
present version of the program.


Web site:    www.satsignal.net
E-mail:      davidtaylor@writeme.com
2003 Jan 14
