SQLite – File system interworking

There is one built-in command, dbFileList , that creates a detailed list of the folders/files under a given user defined starting folder, and places this list in a database table. I find it a powerful capability when doing mass actions on the folder/file system.

The structure of the created table is:

path – path name of the current folder/file, without the name of the current folder/file, and terminated by a “\” character
name – name of the current folder/file
ext – for files, this is the file extension in lowercase!
attribute – a decimal integer representation of the file’s attributes
type – a string , = “folder” or “file” derived from the attribute column!
size – size of the file in bytes
creation – creation date in the format “yyyy-mm-dd hh:mm”
access – last access date in the format “yyyy-mm-dd hh:mm”
write – last write date in the format “yyyy-mm-dd hh:mm”
level – integer showing the level of folder/subfolder nesting
folderkey – a integer which has a different value for each folder/subfolder
hash – the result of the hash digest/signature operation

dbFileList

A built-in command for writing the results of a SQLite query to an Excel sheet.

Parameter 1: OPTIONAL <string expression defining a valid SQL schema>

Specifies the database schema in to which the resulting table is to be created. Default = “main”

Parameter 2: OPTIONAL <string expression specifing the name of the table to be created >

Specifies the database table in to which the resulting folder/file information is to be placed. Default = “myTable”.

If the specified table already exists it will be overwritten!

Parameter 3: <string expression: base/starting folder for the action>

There are 3 cases to be considered for this parameter:

If it specifies a FILE then the information written to the table will only be for that specific file. i.e. the created table will have only 1 row!
If it specifies a FOLDER, but is not terminated by the \ character then the information written to the table will only be for that specific folder . i.e. the created table will have only 1 row!
If it specifies a FOLDER, and is terminated by the \ character then the information written will be based on the folders/files under the specified FODLER. i.e. the create table might contain many rows.

Parameter 4: OPTIONAL <integer expression specifing the number of folder levels to be included in this action> : default = -1 (= all)

This parameter is only relevant if parameter 3 specifies a FOLDER that is terminated by the \ character.

This parameter controls how many levels of nested folders is to be included in this action.

Parameter 5: OPTIONAL <string expression specifing a filter to be applied> : default = “*”

Technically this command uses the Microsoft Windows FindFirstFile (et al) API which supports a wildcard filtering capability, using * for a unknown character string, and ? for a unknown single character.

So, for example, to limit the action to only files with the extension jpg you could set the filter to be “*.jpg”.

Parameter 6: OPTIONAL <string expression specifing a hash algorithum> : default = “”

I think that this is a bit unusual capability but I have used it a nu,mber of times.

When the dbFileList command is running it can evaluate hash signature/digest the current file being examined.

The supported hash algorithms are:

SHA512
SHA384
SHA256
SHA1
MD5
MD4
MD1

Be aware that generating a hash signature/digest is a relatively slow process. Maybe less that 10 files per second depending on file size and the hash algorithm selected.

So a mini-script to find the duplicate files under a given folder could be:

if ( ! dbOpen ) then
dbOpen ( "memory")
End IF

dim myPath = "c:\users\user\pictures\\"
dbFileList ( "temp", "files" , myPath , -1 , , "MD5")

drop table if exists temp.dupHash
create table temp.dupHash as select hash from temp.files where type ="file" group by hash having count (*) > 1

drop table if exists temp.duplicates
create table temp.duplicates as select * from temp.files where hash in ( select hash from temp.dupHash) order by hash

You will find a mini-script to delete all empty folders under a given starting point ns_delete_empty_folders here.