SQLite – File system interworking

There is one built-in command, dbFileList , that creates a detailed list of the folders/files under a given user defined starting folder, and places this list in a database table. I find it a powerful capability when doing mass actions on the folder/file system.

The structure of the created table is:

  • path    – path name of the current folder/file, without the name of the current folder/file, and terminated by a “\” character
  • name – name of the current folder/file
  • ext      – for files, this is the file extension  in lowercase!
  • attribute – a decimal integer representation of the file’s attributes
  • type         –  a string , = “folder”  or  “file”   derived from the   attribute column!
  • size        – size of the file in bytes
  • creation – creation date in the format “yyyy-mm-dd hh:mm”
  • access   – last access date in the format “yyyy-mm-dd hh:mm”
  • write     – last write date in the format “yyyy-mm-dd hh:mm”
  • level      – integer showing the level of folder/subfolder nesting
  • folderkey – a integer which has a different value for each folder/subfolder
  • hash    – the result of the hash digest/signature operation

dbFileList

A built-in command for writing the results of a SQLite query to an Excel sheet.

Parameter 1: OPTIONAL <string expression defining a valid SQL schema>

Specifies the database schema in to which the resulting table is to be created. Default = “main”

Parameter 2: OPTIONAL <string expression specifing the name of the table to be created >

Specifies the database table in to which the resulting folder/file information is to be placed. Default = “myTable”.

If the specified table already exists it will be overwritten!

Parameter 3: <string expression: base/starting folder for the action>

There are 3 cases to be considered for this parameter:

  • If it specifies a FILE then the information written to the table will only be for that specific file. i.e. the created table will have only 1 row!
  • If it specifies a FOLDER, but is not terminated by the  \ character then the information written to the table will only be for that specific folder  . i.e. the created table will have only 1 row!
  • If it specifies a FOLDER, and is terminated by the  \ character then the information written will be based on the folders/files under the specified FODLER. i.e. the create table might contain many rows.
Parameter 4: OPTIONAL <integer expression specifing the number of folder levels to be included in this action> : default =   -1  (= all)

This parameter is only relevant if parameter 3 specifies a FOLDER that is terminated by the \ character.

This parameter controls how many levels of nested folders is to be included in this action.

Parameter 5: OPTIONAL <string expression specifing a filter to be applied> : default = “*”

Technically this command uses the Microsoft Windows FindFirstFile (et al) API which supports a wildcard filtering capability, using * for a unknown character string, and ? for a unknown single character.

So, for example, to limit the action to only files with the extension jpg you could set the filter to be “*.jpg”.

Parameter 6: OPTIONAL <string expression specifing a hash algorithum> : default = “”

I think that this is a bit unusual capability but I have used it a nu,mber of times.

When the dbFileList command is running it can evaluate hash signature/digest the current file being examined.

The supported hash algorithms are:

  • SHA512
  • SHA384
  • SHA256
  • SHA1
  • MD5
  • MD4
  • MD1

Be aware that generating a hash signature/digest is a relatively slow process. Maybe less that 10 files per second depending on file size and the hash algorithm selected.


So a mini-script to find the duplicate files under a given folder could be:

if ( ! dbOpen ) then
   dbOpen ( "memory")
End IF

dim myPath = "c:\users\user\pictures\\"
dbFileList ( "temp", "files" , myPath , -1 , , "MD5")

drop table if exists temp.dupHash
create table temp.dupHash as select hash from temp.files where type ="file" group by hash having count (*) > 1

drop table if exists temp.duplicates
create table temp.duplicates as select * from temp.files where hash in ( select hash from temp.dupHash) order by hash


You will find a mini-script to delete all empty folders under a given starting point ns_delete_empty_folders here.