Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Extracting the MD5 for all WavPack files and including the filepath (Read 14851 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Extracting the MD5 for all WavPack files and including the filepath

Running
Code: [Select]
wvunpack -f *.wv 
lists files with their attributes including MD5, but omits the full path to the file.

Is there a way to have wvunpack return the data points including the full file path rather than just the filename?

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #1
Presuming you're talking Windows, you could pass each file into wvunpack individually, and report both the file path and the wvunpack results.

Something like (untested):

Code: [Select]
for %G in (*.wv) do ( set /P _dummy=%~pG <nul ; wvunpack -f %G )

The "set /P..." is a means to echo a string without the CRLF at the end, so it provides a prefix to anything else getting echoed – in this case %~pG is the path to the entity in %G, so that when wvunpack reports the filename, it has been prefixed by the path.

If you're going to put this into a .BAT rather than just use on the command line, you will need to substitute %%G instead of %G.

This might need some tweaking according to exactly what wvunpack outputs and exactly what you want to see.  On the other hand, as you seem to be running wvunpack in a particular directory, I would have thought it was pretty easy to keep track of the path.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #2
Hmm. The problem is that WavPack doesn't get the full filepath in that case. I suppose there is a non-portable way to get that, but I never looked into it.

However, what you can do is specify the full filepath that you want included. For example this worked for me (on Linux):

Code: [Select]
wvunpack -f /home/david/Music/Miloš\ Karadaglić/Baroque/*.wv


Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #3
Hmm. The problem is that WavPack doesn't get the full filepath in that case. I suppose there is a non-portable way to get that, but I never looked into it.

However, what you can do is specify the full filepath that you want included. For example this worked for me (on Linux):

Code: [Select]
wvunpack -f /home/david/Music/Miloš\ Karadaglić/Baroque/*.wv



Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #4
Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.
Yes, a bash script is the way to go, along similar lines as what I suggested for Windows.  I would have offered a recursive "for" had you been more specific.

However, I don't think there's an easy way to recurse directories in a for loop (unlike Windows).  I think you'll have to use 'find' to locate the directories, feed that output into a 'for' loop on *.wv into 'wvunpack'.  I believe (eg) $1 as a parameter (representing a file found by 'for') is the whole file path, so feeding that into 'wvunpack' should (according to the above posts) produce your prefixed output.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #5
Presuming you're talking Windows

Thanks for the detailed response - I’ll need to do something similar in Linux.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #6
Check reply above.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #7
Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.
Depending on the size of the music collection you're talking about, something as boneheaded as this might work (and this was on a Raspberry Pi):
Code: [Select]
pi@reprise-center:/mnt/usb $ wvunpack -f */*/*.wv */*/*/*.wv | wc -l
9009

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #8
Thanks, I tried that in a current tree and it through errors.  Here's something I conjured:

Code: [Select]
#!/bin/bash
# bash script to find all .wv files in a given tree

# Define the search directory (default to current directory if not provided)
SEARCH_DIR="${1:-.}"

# Find all .wv files and process them
find "$SEARCH_DIR" -type f -name "*.wv" | while read -r file; do
    full_path="$(realpath "$file")"
    echo -n "$full_path | "
    wvunpack -f7 "$file"
done

The output is easily redirected to a CSV file and imported into a database table.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #9
I've enhanced it a little further to make it plug and play for my SQL script that finds duplicate albums:

Code: [Select]
#!/bin/bash
# bash script to find all .wv files in a given tree and return the full path including filename, the path excluding filename, the filename and the wavpack md5 of the audio stream

# Define the search directory (default to current directory if not provided)
SEARCH_DIR="${1:-.}"

# Find all .wv files and process them
find "$SEARCH_DIR" -type f -name "*.wv" | while read -r file; do
    full_path="$(realpath "$file")"
    dir_path="$(dirname "$full_path")"
    file_name="$(basename "$full_path")"
    echo -n "$full_path|$dir_path|$file_name|"
    wvunpack -f7 "$file"
done

If you redirect the output of that script to a csv file using >> to append the file for every subtree you add, you can find duplicate albums by running the following code in Sqlite (obv. you need to create a database first):
Code: [Select]
CREATE TABLE alib (
    __path     TEXT,
    __dirpath  TEXT,
    __filename TEXT,
    __md5sig   TEXT
);

Import the csv file into the table

Code: [Select]
DROP TABLE IF EXISTS __dirpath_content_concat__md5sig;

CREATE TABLE __dirpath_content_concat__md5sig (__dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpath_content_concat__md5sig (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  group_concat (__md5sig, " | ")
FROM
  (
    SELECT
      __dirpath,
      __md5sig
    FROM
      alib
    ORDER BY
      __dirpath,
      __md5sig
  )
GROUP BY
  __dirpath;

DROP TABLE IF EXISTS __dirpaths_with_same_content;

CREATE TABLE __dirpaths_with_same_content (killdir TEXT, __dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpaths_with_same_content (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  concat__md5sig
FROM
  __dirpath_content_concat__md5sig
WHERE
  concat__md5sig IN (
    SELECT
      concat__md5sig
    FROM
      __dirpath_content_concat__md5sig
    GROUP BY
      concat__md5sig
    HAVING
      count(*) > 1
  )
ORDER BY
  concat__md5sig,
  __dirpath;

DROP TABLE IF EXISTS __dirpaths_with_albums_to_kill;

CREATE TABLE __dirpaths_with_albums_to_kill (__dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpaths_with_albums_to_kill (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  concat__md5sig
FROM
  __dirpaths_with_same_content
WHERE
  rowid NOT IN (
    SELECT
      max(rowid)
    FROM
      __dirpaths_with_same_content
    GROUP BY
      concat__md5sig
  );

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #10
If you don't feel like scripting it - after all, you will have to process the file it generates too:
* Wine up foobar2000.
* Set up a ReFacets panel with a number-of-items column and a MD5 column. Will count the number of entries for each MD5 sum it encounters.
* Sort the number column. You'll get the duplicates/multiplicates first.

But: if you have CDs as images with cuesheets, then fb2k reads them as one entry per track. You should then probably first filter to get those with track number not greater than 1. (That catches the missing ones.) Other caveats apply, like HTOA tracks. And say your ripping/tagging application starts enumeration on track 2 for audio on "Playstation" type CDs (whey have the data session as track number 1.)
And WavPack calculating the MD5 with source file's endianness, so if you have the same audio from AIFF and WAVE, etc.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #11
Actually, if you use tracks: I have quite a few duplicates (that contain truly identical audio). CDs that have these silent tracks until a bonus track or two at the end.
Biggest number of hits have an MD5 of 1234DD57F3AF7775D57493B54D59BCEB . That is precisely 4 seconds - 176 400 samples - of silence. Hundreds of "tracks", actually. Also more than one CD has a silent track of 178 164 samples.

Think twice over the need to dedupe them - they don't take up much space, and deleting them will make it much harder to do AccurateRip verification and CUETools repairs.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #12
The scripting I provided here is not intended to identify identical tracks, it's focused on albums.  The SQL works for any format that has an embedded md5 of the audio stream as long as it's accessible and stored in the table schema as __md5sig.

I agree, finding identical tracks across a collection is not a good idea, unless you're looking for identical tracks in the same folder, in which case it's probably something you'd want to delete.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #13
Think twice over the need to dedupe them - they don't take up much space, and deleting them will make it much harder to do AccurateRip verification and CUETools repairs.
The Second Coming (Stone Roses) has 83 tracks containing 176,400 samples of silence.

I keep one and use a *.silent.flac suffix and create a CUE for AccurateRip:
Code: [Select]
...
FILE "01.13.silent.flac" WAVE
  TRACK 13 AUDIO
    INDEX 01 00:00:00
FILE "01.13.silent.flac" WAVE
  TRACK 14 AUDIO
    INDEX 01 00:00:00
FILE "01.13.silent.flac" WAVE
  TRACK 15 AUDIO
    INDEX 01 00:00:00
...

Then in foobar I exclude:
Code: [Select]
*.silent.flac

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #14
Think twice over the need to dedupe them - they don't take up much space, and deleting them will make it much harder to do AccurateRip verification and CUETools repairs.
The Second Coming (Stone Roses) has 83 tracks containing 176,400 samples of silence.

I keep one and use a *.silent.flac suffix and create a CUE for AccurateRip:

That looks like a "Hah, I can and therefore I do!" - and of course, "not that there's anything wrong with that!"  ;)
But, presuming 4 kiB file clusters ... worth it for 328 kiB?

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #15
That looks like a "Hah, I can and therefore I do!" - and of course, "not that there's anything wrong with that!"  ;)
But, presuming 4 kiB file clusters ... worth it for 328 kiB?
I used that particular example because you'd said:
Think twice over the need to dedupe them - they don't take up much space, and deleting them will make it much harder to do AccurateRip verification and CUETools repairs.
If I remember correctly the person responsible for sending me down this rabbit hole (manipulating CUE sheets to keep AccurateRip happy) in the first place was Robbie Williams!

My wife was/is a big Robbie fan, which is fine, but he has a habit of including multiple songs on the last track of his albums separated by huge amounts of silence. Using I've Been expecting You as an example, the last track consists of 3 songs split with a total of 19 minutes of silence. Wifey wasn't impressed with Robbie, once I'd convinced her I wasn't to blame.

I'd previously split tracks with multiple indexes (in order to more accurately define the track title), so knowing that it was easy to do I decided to do the same with Robbie's albums, bringing them back together for AccurateRip purposes with a single track in the CUE having multiple indexes. Quite straight forward really with sox or ffmpeg.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #16
Oh, yeah. Now you want to get rid of them from your library altogether. That is something else than deduplication, if you are tired of artists who instead of identical zero tracks put a few minutes of say dither or vinyl surface noise there. Get a separate fb2k renaming pattern to .silent.flac or .ignore.flac. (If you just want silence excluded, put an $if condition on bitrate <2 or something.)

Then when you exclude *.silent.flac they are gone from the library - but they will still be seen by e.g. CUETools (or https://www.dbpoweramp.com/Help/perfecttunes/accuraterip.htm ) which will check the entire album.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #17
Oh, yeah. Now you want to get rid of them from your library altogether.
Other than The Smiths album that has 80+ tracks of identical silence nothing is physically deleted; all I want to do is remove the silent tracks/part of tracks, from playback.

So, the Robbie Williams example becomes:
  • Split the track into parts

Code: [Select]
sox 01.12.flac 01.12.%2n.flac trim 00:00:00.00 00:05:09.00 : newfile : trim 00:00:00.00 00:06:17.00 : newfile : trim 00:00:00.00 00:03:33.00 : newfile : trim 00:00:00.00 00:12:33.00 : newfile : trim 00:00:00.00

  • Rename the output giving the silent parts the extension .silent.flac
Code: [Select]
01.12#01.flac
01.12#02.silent.flac
01.12#03.flac
01.12#04.silent.flac
01.12#05.flac


  • Build a CUE so they verify with CUETools
Code: [Select]
...
FILE "01.12#01.flac"
  TRACK 12 AUDIO
    INDEX 01 00:00:00
FILE "01.12#02.silent.flac"
    INDEX 02 00:00:00
FILE "01.12#03.flac"
    INDEX 03 00:00:00
...

  • Add the exclude pattern *.silent.flac to my server

I'm sure there's a component in foobar that skips silence but we don't use foobar for playback.