Skip to content

Pipeline

A description of the complete technical flow, from audio files to the web.


Overview

FLAC / DSF files
        │
        ▼
   drscan.sh
  (dr14meter + mediainfo)
        │
        ▼
   tracks_dr.csv
        │
        ▼
   create_db.sh
  (imports CSV → SQLite)
        │
        ▼
   music.db  (single source of truth)
        │
        ├──▶ generate_conclusions.py  →  JSON (tables, KPIs, insights)
        │
        └──▶ generate_charts.py       →  PNG (matplotlib charts)
                        │
                        ▼
                  mkdocs build
                        │
                        ▼
                  site/  (nginx → dr.priet.us)

Step 1 — Audio scan: drscan.sh

The script traverses the FLAC and DSF directories in parallel (3 threads). For each file it extracts metadata using mediainfo and calculates DR with dr14meter.

To avoid reprocessing already analysed files, it leaves a marker file.flac.dr_done alongside each track. Deleting those markers forces a full re-scan.

#!/usr/bin/env bash
set -u

BASE_FLAC="...Media/rips"
BASE_DSF="...Media/dsf"
OUT="...Media/tracks_dr.csv"
DRBIN="$HOME/.local/bin/dr14meter"
THREADS=3

Metadata extraction

The artist is resolved through a chain of fallbacks to handle inconsistent tagging: MusicBrainz field → Album/Performer → Performer → folder name. Lists of musicians (fields containing ;) are discarded, and noise such as (guitar) is stripped.

# 1. MusicBrainz / modern field
artist=$(mediainfo --Inform="General;%ARTISTS%" "$f")

# 2. classic fallback
[[ -z "$artist" ]] && artist=$(mediainfo --Inform="General;%Album/Performer%" "$f")

# 3. folder fallback
[[ -z "$artist" ]] && artist="${base%% - *}"

Album and title follow the same logic: embedded metadata first, filename or folder name as a last resort.

Technical file quality

In addition to musical metadata, the script extracts information about the technical nature of each file:

Field What it measures
sample_rate_hz Sampling frequency (44100 Hz, 96000 Hz, 2822400 Hz for DSD64…)
bit_depth Bit depth (16, 24, 32-bit PCM; 1-bit DSD)
channels Number of channels (2 = stereo)
duration_s Duration in seconds
bitrate_kbps Bit rate in kbps
dsd_rate Human-readable DSD label: DSD64, DSD128, DSD256, DSD512
sample_rate=$(mediainfo --Inform="Audio;%SamplingRate%" "$f")
bit_depth=$(mediainfo --Inform="Audio;%BitDepth%" "$f")
channels=$(mediainfo --Inform="Audio;%Channels%" "$f")

# DSD rate label derived from sampling frequency
case "$sample_rate" in
    2822400)  dsd_rate="DSD64"  ;;
    5644800)  dsd_rate="DSD128" ;;
    11289600) dsd_rate="DSD256" ;;
    22579200) dsd_rate="DSD512" ;;
    *)        dsd_rate="NA"     ;;
esac

DR calculation

result=$("$DRBIN" -f "$f" 2>&1 < /dev/null)
dr=$(echo "$result" | grep -Eo "DR[[:space:]]*[0-9]+" | head -n1 | grep -Eo "[0-9]+")
dr=${dr:-NA}

dr14meter analyses the file and returns the DR value according to the crest-factor algorithm of the DR Database standard. The result is parsed with grep to extract only the number.

Concurrent CSV writes

Since the script processes several files in parallel, it uses flock to guarantee atomic writes to the CSV without lines interleaving:

(
    flock 200
    echo "$line" >> "$OUT"
) 200>"$LOCK"

Step 2 — SQLite import: create_db.sh

Imports the resulting CSV into a SQLite database with the tracks table and creates pre-calculated views (albums, artists, genres) to speed up common queries.


Step 3 — Web data generation

Two Python scripts read music.db and generate the static artefacts:

generate_conclusions.py — produces the JSON files that feed the conclusions and insights pages: global summary, top albums, top artists, genres, decades, exceptional albums and consistent artists.

generate_charts.py — generates the six matplotlib charts (histogram, format, genre, decade, top artists, top albums) and writes graficas.md with captions calculated from the actual data.

# Regenerate everything
python3 docs/generate_conclusions.py
python3 docs/generate_charts.py

Step 4 — Build and publish

~/.local/bin/mkdocs build

MkDocs compiles the Markdown files and static artefacts (PNGs, JSONs) into the site/ directory, which nginx serves directly as dr.priet.us.

There is no separate deploy step: the build writes directly to the root that the web server uses.

# Full cycle
make data && make build