Pipeline¶
A description of the complete technical flow, from audio files to the web.
Overview¶
FLAC / DSF files
│
▼
drscan.sh
(dr14meter + mediainfo)
│
▼
tracks_dr.csv
│
▼
create_db.sh
(imports CSV → SQLite)
│
▼
music.db (single source of truth)
│
├──▶ generate_conclusions.py → JSON (tables, KPIs, insights)
│
└──▶ generate_charts.py → PNG (matplotlib charts)
│
▼
mkdocs build
│
▼
site/ (nginx → dr.priet.us)
Step 1 — Audio scan: drscan.sh¶
The script traverses the FLAC and DSF directories in parallel (3 threads).
For each file it extracts metadata using mediainfo and calculates DR with dr14meter.
To avoid reprocessing already analysed files, it leaves a marker file.flac.dr_done
alongside each track. Deleting those markers forces a full re-scan.
#!/usr/bin/env bash
set -u
BASE_FLAC="...Media/rips"
BASE_DSF="...Media/dsf"
OUT="...Media/tracks_dr.csv"
DRBIN="$HOME/.local/bin/dr14meter"
THREADS=3
Metadata extraction¶
The artist is resolved through a chain of fallbacks to handle inconsistent tagging:
MusicBrainz field → Album/Performer → Performer → folder name.
Lists of musicians (fields containing ;) are discarded, and noise such as (guitar) is stripped.
# 1. MusicBrainz / modern field
artist=$(mediainfo --Inform="General;%ARTISTS%" "$f")
# 2. classic fallback
[[ -z "$artist" ]] && artist=$(mediainfo --Inform="General;%Album/Performer%" "$f")
# 3. folder fallback
[[ -z "$artist" ]] && artist="${base%% - *}"
Album and title follow the same logic: embedded metadata first, filename or folder name as a last resort.
Technical file quality¶
In addition to musical metadata, the script extracts information about the technical nature of each file:
| Field | What it measures |
|---|---|
sample_rate_hz |
Sampling frequency (44100 Hz, 96000 Hz, 2822400 Hz for DSD64…) |
bit_depth |
Bit depth (16, 24, 32-bit PCM; 1-bit DSD) |
channels |
Number of channels (2 = stereo) |
duration_s |
Duration in seconds |
bitrate_kbps |
Bit rate in kbps |
dsd_rate |
Human-readable DSD label: DSD64, DSD128, DSD256, DSD512 |
sample_rate=$(mediainfo --Inform="Audio;%SamplingRate%" "$f")
bit_depth=$(mediainfo --Inform="Audio;%BitDepth%" "$f")
channels=$(mediainfo --Inform="Audio;%Channels%" "$f")
# DSD rate label derived from sampling frequency
case "$sample_rate" in
2822400) dsd_rate="DSD64" ;;
5644800) dsd_rate="DSD128" ;;
11289600) dsd_rate="DSD256" ;;
22579200) dsd_rate="DSD512" ;;
*) dsd_rate="NA" ;;
esac
DR calculation¶
result=$("$DRBIN" -f "$f" 2>&1 < /dev/null)
dr=$(echo "$result" | grep -Eo "DR[[:space:]]*[0-9]+" | head -n1 | grep -Eo "[0-9]+")
dr=${dr:-NA}
dr14meter analyses the file and returns the DR value according to the
crest-factor algorithm of the DR Database standard. The result is parsed
with grep to extract only the number.
Concurrent CSV writes¶
Since the script processes several files in parallel, it uses flock to
guarantee atomic writes to the CSV without lines interleaving:
(
flock 200
echo "$line" >> "$OUT"
) 200>"$LOCK"
Step 2 — SQLite import: create_db.sh¶
Imports the resulting CSV into a SQLite database with the tracks table
and creates pre-calculated views (albums, artists, genres) to speed up
common queries.
Step 3 — Web data generation¶
Two Python scripts read music.db and generate the static artefacts:
generate_conclusions.py — produces the JSON files that feed the conclusions
and insights pages: global summary, top albums, top artists, genres, decades,
exceptional albums and consistent artists.
generate_charts.py — generates the six matplotlib charts (histogram,
format, genre, decade, top artists, top albums) and writes graficas.md
with captions calculated from the actual data.
# Regenerate everything
python3 docs/generate_conclusions.py
python3 docs/generate_charts.py
Step 4 — Build and publish¶
~/.local/bin/mkdocs build
MkDocs compiles the Markdown files and static artefacts (PNGs, JSONs)
into the site/ directory, which nginx serves directly as dr.priet.us.
There is no separate deploy step: the build writes directly to the root that the web server uses.
# Full cycle
make data && make build