Methodology¶
This section describes the process used to build the project database and generate the dynamic range analysis.
The main goal has been to keep the process simple, reproducible and robust enough for a heterogeneous music collection made up of different formats, origins and tagging styles.
1. Source files¶
The analysed collection consists mainly of audio files obtained from:
- personal CD rips
- SACD rips converted to DSF
- special editions such as XRCD, SHM-CD and similar
- files stored locally in a folder-structured library
The formats currently analysed are:
- FLAC
- DSF
2. Metadata extraction¶
For each track, metadata is extracted directly from the file, prioritising embedded tags whenever available.
The fields used include:
- artist
- album
- track title
- year
- genre
- format
Where metadata is incomplete or inconsistent, fallback rules are applied based on the filename or containing folder.
3. Dynamic range calculation¶
The DR (Dynamic Range) value is obtained track by track using an external analysis tool on each audio file.
The aim of this metric is to provide a quantitative approximation of the perceived dynamic range of a given mastering.
Important¶
The DR value:
- does not replace listening
- does not measure audio quality on its own
- should be interpreted as a complementary indicator
That said, it is useful for detecting general trends and comparing different editions.
4. Cleaning and normalisation¶
One of the most important steps in the project is metadata normalisation, since a real-world collection typically contains many irregular cases.
Corrections applied may include:
- removal of extra whitespace
- cleaning of problematic characters
- UTF-8 encoding correction
- resolution of duplicate or contaminated fields
- partial normalisation of artist / album / track
- handling of non-homogeneous date formats
The goal is not absolute perfection, but a database consistent enough for statistical analysis and musical exploration.
5. Storage¶
The processed data is stored in a SQLite database, which allows:
- fast queries
- aggregations by artist, album or genre
- statistics generation
- export to other formats if needed
The SQLite database acts as the single source of truth for the project.
6. Generating visualisations and conclusions¶
From the database, various artefacts are generated for the web:
- aggregated tables
- JSON files
- interactive visualisations
- analytical summaries
This provides a clear separation between:
- the extraction and calculation layer
- the presentation layer
7. Limitations¶
As with any project of this kind, some limitations are unavoidable:
- inconsistent metadata across editions
- differences between manual and automatic tagging
- non-homogeneous repertoire across genres
- possible presence of outliers or occasional errors
This analysis should therefore be understood as an exploratory support tool, not a definitive ranking.
General approach¶
The philosophy throughout the project has been:
prioritise a maintainable, understandable and practically useful solution over pursuing technical perfection that does not justify the effort.
That balance allows the database to remain alive, extensible and enjoyable.