DBT a flawed method for evaluating Hi-Fi ?

**Richard Wall** · January 6th 05, 10:04 AM posted to uk.rec.audio

First let me point out that DBT in its various forms is a well proven
statistical evaluation technique for testing a wide range of diverse items.
For the vast majority it works extremely well however it is already
recognised that where the outcome of the test is not a clear yes/no answer
e.g. testing pharmaceuticals it is necessary to have both a large sample set
and three groups to measure against the so called placebo effect.
I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some point
and will be listening to changes not trying to enjoy music. Couple this to
the fact that what you hear is just an interpretation of the vibrations
received at the ear and the fact that sound will change as the listener/s
moves etc. It is no wonder that DBT has only given positive results for
major changes in sound.
Can anyone point to a Hi-Fi DBT test where no changes were actually made in
the source material to check that the audience could perceive no change ? I
think that the false positives raised by such a test would invalidate this
testing regime ? Seeing as though a proportion of contributors to this group
insist on such validation to prove differences it is very important that
they should first prove that their testing procedure is without flaws ?

With the poor conformation device of a person to identify changes in sound
the sort time interval switching proposed by some advocates must surely be
impossible to identify in a piece of music ? Music is far too complex to
expect anyone to correctly identify all but the most severe changes to be
correctly identified unless a long time period is set between changes ?
Should the test actually be made using white noise ? Using a long period
between changes is also flawed as audio "memory" of how a track sounded is
also unreliable.

Is there a solution ? For short term testing no. Some changes can be
detected electronically if the material remains in the digital domain but
once the sound has left the speakers there are too many variables. For our
Hi-Fi club we still use what for us is a less flawed method of a-b
comparison (preferably blind) on repetition of a range of tracks followed by
a longer term evaluation over the next few weeks.
I look forward to any proof that DBT for Hi-Fi has been validated.
Regards Richard Wall
New Ash Green Hi-Fi Club

**Don Pearce** · January 6th 05, 10:17 AM posted to uk.rec.audio

On Thu, 6 Jan 2005 11:04:55 -0000, "Richard Wall"
wrote:

Can anyone point to a Hi-Fi DBT test where no changes were actually made in
the source material to check that the audience could perceive no change ?

That would be EVERY DBT test. Each sample is randomly selected from
the two possibilities - that means that in each test the source has
either changed or not. So false positives and false negatives both
show up as failures, and this is as it should be.

I think the reason you have problems with DBTs is not that they fail,
but that they pretty much always succeed. They succeed in showing that
there is in fact no sonic difference between the components under
test. Again, these days that is as it should be - we have advanced far
enough along the HI Fi road that pretty much all components should
sound the same.

d

Pearce Consulting
http://www.pearce.uk.com

**John Phillips** · January 6th 05, 10:52 AM posted to uk.rec.audio

In article , Richard Wall wrote:
I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some point
and will be listening to changes not trying to enjoy music.

The testing and analysis method does indeed have to deal with this
tendency to anticipate change.

Can anyone point to a Hi-Fi DBT test where no changes were actually made in
the source material to check that the audience could perceive no change ? I
think that the false positives raised by such a test would invalidate this
testing regime ?

You may be interested to read some relevant comments at:

http://www.bostonaudiosociety.org/ba...l_thinking.htm

(Note: I have only scanned the article and not evaluated it - Caveat
Lector.)

--
John Phillips

**Jim Lesurf** · January 6th 05, 01:55 PM posted to uk.rec.audio

In article , Richard Wall
wrote:
First let me point out that DBT in its various forms is a well proven
statistical evaluation technique for testing a wide range of diverse
items. For the vast majority it works extremely well however it is
already recognised that where the outcome of the test is not a clear
yes/no answer e.g. testing pharmaceuticals it is necessary to have both
a large sample set and three groups to measure against the so called
placebo effect.

In general terms, I would agree. However there are a number of issues here
which I would like to distinguish.

One is that DBT is one method, and ABX is another, and if we wish, we can
use both at the same time. The purpose of ABX is to try and deal with some
of the problems you raise (and others you have not). Thus I would
prefer/hope that Iain will be using an ABX technique as well as DBT.

I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some
point and will be listening to changes not trying to enjoy music.

This depends what interpretation we wish to place on any results. And the
details of how the test was actually carried out. (Please see further
comments, below.)

Couple this to the fact that what you hear is just an interpretation of
the vibrations received at the ear and the fact that sound will change
as the listener/s moves etc.

I agree in general terms. However I would make the following points:

I'd agree that slight movements of the head, variations in the
circumstances of the listener, etc, will cause their perceptions to alter.
However if we are making a system change that produces an effect which is
so small as to become indistinguishable with any reliability due to such
things then I would be inclined to invoke 'Spock's Law'.

My point here is that if an effect is so small that it can't clearly and
reliably be heard above the effects of small head movements, etc, then it
may perhaps be regarded as being too small to be of any significance when
actually listening to music. Thus it may fall into the category of a
'difference' that is so small that makes no real 'difference', and hence in
practice it is indistinguishable from 'no audible difference'.

If a change of component produces an effect smaller than, and similar to, a
slight head movement, then this implies that no-one would ever need to
bother with buying and using such a difference component as they can
acheive an equivalent result by a slight head movement.

FWIW I agree with you in that it is my impression that some of the effects
people argue about seem to me to be small compared with small head
movements. This being the case, I don't personally worry about them very
much.

The second point I would make is w.r.t to the context of Iain's proposed
tests. As I understand it, he proposes to use a range of 'experienced'
listeners in a controlled (standard) listening environment. If I understand
him correctly they would also be using well-regarded professional
equipment, etc. This does lead me to feel that if they can't easily or
reliably 'hear a difference' then it implies that - under the conditions of
the test - the difference can be regarded as 'small' and may therefore fall
into the class I describe above.

My third point is that if the test is also listener-controlled ABX then the
listener can check for themselves by doing things like keeping their head
still whilst switching, or moving as they please, to test for confounding
or confusing effects.

It is no wonder that DBT has only given positive results for major
changes in sound.

Here you make a conclusion which is based upon you making a hypothesis and
assuming it is the correct model. This may not always be the case.

Also bear in mind that if the test uses an ABX protocol then this may help
deal with some of the statistical and uncontrolled variable effects.

Can anyone point to a Hi-Fi DBT test where no changes
were actually made in the source material to check that the audience
could perceive no change ? I think that the false positives raised by
such a test would invalidate this testing regime ?

Afraid I don't see why you would reach such an absolute and sweeping
conclusion. I would agree that a given test will have a level below which
some effect may become too small to be reliably detected. However this is
the reason why I have been suggesting various protocols/methods and
calibrations for Iain to consider as these will help limit the problems you
describe, and help define their level.

Seeing as though a proportion of contributors to this group insist on
such validation to prove differences it is very important that they
should first prove that their testing procedure is without flaws ?

I am unaware of any measurement technique that is "without flaws" if you
mean it must be *absolutely* reliable, tell us about *all* circumstances,
and give results of *absolute* accuracy. Real measurements will provide
finite accuracy, confidence, and reliability for a limited set of
circumstances / assumptions. Thus the question seems to me to be, "for what
purposes, and to what level of reliability / confidence is a given test /
measurement likely to be useful?"

It seems to me that Iain's proposed tests have the potential to be carried
out in better controlled and better defined conditions than is the case for
most domestic audio situations. Thus I would say that - if they use a
reasonable protocol, etc - that the result would stand a good chance of
showing effects if they are large enough to show up clearly and often in
domestic cases. Beyond, that, I don't think anyone could say more without
knowing a lot more about the details of what will actually be done.

I also feel that an ABX protocol is really needed, not just DBT, otherwise
the problems you raise will be more difficult to assess or deal with.

With the poor conformation device of a person to identify changes in
sound the sort time interval switching proposed by some advocates must
surely be impossible to identify in a piece of music ? Music is far
too complex to expect anyone to correctly identify all but the most
severe changes to be correctly identified unless a long time period is
set between changes ?

I am also less than happy with the 'time' method Iain has proposed, using
'how long after a change does the listener notice (if ever)". I would
therefore agree with you that this does not seem an optimum method if we
are interested in deciding what levels/types of changes might be noticable
or significant in a domestic context. The problem here is trying to
interpret the results of the test as telling us something about a different
set of circumstances.

However this depends on the actual 'questions' which we hope the evidence
from the test results will help us to answer.

Should the test actually be made using white noise ?

Personally I'd see that as a reasonable inclusion, along with music.
However simple gaussian noise may not evoke responses w.r.t any audible
mechanisms that sense coherent transients. And people will be more
accustomed to listening to music. Hence I would not regard white noise
alone as a reasonable substitute for music as it takes us further away from
what really interests me, which is listening to music in a domestic
situation.

Using a long period between changes is also flawed as audio "memory" of
how a track sounded is also unreliable.

I would agree. Hence my preference for allowing the listener to ABX switch
as and when they choose so they try for themselves to see what may make any
difference more noticable, and help aid them to identify 'X'.

Is there a solution ? For short term testing no.

I think this depends on the question you are trying to "solve". :-)

Hence I would not share you blanket conclusion, as the 'solution' will
depend on what we are trying to decide, and how we proceed.

Some changes can be detected electronically if the material remains in
the digital domain but once the sound has left the speakers there are
too many variables. For our Hi-Fi club we still use what for us is a
less flawed method of a-b comparison (preferably blind) on repetition of
a range of tracks followed by a longer term evaluation over the next few
weeks. I look forward to any proof that DBT for Hi-Fi has been
validated.

I'm afraid that "proof" isn't really something that experimental science
provides. Experiments provide *evidence* in terms of results which then
have to be assessed by understanding the experimental process actually
applied in the specific case. We can then decide what may have been
established as either reliable or unreliable. Science is not a matter of
"proof", but of testing to see if a hypothesis is supported or confounded
by suitable tests.

When you say you use "a-b comparison (preferrably blind)" do you mean ABX?
I'd be interested to know what protocol and method you use and feel is
better than what Iain is proposing.

Slainte,

Jim

--
Electronics http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html

**Richard Wall** · January 6th 05, 03:55 PM posted to uk.rec.audio

"Don Pearce" wrote in message
...
On Thu, 6 Jan 2005 11:04:55 -0000, "Richard Wall"
wrote:

Can anyone point to a Hi-Fi DBT test where no changes were actually made
in
the source material to check that the audience could perceive no change ?

That would be EVERY DBT test. Each sample is randomly selected from
the two possibilities - that means that in each test the source has
either changed or not. So false positives and false negatives both
show up as failures, and this is as it should be.

I think the reason you have problems with DBTs is not that they fail,
but that they pretty much always succeed. They succeed in showing that
there is in fact no sonic difference between the components under
test. Again, these days that is as it should be - we have advanced far
enough along the HI Fi road that pretty much all components should
sound the same.
Why do you think they always suceed ? I would have thought that most
manufacturers would work to make their systems sound different and the vast
array of suppliers would surely not exist if they all sounded the same ? It
should easily be possible to take three similar amplifiers, modify their
frequency response at above the limit of hearing a difference and then get a
conclusive proof of the effectiveness of the testing method. Has it been
done ?
It may be easy enough to make all kit sound the same, but as it is unlikely
that even if they did all make the prefect reproduction equipment, everyone
would prefer it. I think all the systems belonging to members in our club
sound quite distinct, it would however be impossible to test this opininion.
You have also missed my point that although the testing method is proven the
results on Hi-Fi are not. Hearing is not an absolute and will change over
the listening period. Trying to listen for differences is not the same as
listening to appreciate music. Unless you have data to prove otherwise the
sample sets used to evaluate Hi-Fi are small so surely with false positives,
false negatives and perception changes is it any wonder that the statistics
say that most things cannot be differentiated.

d

Pearce Consulting
http://www.pearce.uk.com

**Don Pearce** · January 6th 05, 04:13 PM posted to uk.rec.audio

On Thu, 6 Jan 2005 16:55:31 -0000, "Richard Wall"
wrote:

"Don Pearce" wrote in message
...
On Thu, 6 Jan 2005 11:04:55 -0000, "Richard Wall"
wrote:

Can anyone point to a Hi-Fi DBT test where no changes were actually made
in
the source material to check that the audience could perceive no change ?

That would be EVERY DBT test. Each sample is randomly selected from
the two possibilities - that means that in each test the source has
either changed or not. So false positives and false negatives both
show up as failures, and this is as it should be.

I think the reason you have problems with DBTs is not that they fail,
but that they pretty much always succeed. They succeed in showing that
there is in fact no sonic difference between the components under
test. Again, these days that is as it should be - we have advanced far
enough along the HI Fi road that pretty much all components should
sound the same.
Why do you think they always suceed ? I would have thought that most
manufacturers would work to make their systems sound different and the vast
array of suppliers would surely not exist if they all sounded the same ? It
should easily be possible to take three similar amplifiers, modify their
frequency response at above the limit of hearing a difference and then get a
conclusive proof of the effectiveness of the testing method. Has it been
done ?
It may be easy enough to make all kit sound the same, but as it is unlikely
that even if they did all make the prefect reproduction equipment, everyone
would prefer it. I think all the systems belonging to members in our club
sound quite distinct, it would however be impossible to test this opininion.
You have also missed my point that although the testing method is proven the
results on Hi-Fi are not. Hearing is not an absolute and will change over
the listening period. Trying to listen for differences is not the same as
listening to appreciate music. Unless you have data to prove otherwise the
sample sets used to evaluate Hi-Fi are small so surely with false positives,
false negatives and perception changes is it any wonder that the statistics
say that most things cannot be differentiated.

d

Hi Fi manufacturers are the same as any. They change things for two
purposes. First is to make them cheaper to improve the margins. Second
is to make them appear different so that not only new buyers but old
ones will purchase the new item.

There is of course absolutely no need to make them sound different -
actual improvement ceased long ago - while one-upmanship and other
psychological effects do such a fine job without effort on the part of
the manufacturer.

The one area where none of the above applies is the loudspeaker, which
is still frankly a disgrace. You don't need a DBT to reveal the
differences between speakers, and unlike the other stuff, the
differences persist under DBT conditions.

As for DBTs usually succeeding - it is simply true. It is very easy to
run a good DBT. But maybe my definition of success is not the same as
yours. Mine is that the DBT reveals the truth, whatever that is. Maybe
you consider a DBT to have failed if it does not show a difference?
That would be a reasonable conclusion to your assertion that DBTs
don't work for Hi Fi. If things cannot be differentiated in DBT, it is
because they actually aren't different.

d

Pearce Consulting
http://www.pearce.uk.com

**Richard Wall** · January 6th 05, 04:36 PM posted to uk.rec.audio

Hence I would not share you blanket conclusion, as the 'solution' will
depend on what we are trying to decide, and how we proceed.

Some changes can be detected electronically if the material remains in
the digital domain but once the sound has left the speakers there are
too many variables. For our Hi-Fi club we still use what for us is a
less flawed method of a-b comparison (preferably blind) on repetition of
a range of tracks followed by a longer term evaluation over the next few
weeks. I look forward to any proof that DBT for Hi-Fi has been
validated.

I'm afraid that "proof" isn't really something that experimental science
provides. Experiments provide *evidence* in terms of results which then
have to be assessed by understanding the experimental process actually
applied in the specific case. We can then decide what may have been
established as either reliable or unreliable. Science is not a matter of
"proof", but of testing to see if a hypothesis is supported or confounded
by suitable tests.
Sorry thought that the above was sufficient to be defined as proof.

When you say you use "a-b comparison (preferrably blind)" do you mean ABX?
I'd be interested to know what protocol and method you use and feel is
better than what Iain is proposing.
I cannot offer a better test protocol for Ian as I fear that they will all
be affected by the listener. I remember attending one of the London shows
where a supplier had an amplifier with standard capacitors and "special"
capacitors (Black Gates ??). The amp was connected to a pair of headphones
and you had a switch to change from A-B and once you had convinced yourself
if there was a difference a flap that when lifted showed which was which. I
also remember thinking that the "special" capacitors sounded slightly
clearer but at the price premium I was not about to try replacing all the
ones in my amp.

Our evaluation procedure is very rudimentary we start with the system (say
A) as is and listen for about 40 minutes, then listen to three specific
tracks before changing to component B We then listen to the same three
tracks. If a difference is significant it can usually be heard by all
attendies within the first few bars, however the opinion as to if this
represents an improvement is not always unanimous and not always the same
for all three of the tracks. If the general perception has been of a
benefit we usually listen for the rest of the evening in the B configuration
before finally returning to A to repeat the three tracks. In some venues
the equipment is in another room or away from the listening area allowing
some changes to be made or not made out of view of the listeners. Whilst we
try to keep the volume setting the same this is not always possible and
alcohol is partaken of . I am sure most of the differences we hear are not
due to component changes.

My big problem with the advocates of ABX/DBT is the opinion based on their
claimed tests that most components sound the same and that the results
obtained by these tests prove this. My experience is to the contrary in
that upgrades I have made in CD player, Amplifier and others still to my
hearing sound like upgrades and when the old component is slotted back into
the system I can hear the difference. I am not happy with simple a/b either
as this can easily create false positives and have found that if we spend a
lot of time switching from a to B and C backwards and forwards at the end of
an evening I am tired, I have not enjoyed the music and I struggle to tell
the difference between anything.
If an A/B listening session has been percieved a sucess it is most likely
that either a club member brought it with them or they are about to buy it
so I have been able to subsequently borrow it to listen at lesuire at home.
I can then listen for longer with my choice of music. I find most changes
are just subjective, it sounds different but rarely consistantly better and
is rarely worth the investment. What has however shocked me recently is
mains cables which I have always felt should make no difference at all. I
now however have a load of Kimber cables !!!

Regards Richard
Slainte,

Jim

--
Electronics
http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html

**Richard Wall** · January 6th 05, 04:36 PM posted to uk.rec.audio

er thanks
I will try and read through it!
"John Phillips" wrote in message
...
In article , Richard Wall wrote:
I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some
point
and will be listening to changes not trying to enjoy music.

The testing and analysis method does indeed have to deal with this
tendency to anticipate change.

Can anyone point to a Hi-Fi DBT test where no changes were actually made
in
the source material to check that the audience could perceive no change ?
I
think that the false positives raised by such a test would invalidate
this
testing regime ?

You may be interested to read some relevant comments at:

http://www.bostonaudiosociety.org/ba...l_thinking.htm

(Note: I have only scanned the article and not evaluated it - Caveat
Lector.)

--
John Phillips

**John Phillips** · January 6th 05, 05:27 PM posted to uk.rec.audio

In article , Richard Wall wrote:
er thanks
I will try and read through it!
"John Phillips" wrote in message
...
In article , Richard Wall wrote:
I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some
point and will be listening to changes not trying to enjoy music.

The testing and analysis method does indeed have to deal with this
tendency to anticipate change.

Can anyone point to a Hi-Fi DBT test where no changes were actually made
in
the source material to check that the audience could perceive no change ?
I
think that the false positives raised by such a test would invalidate
this
testing regime ?

You may be interested to read some relevant comments at:

http://www.bostonaudiosociety.org/ba...l_thinking.htm

(Note: I have only scanned the article and not evaluated it - Caveat
Lector.)

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

--
John Phillips

**Rob** · January 6th 05, 07:27 PM posted to uk.rec.audio

John Phillips wrote:
In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

I've had a quick look at a journal database and I can't see much on DBT
as a valid scientific method (I don't know about the journals on the abx
page - they don't look to be the peer reviewed kind). Its theoretical
basis seems psychological. Is there any methodological and empirical
analysis of this process?

I've always thought it's not what people think, but why they think it,
is the most interesting thing. I'm not (quite!) enough of a pedant to
ponder on 'the same' results, but some qualitative analysis of
'different' would be interesting - does anyone know of DBTs that
includes this (not just audio)?

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

Just curious!

Rob