View Single Post
  #4 (permalink)  
Old January 6th 05, 01:55 PM posted to uk.rec.audio
Jim Lesurf
external usenet poster
 
Posts: 3,051
Default DBT a flawed method for evaluating Hi-Fi ?

In article , Richard Wall
wrote:
First let me point out that DBT in its various forms is a well proven
statistical evaluation technique for testing a wide range of diverse
items. For the vast majority it works extremely well however it is
already recognised that where the outcome of the test is not a clear
yes/no answer e.g. testing pharmaceuticals it is necessary to have both
a large sample set and three groups to measure against the so called
placebo effect.


In general terms, I would agree. However there are a number of issues here
which I would like to distinguish.

One is that DBT is one method, and ABX is another, and if we wish, we can
use both at the same time. The purpose of ABX is to try and deal with some
of the problems you raise (and others you have not). Thus I would
prefer/hope that Iain will be using an ABX technique as well as DBT.


I believe that DBT is unsuitable for Hi-Fi evaluation as the detection
method (you the listener) are anticipating a change to occur at some
point and will be listening to changes not trying to enjoy music.


This depends what interpretation we wish to place on any results. And the
details of how the test was actually carried out. (Please see further
comments, below.)


Couple this to the fact that what you hear is just an interpretation of
the vibrations received at the ear and the fact that sound will change
as the listener/s moves etc.


I agree in general terms. However I would make the following points:

I'd agree that slight movements of the head, variations in the
circumstances of the listener, etc, will cause their perceptions to alter.
However if we are making a system change that produces an effect which is
so small as to become indistinguishable with any reliability due to such
things then I would be inclined to invoke 'Spock's Law'.

My point here is that if an effect is so small that it can't clearly and
reliably be heard above the effects of small head movements, etc, then it
may perhaps be regarded as being too small to be of any significance when
actually listening to music. Thus it may fall into the category of a
'difference' that is so small that makes no real 'difference', and hence in
practice it is indistinguishable from 'no audible difference'.

If a change of component produces an effect smaller than, and similar to, a
slight head movement, then this implies that no-one would ever need to
bother with buying and using such a difference component as they can
acheive an equivalent result by a slight head movement.

FWIW I agree with you in that it is my impression that some of the effects
people argue about seem to me to be small compared with small head
movements. This being the case, I don't personally worry about them very
much.

The second point I would make is w.r.t to the context of Iain's proposed
tests. As I understand it, he proposes to use a range of 'experienced'
listeners in a controlled (standard) listening environment. If I understand
him correctly they would also be using well-regarded professional
equipment, etc. This does lead me to feel that if they can't easily or
reliably 'hear a difference' then it implies that - under the conditions of
the test - the difference can be regarded as 'small' and may therefore fall
into the class I describe above.

My third point is that if the test is also listener-controlled ABX then the
listener can check for themselves by doing things like keeping their head
still whilst switching, or moving as they please, to test for confounding
or confusing effects.


It is no wonder that DBT has only given positive results for major
changes in sound.


Here you make a conclusion which is based upon you making a hypothesis and
assuming it is the correct model. This may not always be the case.

Also bear in mind that if the test uses an ABX protocol then this may help
deal with some of the statistical and uncontrolled variable effects.

Can anyone point to a Hi-Fi DBT test where no changes
were actually made in the source material to check that the audience
could perceive no change ? I think that the false positives raised by
such a test would invalidate this testing regime ?


Afraid I don't see why you would reach such an absolute and sweeping
conclusion. I would agree that a given test will have a level below which
some effect may become too small to be reliably detected. However this is
the reason why I have been suggesting various protocols/methods and
calibrations for Iain to consider as these will help limit the problems you
describe, and help define their level.

Seeing as though a proportion of contributors to this group insist on
such validation to prove differences it is very important that they
should first prove that their testing procedure is without flaws ?


I am unaware of any measurement technique that is "without flaws" if you
mean it must be *absolutely* reliable, tell us about *all* circumstances,
and give results of *absolute* accuracy. Real measurements will provide
finite accuracy, confidence, and reliability for a limited set of
circumstances / assumptions. Thus the question seems to me to be, "for what
purposes, and to what level of reliability / confidence is a given test /
measurement likely to be useful?"

It seems to me that Iain's proposed tests have the potential to be carried
out in better controlled and better defined conditions than is the case for
most domestic audio situations. Thus I would say that - if they use a
reasonable protocol, etc - that the result would stand a good chance of
showing effects if they are large enough to show up clearly and often in
domestic cases. Beyond, that, I don't think anyone could say more without
knowing a lot more about the details of what will actually be done.

I also feel that an ABX protocol is really needed, not just DBT, otherwise
the problems you raise will be more difficult to assess or deal with.

With the poor conformation device of a person to identify changes in
sound the sort time interval switching proposed by some advocates must
surely be impossible to identify in a piece of music ? Music is far
too complex to expect anyone to correctly identify all but the most
severe changes to be correctly identified unless a long time period is
set between changes ?


I am also less than happy with the 'time' method Iain has proposed, using
'how long after a change does the listener notice (if ever)". I would
therefore agree with you that this does not seem an optimum method if we
are interested in deciding what levels/types of changes might be noticable
or significant in a domestic context. The problem here is trying to
interpret the results of the test as telling us something about a different
set of circumstances.

However this depends on the actual 'questions' which we hope the evidence
from the test results will help us to answer.

Should the test actually be made using white noise ?


Personally I'd see that as a reasonable inclusion, along with music.
However simple gaussian noise may not evoke responses w.r.t any audible
mechanisms that sense coherent transients. And people will be more
accustomed to listening to music. Hence I would not regard white noise
alone as a reasonable substitute for music as it takes us further away from
what really interests me, which is listening to music in a domestic
situation.

Using a long period between changes is also flawed as audio "memory" of
how a track sounded is also unreliable.


I would agree. Hence my preference for allowing the listener to ABX switch
as and when they choose so they try for themselves to see what may make any
difference more noticable, and help aid them to identify 'X'.

Is there a solution ? For short term testing no.


I think this depends on the question you are trying to "solve". :-)

Hence I would not share you blanket conclusion, as the 'solution' will
depend on what we are trying to decide, and how we proceed.

Some changes can be detected electronically if the material remains in
the digital domain but once the sound has left the speakers there are
too many variables. For our Hi-Fi club we still use what for us is a
less flawed method of a-b comparison (preferably blind) on repetition of
a range of tracks followed by a longer term evaluation over the next few
weeks. I look forward to any proof that DBT for Hi-Fi has been
validated.


I'm afraid that "proof" isn't really something that experimental science
provides. Experiments provide *evidence* in terms of results which then
have to be assessed by understanding the experimental process actually
applied in the specific case. We can then decide what may have been
established as either reliable or unreliable. Science is not a matter of
"proof", but of testing to see if a hypothesis is supported or confounded
by suitable tests.

When you say you use "a-b comparison (preferrably blind)" do you mean ABX?
I'd be interested to know what protocol and method you use and feel is
better than what Iain is proposing.

Slainte,

Jim

--
Electronics http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html