DBT a flawed method for evaluating Hi-Fi ?

**Arny Krueger** · January 7th 05, 02:26 AM posted to uk.rec.audio

"Richard Wall" wrote in message

First let me point out that DBT in its various forms is a well proven
statistical evaluation technique for testing a wide range of diverse
items. For the vast majority it works extremely well however it is
already recognised that where the outcome of the test is not a clear
yes/no answer e.g. testing pharmaceuticals it is necessary to have
both a large sample set and three groups to measure against the so
called placebo effect.

I believe that DBT is unsuitable for Hi-Fi
evaluation as the detection method (you the listener) are
anticipating a change to occur at some point and will be listening to
changes not trying to enjoy music.

This would be true whether the test was a DBT or not. IOW, if you do a
sighted tests (you the listener) are
anticipating a change to occur at some point and will be listening to
changes not trying to enjoy music.

If this is a serious problem, then virtually every listening test that
involved listening for a change would be invalid.

Couple this to the fact that what
you hear is just an interpretation of the vibrations received at the
ear and the fact that sound will change as the listener/s moves etc.

This would be true whether the test was a DBT or not. IOW, if you do a
sighted tests (sound will change as the listener/s moves etc.

If this is a serious problem, then virtually every listening test that
allowed the listener to move his ears would be invalid.

It is no wonder that DBT has only given positive results for major
changes in sound.

That's just not true. DBTs give positive results for very small changes in
sound - changes that are at the known thresholds of hearing as established
by other independent methods. The *only* differences that don't give
positive results in DBTs are differences that are not known to be audible.

Can anyone point to a Hi-Fi DBT test where no
changes were actually made in the source material to check that the
audience could perceive no change ?

Every test I've ever been involved with. The source material is played twice
or more often, either with or without the change being tested.

I think that the false positives
raised by such a test would invalidate this testing regime?

Most if not all DBT test methodoligies are highly resistent to false
positives.

Seeing as though a proportion of contributors to this group insist on
such
validation to prove differences it is very important that they should
first prove that their testing procedure is without flaws ?

There's no need to prove that a testing methodology has no flaws. All you
have to is show that it is a great improvement over the alternatives. As a
rule, nothing in the real world is perfectly free from flaws.

With the poor conformation device of a person to identify changes in
sound the sort time interval switching proposed by some advocates
must surely be impossible to identify in a piece of music ?

Please clarify. What are the exact nature of these switching time intervals,
and who proposed them?

Music is far too complex to expect anyone to correctly identify all but
the
most severe changes to be correctly identified unless a long time
period is set between changes?

You've got things exactly backwards. Quick switching and comparisons that
are close together in time are generally the most sensitive.

Should the test actually be made using white noise ?

White or pink noise can and has been used, but as a rule music is used.

Using a long period between changes is also
flawed as audio "memory" of how a track sounded is also unreliable.

Agreed. So, don't use long time periods between changes.

Is there a solution ? For short term testing no. Some changes can
be detected electronically if the material remains in the digital
domain but once the sound has left the speakers there are too many
variables.

This would be true whether the test was a DBT or not. IOW, if you do a
sighted tests the sound has left the speakers there are just as many
variables.

If this is a serious problem, then virtually every listening test that
allowed the sound leave the speakers would be invalid.

For our Hi-Fi club we still use what for us is a less
flawed method of a-b comparison (preferably blind) on repetition of a
range of tracks followed by a longer term evaluation over the next
few weeks.

There's nothing about a DBT that keeps the evaluation from being long term
This has been tried. It doesn't help.

I look forward to any proof that DBT for Hi-Fi has been validated.

Your thinking is highly flawed, as you have *proven* that all listening
tests are invalid at least 3 times.

Try getting some practice with real world DVTs - visit www.pcabx.com .

**Arny Krueger** · January 7th 05, 02:37 AM posted to uk.rec.audio

"Richard Wall" wrote in message

My big problem with the advocates of ABX/DBT is the opinion based on
their claimed tests that most components sound the same and that the
results obtained by these tests prove this.

So Richard what you are saying is that if a test does not give the results
you want, it is invalid.

My experience is to the
contrary in that upgrades I have made in CD player, Amplifier and
others still to my hearing sound like upgrades and when the old
component is slotted back into the system I can hear the difference.

So what happens when you do a DBT?

I am not happy with simple a/b either as this can easily create false
positives and have found that if we spend a lot of time switching
from a to B and C backwards and forwards at the end of an evening I
am tired, I have not enjoyed the music and I struggle to tell the
difference between anything.

Been there, done that. This is a major reason for the efforts I've put into
developing DBTs.

If an A/B listening session has been
perceived a success it is most likely that either a club member
brought it with them or they are about to buy it so I have been able
to subsequently borrow it to listen at leisure at home.

You seem to be confused here. It's not the session that members bring with
them but audio components, right?

I can then
listen for longer with my choice of music. I find most changes are
just subjective, it sounds different but rarely consistently better
and is rarely worth the investment. What has however shocked me
recently is mains cables which I have always felt should make no
difference at all. I now however have a load of Kimber cables !!!

So now you have an investment in the idea that DBT *have* to be wrong.
Thanks for sharing...

**Stewart Pinkerton** · January 7th 05, 05:48 AM posted to uk.rec.audio

On Thu, 06 Jan 2005 20:27:55 +0000, Rob
wrote:

John Phillips wrote:
In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

If it's a *good* valve amp, this is quite reasonable. These tests are
of course conducted below the clipping point of the least powerful
amp, so the sheer power difference doesn't count. No one ever
suggested that you can't hear an amp clipping! :-)

I've had a quick look at a journal database and I can't see much on DBT
as a valid scientific method (I don't know about the journals on the abx
page - they don't look to be the peer reviewed kind). Its theoretical
basis seems psychological. Is there any methodological and empirical
analysis of this process?

Lots, both in psy journals and medical journals.

I've always thought it's not what people think, but why they think it,
is the most interesting thing. I'm not (quite!) enough of a pedant to
ponder on 'the same' results, but some qualitative analysis of
'different' would be interesting - does anyone know of DBTs that
includes this (not just audio)?

Try any medical journal. DBTs have been standard in drug trials for
many decades.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

Yes, people know what is being compared, they just don't know what is
actually connected at any given time. In an ABX trial, you can always
select A or B as known devices, only the identity of 'X' (A or B) is
unknown.
--

Stewart Pinkerton | Music is Art - Audio is Engineering

**John Phillips** · January 7th 05, 06:31 AM posted to uk.rec.audio

In article , Rob wrote:
John Phillips wrote:
In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

On the power difference, with "reasonable" loudspeakers you may be
using no more than 5 Watts anyway, so unless things get loud that may
not matter. As for the technology difference, I suspect the lack of
perceived difference says they're both good amplifiers.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

I think you don't need to (probably can't) keep that information from
the participants.

Just curious!

Just curious too since I don't do comparitive tests myself except
sighted tests for once-in-a-blue-moon personal purchases.

I remain puzzled about the postulated audible superiority of teflon
dielectrics (and occasionally paper-oil - for the right type or oil
I assume).

1. The published curves for dielectric absorption etc. show what I
think of as trifling differences compared to polypropylene,
polyethylene and some other dielectrics, There are several
dielectrics which should be audibly good enough to be
indistinguishable.

2. DA and the other usual capacitor defects are primarily linear effects
regardless of what people imply when they talk about smearing of
pulses. They result in ripples in frequency response (which are
mathematically equivalent to the "pulse smearing"). These ripples
can be made so small by good design of the circuits surrounding the
capacitor so it should be a non-issue. Loudspekers have ripples
in frequency response that are orders of magnitude worse.

3. No-one that I can find right now has measured significant
non-linearity (except for some ceramics), i.e. above the level
produced by an amplifier as a whole. Perhaps someone can point me
in the right direction for material?

4. If there is an audible difference then I cannot imagine why (yet)
and it would be interesting to think of some good hypotheses for
the difference.

--
John Phillips

**Rob** · January 7th 05, 07:56 AM posted to uk.rec.audio

Stewart Pinkerton wrote:
On Thu, 06 Jan 2005 20:27:55 +0000, Rob
wrote:

John Phillips wrote:

In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

If it's a *good* valve amp, this is quite reasonable. These tests are
of course conducted below the clipping point of the least powerful
amp, so the sheer power difference doesn't count. No one ever
suggested that you can't hear an amp clipping! :-)

Just sounded wild to me - it's not an expensive valve amp, in fact Keith
G has some that I've heard. Another one to add to the 'good valve amp'
list then. In fact, although it doesn't give me unbridled pleasure to
say this, the Beard power amp I have sounds pretty similar to one SS
amp I have (a huge Hitachi power amp). But this has to be pinned down to
speakers - Dynaudio Contours. The bass is tangibly different across a
range of amplifiers.

I've had a quick look at a journal database and I can't see much on DBT
as a valid scientific method (I don't know about the journals on the abx
page - they don't look to be the peer reviewed kind). Its theoretical
basis seems psychological. Is there any methodological and empirical
analysis of this process?

Lots, both in psy journals and medical journals.

Yes, I've found a few that use the process but none that *critique the
process*. There is, from what I've seen, no qualitative analysis of
respondent, environment, time etc. I'll have a look when I get time.
Methodological analysis is a very particular kind. I'd have thought
there is a social aspect to this - hence my curiosity about this aspect.
I spoke to my 'empiricist' friend last night and we did have a fairly
fruitful discussion about this ...

I've always thought it's not what people think, but why they think it,
is the most interesting thing. I'm not (quite!) enough of a pedant to
ponder on 'the same' results, but some qualitative analysis of
'different' would be interesting - does anyone know of DBTs that
includes this (not just audio)?

Try any medical journal. DBTs have been standard in drug trials for
many decades.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

Yes, people know what is being compared, they just don't know what is
actually connected at any given time. In an ABX trial, you can always
select A or B as known devices, only the identity of 'X' (A or B) is
unknown.

.... an issue, if I may, is that a number of components have 'social
loading'. Anybody, no matter their qualifications, is instantly shot
down on this ng if they mention the value of certain components -
capacitors, valves, cables for example. To particpate in a test of these
variables instantly injects a certain amount of fear in some people and
they'd be less likely to identify change. I'd suggest.

Selection of 'X' also seems to be an important factor.

Rob

**Rob** · January 7th 05, 08:04 AM posted to uk.rec.audio

John Phillips wrote:
In article , Rob wrote:

John Phillips wrote:

In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

On the power difference, with "reasonable" loudspeakers you may be
using no more than 5 Watts anyway, so unless things get loud that may
not matter. As for the technology difference, I suspect the lack of
perceived difference says they're both good amplifiers.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

I think you don't need to (probably can't) keep that information from
the participants.

Just curious!

Just curious too since I don't do comparitive tests myself except
sighted tests for once-in-a-blue-moon personal purchases.

I remain puzzled about the postulated audible superiority of teflon
dielectrics (and occasionally paper-oil - for the right type or oil
I assume).

1. The published curves for dielectric absorption etc. show what I
think of as trifling differences compared to polypropylene,
polyethylene and some other dielectrics, There are several
dielectrics which should be audibly good enough to be
indistinguishable.

2. DA and the other usual capacitor defects are primarily linear effects
regardless of what people imply when they talk about smearing of
pulses. They result in ripples in frequency response (which are
mathematically equivalent to the "pulse smearing"). These ripples
can be made so small by good design of the circuits surrounding the
capacitor so it should be a non-issue. Loudspekers have ripples
in frequency response that are orders of magnitude worse.

3. No-one that I can find right now has measured significant
non-linearity (except for some ceramics), i.e. above the level
produced by an amplifier as a whole. Perhaps someone can point me
in the right direction for material?

4. If there is an audible difference then I cannot imagine why (yet)
and it would be interesting to think of some good hypotheses for
the difference.

I find it odd too. There seems to be such massive 'rumour' evidence, yet
very little ABX evidence and 'physical' evidence, to support this type
of change.

I think the traffic generated by those wishing to try an ABX on
capacitors explains in part why people don't bother. But I also have
some doubts about ABX as a method for those that use it rigorously.
Perhaps there's more to listening than hearing difference - I don't know!

Rob

**Don Pearce** · January 7th 05, 08:19 AM posted to uk.rec.audio

On 07 Jan 2005 07:31:51 GMT, John Phillips
wrote:

I remain puzzled about the postulated audible superiority of teflon
dielectrics (and occasionally paper-oil - for the right type or oil
I assume).

1. The published curves for dielectric absorption etc. show what I
think of as trifling differences compared to polypropylene,
polyethylene and some other dielectrics, There are several
dielectrics which should be audibly good enough to be
indistinguishable.

Do bear in mind that the tan d loss is in series with whatever circuit
impedances are present. A tan d of the odd ohm or so is massively
swamped by the many kilohms of the average circuit coupling situation.
So in most circumstances of capacitor use, tan d is simply not a
relevant issue.

2. DA and the other usual capacitor defects are primarily linear effects
regardless of what people imply when they talk about smearing of
pulses. They result in ripples in frequency response (which are
mathematically equivalent to the "pulse smearing"). These ripples
can be made so small by good design of the circuits surrounding the
capacitor so it should be a non-issue. Loudspekers have ripples
in frequency response that are orders of magnitude worse.

Pulse smearing or dispersion is nothing to do with the quality of a
capacitor, but its value - or more specifically the type of filter in
which it is used. A Bessel filter and a Chebyshev filter might both
use the identical capacitor. The Bessel will not smear, the Chebyshev
will. This effect is found particularly in loudspeaker crossovers,
where filters are typically operating in the centre of the audio band.

3. No-one that I can find right now has measured significant
non-linearity (except for some ceramics), i.e. above the level
produced by an amplifier as a whole. Perhaps someone can point me
in the right direction for material?

You need to design a circuit specifically to highlight those
non-linearities. They certainly don't appear in normal audio use of a
capacitor.

4. If there is an audible difference then I cannot imagine why (yet)
and it would be interesting to think of some good hypotheses for
the difference.

I have yet to see anything other than anecdotal evidence that there is
any audible difference. I think that hypotheses as to cause should
wait for a demonstration of an actual difference. What that bloke down
the pub said isn't necessarily gospel.

d
Pearce Consulting
http://www.pearce.uk.com

**Arny Krueger** · January 7th 05, 11:53 AM posted to uk.rec.audio

"Rob" wrote in message

John Phillips wrote:
In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

I ws there when it happened. It's a matter of picking the amps and the
speakers. The key to the test was picking amps and speakers that resulted in
relatively flat frequency response. This can take a little planning if one
of the amps is tubed.

I've had a quick look at a journal database and I can't see much on
DBT as a valid scientific method

Rob, your research is too superficial. The validity of DBT is not going to
be proven every new day, given that it is so widely generally accepted. You
won't find too many new papers about litmus paper, either. Do you even know
what litmus paper is?

(I don't know about the journals on
the abx page - they don't look to be the peer reviewed kind).

They are a mix, and its a testamony to your superficiality that you can't
figure out which is which.

Its
theoretical basis seems psychological. Is there any methodological
and empirical analysis of this process?

Of course Rob, and you have a lot of it before you. However, there is no
Cliff's Notes or Classics Comics version. I think that pretty much leaves
you out in the cold.

I've always thought it's not what people think, but why they think it,
is the most interesting thing.

I'm still looking for evidence that you know how to think, given the mess
you've dropped on this forum.

I'm not (quite!) enough of a pedant to
ponder on 'the same' results, but some qualitative analysis of
'different' would be interesting - does anyone know of DBTs that
includes this (not just audio)?

Check the medical literature.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

Yes, as a rule. Not telling them doesn't seem to help the sensitivity of the
experiment.

**Rob** · January 7th 05, 12:43 PM posted to uk.rec.audio

Arny Krueger wrote:
"Rob" wrote in message

John Phillips wrote:

In article , Richard Wall wrote:

snip

BTW a set of ABX tests and results (including positive and negative)
are to be found at http://www.pcavtech.com/abx/abx_data.htm (down the
page, including a capacitor test).

That's quite interesting - some results I'd expect (speakers, tape
decks, encode-decode, cables) but some are pretty wild - a 450w ss amp
and a 50w valve amp sound the same for example.

I ws there when it happened. It's a matter of picking the amps and the
speakers. The key to the test was picking amps and speakers that resulted in
relatively flat frequency response. This can take a little planning if one
of the amps is tubed.

It must have been a task, but it's good to know thought went in to
making the process run on an equal footing.

I've had a quick look at a journal database and I can't see much on
DBT as a valid scientific method

Rob, your research is too superficial.

I know, thanks.

The validity of DBT is not going to
be proven every new day, given that it is so widely generally accepted.

I've found it to be widely used, and I've downloaded 6 or so medical
papers. But none actually question DBT as a method. Clearly they're out
there, and I didn't mean to make this a large job for you, or anyone -
just wondering if anyone had the info to hand.

You
won't find too many new papers about litmus paper, either.

That's fine, thanks. I think the two procedures are rather different -
one involves people, the other involves identifying acids. I can see
where your mind is going though.

Do you even know
what litmus paper is?

Yes, but it's been a while since I used it!

(I don't know about the journals on
the abx page - they don't look to be the peer reviewed kind).

They are a mix, and its a testamony to your superficiality that you can't
figure out which is which.

I thought that as well! A bit of googling soon sorted out the peer
review question, thanks. I've since found quite a lot on this subject.

Its
theoretical basis seems psychological. Is there any methodological
and empirical analysis of this process?

Of course Rob, and you have a lot of it before you. However, there is no
Cliff's Notes or Classics Comics version. I think that pretty much leaves
you out in the cold.

You're absolutely right - I have little idea of the methodological basis
of DBT. It strikes me as a highly rational, positivist approach. But I
find the implementation nuanced - hence my confusion.

I've always thought it's not what people think, but why they think it,
is the most interesting thing.

I'm still looking for evidence that you know how to think,

:-)

given the mess
you've dropped on this forum.

I think we could both clean up a little!

I'm not (quite!) enough of a pedant to
ponder on 'the same' results, but some qualitative analysis of
'different' would be interesting - does anyone know of DBTs that
includes this (not just audio)?

Check the medical literature.

I'm not sure if you understand my question - it's not the application of
DBT but the reasoned adoption of it as a method in the light of
alternative approaches. But in essence you may be correct either way -
any scientific paper doesn't just use a method without qualification.
Well they do, but I don't think they should. And this brings us back to
the 'litmus paper' analogy that you chose. I find it hard to believe
that the field of DBT is that clear cut, although I defer to your
knowledge on the subject for the time being.

Linked to this is another thing I don't understand - do people know
what's going on in these experiments ('this is an abx test of cables'
for example).

Yes, as a rule. Not telling them doesn't seem to help the sensitivity of the
experiment.

I've commented on this in another part of this thread. Again, if you say
so Arnie - more of my mess?!

While I'm on - the ABX Double Blind Comparator Data doesn't seem to link
from your home page.

b/w

Rob

**Jim Lesurf** · January 7th 05, 04:01 PM posted to uk.rec.audio

In article , Richard Wall
wrote:

I would have thought that most manufacturers would work to make their
systems sound different

Some may wish a specific 'sound'. Others may not. Although it depends what
you mean. It also depends if you are talking about amps, speakers, or other
types of equipment.

and the vast array of suppliers would surely not exist if they all
sounded the same ?

Depends, There are also a variety of required conditions of use, and prices
people are willing to pay.

It should easily be possible to take three similar amplifiers, modify
their frequency response at above the limit of hearing a difference and
then get a conclusive proof of the effectiveness of the testing method.
Has it been done ?

I'm not clear from what you write above what hypothesis you are wishing to
test.

It may be easy enough to make all kit sound the same,

See above. Speakers and listening rooms can vary quite a lot. So can the
requirements of listeners. My experience is that compared with this the
differences between many amplifiers are quite small (or effectively
undetectable in use).

but as it is unlikely that even if they did all make the prefect
reproduction equipment, everyone would prefer it.

Only if they all have the same room acoustics, same requirements, etc.

I like quad ESLs. But that does not mean I expect them to suit everyone.
This isn't just a matter of what features of the resulting sound matter to
me. The room also matters, as does the taste in music.

I think all the systems belonging to members in our club sound quite
distinct, it would however be impossible to test this opininion. You
have also missed my point that although the testing method is proven the
results on Hi-Fi are not.

Afraid I am not clear what you mean by either of the above statements.

Hearing is not an absolute and will change over the listening
period.

Agreed.

Trying to listen for differences is not the same as listening
to appreciate music.

Agreed.

Unless you have data to prove otherwise the sample sets used to evaluate
Hi-Fi are small so surely with false positives, false negatives and
perception changes is it any wonder that the statistics say that most
things cannot be differentiated.

The method of a controlled ABX type of test is to allow the listener to
decide for themself when they think they can identify which choice is 'X'.
And to do this in a manner that aids us in drawing statistical results from
repeated sets of such tests. The idea is that if A and B can't really be
told apart under the conditions of the test, then the resulting 'choices'
will fail to correlate with the real identity to a level that shows
statistical significance.

However such tests only can provide results that are directly relevant for
the conditions under which the tests were performed. And our ability to use
the results for other purposes will depend on how reliably they fit a given
situation. The results can also only be regarded as useful if the tests are
performed often enough in well-enough controlled circumstances to give the
resulting data statistical significance to the required level.

The problem is this is quite a demanding and time consuming process. As you
say, it ends up being quite different to sitting down and enjoying some
music. However in my personal view, if differences are so slight as to
require extended tests like this to show up, then perhaps they don't really
matter much in practice. :-)

FWIW although I have often heard differences (or at least *thought* I
have!) between various items, I often then am not bothered if they are
small compared with choice of speakers, position of speakers, etc. Thus to
a large extent worrying about something like choice of caps tends to end up
seeming to me like a waste of effort once the effect is swamped by a slight
movement of the head. I'd rather spend the time enjoying music on my
imperfect audio systems. :-)

Slainte,

Jim

--
Electronics http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html