Audio, hi-fi and car audio - Older seperates vs new system

Audio Banter (https://www.audiobanter.co.uk/forum.php)

- uk.rec.audio (General Audio and Hi-Fi) (https://www.audiobanter.co.uk/uk-rec-audio-general-audio/)

- - Older seperates vs new system (https://www.audiobanter.co.uk/uk-rec-audio-general-audio/2232-older-seperates-vs-new-system.html)

Older seperates vs new system

On Sun, 19 Sep 2004 07:45:08 +0000 (UTC), "Alan Murphy"
wrote:

"Stewart Pinkerton" wrote in message
.. .
On Fri, 17 Sep 2004 07:53:59 +0000 (UTC), "Alan Murphy"
wrote:

I'll bet you an even £1000 that on my system, playing my
music, I can tell the difference between a Technics CD player
SL-PG490 alone, and the same player with a Meridian DAC
203 optically connected, in more than 67% instances."
~~~~~~~~~~~
That's simple enough isn't it? What is it that you don't
understand? What is confusing you?

What I don't understand is why you're so terrified of simply setting
the volumes to the same level. Scared to admit that you wasted money
on the DAC?

Your crude and patronising approach to subtle audio
differences, offering a paltry sum of money to take part
in an A/B test that masks all but gross differences, is doing
the industry a disservice and is as distasteful as snake oil,
IMHO.

My approach is certainly not crude, it's the same method used by the
Subjective Evaluation Group at Harman International (brand names
include Mark Levinson, Madrigal and JBL, among others). ABX testing is
used because it *reveals* the most subtle differences, which are often
concealed by other methods, sighted listening being demonstrably
useless for discriminating subtle differences. That's why Harman (and
B&W, KEF etc etc) use it every day in product development. Its other
great strength is that it demonstrates the nonexistence of *imagined*
differences, such as those between cables, so that the development
engineer can check if that 'audiophile approved' component really does
make an audible difference, or if he just wanted it to.

What is 'patronising' is your presumption (with no actual proof) that
you really can hear a difference in sound *quality* between your
player and your DAC. Since you think that £1,000 is a paltry sum (even
though *you* suggested it), let's make it an even £10,000 wager.

You've read the small print and are in complete agreement?

What 'small print'? Levels are equalised to +/- 0.1dB, the test
protocol is double-blind, and you need to score more than 20 correct
out of 30 trials. That's it.

Unless of course you're thinking of something fundamentally underhand,
like turning the volume *way* up and just listening to the noise
floors. That would of course be cynical and pointless, and no cigar.
Another clown tried that game once, with amplifiers. If you think you
can tell a difference in sound quality on music, then that's just
fine, you can choose any partnering gear and any music you like, as
you stated in your original challemge at the top of this post. All I
ask is an *honest* comparison, as always.

Let me
know when you've remortgaged the house and I can arrange escrow.

Cash available whenever you need the only glimpse of it you'll ever
get, and when you've ponied up your end. Care to suggest a neutral
test proctor?

--

Stewart Pinkerton | Music is Art - Audio is Engineering

Older seperates vs new system

On Sun, 19 Sep 2004 11:34:38 +0000 (UTC), Stewart Pinkerton
wrote:

On Sun, 19 Sep 2004 07:45:08 +0000 (UTC), "Alan Murphy"
wrote:

"Stewart Pinkerton" wrote in message
. ..
On Fri, 17 Sep 2004 07:53:59 +0000 (UTC), "Alan Murphy"
wrote:

I'll bet you an even £1000 that on my system, playing my
music, I can tell the difference between a Technics CD player
SL-PG490 alone, and the same player with a Meridian DAC
203 optically connected, in more than 67% instances."
~~~~~~~~~~~
That's simple enough isn't it? What is it that you don't
understand? What is confusing you?

What I don't understand is why you're so terrified of simply setting
the volumes to the same level. Scared to admit that you wasted money
on the DAC?

Your crude and patronising approach to subtle audio
differences, offering a paltry sum of money to take part
in an A/B test that masks all but gross differences, is doing
the industry a disservice and is as distasteful as snake oil,
IMHO.

My approach is certainly not crude, it's the same method used by the
Subjective Evaluation Group at Harman International (brand names
include Mark Levinson, Madrigal and JBL, among others). ABX testing is
used because it *reveals* the most subtle differences, which are often
concealed by other methods, sighted listening being demonstrably
useless for discriminating subtle differences. That's why Harman (and
B&W, KEF etc etc) use it every day in product development. Its other
great strength is that it demonstrates the nonexistence of *imagined*
differences, such as those between cables, so that the development
engineer can check if that 'audiophile approved' component really does
make an audible difference, or if he just wanted it to.

What is 'patronising' is your presumption (with no actual proof) that
you really can hear a difference in sound *quality* between your
player and your DAC. Since you think that £1,000 is a paltry sum (even
though *you* suggested it), let's make it an even £10,000 wager.

You've read the small print and are in complete agreement?

What 'small print'? Levels are equalised to +/- 0.1dB, the test
protocol is double-blind, and you need to score more than 20 correct
out of 30 trials. That's it.

For such a small statistical population, 20 out of 30 is
extraordinarily generous, especially considering that the subject
would be claiming something very close to 100%, particularly if they
get to choose the listening material. Although in strict Chi squared
terms it is a good score, Sod's law dictates that it can happen quite
easily by chance. Having achieved the 20, it should be necessary to do
it again to show it wasn't a fluke.

d
Pearce Consulting
http://www.pearce.uk.com

Older seperates vs new system

Don't they still sell the ATC kits??

Yes they do, Much better value than anything from the usual brand names etc
IMO. Actually very simple to build, for a paint finish, put them to a car
body spray shop to be filled sanded and painted. candy lacquers add amazing
depth to paint jobs. Veneering can be tricky to do yourself without a vacuum
press but a local cabinet maker would do it for you for a small fee.

Stew.

Older seperates vs new system

In article , Stewart Pinkerton
wrote:

... level-matched time-proximate ABX (and ABChr) testing
has proven over many decades to be the *most* sensitive test
for audible differences in sound quality.

But typical ABX tests are often not as sensitive as they are thought to be.
ABX is an elegant scheme for data collection, but data collection is only
part of an experiment. It is the entire experiment's sensitivity that
matters.
Recall that a test is *sensitive* for a difference if the test is likely
to detect that difference when the difference is present; a test is
*specific* (i.e., selective) if it is unlikely to report a false positive
result (when the difference is not present).

Here is an example:

If someone does not ever detect a difference, he will still get correct
answers on 50% of trials in the long run just by random guessing.
If someone always detects a difference, he will of course be able to score
100% correct. Often people take as a threshold the size of difference
where someone would get 75% correct answers in an infinite sequence or
repeated trials.

Consider a difference large enough that a certain listener would get the
correct answer for 90% of all trials (well above the 75% threshold).

If we did an ABX test with just one trial for that subject, the
sensitivity would be .90 but the chance of a false positive would be
..50---way too high.

So, we do a test with 16 trials with a passing score if the subject gets
at least 14 correct responses. Now the type 1 error risk is small ( .01)
but the sensitivity is only .79. In other words, we have made a more
specific test, but it is _less_ sensitive that a single-trial test!

A difference has to so large that subjects get correct answers on about
95% of individual trials before a 14-of-16 test is as sensitive as a
single-trial test.
(A 12-of-16 test is less specific, but far more sensitive.)

Older seperates vs new system

"Jim Lesurf" wrote in message
...
In article , Alan Murphy
wrote:
"Don Pearce" wrote in message
...
On Thu, 16 Sep 2004 08:48:58 +0000 (UTC), "Alan Murphy"
wrote:

Because the DAC is much quieter and I can identify it every time :-)

What do you mean by quieter - less background noise or less volume?

Both, but I was just trying to make a point really, Don, about the
difficulty of establishing proper procedures when testing sensory
descrimination. In the visual field, with which I am familiar, very
slight alterations in test procedure, such as seperating contiguous
samples by a few mm or so can decrease discrimination of colour
difference by an order of magnitude.

I can see the above as being a justification for equalising the output
levels in order to remove a controllable variable - assuming that
the concern is to see if differences of some other kind can be
perceived.

Assuming that your belief is that you can hear differences which
do *not* come from sound level inequalities alone. Please see below as
I'd like to clarify your point of view on this....

Presenting the samples, in series, in A/B fashion, further greatly
decreases discrimination depending on the time interval between
viewings.

The above seems to me to be a generalised assertion - but I am not sure
that it can be shown to always be reliable. I'd presume it would depend
upon the circumstances and the manner of any 'difference' which the
subjects are being tested for perception.

The differences are still there of course but are masked by
the method of testing.

To say "masked" here seems to me to be presupposing that any
difference was both due to some effect *other* than the level
inequality and was then indeed 'masked' rather than being 'removed'
by equalising the levels. Again, I'd like to clarify this below...

Resort to instrumentation is not helpful in judging differences below
about 5 - 10 jnd's, depending on position in colour space, due to the
acuity of the visual system. I suspect the same holds true for auditory
differences.

My interest here is not in what you "suspect", but in trying to clarify
your views and their implications. To do this I'd like to post a series
of
questions and I'd be interested in your answers. Apologies if what
follows
is not clearly expressed...

Firstly, am I correct in understanding that you are saying that you would
be willing to do a comparision test if there were no attempt to correct
for
measureble level differences, but would not be willing to take such a
test
if the levels were equalised beforehand?

If the answer to the above is essentially "Yes", the next question is: Is
this because you are confident you can hear a difference when the levels
are different, but not when they have been equalised?

If the answer to the above is essentially "Yes", the next question is: Is
it your belief that the anticipated inability to perceive a difference is
somehow "masked" (using your term) by level equalisation?

If the answer to the above is essentially "Yes, the next question is:
What
test do you suggest that can be carried out to distinguish between the
hypothesis that the failure to discriminate when the levels are equalised
is due to "masking" rather than the anternative hypothesis that the only
audible distinction was simply due to a difference in level? i.e. that
the
difference percieved was solely due to a level difference.

If you cannot suggest a performable test that could falsify one
hypothesis
and support the other, how can you regard the hypothesis that the level
equalisation "masks" rather than simple "removes" the difference as being
scientifically or academically supportable?

So far as I am concerned, the above questions assume you could arrange
the
protocol to chose lengths and pattern of sampling "A"/"B"/"X" as you
would
feel most reasonable. So - for example - if you'd personally prefer each
"A", etc, to be half an hour, or twenty seconds, this can be assumed to
be
as you prefer.

As a separate point, I would also be curious about your view on how
significant a difference may be if it becomes un-detectable when the
levels
are equalised and each player was giving the desired loudness which you
might prefer (without knowing which player was in use). As someone who
used
to design equipment I can appreciate there are times when even tiny
effects
may be worth persuing. However when in normal use, given all the other
uncontrolled variables of domestic listening, it does seem to me that
many
of the 'differences' people debate may be of doubtful concern when simply
listening to and enjoying music.

Slainte,

Jim

--
Electronics
http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html

I think it might be clearer, Jim, if I just outline my
views on the subject and hope that you find this acceptable.

I agree that, for the results to be meaningful in AB
testing, levels should be equalised and regret that my
devious attempts to wind up Stewart were misinterpreted.

I do feel however that AB testing is possibly not a suitable
test for revealing differences close to 1 jnd and is accurate
to perhaps 5 jnd. My reasoning is that this is analogous
to the case in the visual field where it is possible to view
samples both simultaneously and serially but where
discrimination is greatly reduced in the case of serial viewing.
As you probably know, the specification of small colour
differences is of significant commercial importance and
international standards and an extensive body of research,
both published and unpublished, exists. I published an early
computer study on equal visual spacing as long ago as 1977,
("A two-dimensional colour diagram based on the sensitivity
functions of cone vision". JOCCA, 1977, 60, 307-310).

As for a test to determine whether AB testing is sufficiently sensitive
to distinguish small audio differences I would propose the following:
It should be possible to determine minimum audible differences
of 1 jnd over a discrete range of frequencies on a test setup,
say from 1000 to 15000Hz at 2000 Hz intervals.
A set of digital AB samples to Red Book CD
standard at normal listening levels would then be prepared,
one of which would be a pure tone at each of the
frequencies and the other would be the same tone corrupted
at alternate values by positive then negative random increments
of digital noise varying from 0 to 5 jnd. My prediction is that
an AB test on these samples would not be able to distinguish
differences of less than 5 jnd. Over to you.

BTW Europe won the Ryder Cup by a record margin but I am
sure that in St Andrews you already knew that :-)

Alan.

Older seperates vs new system

I took my dog out for a walk.
While it was ****ing on The Milkman's leg, he seemed distracted by:
"Triffid" wrote:

While it was ****ing on Smeghead's leg, he seemed distracted by:
If you have hands, the ability to use them and a spare moment you could
build a kit loudspeaker from the likes of Wilmslow or Falcon, IPL etc
and have much better bang for the buck.

Once upon a time, that was true.

Don't they still sell the ATC kits??

They do, but I wouldn't trust my woodworking skills to make a decent
bird-table let alone a cabinet worthy of something that quality.

--
Despite appearances, it is still legal to put sugar on cornflakes.

Older seperates vs new system

In article , Alan Murphy
wrote:
"Jim Lesurf" wrote in message
[big snip to avoid repeating my sequence of questions]

I think it might be clearer, Jim, if I just outline my views on the
subject and hope that you find this acceptable.

I'm afraid that your mode of response does not actually seem 'clearer' to
me as you have not dealt with the main issue I was specifically asking you
about. (Please see below.)

I agree that, for the results to be meaningful in AB testing, levels
should be equalised and regret that my devious attempts to wind up
Stewart were misinterpreted.

OK.

I do feel however that AB testing is possibly not a suitable test for
revealing differences close to 1 jnd and is accurate to perhaps 5 jnd.

[snip]

I appreciate that you may "feel" something. I also appreciate that you
might be correct. However, since you seem to be arguing on the basis of
taking an academic approach founded upon applying the scientific method, my
questions were to invite you to apply this to your own statements.

As for a test to determine whether AB testing is sufficiently sensitive
to distinguish small audio differences I would propose the following: It
should be possible to determine minimum audible differences of 1 jnd
over a discrete range of frequencies on a test setup, say from 1000 to
15000Hz at 2000 Hz intervals. A set of digital AB samples to Red Book CD
standard at normal listening levels would then be prepared, one of which
would be a pure tone at each of the frequencies and the other would be
the same tone corrupted at alternate values by positive then negative
random increments of digital noise varying from 0 to 5 jnd. My
prediction is that an AB test on these samples would not be able to
distinguish differences of less than 5 jnd. Over to you.

My series of questions was partly to establish if I had understood you
correctly. Partly to establish what test you had in mind that could be
carried out and whose results could distinguish between your hypothesis
that the failure was due to 'masking' and the alternative hypothesis that
the failure was due to 'removal' of the actual differences.

Unfortunately, your reply does not deal with this point.

I carefully arranged my series of questions so that all but the last could
be answered fairly quickly and simply with a 'yes' or a 'no'. I would have
preferred this as it seems clearer to me that your restatement. However the
key question was the last one (restated above) so I'd like to know your
answer to this. Or do you accept that when you argue that the failure is
due to 'masking' this is no more than a personal belief?

The outline you give from visual experiments is an analogy. This may or may
not be an appropriate analogy. To test this we would require a response to
the question which you did not deal with.

So. Can you now say what practical test/experiment you can suggest that
would be useful to test your hypothesis that the failure is due to
'masking' rather than 'removal' of the audible difference?

BTW Europe won the Ryder Cup by a record margin but I am sure that in St
Andrews you already knew that :-)

Heard about it on the radio. :-) However I tend to leave golf to the
tourists. ;-

Slainte,

Jim

--
Electronics http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html

Older seperates vs new system

"John Corbett" wrote in message
...
In article , Stewart Pinkerton
wrote:

... level-matched time-proximate ABX (and ABChr) testing
has proven over many decades to be the *most* sensitive test
for audible differences in sound quality.

But typical ABX tests are often not as sensitive as they are thought to
be.
ABX is an elegant scheme for data collection, but data collection is only
part of an experiment. It is the entire experiment's sensitivity that
matters.
Recall that a test is *sensitive* for a difference if the test is likely
to detect that difference when the difference is present; a test is
*specific* (i.e., selective) if it is unlikely to report a false positive
result (when the difference is not present).

Here is an example:

If someone does not ever detect a difference, he will still get correct
answers on 50% of trials in the long run just by random guessing.
If someone always detects a difference, he will of course be able to score
100% correct. Often people take as a threshold the size of difference
where someone would get 75% correct answers in an infinite sequence or
repeated trials.

Consider a difference large enough that a certain listener would get the
correct answer for 90% of all trials (well above the 75% threshold).

If we did an ABX test with just one trial for that subject, the
sensitivity would be .90 but the chance of a false positive would be
.50---way too high.

So, we do a test with 16 trials with a passing score if the subject gets
at least 14 correct responses. Now the type 1 error risk is small ( .01)
but the sensitivity is only .79. In other words, we have made a more
specific test, but it is _less_ sensitive that a single-trial test!

A difference has to so large that subjects get correct answers on about
95% of individual trials before a 14-of-16 test is as sensitive as a
single-trial test.
(A 12-of-16 test is less specific, but far more sensitive.)

Didn't understand a word of that myself, but I await the response to it with
eager anticipation!

:-)

Older seperates vs new system

On Thu, 23 Sep 2004 14:14:11 +0100, "Keith G"
wrote:

"John Corbett" wrote in message
...
In article , Stewart Pinkerton
wrote:

... level-matched time-proximate ABX (and ABChr) testing
has proven over many decades to be the *most* sensitive test
for audible differences in sound quality.

But typical ABX tests are often not as sensitive as they are thought to
be.
ABX is an elegant scheme for data collection, but data collection is only
part of an experiment. It is the entire experiment's sensitivity that
matters.
Recall that a test is *sensitive* for a difference if the test is likely
to detect that difference when the difference is present; a test is
*specific* (i.e., selective) if it is unlikely to report a false positive
result (when the difference is not present).

Here is an example:

If someone does not ever detect a difference, he will still get correct
answers on 50% of trials in the long run just by random guessing.
If someone always detects a difference, he will of course be able to score
100% correct. Often people take as a threshold the size of difference
where someone would get 75% correct answers in an infinite sequence or
repeated trials.

Consider a difference large enough that a certain listener would get the
correct answer for 90% of all trials (well above the 75% threshold).

If we did an ABX test with just one trial for that subject, the
sensitivity would be .90 but the chance of a false positive would be
.50---way too high.

So, we do a test with 16 trials with a passing score if the subject gets
at least 14 correct responses. Now the type 1 error risk is small ( .01)
but the sensitivity is only .79. In other words, we have made a more
specific test, but it is _less_ sensitive that a single-trial test!

A difference has to so large that subjects get correct answers on about
95% of individual trials before a 14-of-16 test is as sensitive as a
single-trial test.
(A 12-of-16 test is less specific, but far more sensitive.)

Didn't understand a word of that myself, but I await the response to it with
eager anticipation!

He's quite right statistically, but of course his point is irrelevant,
since there's no point in high sensitivity without high confidence -
just as in measurement, there's no point in high resolution without
high accuracy, viz the myth that 24-bit digital has *any* higher
resolution than 16-bit - if you're using analogue tape sources.
--

Stewart Pinkerton | Music is Art - Audio is Engineering

Older seperates vs new system

"Jim Lesurf" wrote in message
...
In article , Alan Murphy
wrote:
"Jim Lesurf" wrote in message
[big snip to avoid repeating my sequence of questions]

I think it might be clearer, Jim, if I just outline my views on the
subject and hope that you find this acceptable.

I'm afraid that your mode of response does not actually seem 'clearer' to
me as you have not dealt with the main issue I was specifically asking
you
about. (Please see below.)

I agree that, for the results to be meaningful in AB testing, levels
should be equalised and regret that my devious attempts to wind up
Stewart were misinterpreted.

OK.

I do feel however that AB testing is possibly not a suitable test for
revealing differences close to 1 jnd and is accurate to perhaps 5
jnd.

[snip]

I appreciate that you may "feel" something. I also appreciate that you
might be correct. However, since you seem to be arguing on the basis of
taking an academic approach founded upon applying the scientific method,
my
questions were to invite you to apply this to your own statements.

As for a test to determine whether AB testing is sufficiently sensitive
to distinguish small audio differences I would propose the following:
It
should be possible to determine minimum audible differences of 1 jnd
over a discrete range of frequencies on a test setup, say from 1000 to
15000Hz at 2000 Hz intervals. A set of digital AB samples to Red Book
CD
standard at normal listening levels would then be prepared, one of
which
would be a pure tone at each of the frequencies and the other would be
the same tone corrupted at alternate values by positive then negative
random increments of digital noise varying from 0 to 5 jnd. My
prediction is that an AB test on these samples would not be able to
distinguish differences of less than 5 jnd. Over to you.

My series of questions was partly to establish if I had understood you
correctly. Partly to establish what test you had in mind that could be
carried out and whose results could distinguish between your hypothesis
that the failure was due to 'masking' and the alternative hypothesis that
the failure was due to 'removal' of the actual differences.

Unfortunately, your reply does not deal with this point.

I carefully arranged my series of questions so that all but the last
could
be answered fairly quickly and simply with a 'yes' or a 'no'. I would
have
preferred this as it seems clearer to me that your restatement. However
the
key question was the last one (restated above) so I'd like to know your
answer to this. Or do you accept that when you argue that the failure is
due to 'masking' this is no more than a personal belief?

The outline you give from visual experiments is an analogy. This may or
may
not be an appropriate analogy. To test this we would require a response
to
the question which you did not deal with.

So. Can you now say what practical test/experiment you can suggest that
would be useful to test your hypothesis that the failure is due to
'masking' rather than 'removal' of the audible difference?

When I wrote the original post with the term 'masking'
I was not aware that it has a particular meaning and significance
in audio science, not being familiar with the literature. Having
read some relevant papers I now realise that I should have
replaced 'masking' with 'the test is not sufficiently sensitive to
reveal differences which may be present when using some other
method'. My apologies for not being sufficiently clear.

The test that I proposed above would indeed reveal whether the
AB test is insensitive and to what degree, concordant with the
scientific method. I do not have enough knowledge of the subject
to propose a test that would detect low jnd differences in complex
scenarios.

Incidentally during my "googling" I did notice some suggestion that
when different signals are presented simultaneously to seperate ears
much smaller differences can be detected than when these signals are
presented to both ears serially. Any ideas on this?

Alan.

[snip of idle golf chatter :-)]

Slainte,

Jim
--
Electronics
http://www.st-and.ac.uk/~www_pa/Scot...o/electron.htm
Audio Misc http://www.st-and.demon.co.uk/AudioMisc/index.html
Armstrong Audio http://www.st-and.demon.co.uk/Audio/armstrong.html
Barbirolli Soc. http://www.st-and.demon.co.uk/JBSoc/JBSoc.html