[codec] draft test and processing plan for the IETF Codec

Please consider applying the TIA-921-A == ITU-T G.1050 network model, or at least some specific test case from that model, in the codec test plan. TIA-921-B is currently in ballot. The matching revision to G.1050 is in pre-publication status. As the editor, I might be able to answer questions about these standards.

Ed Schulz
Distinguished Engineer
Semiconductor Solutions Group
LSI Corporation
1110 American Parkway NE
Room 12C-265
Allentown, PA 18109-9138
TEL 610 712 2068
MOBILE 732 241 4669
***@lsi.com

Jean-Marc Valin

2011-04-13 13:35:24 UTC

Hi Anisse,

I don't have any comments (yet) because I just started reading, but from
just looking at some tables there's a few things I don't understand. Could
you please clarify these:

1) How much is "very low delay" and how much is "medium delay"?

2) What do "NWT" and "BT" mean?

3) What's the difference between the "Reference Codecs" column and the
"Requirements" column?

4) How do you define the rates that do not exist in some codecs? For
example, I see a reference to 12 and 16 kb/s for Speex, which has no such
rates, and to 8, 12, 16 kb/s for iLBC, which does not have these rates either.

Thanks,

Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it as a starting point for discussion where everyone is welcome to contribute in a constructive manner. Further updates are planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Anisse Taleb

2011-04-13 23:56:40 UTC

Dear Jean Marc.

Thank you for your valuable comments.

Post by Jean-Marc Valin
1) How much is "very low delay" and how much is "medium delay"?

This is based on section [5.1. Operating space] of the requirements document http://www.ietf.org/id/draft-ietf-codec-requirements-02.txt.
"medium delay" (20-30ms), while a few require a "very low delay" (< 10 ms). Would you like to contribute some precise delay operating points for Opus?

Post by Jean-Marc Valin
2) What do "NWT" and "BT" mean?

Sorry for this, I should have defined these abbreviations,

NWT = Not Worse Than.
BT = Better Than.

Post by Jean-Marc Valin
3) What's the difference between the "Reference Codecs" column and the
"Requirements" column?

I am not entirely happy with how this table looks like. Essentially, the first column is the collection of codecs, while the requirements column details the actual requirements. I will group these in a next version.

Post by Jean-Marc Valin
4) How do you define the rates that do not exist in some codecs? For
example, I see a reference to 12 and 16 kb/s for Speex, which has no such
rates, and to 8, 12, 16 kb/s for iLBC, which does not have these rates either.

Thank you for pointing this out. If the rates are not supported, I would suggest to use closest bitrates.
In this initial version, I didn't check if all codecs could be operated at the bitrates being proposed.
Would you like to suggest on which rates ilbc and speex operated?

Kind regards,
/Anisse

Jean-Marc Valin

2011-04-14 02:04:08 UTC

Hi Anisse,

Thanks for the answers. This clarifies a few things, though it makes
other aspects more puzzling.

Benjamin M. Schwartz

2011-04-13 13:38:16 UTC

Post by Anisse Taleb
Please find attached a first draft of a test plan of the IETF codec (Opus).

Thank you for drawing up this test plan, which clearly required a great
deal of thought. The results of such testing would certainly be very
interesting to many.

However, I think the execution of such a test is clearly _not_ an
appropriate prerequisite for publishing a Proposed Standard. By my
calculations, the draft plan presently calls for over 1300 hours of
listening tests, counting only audio being played, estimating 10-second
samples and the minimum number of listeners. Even if many listeners are
listening in parallel, and overheads (such as delays between samples) are
low, conducting such a test would still take many months.

Such an extensive, expensive battery of tests can hardly be justified on
some arbitrary codec version still under development. It can only be
justified if the codec being tested is not going to change, so that the
sponsoring organizations can use the results to determine whether the
codec meets their performance goals.

Let's standardize, and then invite ultra-comprehensive systematic
characterization.

--Ben

Stephen Botzko

2011-04-13 14:13:51 UTC

We are not yet at WG LC, and the RFC publication process itself takes many
months. So I am not seeing the scheduling issues as insurmountable. We
could easily require some core tests prior to WG LC, and continue testing
during the publication process.

I think the best course right now is to refine the test plan, and sort out
the process steps/scheduling when we have stronger consensus on the test
plan itself.

I agree that the tests need to be run on the version we plan to submit for
publication. Making improvements later on does not have to mean the entire
test suite needs to be re-run - subsequent testing could be targeted to test
the changes.

Stephen Botzko

On Wed, Apr 13, 2011 at 9:38 AM, Benjamin M. Schwartz <

Post by Anisse Taleb
Please find attached a first draft of a test plan of the IETF codec

(Opus).
Thank you for drawing up this test plan, which clearly required a great
deal of thought. The results of such testing would certainly be very
interesting to many.
However, I think the execution of such a test is clearly _not_ an
appropriate prerequisite for publishing a Proposed Standard. By my
calculations, the draft plan presently calls for over 1300 hours of
listening tests, counting only audio being played, estimating 10-second
samples and the minimum number of listeners. Even if many listeners are
listening in parallel, and overheads (such as delays between samples) are
low, conducting such a test would still take many months.
Such an extensive, expensive battery of tests can hardly be justified on
some arbitrary codec version still under development. It can only be
justified if the codec being tested is not going to change, so that the
sponsoring organizations can use the results to determine whether the
codec meets their performance goals.
Let's standardize, and then invite ultra-comprehensive systematic
characterization.
--Ben
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Erik Norvell

2011-04-13 14:48:45 UTC

Hi Ben, all

If the codec is not ready for testing, then I cannot see how it could be ready for standardization. To me the steps would be

- freeze the codec when it is stable
- test and evaluate
- check if requirements are met
a) if yes standardize
b) if not do not standarize and rather go back and improve

Informal testing should still be done during development to eliminate the risk of b).

I also think the encumbrance of the codec is unclear at this point and I don't think rushing to finalize the standard would serve the purpose of this WG. Due to the encumbrance there may still be changes required which may affect the quality, and the final testing should begin after this has been resolved.

Best,
Erik

-----Original Message-----
On Behalf Of Benjamin M. Schwartz
Sent: den 13 april 2011 15:38
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Anisse Taleb
Please find attached a first draft of a test plan of the

IETF codec (Opus).
Thank you for drawing up this test plan, which clearly
required a great deal of thought. The results of such
testing would certainly be very interesting to many.
However, I think the execution of such a test is clearly
_not_ an appropriate prerequisite for publishing a Proposed
Standard. By my calculations, the draft plan presently calls
for over 1300 hours of listening tests, counting only audio
being played, estimating 10-second samples and the minimum
number of listeners. Even if many listeners are listening in
parallel, and overheads (such as delays between samples) are
low, conducting such a test would still take many months.
Such an extensive, expensive battery of tests can hardly be
justified on some arbitrary codec version still under
development. It can only be justified if the codec being
tested is not going to change, so that the sponsoring
organizations can use the results to determine whether the
codec meets their performance goals.
Let's standardize, and then invite ultra-comprehensive
systematic characterization.
--Ben

Paul Coverdale

2011-04-13 15:10:46 UTC

Post by Erik Norvell
Hi Ben, all
If the codec is not ready for testing, then I cannot see how it could be
ready for standardization. To me the steps would be
- freeze the codec when it is stable
- test and evaluate
- check if requirements are met
a) if yes standardize
b) if not do not standarize and rather go back and improve
Informal testing should still be done during development to eliminate the risk of b).
I also think the encumbrance of the codec is unclear at this point and I
don't think rushing to finalize the standard would serve the purpose of
this WG. Due to the encumbrance there may still be changes required
which may affect the quality, and the final testing should begin after
this has been resolved.
Best,
Erik

I totally agree.

...Paul

Peter Saint-Andre

2011-04-13 16:45:28 UTC

My understanding is that informal testing has already been done by quite
a few participants in this WG.

Post by Erik Norvell
I also think the encumbrance of the codec is unclear at this point
and I don't think rushing to finalize the standard would serve the
purpose of this WG. Due to the encumbrance there may still be changes
required which may affect the quality, and the final testing should
begin after this has been resolved.

--
Peter Saint-Andre
https://stpeter.im/

Stephen Botzko

2011-04-13 17:57:34 UTC

in-line
Stephen

Post by Peter Saint-Andre

My understanding is that informal testing has already been done by quite
a few participants in this WG.

Yes, and Erik is simply suggesting that should continue while the codec
development is underway.

Post by Peter Saint-Andre

I think Erik was simply referring to the ongoing work already started by the
codec developers [whatever-it-is that is being done in response to the
Qualcomm declaration].
It sounds like you are suggesting this work should be halted, and we should
simply publish the current codec version??? It would be interesting to know
who else would agree with that proposal.

Post by Peter Saint-Andre
Peter
--
Peter Saint-Andre
https://stpeter.im/
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Peter Saint-Andre

2011-04-13 18:07:15 UTC

Post by Stephen Botzko
in-line
Stephen

My understanding is that informal testing has already been done by quite
a few participants in this WG.
Yes, and Erik is simply suggesting that should continue while the codec
development is underway.

Yep, testing is good. Let's keep doing it. :)

We knew when we started this process that there might be encumbrances.
We even knew that there might be unreported encumbrances that would
emerge only after the codec was published as an RFC, or only after the
code was in use by companies who would be big targets for patent
lawsuits. I see no reason to delay publication until all possible
encumbrances have been resolved, whatever that means (as we all know,
patent claims are not resolved at the IETF, they are resolved in courts
of law).
I think Erik was simply referring to the ongoing work already started by
the codec developers [whatever-it-is that is being done in response to
the Qualcomm declaration].
It sounds like you are suggesting this work should be halted, and we
should simply publish the current codec version??? It would be
interesting to know who else would agree with that proposal.

I am suggesting no such thing. What I'm saying is that we could delay
publication *forever* if people want 100% assurance that Opus is
patent-clear. Since we know (and have always known) that we can't gain
such assurance, I'm suggesting that the WG needs to figure out how to
proceed.

My opinion is that delaying forever would be bad.

Peter

--
Peter Saint-Andre
https://stpeter.im/

Stephen Botzko

2011-04-13 18:39:26 UTC

Post by Stephen Botzko
in-line
Stephen

Post by Erik Norvell
Hi Ben, all
If the codec is not ready for testing, then I cannot see how it

could

Post by Erik Norvell
be ready for standardization. To me the steps would be
- freeze the codec when it is stable - test and evaluate - check if
requirements are met a) if yes standardize b) if not do not
standarize and rather go back and improve
Informal testing should still be done during development to

eliminate

Post by Erik Norvell
the risk of b).

My understanding is that informal testing has already been done by

quite

Post by Stephen Botzko
a few participants in this WG.
Yes, and Erik is simply suggesting that should continue while the codec
development is underway.

Yep, testing is good. Let's keep doing it. :)

changes

Post by Erik Norvell
required which may affect the quality, and the final testing should
begin after this has been resolved.

We knew when we started this process that there might be

encumbrances.

Post by Stephen Botzko
We even knew that there might be unreported encumbrances that would
emerge only after the codec was published as an RFC, or only after

the

Post by Stephen Botzko
code was in use by companies who would be big targets for patent
lawsuits. I see no reason to delay publication until all possible
encumbrances have been resolved, whatever that means (as we all know,
patent claims are not resolved at the IETF, they are resolved in

courts

Post by Stephen Botzko
of law).
I think Erik was simply referring to the ongoing work already started by
the codec developers [whatever-it-is that is being done in response to
the Qualcomm declaration].
It sounds like you are suggesting this work should be halted, and we
should simply publish the current codec version??? It would be
interesting to know who else would agree with that proposal.

I think everyone would agree.

In context, there appears to be some time to work through the formal testing
w/o significant delays in publication, since the codec isn't finished yet,
and we can potentially overlap some of the testing schedule with the
publication process time.
Make sense?

Post by Erik Norvell
Peter
--
Peter Saint-Andre
https://stpeter.im/

Ron

2011-04-13 20:35:09 UTC

Post by Peter Saint-Andre
My opinion is that delaying forever would be bad.

I think everyone would agree.
In context, there appears to be some time to work through the formal testing
w/o significant delays in publication, since the codec isn't finished yet,
and we can potentially overlap some of the testing schedule with the
publication process time.
Make sense?

You could pick some tractable number of the ones that you think will
be the most enlightening and run some quick informal tests on those.

We can then take guidance from that as to whether more detailed tests
are warranted for those things, or whether we can agree the result is
clear from even a first approximation. (and perhaps focus energy on
some different set of potentially more valuable tests instead of
chasing clear dead ends in great detail prematurely)

We are in soft-freeze of the bitstream. So finding such cases,
if they do exist, would be much better sooner than later.

Doing as much as we can in parallel is clearly the best option.

Cheers,
Ron

Anisse Taleb

2011-04-18 22:15:26 UTC

Inline.

-----Original Message-----
Peter Saint-Andre
Sent: Wednesday, April 13, 2011 8:07 PM
To: Stephen Botzko
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Stephen Botzko
in-line
Stephen
On Wed, Apr 13, 2011 at 12:45 PM, Peter Saint-Andre

Post by Erik Norvell
Hi Ben, all
If the codec is not ready for testing, then I cannot see how it

could

eliminate

Post by Erik Norvell
the risk of b).

My understanding is that informal testing has already been done by

quite

Post by Stephen Botzko
a few participants in this WG.
Yes, and Erik is simply suggesting that should continue while the
codec development is underway.

Yep, testing is good. Let's keep doing it. :)

changes

Post by Erik Norvell
required which may affect the quality, and the final testing should
begin after this has been resolved.

We knew when we started this process that there might be

encumbrances.

Post by Stephen Botzko
We even knew that there might be unreported encumbrances that would
emerge only after the codec was published as an RFC, or only after

the

courts

Post by Stephen Botzko
of law).
I think Erik was simply referring to the ongoing work already started
by the codec developers [whatever-it-is that is being done in response
to the Qualcomm declaration].
It sounds like you are suggesting this work should be halted, and we
should simply publish the current codec version??? It would be
interesting to know who else would agree with that proposal.

I am suggesting no such thing. What I'm saying is that we could delay
publication *forever* if people want 100% assurance that Opus is patent-
clear. Since we know (and have always known) that we can't gain such
assurance, I'm suggesting that the WG needs to figure out how to proceed.
My opinion is that delaying forever would be bad.

There is a difference between suspecting patents and being 100% sure that there are royalty bearing patents on the codec (see QCM ipr declaration). If the latter holds, I do not see that the resulting codec is suitable to fulfill the requirements that this WG has set forth to accomplish and no reason to publish a codec with known encumbrances.

Kind regards,
/Anisse

Anisse Taleb

2011-04-18 22:08:29 UTC

Post by Peter Saint-Andre
We knew when we started this process that there might be encumbrances.
We even knew that there might be unreported encumbrances that would
emerge only after the codec was published as an RFC, or only after the
code was in use by companies who would be big targets for patent
lawsuits. I see no reason to delay publication until all possible
encumbrances have been resolved, whatever that means (as we all know,
patent claims are not resolved at the IETF, they are resolved in courts
of law).

I am unsure I understand this, publishing the codec while there are known encumbrances is against my understanding of what the WG goals are. These have to definitely be resolved first. The codec has to be thoroughly tested and characterized especially for internet applications...otherwise what the codec WG would have succeed in producing is an encumbered codec with no evidence of its suitability for internet applications...

Anisse Taleb

2011-04-18 09:52:42 UTC

Hi Ben,

Post by Anisse Taleb
Please find attached a first draft of a test plan of the IETF codec

It is always a good practice to first have a target on what would be tested and then find ways how to make the test realistic and reasonable. When it comes to the proposal itself, I think that shortcuts have been taken already. I am not against discussing the size of the test, the draft proposal was exactly made to initiate such discussion...

Post by Anisse Taleb
Such an extensive, expensive battery of tests can hardly be justified on
some arbitrary codec version still under development.

I cannot agree more. Freeze a version of Opus, and let's check the quality of the codec. If it passes the quality expectations, it will become a standard.

-- But before that, clean up the code and the specification and fix the IPR issues. Right now the codec does not pass the "admin" part of requirements.

Kind regards,
/Anisse

Ron

2011-04-18 11:47:16 UTC

Hi Anisse,

Post by Erik Norvell
Hi Ben,

Post by Anisse Taleb
Please find attached a first draft of a test plan of the IETF codec

It is always a good practice to first have a target on what would be tested
and then find ways how to make the test realistic and reasonable. When it
comes to the proposal itself, I think that shortcuts have been taken already.
I am not against discussing the size of the test, the draft proposal was
exactly made to initiate such discussion...

Post by Anisse Taleb
Such an extensive, expensive battery of tests can hardly be justified on
some arbitrary codec version still under development.

I cannot agree more. Freeze a version of Opus, and let's check the quality of
the codec. If it passes the quality expectations, it will become a standard.
-- But before that, clean up the code and the specification and fix the IPR
issues. Right now the codec does not pass the "admin" part of requirements.

I thought it was already agreed that the people acting in good faith would
endeavour to conduct as much of this in parallel as possible.

I'm sure we have plenty of time to file off the rough edges while you gather
enough people to run the two hundred and something nonillion iterations of
your test that Gregory showed would be necessary for it to approach an even
remotely significant result that wasn't entirely a function of chance.

It saddens me to see you play a cheap shot like this at Ben, when so many
people are eagerly awaiting your explanation as to whether that was simply
an error in your math, or a factor you had not considered. Or possibly an
essential ingredient in your insistence of a single do-or-die test? That
nobody could possibly afford to repeat independently ...

So please, we've mapped both extremes of what a non-test might look like now,
and we've clearly shown, with absolute certainty, that this entire group will
never, not before the heat death of the universe, agree upon and perform one
single tell-all test which satisfies them all. And that's before we consider
the users who aren't represented in this testing yet.

I think Cullen very accurately plotted where the middle ground may lay.
Let's all gravitate a little closer to that now, can we please?

Thanks Much!
Ron

Stephen Botzko

2011-04-18 13:32:23 UTC

in-line

Post by Jean-Marc Valin
Hi Anisse,

Post by Erik Norvell
Hi Ben,

Post by Anisse Taleb
Please find attached a first draft of a test plan of the IETF codec

are

Post by Anisse Taleb
listening in parallel, and overheads (such as delays between samples)

are

Post by Anisse Taleb
low, conducting such a test would still take many months.

It is always a good practice to first have a target on what would be

tested

Post by Erik Norvell
and then find ways how to make the test realistic and reasonable. When it
comes to the proposal itself, I think that shortcuts have been taken

already.

Post by Erik Norvell
I am not against discussing the size of the test, the draft proposal was
exactly made to initiate such discussion...

Post by Anisse Taleb
Such an extensive, expensive battery of tests can hardly be justified

Post by Anisse Taleb
some arbitrary codec version still under development.

I cannot agree more. Freeze a version of Opus, and let's check the

quality of

Post by Erik Norvell
the codec. If it passes the quality expectations, it will become a

standard.

Post by Erik Norvell
-- But before that, clean up the code and the specification and fix the

IPR

Post by Erik Norvell
issues. Right now the codec does not pass the "admin" part of

requirements.
I thought it was already agreed that the people acting in good faith would
endeavour to conduct as much of this in parallel as possible.
I'm sure we have plenty of time to file off the rough edges while you gather
enough people to run the two hundred and something nonillion iterations of
your test that Gregory showed would be necessary for it to approach an even
remotely significant result that wasn't entirely a function of chance.
It saddens me to see you play a cheap shot like this at Ben, when so many
people are eagerly awaiting your explanation as to whether that was simply
an error in your math, or a factor you had not considered. Or possibly an
essential ingredient in your insistence of a single do-or-die test? That
nobody could possibly afford to repeat independently ...
Maybe I am missing something, but I am not seeing any cheap shots at Ben or

anyone else in Anisse's post. The "nonillion iterations" and "heat death of
the universe" stuff in your reply are perhaps pejorative, though as we all
know it is hard to judge intentions from emails. I agree with Cullen that
we should assume good faith - that the proponents of systematic testing are
not trying to kill standardization of Opus, but instead simply believe that
such testing is important part of codec standardization, no matter what SDO
is doing the work. As far as I know, that is the truth of it.

The answer to the statistical argument is quite simple - people run
cross-checks and follow up as needed to verify failed results. This allows
fairly efficient weeding out of false negatives and the occasional false
positive), and is one reason why the test procedures need to be
well-documented (so they can be verified/reproduced). The math exercise
presumed that the test procedure could only be run once, and that failure to
pass a requirement or two could not be followed up on. That is not the
case, if we see a result that concerns us, we can follow up. Perhaps the
test plan should say this explicitly, or perhaps we can just agree to
discuss needed follow-ups when we see the results.

Post by Jean-Marc Valin
So please, we've mapped both extremes of what a non-test might look like now,
and we've clearly shown, with absolute certainty, that this entire group will
never, not before the heat death of the universe, agree upon and perform one
single tell-all test which satisfies them all. And that's before we consider
the users who aren't represented in this testing yet.
I think Cullen very accurately plotted where the middle ground may lay.
Let's all gravitate a little closer to that now, can we please?
Thanks Much!
Ron
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Ron

2011-04-18 14:12:00 UTC

Post by Stephen Botzko
if we see a result that concerns us, we can follow up. Perhaps the
test plan should say this explicitly, or perhaps we can just agree to
discuss needed follow-ups when we see the results.

I believe that is exactly the solution that is being explored in the
<***@jmvalin.ca> subthread, which begins:

I gave some more thought on your proposed test plan and as Cullen
suggested, I think the main cause of disagreement is not that much on
the testing, but on the conditions for publishing (large number of BT,
NWT). Considering that ultimately, the decision to publish a spec is
always based on WG consensus, then I think that problem can be
completely bypassed. Once we make it up to the individuals to decide,
then we can focus on "simply" designing a good test.

So let's get started on the tests that people individually think are
important so that we have their results to consider by the time we
think we have enough information to decide.

Do you see anything wrong with that solution? It looks about as fair
and thorough as we can make it to everyone to me.

Cheers,
Ron

Anisse Taleb

2011-04-19 01:54:37 UTC

Hi Ron,

The WG would have failed its goals if the outcome of this activity is to publish an encumbered codec that has a quality which is inferior to current state of the art codecs.
At an equal encumbrance level, quality is the deciding factor.

Kind regards,
/Anisse

-----Original Message-----
Ron
Sent: Monday, April 18, 2011 4:12 PM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

I believe that is exactly the solution that is being explored in the
I gave some more thought on your proposed test plan and as Cullen
suggested, I think the main cause of disagreement is not that much on
the testing, but on the conditions for publishing (large number of BT,
NWT). Considering that ultimately, the decision to publish a spec is
always based on WG consensus, then I think that problem can be
completely bypassed. Once we make it up to the individuals to decide,
then we can focus on "simply" designing a good test.
So let's get started on the tests that people individually think are
important so that we have their results to consider by the time we
think we have enough information to decide.
Do you see anything wrong with that solution? It looks about as fair
and thorough as we can make it to everyone to me.
Cheers,
Ron
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Jean-Marc Valin

2011-04-19 02:52:27 UTC

Hi Anisse,

Post by Anisse Taleb
The WG would have failed its goals if the outcome of this activity is
to publish an encumbered codec that has a quality which is inferior
to current state of the art codecs. At an equal encumbrance level,
quality is the deciding factor.

The goal is still for unencumbered with better quality than existing
unencumbered codecs. If, at the time we decide on whether to publish
this codec, I feel like it is not safe to implement without paying
royalties, I can assure you that I will personally vote/hum/whatever
against.

Cheers,

Jean-Marc

Post by Anisse Taleb
Kind regards, /Anisse

draft test and processing plan for the IETF Codec

Post by Stephen Botzko
if we see a result that concerns us, we can follow up. Perhaps
the test plan should say this explicitly, or perhaps we can just
agree to discuss needed follow-ups when we see the results.

I believe that is exactly the solution that is being explored in
I gave some more thought on your proposed test plan and as Cullen
suggested, I think the main cause of disagreement is not that much
on the testing, but on the conditions for publishing (large number
of BT, NWT). Considering that ultimately, the decision to publish a
spec is always based on WG consensus, then I think that problem can
be completely bypassed. Once we make it up to the individuals to
decide, then we can focus on "simply" designing a good test.
So let's get started on the tests that people individually think
are important so that we have their results to consider by the time
we think we have enough information to decide.
Do you see anything wrong with that solution? It looks about as
fair and thorough as we can make it to everyone to me.
Cheers, Ron
_______________________________________________ codec mailing list

_______________________________________________ codec mailing list

Anisse Taleb

2011-04-19 03:03:04 UTC

Dear Jean-Marc,

I really appreciate your comment and I can assure you of the same from my side. I am happy that we have an agreement on this point.

It's probably too much asking, are you planning a new release overcoming the recently declared IPRs? Feel free to answer me outside the list if you wish.

Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 4:52 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,

Post by Anisse Taleb
Kind regards, /Anisse

draft test and processing plan for the IETF Codec

Post by Stephen Botzko
if we see a result that concerns us, we can follow up. Perhaps
the test plan should say this explicitly, or perhaps we can just
agree to discuss needed follow-ups when we see the results.

_______________________________________________ codec mailing list

Anisse Taleb

2011-04-19 01:40:01 UTC

Dear Ron.

Post by Ron
I thought it was already agreed that the people acting in good faith would
endeavour to conduct as much of this in parallel as possible.

agreed by who?

Post by Ron
I'm sure we have plenty of time to file off the rough edges while you gather
enough people to run the two hundred and something nonillion iterations of
your test that Gregory showed would be necessary for it to approach an even
remotely significant result that wasn't entirely a function of chance.

Please see my answers to Greg.

Post by Ron
It saddens me to see you play a cheap shot like this at Ben, when so many
people are eagerly awaiting your explanation as to whether that was simply
an error in your math, or a factor you had not considered. Or possibly an
essential ingredient in your insistence of a single do-or-die test? That
nobody could possibly afford to repeat independently ...

Cheap-shot:
* an unnecessarily aggressive and unfair remark directed at a defenseless person
Which part was aggressive and which one was unfair, and why would Ben be a defenseless person ?
In this case all 3 requirements would have to be fulfilled before I consider that as a cheap-shot.

Post by Ron
I am not against discussing the size of the test, the draft proposal
was exactly made to initiate such discussion...
So please, we've mapped both extremes of what a non-test might look like now,
and we've clearly shown, with absolute certainty, that this entire group will
never, not before the heat death of the universe, agree upon and perform one
single tell-all test which satisfies them all. And that's before we consider
the users who aren't represented in this testing yet.

I don't think the proposal made was a non-test or extreme in anyway,
it was based on the posted examples by Paul to this very mailing list.

Kind regards,
/Anisse

Jean-Marc Valin

2011-04-18 17:26:47 UTC

Hi Anisse,

Post by Anisse Taleb
I cannot agree more. Freeze a version of Opus, and let's check the
quality of the codec. If it passes the quality expectations, it will
become a standard.

As far as the tests we're discussing are concerned, the Opus bit-stream has
been frozen for about two months now.

Post by Anisse Taleb
-- But before that, clean up the code and the specification and fix the
IPR issues. Right now the codec does not pass the "admin" part of
requirements.

I don't see how making the code/specification cleaner will affect testing,
so I think this can be done in parallel. We can't hold off for every
conceivable possibility, but at this point we don't expect that Opus will
require any quality-impacting changes.

Cheers,

Jean-Marc

Anisse Taleb

2011-04-19 00:50:16 UTC

Dear JM

Post by Jean-Marc Valin

Post by Anisse Taleb
I cannot agree more. Freeze a version of Opus, and let's check the
quality of the codec. If it passes the quality expectations, it will
become a standard.

As far as the tests we're discussing are concerned, the Opus bit-stream has
been frozen for about two months now.

I was referring to the codec as a whole and not only the bitstream. It only needs a statement from Opus proponents that the code is frozen. Whatever you are happy with. If people start testing the codec, which involves resources and costs to respective organizations, I am sure they will not be happy to know that the code has changed after that or while they are testing.

On the bit-stream freeze, I hear elsewhere that the intention is to publish this as fast as possible and then potentially change the bit stream in another *more mature* version. I may have misinterpreted that.

Post by Jean-Marc Valin

Post by Anisse Taleb
-- But before that, clean up the code and the specification and fix the
IPR issues. Right now the codec does not pass the "admin" part of
requirements.

I don't see how making the code/specification cleaner will affect testing,
so I think this can be done in parallel.
We can't hold off for every
conceivable possibility, but at this point we don't expect that Opus will
require any quality-impacting changes.

Personally, I do not want to test an intermediate version of Opus, or to find out that the code I tested had a bug because when cleaning up we find that certain variables or whatever was not initialized.

It is quite frustrating to hear that testing is holding off publication. Let's even assume that the part of testing is done and out of the way. What about the spec and the code ? I see there are still known bugs and I am still not happy with the state of the code or the written draft.

Several individuals will agree to publish this codec and do not seem to care about what it contains or the state of the code. I for one would like to be able to understand the code and the written draft text. Ultimately I would like to be able to implement Opus based solely on the written description alone. This is for instance the case for MPEG audio codecs, they are implementable from the written spec. Can we at least try to achieve that with Opus?

Kind regards,
/Anisse

Roman Shpount

2011-04-13 14:18:40 UTC

Anisse,

Couple of comments about the plan:

1. You are specifying that Opus should be better then G.722 56Kb, even
though the plan calls for no tests against G.722 56 Kb. I would suggest
removing the criteria.

2. You are also specifying that Opus should be better then G.722 for
wideband. I think it should be relaxed to "no worse then". I am not sure
expecting better quality on wideband voice samples is realistic.
_____________
Roman Shpount

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it
as a starting point for discussion where everyone is welcome to contribute
in a constructive manner. Further updates are planned, but let's see first
some initial comments.
The attachment is a pdf version, please let me know if you would like to
see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse

Anisse Taleb

2011-04-18 11:11:50 UTC

Dear Roman,

1. You are specifying that Opus should be better than G.722 56Kb, even though the plan calls for no tests
against G.722 56 Kb. I would suggest removing the criteria.
2. You are also specifying that Opus should be better then G.722 for wideband. I think it should be relaxed
to "no worse then". I am not sure expecting better quality on wideband voice samples is realistic.

Your proposal sounds good to me. Expect these to change in the next version.

Thanks for spotting these.

Kind regards,
/Anisse

Stephan Wenger

2011-04-13 17:19:41 UTC

[...]

Post by Peter Saint-Andre

Yes. However, what can (and probably should) be done is to make sure that
at least those encumbrances that are known (by the way of the disclosure
of the rightholders) are adequately addressed. In the WG meeting, I have
suggested a few mechanisms which may help in this regard.

It is my understanding that requirements are usually taken in a logical
AND relationship. At this point we are as sure as we will ever be that
opus v5 is encumbered by potentially royalty bearing IPR. What does that
say about the relevance of the tests performed against opus v5?

Stephan

Post by Peter Saint-Andre
Peter
--
Peter Saint-Andre
https://stpeter.im/
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Benjamin M. Schwartz

2011-04-13 17:25:43 UTC

Post by Stephan Wenger
At this point we are as sure as we will ever be that
opus v5 is encumbered by potentially royalty bearing IPR.

Speak for yourself, and in the singular.

I am familiar with the codebase and specification. I have so far
discovered exactly zero encumbrance due to potentially royalty bearing
patents.

--Ben

Cullen Jennings

2011-04-15 00:54:27 UTC

Post by Stephan Wenger
At this point we are as sure as we will ever be that
opus v5 is encumbered by potentially royalty bearing IPR.

I somewhat doubt the above statement but regardless, the topic of if given patents apply to opus or not is not something this WG can take on. I'd be very happy if people that want to go discuss the opus code and specific patents took that discussion off to the other email list that Christian set up.

Anisse Taleb

2011-04-18 23:51:27 UTC

Dear Cullen,
I understand that there is no formal process in IETF to solve the issue of encumbrance of the codec. However, a large part of the debate about the work conducted here is related either directly or indirectly to the encumbrance of the codec. Even when it comes to testing and requirements, we have heard objections to comparing to state of the art codecs because they are encumbered. I do not have any solution to this, but my feeling is that a major part of the decision to publish a codec will be based on the level of encumbrance of the codec.

Kind regards,
/Anisse

-----Original Message-----
Cullen Jennings
Sent: Friday, April 15, 2011 2:54 AM
To: Stephan Wenger
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Stephan Wenger
At this point we are as sure as we will ever be that
opus v5 is encumbered by potentially royalty bearing IPR.

I somewhat doubt the above statement but regardless, the topic of if given
patents apply to opus or not is not something this WG can take on. I'd be
very happy if people that want to go discuss the opus code and specific
patents took that discussion off to the other email list that Christian set
up.
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Cullen Jennings

2011-04-19 14:51:34 UTC

inline ...

Post by Anisse Taleb
Dear Cullen,
I understand that there is no formal process in IETF to solve the issue of encumbrance of the codec. However, a large part of the debate about the work conducted here is related either directly or indirectly to the encumbrance of the codec. Even when it comes to testing and requirements, we have heard objections to comparing to state of the art codecs because they are encumbered. I do not have any solution to this, but my feeling is that a major part of the decision to publish a codec will be based on the level of encumbrance of the codec.

For pretty much any working group, the question in the end largely comes down to "is this draft ready to publish?". With this spec, the questions of if opus is royalty free or not will be an important factor in many peoples decisions but it's not a topic the WG tries to directly deal with. The individuals will have to tackle that outside the IETF. Other factors that are important to the decision are how well does it meet the requirements, how well does it meet the needs of some market segment that may be different than what is in the requirements, is the draft good enough that it will lead to interoperable implementations, and does anyone want to use it.

Many people in this WG have no desire for another codec that is not RF. Theses people are interested in how opus compares to RF codecs. If opus offers better perforce over existing RF codecs, and they people outside the IETF WG decide they believe opus has good odds of being RF, they will probably be in favor of publishing it. If either of those two condition are not meant, I would expect them to probably not be in favor of publishing opus. There are probably some people that are interested in opus regardless of if opus is RF or not. Those people are probably interested in how opus compares to both RF and non RF existing codecs. (As a side note, how opus compares to any codec is always interesting and useful to know but from the scope of the charter, I think that how it compares to RF codecs is a higher priority item for this WG)

So I 100% agree with you that when individuals go and decide if they think we should publish a given draft, the probability of it being RF is likely to be a major factor in the decision for many people. The IETF wants to clearly point out to people the IPR disclosures that have been received, but the IETF does not want to be a place were we try and sort out the validity of the IPR.

Hope that helps,
Cullen

PS - Stephan is the IPR advisor for the WG and I hope he corrects me if I got any of the above wrong.

Gregory Maxwell

2011-04-14 03:04:44 UTC

I'm surprised we haven't seen a more intense reaction to this
proposal yet. Perhaps people are missing the less-than-obvious
mathematical reality of it.

If you have 10 requirements all of which must be met, where each is
90% likely to be met, the chance of meeting all of them is 34.8%
(.9^10). The chance of failure increases exponentially with
the number of requirements.

This amplification effect is one reason why I've opposed additional
requirements, even though I was quite confident that Opus was better
than the competition. Add enough requirements and Opus is sure to fail
due to _chance_ no matter how good the codec is, even if the
requirements each sound reasonable individually.

In this case we have 162 requirements proposed. 75 "better than" (BT),
and 87 "not worse than" (NWT), once you expand out all the loss rates, bit
rates, etc.

Moreover, because of measurement noise, Opus could meet all of the
requirements and yet still fail some of the tests. Because there are
so many requirements, even a small chance of false failure becomes
significant.

I did some rough numeric simulations with the tests proposed, using
scores with a standard deviation of 1 (which is about what they were on
the HA test), N = 144 as proposed, and Opus better than the
comparison codec by 0.1. The chance of passing any single NWT
requirement is then 0.9769, and the chance of passing any single BT
requirement is 0.3802.

The chance of passing all of them is
0.9769^87 * 0.3802^75 = 4.1483e-33

Which means about a 1 in 241 nonillion chance of passing all the tests,
even assuming Opus actually met _all_ the stated requirements with a
score +0.1 over the reference.
e.

This is so astronomically unlikely that I had to use an encyclopedia to
find the name for the number. I should have saved the time and just
left it at "a farce".

I urge the working group to keep this hazard in mind when considering
the reasonableness of parallel MUST requirements on top of listening-
test.

Paul Coverdale

2011-04-14 12:34:11 UTC

Post by Gregory Maxwell
I'm surprised we haven't seen a more intense reaction to this
proposal yet. Perhaps people are missing the less-than-obvious
mathematical reality of it.
If you have 10 requirements all of which must be met, where each is
90% likely to be met, the chance of meeting all of them is 34.8%
(.9^10). The chance of failure increases exponentially with
the number of requirements.
This amplification effect is one reason why I've opposed additional
requirements, even though I was quite confident that Opus was better
than the competition. Add enough requirements and Opus is sure to fail
due to _chance_ no matter how good the codec is, even if the
requirements each sound reasonable individually.
In this case we have 162 requirements proposed. 75 "better than" (BT),
and 87 "not worse than" (NWT), once you expand out all the loss rates, bit
rates, etc.
Moreover, because of measurement noise, Opus could meet all of the
requirements and yet still fail some of the tests. Because there are
so many requirements, even a small chance of false failure becomes
significant.
I did some rough numeric simulations with the tests proposed, using
scores with a standard deviation of 1 (which is about what they were on
the HA test), N = 144 as proposed, and Opus better than the
comparison codec by 0.1. The chance of passing any single NWT
requirement is then 0.9769, and the chance of passing any single BT
requirement is 0.3802.
The chance of passing all of them is
0.9769^87 * 0.3802^75 = 4.1483e-33
Which means about a 1 in 241 nonillion chance of passing all the tests,
even assuming Opus actually met _all_ the stated requirements with a
score +0.1 over the reference.
e.
This is so astronomically unlikely that I had to use an encyclopedia to
find the name for the number. I should have saved the time and just
left it at "a farce".
I urge the working group to keep this hazard in mind when considering
the reasonableness of parallel MUST requirements on top of listening-
test.

Greg,

I don't think the situation is as dire as you make out. Your analysis
assumes that all requirements are completely independent. This is not the
case, in many cases if you meet one requirement you are likely to meet
others of the same kind (eg performance as a function of bit rate).

But in any case, the statistical analysis procedure outlined in the test
plan doesn't assume that every requirement must be met with absolute
certainty, it allows for a confidence interval.

Regards,

...Paul

Jean-Marc Valin

2011-04-14 13:06:32 UTC

Post by Paul Coverdale
I don't think the situation is as dire as you make out. Your analysis
assumes that all requirements are completely independent. This is not the
case, in many cases if you meet one requirement you are likely to meet
others of the same kind (eg performance as a function of bit rate).
But in any case, the statistical analysis procedure outlined in the test
plan doesn't assume that every requirement must be met with absolute
certainty, it allows for a confidence interval.

This is exactly what Greg is considering in his analysis. He's starting
from the assumption that the codec really meets *all* 162 requirements.
Consider just the NWT requirements: if we were truly no worse than the
reference codec, then with 87 tests against a 95% confidence interval, we
would be expected to fail about 4 tests just by random chance. Considering
both NWT and BT requirements, the odds of passing Anisse's proposed test
plan given the assumptions above are 4.1483e-33. See http://xkcd.com/882/
for a more rigorous analysis.

Cheers,

Jean-Marc

Anisse Taleb

2011-04-18 23:37:22 UTC

JM, Greg, Paul,
[taking emails in chronological order was ill advised :-)]

I do not disagree with the statistical pitfalls you mention. As Paul stated and also what I wrote in a direct reply to this, there is no single uber-requirement to be passed by the codec, rather a vector of requirements that summarize the performance of the codec compared to other codecs. These have to be analyzed and discussed one by one.

Kind regards,
/Anisse

-----Original Message-----
Jean-Marc Valin
Sent: Thursday, April 14, 2011 3:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Jean-Marc Valin

2011-04-19 00:59:10 UTC

JM, Greg, Paul, [taking emails in chronological order was ill advised
:-)]
I do not disagree with the statistical pitfalls you mention. As Paul
stated and also what I wrote in a direct reply to this, there is no
single uber-requirement to be passed by the codec, rather a vector of
requirements that summarize the performance of the codec compared to
other codecs. These have to be analyzed and discussed one by one.

Then, I guess we have no need for BTs and NWTs in the test plan. In the
end, once the results are analyzed, we'll be able to take each "codec
pair" and say either "A is better than B", "B is better than A", or "A
is tied with B" (null hypothesis). The WG members can then decide what
to conclude from those.

Cheers,

Jean-Marc

Kind regards, /Anisse

for the IETF Codec

Post by Paul Coverdale
I don't think the situation is as dire as you make out. Your
analysis assumes that all requirements are completely
independent. This is not the case, in many cases if you meet one
requirement you are likely to meet others of the same kind (eg
performance as a function of bit rate).
But in any case, the statistical analysis procedure outlined in
the test plan doesn't assume that every requirement must be met
with absolute certainty, it allows for a confidence interval.

This is exactly what Greg is considering in his analysis. He's
starting from the assumption that the codec really meets *all* 162
requirements. Consider just the NWT requirements: if we were truly
no worse than the reference codec, then with 87 tests against a 95%
confidence interval, we would be expected to fail about 4 tests
just by random chance. Considering both NWT and BT requirements,
the odds of passing Anisse's proposed test plan given the
assumptions above are 4.1483e-33. See http://xkcd.com/882/ for a
more rigorous analysis.
Cheers,
Jean-Marc _______________________________________________ codec
https://www.ietf.org/mailman/listinfo/codec

_______________________________________________ codec mailing list

Anisse Taleb

2011-04-19 01:43:20 UTC

Dear Jean Marc,

In any case the results would be there and we can analyze these in whatever form we want. However, requirements need to be set on the codec are you suggesting to defer the requirements discussion to after we see the results?

Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 2:59 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec

JM, Greg, Paul, [taking emails in chronological order was ill advised
:-)]
I do not disagree with the statistical pitfalls you mention. As Paul
stated and also what I wrote in a direct reply to this, there is no
single uber-requirement to be passed by the codec, rather a vector of
requirements that summarize the performance of the codec compared to
other codecs. These have to be analyzed and discussed one by one.

Kind regards, /Anisse

for the IETF Codec

Post by Paul Coverdale
I don't think the situation is as dire as you make out. Your
analysis assumes that all requirements are completely
independent. This is not the case, in many cases if you meet one
requirement you are likely to meet others of the same kind (eg
performance as a function of bit rate).
But in any case, the statistical analysis procedure outlined in
the test plan doesn't assume that every requirement must be met
with absolute certainty, it allows for a confidence interval.

This is exactly what Greg is considering in his analysis. He's
starting from the assumption that the codec really meets *all* 162
requirements. Consider just the NWT requirements: if we were truly
no worse than the reference codec, then with 87 tests against a 95%
confidence interval, we would be expected to fail about 4 tests
just by random chance. Considering both NWT and BT requirements,
the odds of passing Anisse's proposed test plan given the
assumptions above are 4.1483e-33. See http://xkcd.com/882/ for a
more rigorous analysis.
Cheers,
Jean-Marc _______________________________________________ codec
https://www.ietf.org/mailman/listinfo/codec

_______________________________________________ codec mailing list

Anisse Taleb

2011-04-18 22:39:35 UTC

Dear Greg,

Thanks for raising this point, what you say is of course correct mathematically and I am very well aware of that.

I do not recall anyone suggested that everything needs to be passed perfectly. The analysis of the results is something that is open for discussion, if the codec fails a requirement we need to understand why and whether the failure is systematic, that does not mean that the codec is rejected.

There are many cases in which a requirement formally fail but is numerically very close to pass. Certain requirements may pass in a language while fail (very close to pass) in another language and when combined together lead to a pass. There are many examples in ITU-T where codecs have been selected and standardized while fulfilling 90% or 95% of the requirements. What matters in the end, is the decision of the group and the availability of data helps in reaching consensus.

Kind regards,
/Anisse

-----Original Message-----
Gregory Maxwell
Sent: Thursday, April 14, 2011 5:05 AM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec

(Opus).

Post by Anisse Taleb
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider
it as a starting point for discussion where everyone is welcome to
contribute in a constructive manner. Further updates are planned,
but let's see first some initial comments.

I'm surprised we haven't seen a more intense reaction to this
proposal yet. Perhaps people are missing the less-than-obvious
mathematical reality of it.
If you have 10 requirements all of which must be met, where each is
90% likely to be met, the chance of meeting all of them is 34.8%
(.9^10). The chance of failure increases exponentially with
the number of requirements.
This amplification effect is one reason why I've opposed additional
requirements, even though I was quite confident that Opus was better
than the competition. Add enough requirements and Opus is sure to fail
due to _chance_ no matter how good the codec is, even if the
requirements each sound reasonable individually.
In this case we have 162 requirements proposed. 75 "better than" (BT),
and 87 "not worse than" (NWT), once you expand out all the loss rates, bit
rates, etc.
Moreover, because of measurement noise, Opus could meet all of the
requirements and yet still fail some of the tests. Because there are
so many requirements, even a small chance of false failure becomes
significant.
I did some rough numeric simulations with the tests proposed, using
scores with a standard deviation of 1 (which is about what they were on
the HA test), N = 144 as proposed, and Opus better than the
comparison codec by 0.1. The chance of passing any single NWT
requirement is then 0.9769, and the chance of passing any single BT
requirement is 0.3802.
The chance of passing all of them is
0.9769^87 * 0.3802^75 = 4.1483e-33
Which means about a 1 in 241 nonillion chance of passing all the tests,
even assuming Opus actually met _all_ the stated requirements with a
score +0.1 over the reference.
e.
This is so astronomically unlikely that I had to use an encyclopedia to
find the name for the number. I should have saved the time and just
left it at "a farce".
I urge the working group to keep this hazard in mind when considering
the reasonableness of parallel MUST requirements on top of listening-
test.
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Koen Vos

2011-04-14 07:27:39 UTC

This clarifies a few things, though it makes other aspects more puzzling.

Anisse, I too am confused by your proposed "test plan".

By my count, only 8 of the 32 codec/bitrate requirements have a match in the
codec requirements document. Of the other 24, many seem rather contentious.
Had you wanted to add requirements, you could have suggested and motivated
them directly to the mailing list or at the meetings a long time ago.

But my problem with your 24 "new" requirements is not just procedural.
If you read the charter of this WG, you'll see that "beating most existing
codecs" was never a goal. The strongest wording about performance is that
the codec shall enable "high-quality audio services on the Internet".
Instead of spending countless hours on testing against arbitrary
requirements, what better way to verify that we've reached that goal than
deploying it in actual Internet applications?

Early adopters will realize there is always a risk that the bitstream will
have to change, no matter how much we test. They'll also know that the
likelihood of such changes decreases fairly quickly with time and adoption.

Stephen Botzko recently pointed out that codecs implemented in hardware are
difficult to upgrade(*). While true, it's no argument against deploying
sooner rather than later, because no sensible hardware manufacturer is going
to put OPUS in hardware before it has proven itself in the market place.

The testing done to date by developers and independent parties shows we do
indeed have "running code." But the proof of the pudding is in the eating,
and voluntary deployment in real-world (software) applications is the right
next step.

best,
koen.

*) http://www.ietf.org/mail-archive/web/codec/current/msg02352.html

----- Original Message -----
From: "Jean-Marc Valin" <jean-***@octasic.com>
To: "Anisse Taleb" <***@huawei.com>
Cc: ***@ietf.org
Sent: Wednesday, April 13, 2011 7:04:08 PM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Hi Anisse,

Thanks for the answers. This clarifies a few things, though it makes
other aspects more puzzling.

Anisse Taleb

2011-04-18 23:29:35 UTC

Dear Koen,
Answers inline.

Post by Koen Vos
But my problem with your 24 "new" requirements is not just procedural.
If you read the charter of this WG, you'll see that "beating most existing
codecs" was never a goal.
The strongest wording about performance is that
the codec shall enable "high-quality audio services on the Internet".

The goal of this working group is to ensure the existence of a single
high-quality audio codec that is optimized for use over the Internet and
that can be widely implemented and easily distributed among application
developers, service operators, and end users.

It all depends on where you put the bar for high-quality audio, and how you evaluate that a codec is optimized for internet applications. High-quality is very subjective, if I am used to the quality of "existing codecs" I would like to understand how Opus provides high-quality audio relative to these codecs.

Post by Koen Vos
Instead of spending countless hours on testing against arbitrary
requirements, what better way to verify that we've reached that goal than
deploying it in actual Internet applications?

Because in Engineering (as well as Business), goals have to be measurable and measured. If your goal is to deploy a codec for an internet application, why do you need IETF? go ahead, compile, link and deploy!

Most of the codecs cited in the document are "real" and widely deployed. I do apologize for the inconsistencies of some of the test points, I did not claim the test plan to be perfect especially given the time it took to be produced, the goal was to initiate the discussion, if there are codecs you do not wish to see compared to Opus feel free to voice your disagreement.

Post by Koen Vos
Early adopters will realize there is always a risk that the bitstream will
have to change, no matter how much we test. They'll also know that the
likelihood of such changes decreases fairly quickly with time and adoption.

I find it distressing that you mention that the bitstream would change after the codec has been adopted and published by the IETF. While this is something you can do with a proprietary codec, I see no point in standardizing a codec with the a priori knowledge that it will change in the future...

Please define Standard?

Post by Koen Vos
Stephen Botzko recently pointed out that codecs implemented in hardware are
difficult to upgrade(*). While true, it's no argument against deploying
sooner rather than later, because no sensible hardware manufacturer is going
to put OPUS in hardware before it has proven itself in the market place.

If that is true, then why are we here? Why not deploy the codec, gain acceptance in the market place and reach a stable version that is ready for hardware implementation, and then let's standardize it.

Post by Koen Vos
The testing done to date by developers and independent parties shows we do
indeed have "running code." But the proof of the pudding is in the eating,
and voluntary deployment in real-world (software) applications is the right
next step.

Not when we all have to eat the same pudding and we know it contains radioactive lead.

Kind regards,
/Anisse

Jean-Marc Valin

2011-04-15 03:16:31 UTC

Hi Anisse,

I gave some more thought on your proposed test plan and as Cullen
suggested, I think the main cause of disagreement is not that much on
the testing, but on the conditions for publishing (large number of BT,
NWT). Considering that ultimately, the decision to publish a spec is
always based on WG consensus, then I think that problem can be
completely bypassed. Once we make it up to the individuals to decide,
then we can focus on "simply" designing a good test.

Overall I thought the conditions you were proposing in section 2 were
pretty reasonable. There's a few details like selecting existing rates
for codecs like Speex and iLBC, but that should be easy to solve. Once
these are sorted out, interested parties (we had several hands raised in
the last meeting) can start testing and we then let each individual
decide on whether the codec is any good based on the results of the tests.

Sounds like a plan?

Jean-Marc

Jean-Marc Valin

2011-04-15 04:49:58 UTC

So here's some more specific comments on actual bitrates:

1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.
Those should be changed to 8, 11, 15 kb/s to match the actual Speex
bitrates.

2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33
kb/s but that's for 30 ms frames so it's not very interesting.

3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I
think Speex wideband around 12 kb/s is just crap. Worth testing would be
20.6 and 27.8 kb/s.

4) For super-wideband Speex, I recommend just dumping that. This Speex
mode was a mistake right from the start and usually has worse quality
than wideband Speex.

Regarding super-wideband, one thing to keep in mind is that Opus defines
super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling
rate). This makes comparisons with other codecs more difficult. The
rates currently listed for super-wideband are 24, 32, 64 kb/s. I
recommend running 24 kb/s in super-wideband and running 32 and 64 kb/s
in fullband mode (even if the input is a 32 kHz signal).

For the very low delay tests (10 ms frame size), I think all the listed
rates should be using fullband mode except the 32 kb/s.

That's it for now. Any thoughts?

Jean-Marc

Post by Jean-Marc Valin
Hi Anisse,
I gave some more thought on your proposed test plan and as Cullen
suggested, I think the main cause of disagreement is not that much on
the testing, but on the conditions for publishing (large number of BT,
NWT). Considering that ultimately, the decision to publish a spec is
always based on WG consensus, then I think that problem can be
completely bypassed. Once we make it up to the individuals to decide,
then we can focus on "simply" designing a good test.
Overall I thought the conditions you were proposing in section 2 were
pretty reasonable. There's a few details like selecting existing rates
for codecs like Speex and iLBC, but that should be easy to solve. Once
these are sorted out, interested parties (we had several hands raised in
the last meeting) can start testing and we then let each individual
decide on whether the codec is any good based on the results of the tests.
Sounds like a plan?
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Roman Shpount

2011-04-15 16:59:23 UTC

Jean-Mark,

For iLBC 13.3kb 30 ms packet mode is the default, most commonly used, and as
far as I know higher audio quality. I understand that you prefer to test
against the similar packet sizes, but it probably makes sense to test
against the most
common use cases for the codec.
_____________
Roman Shpount

Post by Jean-Marc Valin
1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.
Those should be changed to 8, 11, 15 kb/s to match the actual Speex
bitrates.
2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33
kb/s but that's for 30 ms frames so it's not very interesting.
3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I
think Speex wideband around 12 kb/s is just crap. Worth testing would be
20.6 and 27.8 kb/s.
4) For super-wideband Speex, I recommend just dumping that. This Speex mode
was a mistake right from the start and usually has worse quality than
wideband Speex.
Regarding super-wideband, one thing to keep in mind is that Opus defines
super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling rate).
This makes comparisons with other codecs more difficult. The rates currently
listed for super-wideband are 24, 32, 64 kb/s. I recommend running 24 kb/s
in super-wideband and running 32 and 64 kb/s in fullband mode (even if the
input is a 32 kHz signal).
For the very low delay tests (10 ms frame size), I think all the listed
rates should be using fullband mode except the 32 kb/s.
That's it for now. Any thoughts?
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse

David Virette

2011-04-18 14:11:17 UTC

Dear Jean-Marc,
Thanks for the information on available bitrates for speex and iLBC, it will
help updating the test plan.

Regarding the super-wideband definition, it is really common to have
different bandwidths in a MUSHRA test, and it is not the only factor which
guides the listeners to score the codecs. If you look at the test plan, we
proposed to test the super-wideband and full-band in the same test using
some different band-limited signals as anchors.

Jean-Marc Valin

2011-04-18 17:54:54 UTC

Hi David,

Just to be clear, Opus *does* have support for what ITU-T codecs call
super-wideband, i.e. 14-16 kHz audio bandwidth. For Opus, this bandwidth is
covered by the full-band mode because given the Opus architecture, the
overhead of coding the 16-20 kHz band was small enough that it wasn't worth
creating another mode for cases where there is no content in that band.

That being said, a well-designed VoIP application should ideally use all
the bandwidth that's available and only low-pass/skip frequency bands when
doing so improves quality given the bit-rate. The only exception to the
principle of going with the bandwidth that maximizes quality would be
narrowband, because of the PSTN compatibility issue.

Cheers,

Jean-Marc

Post by David Virette
Dear Jean-Marc,
Thanks for the information on available bitrates for speex and iLBC, it will
help updating the test plan.
Regarding the super-wideband definition, it is really common to have
different bandwidths in a MUSHRA test, and it is not the only factor which
guides the listeners to score the codecs. If you look at the test plan, we
proposed to test the super-wideband and full-band in the same test using
some different band-limited signals as anchors.

David Virette

2011-04-18 20:56:36 UTC

Hi Jean-Marc,
I agree that a well-designed VoIP application should ideally use the
available bandwidth and I think that in the current test plan, the maximum
available bandwidth will always be taken into account. For super-wideband
mode, the input will be 32 kHz, then low pass filtered at 14 kHz and then
probably resampled at 24 kHz for OPUS allowing it to use its maximum
bandwidth for those modes. In full band, a low pass filter with cutoff at 20
kHz will be applied on input and should not affect any codecs.
If there is an overlap between the super-wideband modes and full band modes
of OPUS in the same test (and I would recommend this overlap), this would
also give us the information at which bitrate it is better to switch from
super-wideband to full band.
Best regards,
David

-----Original Message-----
From: Jean-Marc Valin [mailto:jean-***@octasic.com]
Sent: lundi 18 avril 2011 19:55
To: David Virette
Cc: 'Jean-Marc Valin'; ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Hi David,

Just to be clear, Opus *does* have support for what ITU-T codecs call
super-wideband, i.e. 14-16 kHz audio bandwidth. For Opus, this bandwidth is
covered by the full-band mode because given the Opus architecture, the
overhead of coding the 16-20 kHz band was small enough that it wasn't worth
creating another mode for cases where there is no content in that band.

That being said, a well-designed VoIP application should ideally use all
the bandwidth that's available and only low-pass/skip frequency bands when
doing so improves quality given the bit-rate. The only exception to the
principle of going with the bandwidth that maximizes quality would be
narrowband, because of the PSTN compatibility issue.

Cheers,

Jean-Marc

Jean-Marc Valin

2011-04-18 21:46:38 UTC

Post by David Virette
Hi Jean-Marc,
I agree that a well-designed VoIP application should ideally use the
available bandwidth and I think that in the current test plan, the maximum
available bandwidth will always be taken into account. For super-wideband
mode, the input will be 32 kHz, then low pass filtered at 14 kHz and then
probably resampled at 24 kHz for OPUS allowing it to use its maximum
bandwidth for those modes.

Well, if you low-pass filter at 14 kHz then by definition you're not using
all the available bandwidth. If what you have is a 32 kHz input signal,
then the best thing you can do is resample it to *48* kHz (without any
low-pass filtering) and then feed it to Opus at that rate. You can consider
Opus as a 48 kHz codec that can also do lower bandwidths. In fact, the best
way to use Opus is to always run it at 48 kHz and to let it scale bandwidth
internally based on the bit-rate. A while ago I posted this sample of Opus
scaling from 8 kb/s narrowband to 60 kb/s fullband:
http://jmvalin.ca/misc_stuff/opus_sweep.wav

Cheers,

Jean-Marc

Koen Vos

2011-04-15 08:44:11 UTC

I would also suggest replacing all BT (better than) requirements by NWT (no worse than).

My reasoning is that:
- The WG never had the goal to be better than other codecs (see charter).
- Proving to be better can be very hard, especially when several codecs are close to transparent. To show significance in that case you'd need a vast number of listeners, which makes a test more cumbersome to perform.

best,
koen.

----- Original Message -----
From: "Jean-Marc Valin" <***@jmvalin.ca>
To: "Jean-Marc Valin" <***@jmvalin.ca>
Cc: ***@ietf.org
Sent: Thursday, April 14, 2011 9:49:58 PM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

So here's some more specific comments on actual bitrates:

1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.
Those should be changed to 8, 11, 15 kb/s to match the actual Speex
bitrates.

2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33
kb/s but that's for 30 ms frames so it's not very interesting.

3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I
think Speex wideband around 12 kb/s is just crap. Worth testing would be
20.6 and 27.8 kb/s.

4) For super-wideband Speex, I recommend just dumping that. This Speex
mode was a mistake right from the start and usually has worse quality
than wideband Speex.

Regarding super-wideband, one thing to keep in mind is that Opus defines
super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling
rate). This makes comparisons with other codecs more difficult. The
rates currently listed for super-wideband are 24, 32, 64 kb/s. I
recommend running 24 kb/s in super-wideband and running 32 and 64 kb/s
in fullband mode (even if the input is a 32 kHz signal).

For the very low delay tests (10 ms frame size), I think all the listed
rates should be using fullband mode except the 32 kb/s.

That's it for now. Any thoughts?

Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec
(Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Jean-Marc Valin

2011-04-15 10:41:39 UTC

Koen,

The point I was making in the earlier email and that Cullen has stated
earlier is that we don't even need *any* BT or NWT requirements in this
document. It will be up to the WG individual participants to decide for
themselves whether they think it's good to publish Opus, based on the
information available at the time they make the decision. So Anisse's
test plan proposal is meant to "gather useful data", not decide on
whether to publish.

Jean-Marc

Post by Koen Vos
I would also suggest replacing all BT (better than) requirements by NWT (no worse than).
- The WG never had the goal to be better than other codecs (see charter).
- Proving to be better can be very hard, especially when several codecs are close to transparent. To show significance in that case you'd need a vast number of listeners, which makes a test more cumbersome to perform.
best,
koen.
----- Original Message -----
Sent: Thursday, April 14, 2011 9:49:58 PM
Subject: Re: [codec] draft test and processing plan for the IETF Codec
1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.
Those should be changed to 8, 11, 15 kb/s to match the actual Speex
bitrates.
2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33
kb/s but that's for 30 ms frames so it's not very interesting.
3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I
think Speex wideband around 12 kb/s is just crap. Worth testing would be
20.6 and 27.8 kb/s.
4) For super-wideband Speex, I recommend just dumping that. This Speex
mode was a mistake right from the start and usually has worse quality
than wideband Speex.
Regarding super-wideband, one thing to keep in mind is that Opus defines
super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling
rate). This makes comparisons with other codecs more difficult. The
rates currently listed for super-wideband are 24, 32, 64 kb/s. I
recommend running 24 kb/s in super-wideband and running 32 and 64 kb/s
in fullband mode (even if the input is a 32 kHz signal).
For the very low delay tests (10 ms frame size), I think all the listed
rates should be using fullband mode except the 32 kb/s.
That's it for now. Any thoughts?
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec
(Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Cullen Jennings

2011-04-15 15:36:36 UTC

One other little detail where I should have been, uh less fluffy, in my email is the following... It's not really the working group process here, it's just the normal IETF process. There is nothing special about the process for this WG.

One other random idea ... I wonder if it might be worst lists the tests as priority 1,2, and 3. With the idea that priority 1 tests are ones where people are most interested in the results and 3 the least to help decide what order we run tests in. Its seems to me some tests are far more interesting than others. A test might be uninteresting because we pretty much already know what the result will be or it might be less interesting just because it's a bit scenario that will not be used as much. Just a random idea I am tossing out there, no idea if this would help or not.

Cullen

Koen,
The point I was making in the earlier email and that Cullen has stated earlier is that we don't even need *any* BT or NWT requirements in this document. It will be up to the WG individual participants to decide for themselves whether they think it's good to publish Opus, based on the information available at the time they make the decision. So Anisse's test plan proposal is meant to "gather useful data", not decide on whether to publish.
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec
(Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(From La Jolla - San Diego).
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

David Virette

2011-04-18 20:16:05 UTC

Hi Jean-Marc,
The BT and NWT are just intended to gather the meaningful information. Of
course we can ask the listening test sites to report the raw data and the
interested parties can compute all the statistics they feel interesting. But
I think that some people will only want to see the comparison with the
reference codec coming from the requirement document and with the additional
codecs (G.729, G.722,...).
So it is good to list all meaningful comparisons that should be reported by
listening labs. This list of comparison is missing in the current version of
the document, but we will add it. Moreover, I think we should ask the labs
to report the NWT and BT results for all the comparisons, then we will have
the full picture. People will then be able to check the requirements based
on the NWT and everybody should be even more than happy if it finally
appears to be BT.
This kind of comparison for information is also common in ITU-T as it gives
some useful information on the actual codec quality on top of the necessary
results to check the initial requirements.
Best regards,
David

-----Original Message-----
From: codec-***@ietf.org [mailto:codec-***@ietf.org] On Behalf Of
Jean-Marc Valin
Sent: vendredi 15 avril 2011 12:42
To: Koen Vos
Cc: ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Koen,

The point I was making in the earlier email and that Cullen has stated
earlier is that we don't even need *any* BT or NWT requirements in this
document. It will be up to the WG individual participants to decide for
themselves whether they think it's good to publish Opus, based on the
information available at the time they make the decision. So Anisse's
test plan proposal is meant to "gather useful data", not decide on
whether to publish.

Jean-Marc

are close to transparent. To show significance in that case you'd need a
vast number of listeners, which makes a test more cumbersome to perform.

Post by Koen Vos
best,
koen.
----- Original Message -----
Sent: Thursday, April 14, 2011 9:49:58 PM
Subject: Re: [codec] draft test and processing plan for the IETF Codec
1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.
Those should be changed to 8, 11, 15 kb/s to match the actual Speex
bitrates.
2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33
kb/s but that's for 30 ms frames so it's not very interesting.
3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I
think Speex wideband around 12 kb/s is just crap. Worth testing would be
20.6 and 27.8 kb/s.
4) For super-wideband Speex, I recommend just dumping that. This Speex
mode was a mistake right from the start and usually has worse quality
than wideband Speex.
Regarding super-wideband, one thing to keep in mind is that Opus defines
super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling
rate). This makes comparisons with other codecs more difficult. The
rates currently listed for super-wideband are 24, 32, 64 kb/s. I
recommend running 24 kb/s in super-wideband and running 32 and 64 kb/s
in fullband mode (even if the input is a 32 kHz signal).
For the very low delay tests (10 ms frame size), I think all the listed
rates should be using fullband mode except the 32 kb/s.
That's it for now. Any thoughts?
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec
(Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Koen Vos

2011-04-16 05:03:55 UTC

Hi Anisse,

I noticed your plan tests with band-limited signals: Narrowband signals are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.

However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.

Instead of band-pass filtering, tests on speech could use a simple high-pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.

best,
koen.

----- Original Message -----
From: "Anisse Taleb" <***@huawei.com>
To: ***@ietf.org
Sent: Wednesday, April 13, 2011 12:32:00 AM
Subject: [codec] draft test and processing plan for the IETF Codec

Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it as a starting point for discussion where everyone is welcome to contribute in a constructive manner. Further updates are planned, but let's see first some initial comments.

The attachment is a pdf version, please let me know if you would like to see another format and I would be glad to oblige.

Comments and additions are welcome!

Kind regards,
/Anisse
(

Paul Coverdale

2011-04-16 11:42:06 UTC

Hi Koen,

You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

...Paul

-----Original Message-----
Of Koen Vos
Sent: Saturday, April 16, 2011 1:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband signals are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple high-
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Jean-Marc Valin

2011-04-16 13:56:07 UTC

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

What Koen means (I assume!) is that there is no deliberate stop band
starting at 87.5% of the Nyquist rate (e.g. 7 kHz for wideband). These
days what you will see is either a sharp cutoff with "complete
attenuation" starting at 8 kHz -- just enough to avoid aliasing -- or
even using a -3 dB cutoff at 8 kHz and complete attenuation starting
just above that. While the latter introduces a tiny bit of aliasing, it
also produces the widest audio bandwidth possible and the aliasing is
likely to be lower than the coding noise at that frequency anyway.
Fortunately, I think systems with no anti-aliasing at all are pretty
rare (and they don't deserve we care about them anyway!).

I have to admit that in Speex I tuned/trained everything assuming these
3.5/7 kHz cutoff frequencies and I now consider that this was an error
because very few of my users have such filtering.

Cheers,

Jean-Marc

Post by Paul Coverdale
...Paul

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

David Virette

2011-04-18 20:41:44 UTC

Hi Koen,
I agree that for narrowband, in VoIP applications the 300-3400 Hz IRS filter
is not used and I think the 300-4000 was a typo in the test plan, this
should have been 50-4000Hz. This will be corrected in the next version. For
the super-wideband, if I understood correctly, the input signal from OPUS
will be sampled at 24 kHz. And I don't see the point to test it against the
super-wideband as defined in ITU-T. This comparison is part of the
requirement.
At the same time, as pointed out by Jean-Marc, for some bitrates it is
better to use a full band mode rather than super-wideband. In that case, if
no full band reference codec can operate at the same bitrate, I think that
the final comparison will be done against a super-wideband reference codec
operating at the same bitrate. As all the super-wideband and full band codec
will be tested in the same experiments, all these comparisons will be
possible.
Best regards,
David

-----Original Message-----
From: codec-***@ietf.org [mailto:codec-***@ietf.org] On Behalf Of
Koen Vos
Sent: samedi 16 avril 2011 07:04
To: Anisse Taleb
Cc: ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Hi Anisse,

I noticed your plan tests with band-limited signals: Narrowband signals are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.

However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.

Instead of band-pass filtering, tests on speech could use a simple high-pass

filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.

best,
koen.

----- Original Message -----
From: "Anisse Taleb" <***@huawei.com>
To: ***@ietf.org
Sent: Wednesday, April 13, 2011 12:32:00 AM
Subject: [codec] draft test and processing plan for the IETF Codec

Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it
as a starting point for discussion where everyone is welcome to contribute
in a constructive manner. Further updates are planned, but let's see first
some initial comments.

The attachment is a pdf version, please let me know if you would like to see
another format and I would be glad to oblige.

Comments and additions are welcome!

Kind regards,
/Anisse
(

Anisse Taleb

2011-04-19 00:16:49 UTC

Dear Koen,

Let's mimic the real world as closely as possible :-).

Besides the DAC and AA filters, you cannot expect that every loudspeaker, headset, earplug, microphone out there to have a flat frequency response regardless of if it is a VoIP application or something else. Specifying a frequency mask helps in reducing some of the variability and uncertainty due to these and other external factors.

That aside, I am not against revisiting these in getting something that all agree on.

Kind regards,
/Anisse

-----Original Message-----
Sent: Saturday, April 16, 2011 7:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband signals are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple high-
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.
----- Original Message -----
Sent: Wednesday, April 13, 2011 12:32:00 AM
Subject: [codec] draft test and processing plan for the IETF Codec
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it
as a starting point for discussion where everyone is welcome to contribute
in a constructive manner. Further updates are planned, but let's see first
some initial comments.
The attachment is a pdf version, please let me know if you would like to
see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Koen Vos

2011-04-16 20:06:47 UTC

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.

Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.

I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).

best,
koen.

----- Original Message -----
From: "Paul Coverdale" <***@sympatico.ca>
To: "Koen Vos" <***@skype.net>, "Anisse Taleb" <***@huawei.com>
Cc: ***@ietf.org
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Hi Koen,

You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

...Paul

Post by Paul Coverdale
-----Original Message-----
Of Koen Vos
Sent: Saturday, April 16, 2011 1:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband signals
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the
WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple high-
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

Paul Coverdale

2011-04-17 01:25:04 UTC

Hi Koen and Jean-Marc,

The filtering described in the test plan is not meant to be for anti-aliassing, it is there to establish a common bandwidth (and equalization characteristic in some cases) for the audio chain (be it NB, WB, SWB) so that subjects can focus on comparing the distortion introduced by each of the codecs in the test, without confounding it with bandwidth effects.

Regards,

...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

Post by Paul Coverdale
WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Koen Vos

2011-04-17 05:44:29 UTC

Hi Paul,

The filtering described in the test plan [..] is there to establish
a common bandwidth (and equalization characteristic in some cases)
for the audio chain (be it NB, WB, SWB) so that subjects can focus
on comparing the distortion introduced by each of the codecs in the
test, without confounding it with bandwidth effects.

I believe it would be a mistake to test with band-limited signals, for
these reasons:

1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.

2. Band limiting the input hurts a codec's performance. In the Google
test for instance, Opus-***@20 kbps outperformed the LP7 anchor --
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.

3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of http://www.ietf.org/proceedings/77/slides/codec-3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.

4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.

I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.

best,
koen.

----- Original Message -----
From: "Paul Coverdale" <***@sympatico.ca>
To: "Koen Vos" <***@skype.net>
Cc: ***@ietf.org, "Anisse Taleb" <***@huawei.com>
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Hi Koen and Jean-Marc,

The filtering described in the test plan is not meant to be for anti-aliassing, it is there to establish a common bandwidth (and equalization characteristic in some cases) for the audio chain (be it NB, WB, SWB) so that subjects can focus on comparing the distortion introduced by each of the codecs in the test, without confounding it with bandwidth effects.

Regards,

...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

Paul Coverdale

2011-04-18 00:40:33 UTC

Hi Koen,

There's no doubt that increased audio bandwidth, other things being equal, enhances the perception of quality (well, up to the point where the input signal spectrum itself runs out of steam). I think where this discussion is going is that we need to be more precise in defining what we mean by "NB", "WB", "SWB" and "FB" if we want to make meaningful comparisons between codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the anti-aliassing requirement.

Regards,

...Paul

-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of
http://www.ietf.org/proceedings/77/slides/codec-3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each
of the codecs in the test, without confounding it with bandwidth
effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Anisse Taleb

2011-04-19 01:14:28 UTC

Paul,

I have been involved in such a debate elsewhere and I agree that the traditional definition of bandwidth need some update and dusting-off.

In 3GPP, these questions are somewhat related to the acoustic specifications of terminals. As such, operators require from terminal manufacturers to report on the terminal acoustics whether they pass these specifications or not. This work is still ongoing and there are still ongoing discussions on these.

There is however an industry wide consensus about the sampling rates for the different bandwidths, NB (8kHz), WB (16kHz), SWB (32kHz) and FB (48kHz). When it comes to the exact definition of such bandwidths the question is still debatable.

Essentially, the question boils down to fairness with respect to legacy codecs that were designed and tested under certain underlying bandwidth assumptions.

I personally have no strong objection to testing with a signal that spans the whole frequency range (starting at 20Hz for FB and 50Hz for WB, SWB).

Kind regards,
/Anisse

-----Original Message-----
Sent: Monday, April 18, 2011 2:41 AM
To: 'Koen Vos'
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
There's no doubt that increased audio bandwidth, other things being equal,
enhances the perception of quality (well, up to the point where the input
signal spectrum itself runs out of steam). I think where this discussion is
going is that we need to be more precise in defining what we mean by "NB",
"WB", "SWB" and "FB" if we want to make meaningful comparisons between
codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually
a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the
anti-aliassing requirement.
Regards,
...Paul

-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Anisse Taleb

2011-04-19 01:03:09 UTC

Dear Koen,

Regarding point 3. This is quite interesting results, just for my understanding, I was wondering why the confidence intervals for SILK-SWB were small in comparison with the other alternatives? My understanding of this experiment is that the audio bandwidth is not the only factor affecting quality and call time. How did you isolate the other effect?

Do you have any more information about the experimental setup and statistical analysis conducted to derive these results.

Kind regards,
/Anisse

-----Original Message-----
Sent: Sunday, April 17, 2011 7:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of http://www.ietf.org/proceedings/77/slides/codec-
3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each of
the codecs in the test, without confounding it with bandwidth effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Koen Vos

2011-04-18 07:27:55 UTC

Hi Paul,

Post by Paul Coverdale
I think where this discussion is going is that we need to be more
precise in defining what we mean by "NB", "WB", "SWB" and "FB" if
we want to make meaningful comparisons between codecs.

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).

The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.

best,
koen.

----- Original Message -----
From: "Paul Coverdale" <***@sympatico.ca>
To: "Koen Vos" <***@skype.net>
Cc: ***@ietf.org, "Anisse Taleb" <***@huawei.com>
Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Hi Koen,

There's no doubt that increased audio bandwidth, other things being equal, enhances the perception of quality (well, up to the point where the input signal spectrum itself runs out of steam). I think where this discussion is going is that we need to be more precise in defining what we mean by "NB", "WB", "SWB" and "FB" if we want to make meaningful comparisons between codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the anti-aliassing requirement.

Regards,

...Paul

Post by Paul Coverdale
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

Stephen Botzko

2011-04-18 11:18:05 UTC

in-line
Stephen Botzko

Post by Koen Vos
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.

If you simply want to know which "sounds better" to the user, then perhaps
bandpass filtering gets in the way.

If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.

Post by Koen Vos
I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).
The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.
best,
koen.
----- Original Message -----
Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
There's no doubt that increased audio bandwidth, other things being equal,
enhances the perception of quality (well, up to the point where the input
signal spectrum itself runs out of steam). I think where this discussion is
going is that we need to be more precise in defining what we mean by "NB",
"WB", "SWB" and "FB" if we want to make meaningful comparisons between
codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a
minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the
anti-aliassing requirement.
Regards,
...Paul

Post by Paul Coverdale
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Ron

2011-04-18 11:52:46 UTC

Post by Stephen Botzko
in-line
Stephen Botzko

Post by Koen Vos
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.
If you simply want to know which "sounds better" to the user, then perhaps
bandpass filtering gets in the way.
If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.

I think we can safely put the latter into the "category 3. tests", of
things that would be quite interesting to know if someone has time to
collect the data, but that aren't at all required knowledge to publish
a proposed standard of what we have.

Cheers,
Ron

Stephen Botzko

2011-04-18 13:03:11 UTC

in-line

Post by Ron

Post by Stephen Botzko
in-line
Stephen Botzko

Post by Koen Vos
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.
If you simply want to know which "sounds better" to the user, then

perhaps

Post by Stephen Botzko
bandpass filtering gets in the way.
If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.

Generally speaking, I think it is more productive to separate the test plan

[mailto:] On Behalf = Of Jean-Marc Valin

1970-01-01 00:00:00 UTC

David Virette
HUAWEI TECHNOLOGIES CO.,LTD.=A0

Building C
Riesstrasse 25
80992 Munich, Germany
Tel: +49 89 158834 4148
Fax: +49 89 158834 4447
Mobile: +49 1622047469
E-mail: ***@huawei.com
www.huawei.com
-------------------------------------------------------------------------=
---
---------------------------------------------------------
This e-mail and its attachments contain confidential information from
HUAWEI, which=20
is intended only for the person or entity whose address is listed above. =
Any
use of the=20
information contained herein in any way (including, but not limited to,
total or partial=20
disclosure, reproduction, or dissemination) by persons other than the
intended=20
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by=20
phone or email immediately and delete it!
=A0

-----Original Message-----
From: codec-***@ietf.org [mailto:codec-***@ietf.org] On Behalf =
Of
Jean-Marc Valin
Sent: vendredi 15 avril 2011 06:50
To: Jean-Marc Valin
Cc: ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

So here's some more specific comments on actual bitrates:

1) For narrowband Speex, the rates currently listed are 8, 12, 16 kb/s.=20
Those should be changed to 8, 11, 15 kb/s to match the actual Speex=20
bitrates.

2) For iLBC, the rates currently listed are 8, 12, 16 kb/s. I think we=20
should only use 15.2 kb/s for iLBC. There's another rate, which is 13.33 =

kb/s but that's for 30 ms frames so it's not very interesting.

3) For Speex wideband, the rates currently listed are 12, 24, 32 kb/s. I =

think Speex wideband around 12 kb/s is just crap. Worth testing would be =

20.6 and 27.8 kb/s.

4) For super-wideband Speex, I recommend just dumping that. This Speex=20
mode was a mistake right from the start and usually has worse quality=20
than wideband Speex.

Regarding super-wideband, one thing to keep in mind is that Opus defines =

super-wideband as having a 12 kHz audio bandwidth (24 kHz sampling=20
rate). This makes comparisons with other codecs more difficult. The=20
rates currently listed for super-wideband are 24, 32, 64 kb/s. I=20
recommend running 24 kb/s in super-wideband and running 32 and 64 kb/s=20
in fullband mode (even if the input is a 32 kHz signal).

For the very low delay tests (10 ms frame size), I think all the listed=20
rates should be using fullband mode except the 32 kb/s.

That's it for now. Any thoughts?

Jean-Marc

Post by Jean-Marc Valin
the last meeting) can start testing and we then let each individual
decide on whether the codec is any good based on the results of the =

tests.

Post by Jean-Marc Valin
Sounds like a plan?
Jean-Marc

Post by Anisse Taleb
Hi,
Please find attached a first draft of a test plan of the IETF codec
(Opus).
The proposal does not claim to be complete, there are still many
missing things, e.g. tandeming cases, tests with delay jitter, dtx
etc. Consider it as a starting point for discussion where everyone is
welcome to contribute in a constructive manner. Further updates are
planned, but let's see first some initial comments.
The attachment is a pdf version, please let me know if you would like
to see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Koen Vos

2011-04-18 16:48:46 UTC

Post by Stephen Botzko
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Stephen Botzko
then perhaps bandpass filtering gets in the way.

Correct.

Post by Stephen Botzko
If you want to see if there are there is an underlying difference in intelligibility
or user tolerance for the coding artifacts,, then the bandpass filtering might be
useful, since it controls for the known preference that users have for wider
frequency response.

Post by Stephen Botzko
I think where this discussion is going is that we need to be more
precise in defining what we mean by "NB", "WB", "SWB" and "FB" if
we want to make meaningful comparisons between codecs.

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.

If you simply want to know which "sounds better" to the user, then perhaps bandpass filtering gets in the way.

If you want to see if there are there is an underlying difference in intelligibility or user tolerance for the coding artifacts,, then the bandpass filtering might be useful, since it controls for the known preference that users have for wider frequency response.

I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).

The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.

best,
koen.

----- Original Message -----
From: "Paul Coverdale" < ***@sympatico.ca >
To: "Koen Vos" < ***@skype.net >
Cc: ***@ietf.org , "Anisse Taleb" < ***@huawei.com >

Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Hi Koen,

There's no doubt that increased audio bandwidth, other things being equal, enhances the perception of quality (well, up to the point where the input signal spectrum itself runs out of steam). I think where this discussion is going is that we need to be more precise in defining what we mean by "NB", "WB", "SWB" and "FB" if we want to make meaningful comparisons between codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the anti-aliassing requirement.

Regards,

...Paul

Post by Stephen Botzko
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of
http://www.ietf.org/proceedings/77/slides/codec-3.pdf )
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each
of the codecs in the test, without confounding it with bandwidth
effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
***@ietf.org
https://www.ietf.org/mailman/listinfo/codec

Stephen Botzko

2011-04-18 17:34:22 UTC

in-line

Post by Koen Vos

Post by Stephen Botzko
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Stephen Botzko
then perhaps bandpass filtering gets in the way.

Correct.

Post by Stephen Botzko
If you want to see if there are there is an underlying difference in

intelligibility

Post by Stephen Botzko
or user tolerance for the coding artifacts,, then the bandpass filtering

might be

Post by Stephen Botzko
useful, since it controls for the known preference that users have for

wider

Post by Stephen Botzko
frequency response.

I don't think this is particularly academic, such filtering seems to show up
in most test plans I've seen. I don't see how it "invalidates the
conclusion", as the input signal is the same for all codecs in any event.

Also, bandpass filtering is not really "pre-distorting".

Best,
Stephen Botzko

Post by Koen Vos
best,
koen.
------------------------------
*Sent: *Monday, April 18, 2011 4:18:05 AM
*Subject: *Re: [codec] draft test and processing plan for the IETF Codec
in-line
Stephen Botzko

Post by Stephen Botzko
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.
If you simply want to know which "sounds better" to the user, then perhaps
bandpass filtering gets in the way.
If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.

Post by Stephen Botzko
I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).
The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.
best,
koen.
----- Original Message -----
Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
There's no doubt that increased audio bandwidth, other things being equal,
enhances the perception of quality (well, up to the point where the input
signal spectrum itself runs out of steam). I think where this discussion is
going is that we need to be more precise in defining what we mean by "NB",
"WB", "SWB" and "FB" if we want to make meaningful comparisons between
codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a
minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the
anti-aliassing requirement.
Regards,
...Paul

Post by Paul Coverdale
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Roman Shpount

2011-04-18 17:56:16 UTC

Bandpass filtering skews the test results towards the codecs which encode
mid-range frequencies better vs the codecs that provide good encoding across
the entire audio spectrum. If we are looking for realistic scenarios, you
would want bandpass filtering if you want to simulate an audio signal from
PSTN which would be normal signal for a gateway. If you are simulating audio
signal in soft or IP phones there is no reason for such filter to be there,
since inthis case you have access to the audio from more or less the
complete spectrum. The input audio signal in such cases is, in fact, the
original, non-filtered audio that is typically used as test audio samples.
_____________
Roman Shpount

Post by Stephen Botzko
in-line

Post by Koen Vos

Post by Stephen Botzko
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Stephen Botzko
then perhaps bandpass filtering gets in the way.

Correct.

Post by Stephen Botzko
If you want to see if there are there is an underlying difference in

intelligibility

Post by Stephen Botzko
or user tolerance for the coding artifacts,, then the bandpass filtering

might be

Post by Stephen Botzko
useful, since it controls for the known preference that users have for

wider

Post by Stephen Botzko
frequency response.

I don't think this is particularly academic, such filtering seems to show
up in most test plans I've seen. I don't see how it "invalidates the
conclusion", as the input signal is the same for all codecs in any event.
Also, bandpass filtering is not really "pre-distorting".
Best,
Stephen Botzko

Post by Koen Vos
best,
koen.
------------------------------
*Sent: *Monday, April 18, 2011 4:18:05 AM
*Subject: *Re: [codec] draft test and processing plan for the IETF Codec
in-line
Stephen Botzko

Post by Stephen Botzko
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.
If you simply want to know which "sounds better" to the user, then perhaps
bandpass filtering gets in the way.
If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.

Post by Stephen Botzko
I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).
The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.
best,
koen.
----- Original Message -----
Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
There's no doubt that increased audio bandwidth, other things being
equal, enhances the perception of quality (well, up to the point where the
input signal spectrum itself runs out of steam). I think where this
discussion is going is that we need to be more precise in defining what we
mean by "NB", "WB", "SWB" and "FB" if we want to make meaningful comparisons
between codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is
actually a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet
the anti-aliassing requirement.
Regards,
...Paul

Post by Paul Coverdale
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

Post by Paul Coverdale
-----Original Message-----

Behalf

Post by Paul Coverdale

Post by Paul Coverdale
Of Koen Vos
Sent: Saturday, April 16, 2011 1:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Koen Vos

2011-04-18 18:13:47 UTC

I don't see how it "invalidates the conclusion", as the input signal is the same
for all codecs in any event.

The input signal being the same doesn't preclude a bias. The bias comes from
the fact that the input signal is an artificial test signal designed to match the
response of ITU codecs.

As Paul said earlier: "There's no doubt that increased audio bandwidth, other
things being equal, enhances the perception of quality". Therefore, artificially
preventing some codecs to deliver the bandwidth they would in the real world
introduces a bias in the results.

And I don't see what conclusions to draw from biased results.

Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion

best,
koen.

----- Original Message -----
From: "Stephen Botzko" <***@gmail.com>
To: "Koen Vos" <***@skype.net>
Cc: "Paul Coverdale" <***@sympatico.ca>, ***@ietf.org
Sent: Monday, April 18, 2011 10:34:22 AM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line

If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

then perhaps bandpass filtering gets in the way.

Correct.

If you want to see if there are there is an underlying difference in intelligibility
or user tolerance for the coding artifacts,, then the bandpass filtering might be
useful, since it controls for the known preference that users have for wider
frequency response.

Sounds like an interesting academic study. You should also look into any
long-term health effects (so you can argue for a 5 year test plan!).

One thing we know for sure though: pre-distoring test signals creates a bias in the
results and thus invalidates any conclusion from the test.

I don't think this is particularly academic, such filtering seems to show up in most test plans I've seen. I don't see how it "invalidates the conclusion", as the input signal is the same for all codecs in any event.

Also, bandpass filtering is not really "pre-distorting".

Best,
Stephen Botzko

best,
koen.

From: "Stephen Botzko" < ***@gmail.com >

To: "Koen Vos" < ***@skype.net >
Cc: "Paul Coverdale" < ***@sympatico.ca >, ***@ietf.org
Sent: Monday, April 18, 2011 4:18:05 AM

Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line
Stephen Botzko

On Mon, Apr 18, 2011 at 3:27 AM, Koen Vos < ***@skype.net > wrote:

Hi Paul,

I think where this discussion is going is that we need to be more
precise in defining what we mean by "NB", "WB", "SWB" and "FB" if
we want to make meaningful comparisons between codecs.

-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of
http://www.ietf.org/proceedings/77/slides/codec-3.pdf )
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each
of the codecs in the test, without confounding it with bandwidth
effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
***@ietf.org
https://www.ietf.org/mailman/listinfo/codec

Michael Ramalho (mramalho)

2011-04-18 19:07:24 UTC

Post by Stephen Botzko
Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion

MAR: I agree that the adjective should be âBANDPASS filteringâ; as amplitude attenuation (i.e., amplitude distortion) is by definition desired in the attenuation bands.

MAR: One is usually only concerned about âamplitude/frequency/phase distortionâ in the passband. And for that reason it is sometimes desirous to have linear phase filters with constant group delay.

MAR: Granted, a reasonably sharp bandpass filter of the type likely desired for a test plan will likely not have linear phase âŠ and thus will likely have some phase distortion.

MAR: However, the human ear is mostly insensitive to (reasonably small) phase distortion.

MAR: What type of âsignal conditioning (pre) distortionâ are you concerned about?

MAR: If you said the above in jest, I apologize for not seeing a smiley face.

MAR: Additionally, in practice you may not know what type of bandpass filtering is in use prior to the codec. For example, the wideband handsets for hardware IP phones may need to meet defined masks (e.g., tia810B).* Microphones also introduce non-flat passbands. By your definition is a lot of âpre-distortionâ present in the test signals as well.

Michael Ramalho

* I think Roman stated that there is no need for such filters in IP phones, thus I disagree with that statement as well. One usually has to employ specific filters to meet frequency dependent input masks on such devices.

PS â I have a 24 bit recording system at home âŠ so I donât like distortions either.

From: codec-***@ietf.org [mailto:codec-***@ietf.org] On Behalf Of Koen Vos
Sent: Monday, April 18, 2011 2:14 PM
To: Stephen Botzko
Cc: ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Stephen Botzko
I don't see how it "invalidates the conclusion", as the input signal is the same
for all codecs in any event.

Post by Stephen Botzko
Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion

best,
koen.

________________________________

From: "Stephen Botzko" <***@gmail.com>
To: "Koen Vos" <***@skype.net>
Cc: "Paul Coverdale" <***@sympatico.ca>, ***@ietf.org
Sent: Monday, April 18, 2011 10:34:22 AM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line

Post by Stephen Botzko
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Stephen Botzko
then perhaps bandpass filtering gets in the way.

Correct.

Sounds like an interesting academic study. You should also look into any
long-term health effects (so you can argue for a 5 year test plan!).

One thing we know for sure though: pre-distoring test signals creates a bias in the
results and thus invalidates any conclusion from the test.

I don't think this is particularly academic, such filtering seems to show up in most test plans I've seen. I don't see how it "invalidates the conclusion", as the input signal is the same for all codecs in any event.

Also, bandpass filtering is not really "pre-distorting".

Best,
Stephen Botzko

best,
koen.

________________________________

From: "Stephen Botzko" <***@gmail.com>

To: "Koen Vos" <***@skype.net>

Cc: "Paul Coverdale" <***@sympatico.ca>, ***@ietf.org
Sent: Monday, April 18, 2011 4:18:05 AM

Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line
Stephen Botzko

On Mon, Apr 18, 2011 at 3:27 AM, Koen Vos <***@skype.net> wrote:

Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.

I think this might depend on what you want to learn from the test.

If you simply want to know which "sounds better" to the user, then perhaps bandpass filtering gets in the way.

If you want to see if there are there is an underlying difference in intelligibility or user tolerance for the coding artifacts,, then the bandpass filtering might be useful, since it controls for the known preference that users have for wider frequency response.

I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).

The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.

best,
koen.

----- Original Message -----
From: "Paul Coverdale" <***@sympatico.ca>
To: "Koen Vos" <***@skype.net>
Cc: ***@ietf.org, "Anisse Taleb" <***@huawei.com>

Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Hi Koen,

There's no doubt that increased audio bandwidth, other things being equal, enhances the perception of quality (well, up to the point where the input signal spectrum itself runs out of steam). I think where this discussion is going is that we need to be more precise in defining what we mean by "NB", "WB", "SWB" and "FB" if we want to make meaningful comparisons between codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the anti-aliassing requirement.

Regards,

...Paul

Post by Stephen Botzko
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
***@ietf.org
https://www.ietf.org/mailman/listinfo/codec

Roman Shpount

2011-04-18 19:27:34 UTC

Post by Michael Ramalho (mramalho)

What I said was that in IP phones you normally deal with audio that is very
similar to the test audio samples before they been passed through some type
of filter. There is quite a bit of filtering going on in the microphone,
DAC, and after the DAC to actually produce the audio, but all of those are
designed so that you end up with more or less the desired audio spectrum.
After this point you can either pass it through a bandpass filter if this is
required by the CODEC (and a lot of codecs have a filter as a part of their
specification) or you can encode the audio directly (it is not uncommon to
have 50-3900 Hz signal in 8KHz PCMU signal from an IP phone). There is no
reason why IP phone should have only 300 to 3400 Hz in case of narrowband,
or 50-7000 in case of wideband signal. In fact, most of the IP phones have a
wider audio spectrum.
_____________
Roman Shpount

On Mon, Apr 18, 2011 at 3:07 PM, Michael Ramalho (mramalho) <

Post by Stephen Botzko
Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion
MAR: I agree that the adjective should be BANDPASS filtering; as
amplitude attenuation (i.e., amplitude distortion) is by definition desired
in the attenuation bands.
MAR: One is usually only concerned about amplitude/frequency/phase
distortion in the passband. And for that reason it is sometimes desirous to
have linear phase filters with constant group delay.
MAR: Granted, a reasonably sharp bandpass filter of the type likely desired
for a test plan will likely not have linear phase and thus will likely
have some phase distortion.
MAR: However, the human ear is mostly insensitive to (reasonably small) phase distortion.
MAR: What type of signal conditioning (pre) distortion are you concerned
about?
MAR: If you said the above in jest, I apologize for not seeing a smiley face.
MAR: Additionally, in practice you may not know what type of bandpass
filtering is in use prior to the codec. For example, the wideband handsets
for hardware IP phones may need to meet defined masks (e.g., tia810B).*
Microphones also introduce non-flat passbands. By your definition is a lot
of pre-distortion present in the test signals as well.
Michael Ramalho
* I think Roman stated that there is no need for such filters in IP phones,
thus I disagree with that statement as well. One usually has to employ
specific filters to meet frequency dependent input masks on such devices.
PS I have a 24 bit recording system at home so I dont like distortions
either.
Of *Koen Vos
*Sent:* Monday, April 18, 2011 2:14 PM
*To:* Stephen Botzko
*Subject:* Re: [codec] draft test and processing plan for the IETF Codec

Post by Stephen Botzko
I don't see how it "invalidates the conclusion", as the input signal is

the same

Post by Stephen Botzko
for all codecs in any event.

The input signal being the same doesn't preclude a bias. The bias comes from
the fact that the input signal is an artificial test signal designed to match the
response of ITU codecs.
As Paul said earlier: "There's no doubt that increased audio bandwidth, other
things being equal, enhances the perception of quality". Therefore, artificially
preventing some codecs to deliver the bandwidth they would in the real world
introduces a bias in the results.
And I don't see what conclusions to draw from biased results.

Post by Stephen Botzko
Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion
best,
koen.
------------------------------
*Sent: *Monday, April 18, 2011 10:34:22 AM
*Subject: *Re: [codec] draft test and processing plan for the IETF Codec
in-line

Post by Stephen Botzko
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Stephen Botzko
then perhaps bandpass filtering gets in the way.

Correct.

Post by Stephen Botzko
If you want to see if there are there is an underlying difference in

intelligibility

Post by Stephen Botzko
or user tolerance for the coding artifacts,, then the bandpass filtering

might be

Post by Stephen Botzko
useful, since it controls for the known preference that users have for

wider

Post by Stephen Botzko
frequency response.

Sounds like an interesting academic study. You should also look into any
long-term health effects (so you can argue for a 5 year test plan!).
One thing we know for sure though: pre-distoring test signals creates a bias in the
results and thus invalidates any conclusion from the test.
I don't think this is particularly academic, such filtering seems to show
up in most test plans I've seen. I don't see how it "invalidates the
conclusion", as the input signal is the same for all codecs in any event.
Also, bandpass filtering is not really "pre-distorting".
Best,
Stephen Botzko
best,
koen.
------------------------------
*Sent: *Monday, April 18, 2011 4:18:05 AM
*Subject: *Re: [codec] draft test and processing plan for the IETF Codec
in-line
Stephen Botzko
Hi Paul,

The discussion so far was about whether to pre-distort test signals by
bandpass filtering.
I think this might depend on what you want to learn from the test.
If you simply want to know which "sounds better" to the user, then perhaps
bandpass filtering gets in the way.
If you want to see if there are there is an underlying difference in
intelligibility or user tolerance for the coding artifacts,, then the
bandpass filtering might be useful, since it controls for the known
preference that users have for wider frequency response.
I don't see what the name of a codec's mode has to do with meaningful
comparisons. It's the sampling rate that matters: what happens when a
VoIP application swaps one codec for another while leaving all else the
same. So where possible you want to compare codecs running at equal
sampling rates. That gives a clear grouping of codecs for 8, 16 and
48 kHz (some call these NB, WB and FB).
The open question is what to do in between 16 and 48 kHz. Opus accepts
24 kHz signals, other codecs use 32 kHz (and they all call it SWB).
Here you could either compare directly, which puts the 32 kHz codecs at
an advantage. Or you could run Opus in FB mode by upsampling the 32
kHz signal to 48 kHz, as Jean-Marc suggested for 32 and 64 kbps.
best,
koen.
----- Original Message -----
Sent: Sunday, April 17, 2011 5:40:33 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
There's no doubt that increased audio bandwidth, other things being equal,
enhances the perception of quality (well, up to the point where the input
signal spectrum itself runs out of steam). I think where this discussion is
going is that we need to be more precise in defining what we mean by "NB",
"WB", "SWB" and "FB" if we want to make meaningful comparisons between
codecs. In fact, the nominal -3 dB passband bandwidth of G.722 is actually a
minimum of 50 to 7000 Hz, you can go up to 8000 Hz and still meet the
anti-aliassing requirement.
Regards,
...Paul

Post by Stephen Botzko
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

Post by Paul Coverdale
-----Original Message-----
Sent: Saturday, April 16, 2011 1:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec
_______________________________________________
codec mailing list
https://www.ietf.org/mailman/listinfo/codec

Koen Vos

2011-04-18 22:17:27 UTC

Post by Michael Ramalho (mramalho)
What type of âsignal conditioning (pre) distortionâ are you concerned about?

Anisse's test plan would artificially reduce the bandwidth (ie pre-distort) of
input signals so that Opus will sound more muffled than in real-world
applications. ITU codecs always sound muffled, so there it won't make much
difference.

Post by Michael Ramalho (mramalho)
If you said the above in jest ,

No jest from my side! :-)
koen.

----- Original Message -----
From: "Michael Ramalho (mramalho)" <***@cisco.com>
To: "Koen Vos" <***@skype.net>, "Stephen Botzko" <***@gmail.com>
Cc: ***@ietf.org
Sent: Monday, April 18, 2011 12:07:24 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Post by Michael Ramalho (mramalho)
Also, bandpass filtering is not really "pre-distorting".

Post by Michael Ramalho (mramalho)
I don't see how it "invalidates the conclusion", as the input signal is the same
for all codecs in any event.

Post by Michael Ramalho (mramalho)
Also, bandpass filtering is not really "pre-distorting".

Why not? It creates spectral distortion in the signal before the encoder.
http://en.wikipedia.org/wiki/Distortion#Frequency_response_distortion

best,
koen.

----- Original Message -----

From: "Stephen Botzko" <***@gmail.com>
To: "Koen Vos" <***@skype.net>
Cc: "Paul Coverdale" <***@sympatico.ca>, ***@ietf.org
Sent: Monday, April 18, 2011 10:34:22 AM
Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line

Post by Michael Ramalho (mramalho)
If you simply want to know which "sounds better" to the user,

That's probably the best you can hope for yes.

Post by Michael Ramalho (mramalho)
then perhaps bandpass filtering gets in the way.

Correct.

Post by Michael Ramalho (mramalho)
If you want to see if there are there is an underlying difference in intelligibility
or user tolerance for the coding artifacts,, then the bandpass filtering might be
useful, since it controls for the known preference that users have for wider
frequency response.

Sounds like an interesting academic study. You should also look into any
long-term health effects (so you can argue for a 5 year test plan!).

One thing we know for sure though: pre-distoring test signals creates a bias in the
results and thus invalidates any conclusion from the test.

I don't think this is particularly academic, such filtering seems to show up in most test plans I've seen. I don't see how it "invalidates the conclusion", as the input signal is the same for all codecs in any event.

Also, bandpass filtering is not really "pre-distorting".

Best,
Stephen Botzko

best,
koen.

From: "Stephen Botzko" < ***@gmail.com >

To: "Koen Vos" < ***@skype.net >

Cc: "Paul Coverdale" < ***@sympatico.ca >, ***@ietf.org
Sent: Monday, April 18, 2011 4:18:05 AM

Subject: Re: [codec] draft test and processing plan for the IETF Codec

in-line
Stephen Botzko

On Mon, Apr 18, 2011 at 3:27 AM, Koen Vos < ***@skype.net > wrote:

Hi Paul,

Post by Michael Ramalho (mramalho)
I think where this discussion is going is that we need to be more
precise in defining what we mean by "NB", "WB", "SWB" and "FB" if
we want to make meaningful comparisons between codecs.

Post by Michael Ramalho (mramalho)
-----Original Message-----
Sent: Sunday, April 17, 2011 1:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of
http://www.ietf.org/proceedings/77/slides/codec-3.pdf )
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each
of the codecs in the test, without confounding it with bandwidth
effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

_______________________________________________
codec mailing list
***@ietf.org
https://www.ietf.org/mailman/listinfo/codec

Paul Coverdale

2011-04-19 02:20:32 UTC

As I mentioned earlier, the situation is not as bad as it may seem, certainly not a "1 in nonillion" chance of passing all requirements. Greg's analysis applies to flipping a coin that has a probability of .90 for "heads" and .10 for "tails." However, listener responses in a MOS test are not random - if they are, we throw that listener out of the results. The "randomness" that forms the basis of a statistical test derives from the distribution of responses ACROSS listeners rather than WITHIN listeners.

Regards,

...Paul

-----Original Message-----
Sent: Monday, April 18, 2011 6:37 PM
To: Jean-Marc Valin; Paul Coverdale
Subject: RE: [codec] draft test and processing plan for the IETF Codec
JM, Greg, Paul,
[taking emails in chronological order was ill advised :-)]
I do not disagree with the statistical pitfalls you mention. As Paul
stated and also what I wrote in a direct reply to this, there is no
single uber-requirement to be passed by the codec, rather a vector of
requirements that summarize the performance of the codec compared to
other codecs. These have to be analyzed and discussed one by one.
Kind regards,
/Anisse

-----Original Message-----

Jean-Marc Valin
Sent: Thursday, April 14, 2011 3:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
I don't think the situation is as dire as you make out. Your

analysis

Post by Paul Coverdale
assumes that all requirements are completely independent. This is

not the

Post by Paul Coverdale
case, in many cases if you meet one requirement you are likely to

meet

Post by Paul Coverdale
others of the same kind (eg performance as a function of bit rate).
But in any case, the statistical analysis procedure outlined in the

test

Post by Paul Coverdale
plan doesn't assume that every requirement must be met with absolute
certainty, it allows for a confidence interval.

This is exactly what Greg is considering in his analysis. He's

starting

from the assumption that the codec really meets *all* 162

requirements.

Consider just the NWT requirements: if we were truly no worse than the
reference codec, then with 87 tests against a 95% confidence interval,

would be expected to fail about 4 tests just by random chance.

Considering

both NWT and BT requirements, the odds of passing Anisse's proposed

test

plan given the assumptions above are 4.1483e-33. See

http://xkcd.com/882/

for a more rigorous analysis.
Cheers,
Jean-Marc

Anisse Taleb

2011-04-19 03:38:46 UTC

Paul,

Strictly speaking, the probability of failing at least one requirement increases (or stays constant in dependent cases) with increasing the number of requirements. Of course as you mention, the responses of the listeners are not really random and a heads-tail modeling of the pass-fail is way too simplistic. I didn't look at the analysis in depth, neither did I verify the numbers.
I think Greg is mostly concerned with the size of the test and how the analysis of the requirements would be used to derive a conclusion about the codec.

[Off topic]
Statistics are quite fun to play with, a while ago a Conspiracy theorist tried to convince me that man never landed on the moon and that it was impossible that Apollo 11 made it. The millions of components, given the technology of that time, had a significant probability of failure that in total it was beyond doubt that such a rocket would have gone off course.

Kind regards,
/Anisse

From: codec-***@ietf.org [mailto:codec-***@ietf.org] On Behalf Of Paul Coverdale
Sent: Tuesday, April 19, 2011 4:21 AM
To: ***@ietf.org
Subject: Re: [codec] draft test and processing plan for the IETF Codec

As I mentioned earlier, the situation is not as bad as it may seem, certainly not a "1 in nonillion" chance of passing all requirements. Greg's analysis applies to flipping a coin that has a probability of .90 for "heads" and .10 for "tails." However, listener responses in a MOS test are not random - if they are, we throw that listener out of the results. The "randomness" that forms the basis of a statistical test derives from the distribution of responses ACROSS listeners rather than WITHIN listeners.

Regards,

...Paul

-----Original Message-----

Jean-Marc Valin
Sent: Thursday, April 14, 2011 3:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
I don't think the situation is as dire as you make out. Your

analysis

Post by Paul Coverdale
assumes that all requirements are completely independent. This is

not the

Post by Paul Coverdale
case, in many cases if you meet one requirement you are likely to

meet

Post by Paul Coverdale
others of the same kind (eg performance as a function of bit rate).
But in any case, the statistical analysis procedure outlined in the

test

Post by Paul Coverdale
plan doesn't assume that every requirement must be met with absolute
certainty, it allows for a confidence interval.

This is exactly what Greg is considering in his analysis. He's

starting

from the assumption that the codec really meets *all* 162

requirements.

Consider just the NWT requirements: if we were truly no worse than the
reference codec, then with 87 tests against a 95% confidence interval,

would be expected to fail about 4 tests just by random chance.

Considering

both NWT and BT requirements, the odds of passing Anisse's proposed

test

plan given the assumptions above are 4.1483e-33. See

http://xkcd.com/882/

for a more rigorous analysis.
Cheers,
Jean-Marc

Ron

2011-04-19 05:18:50 UTC

Post by Anisse Taleb
Paul,
Strictly speaking, the probability of failing at least one requirement
increases (or stays constant in dependent cases) with increasing the number
of requirements. Of course as you mention, the responses of the listeners are
not really random and a heads-tail modeling of the pass-fail is way too
simplistic. I didn't look at the analysis in depth, neither did I verify the
numbers. I think Greg is mostly concerned with the size of the test and how
the analysis of the requirements would be used to derive a conclusion about
the codec.

I think he's worried, as are many others, that it's sufficiently
sophisticated as to be indistinguishable from a rigged demo.

I think we can be equally worried that you haven't done this analysis
and yet still assert that this test is somehow more valid that the ones
which have already been conducted.

Post by Anisse Taleb
[Off topic]
Statistics are quite fun to play with, a while ago a Conspiracy theorist
tried to convince me that man never landed on the moon and that it was
impossible that Apollo 11 made it. The millions of components, given the
technology of that time, had a significant probability of failure that in
total it was beyond doubt that such a rocket would have gone off course.

I guess you're also unaware of how many rockets they pranged before the
(g)odds did favour them with success, and how unrepeatable that success
subsequently proved to be. And that they also didn't have a worried
cartel shrilling OMG patent! at them at every step of the way.

So can we please return to the topic at hand?

Do you or do you not have a test plan that you wish to conduct before
we reach the WG milestone of assessing the results of the currently
frozen candidate codec? The clock is ticking, and you're wasting the
time that you do have remaining to perform that task and present its
results for the group to consider.

Cheers,
Ron

Monty Montgomery

2011-04-19 06:29:19 UTC

Post by Paul Coverdale
As I mentioned earlier, the situation is not as bad as it may seem,

I questioned Greg on the statistical assumptions he was making, and
they were ~ sound. Conservative, in fact, along most axes.

Post by Paul Coverdale
certainly not a "1 in nonillion" chance of passing all requirements. Greg's
analysis applies to flipping a coin that has a probability of .90 for
"heads" and .10 for "tails." However, listener responses in a MOS test are
not random - if they are, we throw that listener out of the results.

...please be a great deal more descriptive/explicit on what you mean
here, and please don't dumb it down assuming a non-technical audience
without basic statistical analysis training. Because what you just
said above set off multiple alarm bells.

Probability distribution functions are probability distrbution functions.

Monty
Xiph.Org

Anisse Taleb

2011-04-19 09:47:02 UTC

Hi Monty,

Post by Monty Montgomery

Post by Paul Coverdale
As I mentioned earlier, the situation is not as bad as it may seem,

I questioned Greg on the statistical assumptions he was making, and
they were ~ sound. Conservative, in fact, along most axes.

I understand the point that Greg was making, If we want to continue discussing this and taking those arguments as granted then please enlighten me, What are the statistical assumptions? Independence was surely assumed otherwise the equation does not hold. What are the axes you mention?

Post by Monty Montgomery

Post by Paul Coverdale
certainly not a "1 in nonillion" chance of passing all requirements.

Greg's

Post by Paul Coverdale
analysis applies to flipping a coin that has a probability of .90 for
"heads" and .10 for "tails." However, listener responses in a MOS test

are

Post by Paul Coverdale
not random - if they are, we throw that listener out of the results.

...please be a great deal more descriptive/explicit on what you mean
here, and please don't dumb it down assuming a non-technical audience
without basic statistical analysis training. Because what you just
said above set off multiple alarm bells.
Probability distribution functions are probability distrbution functions.

Personally I see no issue with Paul's description. My understanding here, which is also my understanding of Greg's assumptions, is that the outcome of testing a requirement is a coin flip distribution, a.k.a. Bernoulli distribution, the outcome of which is a pass with a probability p=0.9 and fail with a probability 1-p = 0.1 (I think the actual numbers were an example). Did I miss something?

Kind regards,
/Anisse

Koen Vos

2011-04-19 09:39:44 UTC

Hi Anisse,

The reason that SILK-SWB has a much smaller confidence interval is simple: we had many more data points for that case (about a million, versus just a couple thousand for each of the other ones). This difference in numbers was caused by the fact that in the randomized testing we only allowed a small fraction of calls to switch to anything else than the default.

best,
koen.

----- Original Message -----
From: "Anisse Taleb" <***@huawei.com>
To: "Koen Vos" <***@skype.net>, "Paul Coverdale" <***@sympatico.ca>
Cc: ***@ietf.org
Sent: Monday, April 18, 2011 6:03:09 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Dear Koen,

Regarding point 3. This is quite interesting results, just for my understanding, I was wondering why the confidence intervals for SILK-SWB were small in comparison with the other alternatives? My understanding of this experiment is that the audio bandwidth is not the only factor affecting quality and call time. How did you isolate the other effect?

Do you have any more information about the experimental setup and statistical analysis conducted to derive these results.

Kind regards,
/Anisse

-----Original Message-----
Sent: Sunday, April 17, 2011 7:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of http://www.ietf.org/proceedings/77/slides/codec-
3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each of
the codecs in the test, without confounding it with bandwidth effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

Anisse Taleb

2011-04-19 09:58:31 UTC

Dear Koen,
Thanks for the explanation. When it comes to the second part of my question, i.e. interaction with other effects, such as channel quality, call drops, bandwidth, I was curious about the statistical methodology you used. Any info on that?

Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 11:40 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
we had many more data points for that case (about a million, versus just a
couple thousand for each of the other ones). This difference in numbers
was caused by the fact that in the randomized testing we only allowed a
small fraction of calls to switch to anything else than the default.
best,
koen.
----- Original Message -----
Sent: Monday, April 18, 2011 6:03:09 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Dear Koen,
Regarding point 3. This is quite interesting results, just for my
understanding, I was wondering why the confidence intervals for SILK-SWB
were small in comparison with the other alternatives? My understanding of
this experiment is that the audio bandwidth is not the only factor
affecting quality and call time. How did you isolate the other effect?
Do you have any more information about the experimental setup and
statistical analysis conducted to derive these results.
Kind regards,
/Anisse

-----Original Message-----
Sent: Sunday, April 17, 2011 7:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of http://www.ietf.org/proceedings/77/slides/codec-
3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each of
the codecs in the test, without confounding it with bandwidth effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.

Koen Vos

2011-04-19 21:19:19 UTC

Hi Anisse,

Yes for sure, we carefully filtered out anything that could interfere with the test. So we kept only good network channels and so on. You know, regular rigorous testing, but without the intentional biases :P.

The most amazing thing was: coinciding with the randomized testing we saw an increase in bug reports about audio quality.. People complaining about muffled audio etc.

best,
koen.

----- Original Message -----
From: "Anisse Taleb" <***@huawei.com>
To: "Koen Vos" <***@skype.net>
Cc: ***@ietf.org, "Paul Coverdale" <***@sympatico.ca>
Sent: Tuesday, April 19, 2011 11:58:31 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Dear Koen,
Thanks for the explanation. When it comes to the second part of my question, i.e. interaction with other effects, such as channel quality, call drops, bandwidth, I was curious about the statistical methodology you used. Any info on that?

Kind regards,
/Anisse

-----Original Message-----
Sent: Sunday, April 17, 2011 7:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

I believe it would be a mistake to test with band-limited signals, for
1. Band-limited test signals are atypical of real-world usage. People
in this WG have always emphasized that we should test with realistic
scenarios (like network traces for packet loss), and the proposal goes
against that philosophy.
2. Band limiting the input hurts a codec's performance. In the Google
surely that wouldn't happen if Opus ran on an LP7 signal. That makes
the proposed testing procedure less relevant for deciding whether this
codec will be of value on the Internet.
3. Audio bandwidth matters to end users. Real-life experiments show
that codecs with more bandwidth boost user ratings and call durations.
(E.g. see slides 2, 3 of http://www.ietf.org/proceedings/77/slides/codec-
3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and equalization
characteristic in some cases) for the audio chain (be it NB, WB, SWB) so
that subjects can focus on comparing the distortion introduced by each of
the codecs in the test, without confounding it with bandwidth effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to Nyquist
(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not being
used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed
have
such a filter.
best,
koen.

Anisse Taleb

2011-04-20 12:57:12 UTC

Hi Koen,

The reason for me asking is that the ACD numbers (regardless of which codec is used ) you report seem to be quite high compared to what is reported in (*). It could be, as you mention, that by filtering the bad channel conditions the overall average goes up. Are your results published somewhere else? With the detailed experimental setup and filtering procedure?

(*) “An Experimental Study of the Skype Peer-to-Peer VoIP System,” by S. Guha, N. Daswani and R. Jain.

Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 11:19 PM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
Yes for sure, we carefully filtered out anything that could interfere with
the test. So we kept only good network channels and so on. You know,
regular rigorous testing, but without the intentional biases :P.
The most amazing thing was: coinciding with the randomized testing we saw
an increase in bug reports about audio quality.. People complaining about
muffled audio etc.
best,
koen.
----- Original Message -----
Sent: Tuesday, April 19, 2011 11:58:31 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Dear Koen,
Thanks for the explanation. When it comes to the second part of my
question, i.e. interaction with other effects, such as channel quality,
call drops, bandwidth, I was curious about the statistical methodology you
used. Any info on that?
Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 11:40 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
The reason that SILK-SWB has a much smaller confidence interval is
we had many more data points for that case (about a million, versus just

couple thousand for each of the other ones). This difference in numbers
was caused by the fact that in the randomized testing we only allowed a
small fraction of calls to switch to anything else than the default.
best,
koen.
----- Original Message -----
Sent: Monday, April 18, 2011 6:03:09 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Dear Koen,
Regarding point 3. This is quite interesting results, just for my
understanding, I was wondering why the confidence intervals for SILK-SWB
were small in comparison with the other alternatives? My understanding of
this experiment is that the audio bandwidth is not the only factor
affecting quality and call time. How did you isolate the other effect?
Do you have any more information about the experimental setup and
statistical analysis conducted to derive these results.
Kind regards,
/Anisse

-----Original Message-----
Sent: Sunday, April 17, 2011 7:44 AM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Paul,

http://www.ietf.org/proceedings/77/slides/codec-

3.pdf)
So if a codec scores higher "just" because it encodes more bandwidth,
that's still a real benefit to users. And the testing procedure
proposed already reduces the impact of differing bandwidths, by using
MOS scores without pairwise comparisons.
4. Testing with band-limited signals risks perpetuating crippled codec
design. In order to do well in the tests, a codec designer would be
"wise" to downsample the input or otherwise optimize towards the
artificial test signals. This actually lowers the performance for
real-world signals, and usually adds complexity. And as long as
people design codecs with a band-limited response, they'll argue to
test with one as well. Let's break this circle.
I also found it interesting how the chosen bandwidths magically match
those of ITU standards, while potentially hurting Opus. For instance,
Opus-SWB has only 12 kHz bandwidth, but would still be tested with a
14 kHz signal.
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 6:25:04 PM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen and Jean-Marc,
The filtering described in the test plan is not meant to be for anti-
aliassing, it is there to establish a common bandwidth (and

equalization

characteristic in some cases) for the audio chain (be it NB, WB, SWB)

that subjects can focus on comparing the distortion introduced by each

the codecs in the test, without confounding it with bandwidth effects.
Regards,
...Paul

-----Original Message-----
Sent: Saturday, April 16, 2011 4:07 PM
To: Paul Coverdale
Subject: Re: [codec] draft test and processing plan for the IETF Codec

Post by Paul Coverdale
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?

The bandpass filter in the test plan runs on the downsampled signal,
so it's not an anti-aliasing filter.
Also, the plan's bandpass for narrowband goes all the way up to

Nyquist

(4000 Hz), whereas for wideband it goes only to 7000 Hz. So if the
bandpass filters were to somehow deal with aliasing, they are not

being

used consistently.
I presume the resamplers in the plan use proper anti-aliasing filters
representative of those in VoIP applications (and described in
Jean-Marc's post).
best,
koen.
----- Original Message -----
Sent: Saturday, April 16, 2011 4:42:06 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Hi Koen,
You mean that VoIP applications have no filtering at all, not even
anti-aliassing?
...Paul

Post by Paul Coverdale
-----Original Message-----

Behalf

Post by Paul Coverdale
Of Koen Vos
Sent: Saturday, April 16, 2011 1:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF

Codec

Post by Paul Coverdale
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband

signals

Post by Paul Coverdale
are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband

from

Post by Paul Coverdale
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which

degrade

Post by Paul Coverdale
quality and add complexity). So results will be more informative to

the

Post by Paul Coverdale
WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world

Post by Paul Coverdale
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple

high-

Post by Paul Coverdale
pass
filter with a cutoff around 50 Hz, as many VoIP applications do

indeed

Post by Paul Coverdale
have
such a filter.
best,
koen.

Koen Vos

2011-04-19 21:31:04 UTC

Hi Anisse,

Post by Anisse Taleb
Besides the DAC and AA filters, you cannot expect that every loudspeaker,
headset, earplug, microphone out there to have a flat frequency response
regardless of if it is a VoIP application or something else. Specifying a
frequency mask helps in reducing some of the variability and uncertainty
due to these and other external factors.

Most of your external factors depend on the listening setup, but are otherwise
constant over all tests. So how can you possibly mitigate the "variability
and uncertainty" with frequency masks that are _constant_ over all setups, and
which -for unclear reasons- depend on the (bandwidth of the) listening
test?

best,
koen.

----- Original Message -----
From: "Anisse Taleb" <***@huawei.com>
To: "Koen Vos" <***@skype.net>
Cc: ***@ietf.org
Sent: Tuesday, April 19, 2011 2:16:49 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec

Dear Koen,

Let's mimic the real world as closely as possible :-).

Besides the DAC and AA filters, you cannot expect that every loudspeaker, headset, earplug, microphone out there to have a flat frequency response regardless of if it is a VoIP application or something else. Specifying a frequency mask helps in reducing some of the variability and uncertainty due to these and other external factors.

That aside, I am not against revisiting these in getting something that all agree on.

Kind regards,
/Anisse

Post by Anisse Taleb
-----Original Message-----
Sent: Saturday, April 16, 2011 7:04 AM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,
I noticed your plan tests with band-limited signals: Narrowband signals are
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the WG
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple high-
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed have
such a filter.
best,
koen.
----- Original Message -----
Sent: Wednesday, April 13, 2011 12:32:00 AM
Subject: [codec] draft test and processing plan for the IETF Codec
Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it
as a starting point for discussion where everyone is welcome to contribute
in a constructive manner. Further updates are planned, but let's see first
some initial comments.
The attachment is a pdf version, please let me know if you would like to
see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Anisse Taleb

2011-04-20 11:11:55 UTC

Dear Koen,

Several organizations do define a send and receive frequency masks for terminals, these are requirements that have to be fulfilled by terminal manufacturers. Given these masks, one can only assume that the audio signal will be picked-up and reproduced within these defined ranges.

I agree that outside what is defined and required by certain organizations, it is pretty much a wild jungle.

Kind regards,
/Anisse

-----Original Message-----
Sent: Tuesday, April 19, 2011 11:31 PM
To: Anisse Taleb
Subject: Re: [codec] draft test and processing plan for the IETF Codec
Hi Anisse,

Most of your external factors depend on the listening setup, but are otherwise
constant over all tests. So how can you possibly mitigate the "variability
and uncertainty" with frequency masks that are _constant_ over all setups, and
which -for unclear reasons- depend on the (bandwidth of the) listening
test?
best,
koen.
----- Original Message -----
Sent: Tuesday, April 19, 2011 2:16:49 AM
Subject: RE: [codec] draft test and processing plan for the IETF Codec
Dear Koen,
Let's mimic the real world as closely as possible :-).
Besides the DAC and AA filters, you cannot expect that every loudspeaker,
headset, earplug, microphone out there to have a flat frequency response
regardless of if it is a VoIP application or something else. Specifying a
frequency mask helps in reducing some of the variability and uncertainty
due to these and other external factors.
That aside, I am not against revisiting these in getting something that all agree on.
Kind regards,
/Anisse

are

Post by Anisse Taleb
filtered from 300-4000 Hz, Wideband from 50-7000 Hz, Superwideband from
50-14000 Hz.
However, VoIP applications have no such band-pass filters (which degrade
quality and add complexity). So results will be more informative to the

Post by Anisse Taleb
and potential adopters of the codec if the testing avoids band-pass
filtering as well. We want test conditions to mimic the real world as
closely as possible.
Instead of band-pass filtering, tests on speech could use a simple high-
pass
filter with a cutoff around 50 Hz, as many VoIP applications do indeed

have

Post by Anisse Taleb
such a filter.
best,
koen.
----- Original Message -----
Sent: Wednesday, April 13, 2011 12:32:00 AM
Subject: [codec] draft test and processing plan for the IETF Codec
Hi,
Please find attached a first draft of a test plan of the IETF codec

(Opus).

Post by Anisse Taleb
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider

Post by Anisse Taleb
as a starting point for discussion where everyone is welcome to

contribute

Post by Anisse Taleb
in a constructive manner. Further updates are planned, but let's see

first

Post by Anisse Taleb
some initial comments.
The attachment is a pdf version, please let me know if you would like to
see another format and I would be glad to oblige.
Comments and additions are welcome!
Kind regards,
/Anisse
(

Christian Hoene

2014-03-25 09:32:00 UTC

Hello Anisse Taleb,

thanks for the test plan. Who is willing to do those tests?
Complexity is a big issue, too.

With best regards,

Christian Hoene

-----Ursprüngliche Nachricht-----
Von: codec-***@ietf.org [mailto:codec-***@ietf.org] Im Auftrag von
Anisse Taleb
Gesendet: Mittwoch, 13. April 2011 09:32
An: ***@ietf.org
Betreff: [codec] draft test and processing plan for the IETF Codec

Hi,
Please find attached a first draft of a test plan of the IETF codec (Opus).
The proposal does not claim to be complete, there are still many missing
things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider it
as a starting point for discussion where everyone is welcome to contribute
in a constructive manner. Further updates are planned, but let's see first
some initial comments.

The attachment is a pdf version, please let me know if you would like to see
another format and I would be glad to oblige.

Comments and additions are welcome!

Kind regards,
/Anisse
(

Christian Hoene

2014-03-25 12:42:37 UTC