http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&feed=atom&action=history
Feature Extraction for Speech Recognition (1S) - Revision history
2024-03-28T18:55:54Z
Revision history for this page on the wiki
MediaWiki 1.28.0
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=8343&oldid=prev
Cioflanc: /* Status: Completed */
2022-11-14T09:00:51Z
<p><span dir="auto"><span class="autocomment">Status: Completed</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 09:00, 14 November 2022</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8" >Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Professor: Prof. Dr. L. Benini</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Professor: Prof. Dr. L. Benini</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Supervisors:</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Supervisors:</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>** Cristian Cioflan (IIS): [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>** <ins class="diffchange diffchange-inline">[[:User:Cioflanc| </ins>Cristian Cioflan<ins class="diffchange diffchange-inline">]] </ins>(IIS): [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>** Dr. Lukas Cavigelli (Huawei Technologies): [mailto:lukas.cavigelli@huawei.com lukas.cavigelli@huawei.com]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>** Dr. Lukas Cavigelli (Huawei Technologies): [mailto:lukas.cavigelli@huawei.com lukas.cavigelli@huawei.com]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l22" >Line 22:</td>
<td colspan="2" class="diff-lineno">Line 22:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td></tr>
<!-- diff cache key iis_projects:diff:version:1.11a:oldid:8233:newid:8343 -->
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=8233&oldid=prev
Cioflanc at 12:23, 27 October 2022
2022-10-27T12:23:20Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 12:23, 27 October 2022</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l19" >Line 19:</td>
<td colspan="2" class="diff-lineno">Line 19:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Hot]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Hot]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Deep Learning Projects]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Deep Learning Projects]]</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category:Completed]]</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">[[Category:Completed]]</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=8230&oldid=prev
Cioflanc at 12:19, 27 October 2022
2022-10-27T12:19:19Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 12:19, 27 October 2022</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Overview =</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Overview =</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== Status: <del class="diffchange diffchange-inline">Available </del>==</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== Status: <ins class="diffchange diffchange-inline">Completed </ins>==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Type: Semester Thesis</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* Type: Semester Thesis</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l21" >Line 21:</td>
<td colspan="2" class="diff-lineno">Line 21:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Digital]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Cioflanc]]</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Category:<del class="diffchange diffchange-inline">Available</del>]]</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:<ins class="diffchange diffchange-inline">Completed</ins>]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>= Short Description =</div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7030&oldid=prev
Cioflanc at 11:30, 27 September 2021
2021-09-27T11:30:09Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:30, 27 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l31" >Line 31:</td>
<td colspan="2" class="diff-lineno">Line 31:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction represents the process of deriving essential, non-redundant information from a set of measured data. Through the selection and/or combination of input variables into features, the dimension of the given data can be largely reduced, thus also reducing the computational effort within the processing pipeline. Nonetheless, such techniques should not affect the integrity of the data; the extracted features should still accurately described the data set. Often used interchangeably, we differentiate between feature (pre)processing and feature extraction, defining the former as the techniques applied to alter the data in order to emphasize or remove certain characteristics. To better understand the concept of feature extraction, we will use the Mel-Frequency Cepstral Coefficients (MFCC) as an example.   </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction represents the process of deriving essential, non-redundant information from a set of measured data. Through the selection and/or combination of input variables into features, the dimension of the given data can be largely reduced, thus also reducing the computational effort within the processing pipeline. Nonetheless, such techniques should not affect the integrity of the data; the extracted features should still accurately described the data set. Often used interchangeably, we differentiate between feature (pre)processing and feature extraction, defining the former as the techniques applied to alter the data in order to emphasize or remove certain characteristics. To better understand the concept of feature extraction, we will use the Mel-Frequency Cepstral Coefficients (MFCC) as an example.   </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Extracted from an audio signal received as input by the system, the MFCCs are one of the current standards in KWS [[#ref-berg2021|&#91;2&#93;]] [[#ref-majumdar2020|&#91;6&#93;]]. They are a cepstral representation of the signal, but, compared to the simple cepstrum, the MFCC use equally spaced frequency bands, based on the Mel scale. This leads to a closer representation of the signal to the actual response of the human auditory system. The derivation techniques are described in detail in [[#ref-lyonmfcc|&#91;5&#93;]]; we will enumerate the main steps of <del class="diffchange diffchange-inline">PyTorch’s1 </del>MFCC computation:   </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Extracted from an audio signal received as input by the system, the MFCCs are one of the current standards in KWS [[#ref-berg2021|&#91;2&#93;]] [[#ref-majumdar2020|&#91;6&#93;]]. They are a cepstral representation of the signal, but, compared to the simple cepstrum, the MFCC use equally spaced frequency bands, based on the Mel scale. This leads to a closer representation of the signal to the actual response of the human auditory system. The derivation techniques are described in detail in [[#ref-lyonmfcc|&#91;5&#93;]]; we will enumerate the main steps of <ins class="diffchange diffchange-inline">[https://pytorch.org/audio/stable/_modules/torchaudio/transforms.html#MFCC PyTorch’s] </ins>MFCC computation:   </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><ul></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><ul></div></td></tr>
<!-- diff cache key iis_projects:diff:version:1.11a:oldid:7029:newid:7030 -->
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7029&oldid=prev
Cioflanc at 11:23, 27 September 2021
2021-09-27T11:23:52Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:23, 27 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l43" >Line 43:</td>
<td colspan="2" class="diff-lineno">Line 43:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction is often used in Machine Learning settings, and especially in the con-text of Deep Neural Networks (DNN), due to the fact that reducing the feature dimensionality further reduces the computational load, as well as the storage requirements, of such a DNN. Therefore, by pre-determining the relevant features from the input data using human expertise, the model is alleviated from the tedious task of identifying and filtering redundant information. A deep learning pipeline for KWS integrating MFCC is presented in Figure 1.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction is often used in Machine Learning settings, and especially in the con-text of Deep Neural Networks (DNN), due to the fact that reducing the feature dimensionality further reduces the computational load, as well as the storage requirements, of such a DNN. Therefore, by pre-determining the relevant features from the input data using human expertise, the model is alleviated from the tedious task of identifying and filtering redundant information. A deep learning pipeline for KWS integrating MFCC is presented in Figure 1.  </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"><div class="center"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[File:KWS.png|thumb|600px]]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></div></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In the recent years, it was shown that state-of-the-art results can be achieved not only by using the MFCCs, but also by employing the results of an intermediate step, such as the MelSpectrogram [[#ref-vygon2021|&#91;9&#93;]] or the LogMelSpectrogram [[#ref-kim2021|&#91;4&#93;]]. Moreover, surveys  [[#ref-alim2018|&#91;1&#93;]] [[#ref-sharma2020|&#91;7&#93;]] [[#ref-choi2021|&#91;3&#93;]] [[#ref-shrawankar2013|&#91;8&#93;]] on the topic present alternative strategies in performing feature extraction for audio-based ML tasks. Nevertheless, there are no such works, to the best of our knowledge, presenting a meta-analysis of the different extraction techniques and their impact in the scope of KWS. As Choi et al. [[#ref-choi2021|&#91;3&#93;]] have shown, the choice of the feature extraction mechanism has a significant impact in the context of music tagging, so it is only natural to assume that a similar conclusion would be drawn when performing a comparative evaluation for a keyword spotting system.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>In the recent years, it was shown that state-of-the-art results can be achieved not only by using the MFCCs, but also by employing the results of an intermediate step, such as the MelSpectrogram [[#ref-vygon2021|&#91;9&#93;]] or the LogMelSpectrogram [[#ref-kim2021|&#91;4&#93;]]. Moreover, surveys  [[#ref-alim2018|&#91;1&#93;]] [[#ref-sharma2020|&#91;7&#93;]] [[#ref-choi2021|&#91;3&#93;]] [[#ref-shrawankar2013|&#91;8&#93;]] on the topic present alternative strategies in performing feature extraction for audio-based ML tasks. Nevertheless, there are no such works, to the best of our knowledge, presenting a meta-analysis of the different extraction techniques and their impact in the scope of KWS. As Choi et al. [[#ref-choi2021|&#91;3&#93;]] have shown, the choice of the feature extraction mechanism has a significant impact in the context of music tagging, so it is only natural to assume that a similar conclusion would be drawn when performing a comparative evaluation for a keyword spotting system.  </div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7028&oldid=prev
Cioflanc at 11:20, 27 September 2021
2021-09-27T11:20:46Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:20, 27 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l119" >Line 119:</td>
<td colspan="2" class="diff-lineno">Line 119:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-majumdar2020" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-majumdar2020" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">2</del>&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg. <span><span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition. </span></span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">6</ins>&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg. <span><span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition. </span></span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-sharma2020" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-sharma2020" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">4</del>&#93; </span><span class="csl-right-inline">Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan. <span> Trends in audio signal feature extraction methods. </span>2020</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">7</ins>&#93; </span><span class="csl-right-inline">Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan. <span> Trends in audio signal feature extraction methods. </span>2020</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-shrawankar2013" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-shrawankar2013" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">4</del>&#93; </span><span class="csl-right-inline">Urmila Shrawankar and V M Thakare. <span> Techniques for feature extraction in speech recognition system: A comparative study. </span>2013</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">8</ins>&#93; </span><span class="csl-right-inline">Urmila Shrawankar and V M Thakare. <span> Techniques for feature extraction in speech recognition system: A comparative study. </span>2013</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-vygon2021" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-vygon2021" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">4</del>&#93; </span><span class="csl-right-inline">Roman Vygon and Nikolay Mikhaylovskiy. <span>Learning efficient representations for keyword spotting with triplet loss. </span>2021</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">9</ins>&#93; </span><span class="csl-right-inline">Roman Vygon and Nikolay Mikhaylovskiy. <span>Learning efficient representations for keyword spotting with triplet loss. </span>2021</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7027&oldid=prev
Cioflanc at 11:19, 27 September 2021
2021-09-27T11:19:39Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:19, 27 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l31" >Line 31:</td>
<td colspan="2" class="diff-lineno">Line 31:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction represents the process of deriving essential, non-redundant information from a set of measured data. Through the selection and/or combination of input variables into features, the dimension of the given data can be largely reduced, thus also reducing the computational effort within the processing pipeline. Nonetheless, such techniques should not affect the integrity of the data; the extracted features should still accurately described the data set. Often used interchangeably, we differentiate between feature (pre)processing and feature extraction, defining the former as the techniques applied to alter the data in order to emphasize or remove certain characteristics. To better understand the concept of feature extraction, we will use the Mel-Frequency Cepstral Coefficients (MFCC) as an example.   </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction represents the process of deriving essential, non-redundant information from a set of measured data. Through the selection and/or combination of input variables into features, the dimension of the given data can be largely reduced, thus also reducing the computational effort within the processing pipeline. Nonetheless, such techniques should not affect the integrity of the data; the extracted features should still accurately described the data set. Often used interchangeably, we differentiate between feature (pre)processing and feature extraction, defining the former as the techniques applied to alter the data in order to emphasize or remove certain characteristics. To better understand the concept of feature extraction, we will use the Mel-Frequency Cepstral Coefficients (MFCC) as an example.   </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Extracted from an audio signal received as input by the system, the MFCCs are one of the current standards in KWS [2<del class="diffchange diffchange-inline">, </del>6]. They are a cepstral representation of the signal, but, compared to the simple cepstrum, the MFCC use equally spaced frequency bands, based on the Mel scale. This leads to a closer representation of the signal to the actual response of the human auditory system. The derivation techniques are described in detail in [5]; we will enumerate the main steps of PyTorch’s1 MFCC computation:   </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Extracted from an audio signal received as input by the system, the MFCCs are one of the current standards in KWS [<ins class="diffchange diffchange-inline">[#ref-berg2021|&#91;</ins>2<ins class="diffchange diffchange-inline">&#93;]] [[#ref-majumdar2020|&#91;</ins>6<ins class="diffchange diffchange-inline">&#93;]</ins>]. They are a cepstral representation of the signal, but, compared to the simple cepstrum, the MFCC use equally spaced frequency bands, based on the Mel scale. This leads to a closer representation of the signal to the actual response of the human auditory system. The derivation techniques are described in detail in [<ins class="diffchange diffchange-inline">[#ref-lyonmfcc|&#91;</ins>5<ins class="diffchange diffchange-inline">&#93;]</ins>]; we will enumerate the main steps of PyTorch’s1 MFCC computation:   </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><ul></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><ul></div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l44" >Line 44:</td>
<td colspan="2" class="diff-lineno">Line 44:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction is often used in Machine Learning settings, and especially in the con-text of Deep Neural Networks (DNN), due to the fact that reducing the feature dimensionality further reduces the computational load, as well as the storage requirements, of such a DNN. Therefore, by pre-determining the relevant features from the input data using human expertise, the model is alleviated from the tedious task of identifying and filtering redundant information. A deep learning pipeline for KWS integrating MFCC is presented in Figure 1.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>Feature extraction is often used in Machine Learning settings, and especially in the con-text of Deep Neural Networks (DNN), due to the fact that reducing the feature dimensionality further reduces the computational load, as well as the storage requirements, of such a DNN. Therefore, by pre-determining the relevant features from the input data using human expertise, the model is alleviated from the tedious task of identifying and filtering redundant information. A deep learning pipeline for KWS integrating MFCC is presented in Figure 1.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the recent years, it was shown that state-of-the-art results can be achieved not only by using the MFCCs, but also by employing the results of an intermediate step, such as the MelSpectrogram [9] or the LogMelSpectrogram [4]. Moreover, surveys [1<del class="diffchange diffchange-inline">, </del>7<del class="diffchange diffchange-inline">, </del>3<del class="diffchange diffchange-inline">, </del>8] on the topic present alternative strategies in performing feature extraction for audio-based ML tasks. Nevertheless, there are no such works, to the best of our knowledge, presenting a meta-analysis of the different extraction techniques and their impact in the scope of KWS. As Choi et al. [3] have shown, the choice of the feature extraction mechanism has a significant impact in the context of music tagging, so it is only natural to assume that a similar conclusion would be drawn when performing a comparative evaluation for a keyword spotting system.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the recent years, it was shown that state-of-the-art results can be achieved not only by using the MFCCs, but also by employing the results of an intermediate step, such as the MelSpectrogram [<ins class="diffchange diffchange-inline">[#ref-vygon2021|&#91;</ins>9<ins class="diffchange diffchange-inline">&#93;]</ins>] or the LogMelSpectrogram [<ins class="diffchange diffchange-inline">[#ref-kim2021|&#91;</ins>4<ins class="diffchange diffchange-inline">&#93;]</ins>]. Moreover, surveys <ins class="diffchange diffchange-inline"> [</ins>[<ins class="diffchange diffchange-inline">#ref-alim2018|&#91;</ins>1<ins class="diffchange diffchange-inline">&#93;]] [[#ref-sharma2020|&#91;</ins>7<ins class="diffchange diffchange-inline">&#93;]] [[#ref-choi2021|&#91;</ins>3<ins class="diffchange diffchange-inline">&#93;]] [[#ref-shrawankar2013|&#91;</ins>8<ins class="diffchange diffchange-inline">&#93;]</ins>] on the topic present alternative strategies in performing feature extraction for audio-based ML tasks. Nevertheless, there are no such works, to the best of our knowledge, presenting a meta-analysis of the different extraction techniques and their impact in the scope of KWS. As Choi et al. [<ins class="diffchange diffchange-inline">[#ref-choi2021|&#91;</ins>3<ins class="diffchange diffchange-inline">&#93;]</ins>] have shown, the choice of the feature extraction mechanism has a significant impact in the context of music tagging, so it is only natural to assume that a similar conclusion would be drawn when performing a comparative evaluation for a keyword spotting system.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l104" >Line 104:</td>
<td colspan="2" class="diff-lineno">Line 104:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="refs" class="references csl-bib-body"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="refs" class="references csl-bib-body"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-alim2018" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-alim2018" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">Sabur Ajibola Alim and Nahrul Khair Alang Rashid<span><del class="diffchange diffchange-inline">“</del><span class="nocase">Some commonly used speech feature extraction algorithms</span><del class="diffchange diffchange-inline">,”</del></span> 2018.</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">Sabur Ajibola Alim and Nahrul Khair Alang Rashid<ins class="diffchange diffchange-inline">. </ins><span><span class="nocase">Some commonly used speech feature extraction algorithms</span><ins class="diffchange diffchange-inline">. </ins></span> 2018.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-berg2021" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-berg2021" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">Axel Berg, Mark O’Connor, and Miguel Tairum Cruz<span><del class="diffchange diffchange-inline">“</del><span class="nocase">Keyword transformer: A self-attention model for keyword spotting</span><del class="diffchange diffchange-inline">,”</del></span> 2021.</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">Axel Berg, Mark O’Connor, and Miguel Tairum Cruz<ins class="diffchange diffchange-inline">. </ins><span><span class="nocase">Keyword transformer: A self-attention model for keyword spotting<ins class="diffchange diffchange-inline">. </ins></span></span> 2021.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-choi2021" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-choi2021" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">Keunwoo Choi, György Fazekas, Kyunghyun Cho, and Mark Sandler.<span>A comparison of audio signal preprocessing methods for deep neural networks on music tagging</span>2021</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">Keunwoo Choi, György Fazekas, Kyunghyun Cho, and Mark Sandler. <span>A comparison of audio signal preprocessing methods for deep neural networks on music tagging<ins class="diffchange diffchange-inline">. </ins></span>2021<ins class="diffchange diffchange-inline">. </ins></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-kim2021" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-kim2021" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Byeonggeun Kim, Simyung Chang, Jinkyu Lee, and Dooyong Sung<span>Broadcasted residual learning for efficient keyword spotting</span> 2021</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Byeonggeun Kim, Simyung Chang, Jinkyu Lee, and Dooyong Sung<ins class="diffchange diffchange-inline">. </ins><span>Broadcasted residual learning for efficient keyword spotting<ins class="diffchange diffchange-inline">. </ins></span>2021<ins class="diffchange diffchange-inline">.</ins></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-lyonmfcc" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-lyonmfcc" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">James Lyon <span>“<span class="nocase">Mel Frequency Cepstral Coefficient (MFCC) tutorial.</span><del class="diffchange diffchange-inline">,”</del></span></span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">James Lyon<ins class="diffchange diffchange-inline">. </ins><span>“<span class="nocase">Mel Frequency Cepstral Coefficient (MFCC) tutorial. </span></span></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-majumdar2020" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-majumdar2020" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg<del class="diffchange diffchange-inline">, </del><span><del class="diffchange diffchange-inline">“</del><span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition</span><del class="diffchange diffchange-inline">,”</del></span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg<ins class="diffchange diffchange-inline">. </ins><span><span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition<ins class="diffchange diffchange-inline">. </ins></span></span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-sharma2020" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-sharma2020" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan<span> Trends in audio signal feature extraction methods</span> 2020</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan<ins class="diffchange diffchange-inline">. </ins><span> Trends in audio signal feature extraction methods<ins class="diffchange diffchange-inline">. </ins></span>2020</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-shrawankar2013" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-shrawankar2013" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Urmila Shrawankar and V M Thakare<span> Techniques for feature extraction in speech recognition system : A comparative study</span> 2013</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Urmila Shrawankar and V M Thakare<ins class="diffchange diffchange-inline">. </ins><span> Techniques for feature extraction in speech recognition system: A comparative study<ins class="diffchange diffchange-inline">. </ins></span>2013</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-vygon2021" class="csl-entry"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-vygon2021" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Roman Vygon and Nikolay Mikhaylovskiy<span>Learning efficient representations for keyword spotting with triplet loss</span> 2021</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Roman Vygon and Nikolay Mikhaylovskiy<ins class="diffchange diffchange-inline">. </ins><span>Learning efficient representations for keyword spotting with triplet loss<ins class="diffchange diffchange-inline">. </ins></span>2021</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7026&oldid=prev
Cioflanc at 11:08, 27 September 2021
2021-09-27T11:08:50Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:08, 27 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l103" >Line 103:</td>
<td colspan="2" class="diff-lineno">Line 103:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="refs" class="references csl-bib-body"></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><div id="refs" class="references csl-bib-body"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<del class="diffchange diffchange-inline">deAndrade2018</del>" class="csl-entry"></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<ins class="diffchange diffchange-inline">alim2018</ins>" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline"><del class="diffchange diffchange-inline">D. Coimbra de Andrade, S. Leo, M. Loesener Da Silva Viana, </del>and <del class="diffchange diffchange-inline">C. Bernkopf, </del><span>“<span class="nocase"><del class="diffchange diffchange-inline">A neural attention model for </del>speech <del class="diffchange diffchange-inline">command recognition</del></span>,”</span> <del class="diffchange diffchange-inline">''ArXiv e-prints'', Aug. </del>2018.</span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline"><ins class="diffchange diffchange-inline">Sabur Ajibola Alim </ins>and <ins class="diffchange diffchange-inline">Nahrul Khair Alang Rashid</ins><span>“<span class="nocase"><ins class="diffchange diffchange-inline">Some commonly used </ins>speech <ins class="diffchange diffchange-inline">feature extraction algorithms</ins></span>,”</span> 2018.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<del class="diffchange diffchange-inline">Majumdar2020</del>" class="csl-entry"></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<ins class="diffchange diffchange-inline">berg2021" class="csl-entry"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">Axel Berg, Mark O’Connor, and Miguel Tairum Cruz<span>“<span class="nocase">Keyword transformer: A self-attention model for keyword spotting</span>,”</span> 2021.</span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"></div></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><div id="ref-choi2021" class="csl-entry"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">Keunwoo Choi, György Fazekas, Kyunghyun Cho, and Mark Sandler.<span>A comparison of audio signal preprocessing methods for deep neural networks on music tagging</span>2021</span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"></div></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><div id="ref-kim2021" class="csl-entry"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">Byeonggeun Kim, Simyung Chang, Jinkyu Lee, and Dooyong Sung<span>Broadcasted residual learning for efficient keyword spotting</span> 2021</span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"></div></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><div id="ref-lyonmfcc" class="csl-entry"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">James Lyon <span>“<span class="nocase">Mel Frequency Cepstral Coefficient (MFCC) tutorial.</span>,”</span></span></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"></div></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><div id="ref-majumdar2020</ins>" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg, <span>“<span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition</span>,”</span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg, <span>“<span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition</span>,”</span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<del class="diffchange diffchange-inline">Zeng2019</del>" class="csl-entry"></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<ins class="diffchange diffchange-inline">sharma2020</ins>" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">3</del>&#93; </span><span class="csl-right-inline"><del class="diffchange diffchange-inline">M. Zeng </del>and <del class="diffchange diffchange-inline">N. Xiao, </del><span><del class="diffchange diffchange-inline">“Effective combination of DenseNet and BiLSTM for keyword spotting,”</del></span> <del class="diffchange diffchange-inline">''IEEE Access'', vol. 7, pp. 10767–10775, 2019.</del></span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">4</ins>&#93; </span><span class="csl-right-inline"><ins class="diffchange diffchange-inline">Garima Sharma, Kartikeyan Umapathy, </ins>and <ins class="diffchange diffchange-inline">Sridhar Krishnan</ins><span> <ins class="diffchange diffchange-inline">Trends in audio signal feature extraction methods</ins></span> <ins class="diffchange diffchange-inline">2020</ins></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<del class="diffchange diffchange-inline">Kim2019</del>" class="csl-entry"></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<ins class="diffchange diffchange-inline">shrawankar2013</ins>" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline"><del class="diffchange diffchange-inline">B. Kim, </del>M<del class="diffchange diffchange-inline">. Lee, J. Lee, Y. Kim, and K. Hwang, </del><span><del class="diffchange diffchange-inline">“Query-by-example on-device keyword spotting,”</del></span> <del class="diffchange diffchange-inline">in ''2019 IEEE automatic speech recognition and understanding workshop (ASRU)'', 2019, pp. 532–538.</del></span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline"><ins class="diffchange diffchange-inline">Urmila Shrawankar and V </ins>M <ins class="diffchange diffchange-inline">Thakare</ins><span> <ins class="diffchange diffchange-inline">Techniques for feature extraction in speech recognition system : A comparative study</ins></span> <ins class="diffchange diffchange-inline">2013</ins></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<del class="diffchange diffchange-inline">warden2018speech</del>" class="csl-entry"></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><div id="ref-<ins class="diffchange diffchange-inline">vygon2021</ins>" class="csl-entry"></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<del class="diffchange diffchange-inline">5</del>&#93; </span><span class="csl-right-inline"><del class="diffchange diffchange-inline">P. Warden, </del><span><del class="diffchange diffchange-inline">“<span class="nocase">Speech Commands: A Dataset </del>for <del class="diffchange diffchange-inline">Limited-Vocabulary Speech Recognition</span>,”</del></span> <del class="diffchange diffchange-inline">''ArXiv e-prints'', Apr. 2018.</del></span></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><span class="csl-left-margin">&#91;<ins class="diffchange diffchange-inline">4</ins>&#93; </span><span class="csl-right-inline"><ins class="diffchange diffchange-inline">Roman Vygon and Nikolay Mikhaylovskiy</ins><span><ins class="diffchange diffchange-inline">Learning efficient representations </ins>for <ins class="diffchange diffchange-inline">keyword spotting with triplet loss</ins></span> <ins class="diffchange diffchange-inline">2021</ins></span></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div></div></div></td></tr>
</table>
Cioflanc
http://iis-projects.ee.ethz.ch/index.php?title=Feature_Extraction_for_Speech_Recognition_(1S)&diff=7025&oldid=prev
Cioflanc: Created page with "<!-- Feature Extraction for Speech Recognition (1S) --> = Overview = == Status: Available == * Type: Semester Thesis * Professor: Prof. Dr. L. Benini * Supervisors: ** Cris..."
2021-09-27T10:53:00Z
<p>Created page with "<!-- Feature Extraction for Speech Recognition (1S) --> = Overview = == Status: Available == * Type: Semester Thesis * Professor: Prof. Dr. L. Benini * Supervisors: ** Cris..."</p>
<p><b>New page</b></p><div><!-- Feature Extraction for Speech Recognition (1S) --><br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** Cristian Cioflan (IIS): [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]<br />
** Dr. Lukas Cavigelli (Huawei Technologies): [mailto:lukas.cavigelli@huawei.com lukas.cavigelli@huawei.com]<br />
<br />
<br />
<!-- TODO: ADD APPROPRIATE CAGEGORIES HERE --><br />
<br />
<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:Digital]]<br />
[[Category:Cioflanc]]<br />
[[Category:Available]]<br />
<br />
= Short Description =<br />
<br />
The objective of keyword spotting (KWS) is to detect a set of predefined keywords within a stream of user utterances. For most of KWS pipelines, as well as any other audio-based task, the acoustic model and/or lingustic model is preceded by a feature extraction segment. Several approaches have been proposed to perform this step, the most notable one being the Mel Frequency Cepstral Coefficients (MFCC). As this technique represents only one of the possible approaches towards feature extraction, and since it has been showed that the performance of the system can be largely influenced by this choice, the aim of this project is to do an in-depth evaluation of feature extraction methods for KWS. <br />
<br />
= Introduction =<br />
<br />
Feature extraction represents the process of deriving essential, non-redundant information from a set of measured data. Through the selection and/or combination of input variables into features, the dimension of the given data can be largely reduced, thus also reducing the computational effort within the processing pipeline. Nonetheless, such techniques should not affect the integrity of the data; the extracted features should still accurately described the data set. Often used interchangeably, we differentiate between feature (pre)processing and feature extraction, defining the former as the techniques applied to alter the data in order to emphasize or remove certain characteristics. To better understand the concept of feature extraction, we will use the Mel-Frequency Cepstral Coefficients (MFCC) as an example. <br />
<br />
Extracted from an audio signal received as input by the system, the MFCCs are one of the current standards in KWS [2, 6]. They are a cepstral representation of the signal, but, compared to the simple cepstrum, the MFCC use equally spaced frequency bands, based on the Mel scale. This leads to a closer representation of the signal to the actual response of the human auditory system. The derivation techniques are described in detail in [5]; we will enumerate the main steps of PyTorch’s1 MFCC computation: <br />
<br />
<ul><br />
<li><p>Windowing - applying a window function (e.g., Hamming window) to each frame, mainly to counteract the infinite-data assumption made during the FFT computation;</p></li><br />
<li><p>Fast Fourier Transform (FFT) - calculating the frequency spectrum;</p></li><br />
<li><p>Triangular Filters - applying triangular filters on a Mel-scale to extract frequency bands;</p></li><br />
<li><p>Logarithm - transforming the MelSpectrogram into LogMelSpectrogram to obtain additive behaviour;</p></li><br />
<li><p>Discrete Cosine Transform (DCT) - calculating the Cepstral coefficients from the LogMel-Spectrogram.</p></li></ul><br />
<br />
<br />
<br />
Feature extraction is often used in Machine Learning settings, and especially in the con-text of Deep Neural Networks (DNN), due to the fact that reducing the feature dimensionality further reduces the computational load, as well as the storage requirements, of such a DNN. Therefore, by pre-determining the relevant features from the input data using human expertise, the model is alleviated from the tedious task of identifying and filtering redundant information. A deep learning pipeline for KWS integrating MFCC is presented in Figure 1. <br />
<br />
In the recent years, it was shown that state-of-the-art results can be achieved not only by using the MFCCs, but also by employing the results of an intermediate step, such as the MelSpectrogram [9] or the LogMelSpectrogram [4]. Moreover, surveys [1, 7, 3, 8] on the topic present alternative strategies in performing feature extraction for audio-based ML tasks. Nevertheless, there are no such works, to the best of our knowledge, presenting a meta-analysis of the different extraction techniques and their impact in the scope of KWS. As Choi et al. [3] have shown, the choice of the feature extraction mechanism has a significant impact in the context of music tagging, so it is only natural to assume that a similar conclusion would be drawn when performing a comparative evaluation for a keyword spotting system. <br />
<br />
<br />
== Character ==<br />
<br />
* 20% literature research<br />
* 40% feature extraction implementation<br />
* 40% evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Must be familiar with Python.<br />
* Knowledge of deep learning basics, including some deep learning framework like PyTorch or TensorFlow from a course, project, or self-taught with some tutorials.<br />
<br />
= Project Goals =<br />
<br />
The main tasks of this project are:<br />
<br />
<ul><br />
<li><p>'''Task 1: Familiarize yourself with the project specifics (1-2 Weeks)'''</p><br />
<p> Read up on feature extraction methods mentioned in the reference materials. Learn about DNN training and PyTorch, how to visualize results with TensorBoard. Read up on DNN models aimed at time series (''e.g''. <span>TCNs, RNNs, transformer networks</span>) and the recent advances in KWS.</p></li><br />
<li><p>'''Task 2: Implement and evaluate the baseline (2-3 Weeks)'''</p><br />
<p>Select a dataset and analyse the models which can represent baselines for our work. Particularly check for publicly available code. The supervisors will provide you with the MFCC implementation that will represent the starting point for the aforementioned analysis. </p><br />
<p>If no code is available: design, implement, and train KWS models, considering the state-of-the-art architectures for time series. </p><br />
<p>Compare the model against the selected baseline and figures in the paper.</p></li><br />
<li><p>'''Task 3: Implement feature extraction techniques (4-5 Weeks)'''</p><br />
<p>Using the referenced work, implement the feature extraction methods against which MFCC will be compared. </p><br />
<p>Optimize said methods with respect to computational effort and storage requirements.</p><br />
<p>(Optional) Perform parameter tuning for the implemented techniques.</p></li><br />
<li><p>'''Task 4: Evaluate feature extraction techniques (4-5 Weeks)'''</p><br />
<p>Evaluate and analyse the accuracy of the KWS pipeline with respect to the implemented feature extraction setting. </p><br />
<p>Evaluate and analyse the accuracy of the aforementioned methods under similar operating regimes. </p><br />
<p>Evaluate and analyse the compatibility between the processing techniques and the DNN architecture. </p></li><br />
<li><p>'''Task 5 - Gather and Present Final Results (2-3 Weeks)'''</p><br />
<p>Gather final results.</p><br />
<p>Prepare presentation (10 min. + 5 min. discussion).</p><br />
<p>Write a final report. Include all major decisions taken during the design process and argue your choice. Include everything that deviates from the very standard case - show off everything that took time to figure out and all your ideas that have influenced the project.</p></li></ul><br />
<br />
= Project Organization =<br />
<br />
== Weekly Meetings ==<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Tgif (See: http://bourbon.usc.edu:8001/tgif/index.html and http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html) or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
==== Final Report ====<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15-minutes talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-deAndrade2018" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">D. Coimbra de Andrade, S. Leo, M. Loesener Da Silva Viana, and C. Bernkopf, <span>“<span class="nocase">A neural attention model for speech command recognition</span>,”</span> ''ArXiv e-prints'', Aug. 2018.</span><br />
</div><br />
<div id="ref-Majumdar2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">S. Majumdar and B. Ginsburg, <span>“<span class="nocase">MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition</span>,”</span> in ''Proc. Interspeech 2020'', 2020, pp. 3356–3360.</span><br />
</div><br />
<div id="ref-Zeng2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">M. Zeng and N. Xiao, <span>“Effective combination of DenseNet and BiLSTM for keyword spotting,”</span> ''IEEE Access'', vol. 7, pp. 10767–10775, 2019.</span><br />
</div><br />
<div id="ref-Kim2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">B. Kim, M. Lee, J. Lee, Y. Kim, and K. Hwang, <span>“Query-by-example on-device keyword spotting,”</span> in ''2019 IEEE automatic speech recognition and understanding workshop (ASRU)'', 2019, pp. 532–538.</span><br />
</div><br />
<div id="ref-warden2018speech" class="csl-entry"><br />
<span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">P. Warden, <span>“<span class="nocase">Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition</span>,”</span> ''ArXiv e-prints'', Apr. 2018.</span><br />
</div><br />
</div></div>
Cioflanc