{"id":3315,"date":"2023-06-11T13:08:42","date_gmt":"2023-06-11T17:08:42","guid":{"rendered":"https:\/\/michaelrowe01.com\/?p=3315"},"modified":"2023-06-11T13:09:10","modified_gmt":"2023-06-11T17:09:10","slug":"customize-on-device-speech-recognition","status":"publish","type":"post","link":"https:\/\/michaelrowe01.com\/index.php\/blog\/customize-on-device-speech-recognition\/","title":{"rendered":"Customize on-device speech recognition\u00a0"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"660\" height=\"371\" src=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?resize=660%2C371&#038;ssl=1\" alt=\"\" class=\"wp-image-3318\" srcset=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?w=1920&amp;ssl=1 1920w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Customize-on-device.png?w=1320&amp;ssl=1 1320w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/figure>\n\n\n\n<p>iOS 10 introduced speech recognition<\/p>\n\n\n\n<p>Speech is designed to convert an acoustic model to phonetic representation, that is then transcribed to a physical representation.&nbsp; Sometimes there are multiple matches, so we must do more than just that.&nbsp; Looking at context we can disambiguate values with a language model.&nbsp; This was how it was modeled in iOS 10.<\/p>\n\n\n\n<p>In iOS 17 you can customize the language model for your app to make recognition more appropriate for your app.\u00a0 You will boost your model with phrases that your app needs, you can tune it to weight certain phrases in your system.\u00a0 You can also use templates to load a lot a patterns like in chess.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"660\" height=\"371\" src=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?resize=660%2C371&#038;ssl=1\" alt=\"\" class=\"wp-image-3317\" srcset=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?w=1920&amp;ssl=1 1920w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.01.48-2.png?w=1320&amp;ssl=1 1320w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/figure>\n\n\n\n<p>You can also define spelling and pronunciations for domains like medical, etc.\u00a0 Again a chess example:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"660\" height=\"371\" src=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?resize=660%2C371&#038;ssl=1\" alt=\"\" class=\"wp-image-3316\" srcset=\"https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?w=1920&amp;ssl=1 1920w, https:\/\/i0.wp.com\/michaelrowe01.com\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-11-at-13.02.58-2.png?w=1320&amp;ssl=1 1320w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/figure>\n\n\n\n<p>Training data is bound to a single locale &#8211; so you will need to use standard localization methods.<\/p>\n\n\n\n<p>Loading a language model will have latency so run on a background thread and hide behind some UI, like a loading screen or other method.<\/p>\n\n\n\n<p>Customization data is never sent over the network, so you should focus your on the device to; however wise it will not load the language mode.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>iOS 10 introduced speech recognition Speech is designed to convert an acoustic model to phonetic representation, that is then transcribed to a physical representation.&nbsp; Sometimes there are multiple matches, so we must do more than just that.&nbsp; Looking at context we can disambiguate values with a language model.&nbsp; This was how it was modeled in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_wp_convertkit_post_meta":{"form":"-1","landing_page":"0","tag":"0","restrict_content":"0"},"hide_page_title":"","footnotes":""},"categories":[2,3],"tags":[752,322,753,680],"class_list":["post-3315","post","type-post","status-publish","format-standard","hentry","category-blog","category-personal-softwareandit","tag-day-7","tag-privacy","tag-speech-recognition","tag-wwdc23"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/posts\/3315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/comments?post=3315"}],"version-history":[{"count":1,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/posts\/3315\/revisions"}],"predecessor-version":[{"id":3319,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/posts\/3315\/revisions\/3319"}],"wp:attachment":[{"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/media?parent=3315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/categories?post=3315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michaelrowe01.com\/index.php\/wp-json\/wp\/v2\/tags?post=3315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}