iOS 10 introduced speech recognition
Speech is designed to convert an acoustic model to phonetic representation, that is then transcribed to a physical representation. Sometimes there are multiple matches, so we must do more than just that. Looking at context we can disambiguate values with a language model. This was how it was modeled in iOS 10.
In iOS 17 you can customize the language model for your app to make recognition more appropriate for your app. You will boost your model with phrases that your app needs, you can tune it to weight certain phrases in your system. You can also use templates to load a lot a patterns like in chess.
You can also define spelling and pronunciations for domains like medical, etc. Again a chess example:
Training data is bound to a single locale – so you will need to use standard localization methods.
Loading a language model will have latency so run on a background thread and hide behind some UI, like a loading screen or other method.
Customization data is never sent over the network, so you should focus your on the device to; however wise it will not load the language mode.