01 June, 2019

Artificial Intelligence : Speech To Text Recognition Bot!

Speech recognition is a standard for modern applications; users expect to be able to speak, be understood, and be spoken to. The Microsoft Cognitive Services – Speech API allows you to easily add real-time speech recognition to your app, so it can recognize audio coming from multiple sources and convert it to text the app understands.

In this article, I would walk you through the steps for creating your first Speech-to-Text artificial intelligence bot in a simple C# console application using the Microsoft Speech Cognitive API.

Please note that I have purposefully left out some complex areas or haven't deep dived into details somewhere inorder to make this article an easy read and help as a quick guide from where to start in the Microsoft Azure Cognitive space.


Prerequisites:
  • Microsoft Visual Studio 2017 (Though you can try on VS2015, but this demo is on VS2017)
  • Azure Subscription (Free subscription of 30 days will also do)
  • Basic C# programing knowledge

Setting up the Speech API in Azure

  • Login to Azure (If you do not have any subscription already then create one else login to your existing one)
  • Click the “+ Create a resource” option
  •  Search for “Speech” in the search box and select 

  •  Click “Create”

  • Fill up the necessary details and click Create. The Name, Resource Group and location you can choose as per your preference.
 

  • Wait few seconds for Azure to create the service for you. Once created it will take you to it’s landing page (Quick start)

  • Now select the “Keys” property and copy the Key 1. You can also copy Key 2 if you wish as any one will solve the purpose. Keep it in a notepad. Will need this later.


  •  We are done setting up the Speech API in Azure.

 Visual Studio Application – To see the action

  • Open visual studio 2017 and select File >> New >> Project. Select “Console App (.Net Framework)” as your project type. Please note that Visual C# is my default language selection. Choose the project name as per your wish (I have given SpeechToText_AI)

  • Now inorder to use the Speech API we need to take reference of Microsoft.CognitiveServices.Speech nu-get library. So, right click on the project >> Manage NuGet Packages. Browse for the library and install the version

  • Open Program.cs and add the following using statement at the top
using Microsoft.CognitiveServices.Speech;
  • Create a separate async method to perform the speech recognition operations (for example : ConvertSpeechToText)
public static async Task ConvertSpeechToText()
{
   var config = SpeechConfig.FromSubscription("<>", "<>");

   using (var recognizer = new SpeechRecognizer(config))
   {
      Console.WriteLine("\nSay something in english...");

      var result = await recognizer.RecognizeOnceAsync();

      if (result.Reason == ResultReason.RecognizedSpeech)
      {
         Console.WriteLine($"Speech recognized: {result.Text}");
      }
      else if (result.Reason == ResultReason.NoMatch)
      {
         Console.WriteLine($"NOMATCH: Speech could not be recognized.");
      }
      else if (result.Reason == ResultReason.Canceled)
      {
         var cancellation = CancellationDetails.FromResult(result);
         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

         if (cancellation.Reason == CancellationReason.Error)
         {
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
         }
     }
  }
}

  • Create a separate async method to perform the speech recognition operations (for example : ConvertSpeechToText)
using System;
using System.Media;
using Microsoft.CognitiveServices.Speech;
using System.IO;
using System.Threading;

namespace SpeechToText_AI
{
    static void Main(string[] args)
    {
        ConvertSpeechToText().Wait();
        Console.WriteLine("\nPlease press a key to exit.");
        Console.ReadLine();
    }

    public static async Task ConvertSpeechToText()
    {
       var config = SpeechConfig.FromSubscription("<>", "<>");

       using (var recognizer = new SpeechRecognizer(config))
       {
           Console.WriteLine("\nSay something in english...");

           var result = await recognizer.RecognizeOnceAsync();

           if (result.Reason == ResultReason.RecognizedSpeech)
           {
               Console.WriteLine($"Speech recognized: {result.Text}");
           }
           else if (result.Reason == ResultReason.NoMatch)
           {
               Console.WriteLine($"NOMATCH: Speech could not be recognized.");
           }
           else if (result.Reason == ResultReason.Canceled)
           {
               var cancellation = CancellationDetails.FromResult(result);
               Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

               if (cancellation.Reason == CancellationReason.Error)
               {
                  Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                  Console.WriteLine($"CANCELED: Did you update the subscription info?");
               }
           }
      }
   }
}

We are done with the coding part as well. Now press F5 and run the program.

The command prompt will show you the message: Say something in english...

Start speaking slowly (use your computer speaker/microphone). Pronounce clearly each word you speak and see your small few lines of artificial intelligence code is responding back with what you are saying in plain text. Voila!! Your Speech to Text intelligence is up and running.

Well Done! 

Do share with me about your experience and what you have built upon this foundation. You can take it upto any level and integrate. I would love to hear from you.

No comments:

Post a Comment