I usually don’t like slidewares.
Actually as IT engineer working, by now, exclusively in France, I’m facing the PowerPoint problem:
- Too many boring slides,
- too much information per slide,
- a presenter dedicated to read their content.
Therefore, the audience is watching its watch while waiting for a coffee break.
I won’t redo the introduction I already did in a previous post but indeed slides can, from time to time, be a value-add to a presentation.
In this post, I will go a little bit further with the integration of Speech Recognition.
Speech Recognition by Google
The Google Cloud Speech API
It ain’t no secret now, I’m a big fan of machine learning. Machines learn faster than people, and they can now assist us in a lot of boring tasks.
On the base of a neuron network, Google provides an API for speech recognition. It is fairly complete and multi lingual.
Chrome speech recognition
This means that you can develop a web page that will access you microphone, sends what you say to the Google cloud, get the result back and process it in your page.
You can see an introduction here
What can I use that for: A case study?
I had to do a presentation recently. This presentation was about Agility and Devops. The main idea was to give my client a feedback about experiences I had regarding those principles in digital transformation.
I didn’t want to loose my audience with slides. But I wanted to keep the key concepts alive and visible.
So what I did was a one slide presentation only with the keywords I wanted to talk about.
The day before, I though:
“How nice it would be if as far as I speak, the key concepts would appear on screen…”
You may think: “that’s easy, learn your script and click on the right moment”.
Ok, but there are drawbacks:
- You have to learn the script
- You cannot be spontaneous anymore
- It’s a one shot, one displayed, you cannot interact with the points anymore.
What I need is “an assistant” that will listen to me and act as soon as he/she/it hear a buzz word. It’s 2016, this assistant is a machine, and I can teach it to act correctly.
Here is a little demo of the end product (don’t pay to much attention to the content, I said nonsense for the demo)
And another one in French.
For every keyword I did a match to an object of my DOM and simply changed its
visibility style from
And the corresponding html section for the communication keyword:
The speech recognition engine detects the sentence and gives a confidence note about its recognition.
All the potential results are stored in an array (here
phrases). I’ve used them all so I was more confident not to miss a word.
Making them blink
As I was not fully confident in the solution (it was late in the night and the show was the next morning), Therefore I made a fall-back solution. All the words were displayed, and I used a little CSS tweak to make them blink when they were pronounced. This was done by adding and removing a css class to the concerned node. The logic remains the same.
Conclusion and TODO
Anyway, as a quick and dirty solution, the goal is achieved.
Another Idea would be to plug this with a NLP engine to perform stemming or lemmatization to do a better decoding and be even more machine learning compliant. This could be done with the help of MITIE