Adventures with Ami

Adventures with Patch #4 Moving

Compiling an Arduino Sketch on the Pi

I am using an Arduino Uno to control the two dual H bridges I have controlling the motors.

The pi and the uno are connected using a USB cable. In order to up load sketches from the pi command line (I control the pi using SSH) I followed the tutorial https://thezanshow.com/electronics-tutorials/arduino/tutorial-11

I used sudo apt install arduino-mk to install the arduino-mk. However when I came to run make as directed by the tutorial I hit an error – avrdude: can’t open config file “/usr/share/arduino/hardware/tools/avr/etc/avrdude.conf”: No such file or directory. After a little searching I found a fix – https://groups.google.com/g/linux.debian.bugs.dist/c/EFFx-yaMo5Y

patch@patch:/usr/share/arduino/hardware/tools/avr> sudo mkdir etc
patch@patch:/usr/share/arduino/hardware/tools/avr> cd etc
patch@patch:/usr/share/arduino/hardware/tools/avr/etc>sudo ln -s /etc/avrdude.conf

Still following the tutorial I used sudo apt install screen So I can see the serial output from the arduino. Screen was already installed on my pi.

So after amending my blink.ino I can make it, upload it start screen and clean up after by using make upload monitor clean as my command.

Once I have finished with the monitor I can exit it by using ctr-a and ctr-d. Then see it is still there by screen -list , and then you can re-attach by screen -r

In order to upload more code to the arduino the serial connection needs to broken – screen -X kill does the trick.

Don’t forget to put the arduino sketch into an .ino file. This file along with a copy of the Makefile should be in it’s own directory. In that directory you can safely issue the make upload monitor clean command. and not have any conflicts with other .ino files.

patch_control.ino

Arduino sketch uploaded to the Uno from the pi. It is on github – https://github.com/patchnspotnami/patch

It is deigned to accept two integers – as strings, I tried but could not get integers to transmit nicely over serial. Just gave up and used strings instead :-). The first integer is the heading and the second is the time to move for, in seconds. These two integers are generated by AI in ptalk.py

On initialisation the sketch calculates the error in the MPU, and generates a correction factor. The robot yaw does drift slightly, about 1 degree every 3 seconds. I think this is because there is a finite time between readings of the MPU. To improve this the IMU could be read continuously, in a separate thread for example.

In loop it checks for new serial data coming in, if not it carries on with the previous command.

jmoving is the function that controls the robot, if the angle is greater than 90′ it spins until it is less than 90 and then moves forward. The relative PWM of each side is given by the graph: I think in the coding I made an error and -ve angles are to the right, not left as they should be. I did not bother to correct the coding merely giving it mirror commands.

ptalk.py

Python program to enable the pi to respond to questions and control the movement of the robot. I have already outlined this in my previous post.

I recently amended ptalk.py to include in the context sent to the AI an indication of the format I expect any moving command to come back in. This command is picked up in jspeak_syn the function that synthesises the speech. I had some trouble with the AI reply containing lots of spaces, so I had to strip these out. The function extract_numbers was written by Perplexity AI. The first time I have used code from an AI 🙂

Video

A sort video of Patch responding to questions and a command to move forward – https://youtube.com/shorts/VrGBR-Skw-I?feature=share

Notice wake word is immediately detects a female voice. I think this is a function of the training being on a female voice.

Adventures with Patch #3 Talking

Talking

Commercial v Open Source

There has been a long pause between posts, not because I havnt been doing anything. But because I was seduced by commercial audio libraries. But then my activation code was arbitrarily (to me) disabled! I am talking here of Picovoice.

I have got Patch talking pretty well, not was well as I would like. But time to move on.

Amplifier and Speaker

From previous experience the output from the headphone jack is intended for headphones. Duh! So to get a decent volume out of the speaker I need an amplifier. I use a cheap one I bought from Jaycar a few years ago https://www.jaycar.com.au/9v-0-5w-mono-audio-amplifier-module/p/AA0373. This hooked up to a small speaker I had kicking about in my box works ok. It is designed for a 9V power input, But I only have 5V so that is what I use.

I have bought a PAM8406 5v 5w amplifier, but not yet hooked it up. Hopefully when I do, Patch will be able to fully express himself!

Picovoice

I find picovoice’s libraries work really well. However when I went to look at my accounts, I found they had been deleted! – Lack of use, only 2 months. And that over Christmas. Anyway I signed up with a free new account and obtained a new access_key.

With the new access key I was able to develop my code using pvcheetah (speech to text), pvporcupine (wake word), pvorca (text to speech) and pveagle (speaker identification).

One day, my code started misbehaving. Pvcheetah only returning text at the end of a message. It only looked at speech produced by flush(). It took me a long time to figure out it wasn’t my code but a problem in pvcheetah.

I abandoned pvcheetah but carried on with the other Picovoice libraries. All was good, just about finished and then … I discovered my access key was disabled! Again no explanation. So I abandoned picovoice and have gone completely open source.

OpenWakeWord

I am currently using OpenWakeWord. It is not as easy to use as picovoice – the documentation is non-existent. /https://github.com/dscripka/openWakeWord

I was able to train my wake word library using Colab, with a T4GPU selected from Runtime/Change runtime type drop down menu https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb?usp=sharing#scrollTo=qgaKWIY6WlJ1 The finished model is automatically downloaded to the calling computer.

The Colab training uses a female voice automatically generated by piper. I found the detection of the wake-word is far better when the speaker is female. A male speaker is far less likely to trigger the wake word.

A bit of trial and error with the model settings in the third section of the colab code. I trained three – small, medium and large number of examples. I used a low false activation penalty as I seem to be getting few false activation. But difficult to get activation – I use 0.5 as the threshold. I am having problems as Hey Patch (my wake word) ends with a ‘ch’ sound, which I frequently drop.

The only problem I ran into here was the address of the models. I discovered I had a space in the address string. On github there is a small program, try_wake.py that displays the prediction from a model.

I spent some time trying to train verifier models for myself. And two other people https://github.com/dscripka/openWakeWord/blob/main/docs/custom_verifier_models.md

But I ran into an error, that I believe was caused by the model failing to recognise the wake word in one of my three positive examples. The way out of this, I believe is to check the threshold of a particular clip by using model.predict_clip(“path/to/wav/file”). But I got fed up and decided to move on. Maybe I will go back one day. I have uploaded record_to_wav.py to github. This I used to create the training files.

Github

The github address is https://github.com/patchnspotnami/talking

Piper

For the text to speech engine I use piper. I find it very good. https://github.com/rhasspy/piper Again the documentation is not great. But I found an excellent thread https://github.com/rhasspy/piper/discussions/326 that explained everything I wanted. I use the “en_GB-northern_english_male-medium.onnx” that I downloaded from https://github.com/rhasspy/piper/blob/master/VOICES.md

Output Stream

I don’t use pyaudio to generate the output audio as it throws a weird error. I believe it may be a bug to do with the pi. Anyway I use sounddevice instead and it works well. https://pypi.org/project/sounddevice/

Vosk

For the speech to text engine I use Vosk. https://pypi.org/project/vosk/

I use the vosk-model-small-en-us-0.15.zip model downloaded from https://alphacephei.com/vosk/models. It only appears in the pi directory as a .zip file, which is strange. But it seems happy enough.

The documentation is not great but by looking at the python examples https://github.com/alphacep/vosk-api/tree/master/python/example I was able to figure out how to use it.

MIcrophone

I use the Seeed 4 mic array https://www.seeedstudio.com/ReSpeaker-Mic-Array-v2-0.html, it is a little expensive. But from experience it is well worth it, as it screens out background noise very well and captures speech very well. It simply plugs into the usb port of the pi. The wiki at https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/ explains it well. Though it is a little dated, I have written about before in a previous post.

ptalk.py

The main program https://github.com/patchnspotnami/talking/blob/main/ptalk.py. It works on 3 threads: speak_thread which looks in the speak_queue and when something appears it sends it off to the speaker. This controlled by the function jspeak(). In jspeak() I take the opportunity of the patch_talking flag being set to empty the input queue. No data items are added to the queue when patch is talking (see callback()). This meant that some audio from before patch was talking was stored in the queue. This can cause problems, best to empty the queue!

callback : this is the callback function called by the microphone stream. It puts the incoming audio data into the input_queue . Discarding it if patch_talking is set,

main_thread : I use start_time as a way of measuring time since last response. A long pause indicates the conversation is over. Then Patch reverts to listening, and has to be reactivated with the wake word.

sd.RawInputStream starts the input stream. It uses a plain python buffer object, I use this because it works! and because vosk and OpenWakeWord use different formats for their inputs. This calls the callback whenever the buffer is full.

I had a problem with Patch re-hearing the wake word when a conversation ended, I believe this is because as soon as the wake word was detected no more data was entered into the buffer. So I clean out the buffer by putting more audio in there, 10 buffer fulls – about 1/3 sec seems to do the trick.

If Patch is still talking, reset the start time as this is not a pause

I have some trouble understanding when voss reaches the end of its input stream. The .AcceptWaveform(data) function seems to return True if the end is reached., then the text is returned in .Result(). If not then the function .PartialResult() returns the partial result. I don’t know how the end of the input is detected. Occasionally if there is no sound for a time, the .AcceptWaveform(data) returns True, this causes problems as a blank string is sent off to ChatGPT. I catch this by checking the length of the .Result string. Anyway it all seems to work pretty well.

Adventures with Patch #2

After constructing Patch’s chassis, time to move onto programming the ESP32-S2 Thing Plus.

This what it looks like now.

I downloaded the Arduino IDE 2 from the official site https://docs.arduino.cc/software/ide-v2/tutorials/getting-started/ide-v2-downloading-and-installing

I also followed https://randomnerdtutorials.com/installing-esp32-arduino-ide-2-0/ and https://raspberrytips.com/install-arduino-ide-on-ubuntu/

To get the board definitions. In the preferences of the file menu of the Arduino IDE. In the additional boards manager URLs I pasted : https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json

Then in Tools > Board > Boards Manager… I searched for esp32 and selected the esp32 by Espressif Systems

In the Arduino IDE I selected the ESP32S2 Dev Module board. All worked well 🙂

I2C on the ESP32S2

The SDA and SCL pins are GPIO 1 and GPIO 2 respectively

//

// Modified from https://playground.arduino.cc/Main/I2cScanner/

// --------------------------------------
#include <Wire.h>
// Set I2C bus to use: Wire, Wire1, etc.

#define WIRE Wire

void setup() {

Serial.begin(9600);

while (!Serial)

delay(10);

Serial.println("\nI2C Scanner");

# define SDApin 01

# define SCLpin 02

WIRE.begin (SDApin, SCLpin); // sda= GPIO_01 /scl= GPIO_02

}

Calibrate the MPU 6050

I have been following the tutorial at https://howtomechatronics.com/tutorials/arduino/arduino-and-mpu6050-accelerometer-and-gyroscope-tutorial/

I have been having a lot of trouble getting sense out of the 6050. Despite re-writing the example code I was not able to get any sensible result from the 6050.

Reading the registers directly was simply not working for me. I must have an error somewhere. After a few weeks of searching I gave up.

Install Arduino 2

I followed the instructions https://raspberrytips.com/install-arduino-ide-on-ubuntu/. My paths are :

Name=Arduino IDE
Exec=/opt/arduino_ide_2.2.1_Linux_64bit/arduino-ide
Icon=/opt/arduino_ide_2.2.1_Linux_64bit/resources/app/resources/icons/512×512.png

But there was a problem installing the desktop shortcut. I copied the .desktop file I created in the tutorial to the Desktop. Then I allowed it to be executable and trusted.

Connecting to the ESP32-s2 over wifi

Following the comments here https://forum.arduino.cc/t/setup-communication-between-desktop-pc-and-esp32-over-wifi/1152476/7

That worked OK.

Don’t forget to :

add yourself to to the dialout group : sudo usermod -a -G dialout

install python-is-python3, and pip3 and pyserial

DF Robot Dual Motor Controller

Motor Control Pins

E1,E2: Motor Enable Pin（PWM Control）
M1,M2: Motor Signal Pin. Eg: M1 = 0,the motor rotates in forward direction. M1 = 1,the motor rotates in back direction.

E	M	RUN
LOW	LOW/HIGH	STOP
HIGH	HIGH	Back Direction
HIGH	LOW	Forward direction
PWM	LOW/HIGH	Speed

Note: LOW = 0; HIGH = 1; PWM = 0~255

Arduino Uno

I was unable to get the USP32-s2 to talk nicely to the MPU6050. The published libraries for esp32 and mpu6050 don’t work as the esp32-s2 needs to define the SDA and SCL pins differently.

https://components101.com/sensors/mpu6050-module

https://github.com/jrowberg/i2cdevlib

So I was reduced to trying to interrogate the relevant registers directly. But after a lot of difficulty I was still unable to get sensible results out. I believe I was having difficulty with my bit wise logic.

https://howtomechatronics.com/tutorials/arduino/arduino-and-mpu6050-accelerometer-and-gyroscope-tutorial/

https://cdn.sparkfun.com/datasheets/Sensors/Accelerometers/RM-MPU-6000A.pdf

I use the adafruit mpu6050 libraries available within library manager in the Arduino IDE. (n.b. for Patch I have returned to using Arduino IDE 1.8.19)

The driver pins for the motors are:

e4 – 11
m4 – 12
e3 – 10
m3 – 9
e2 – 6
m2 – 7
e1 – 5
m1 – 4

MPU6050 with UNO

The connecting pins used are:

SCL – A5
SDA – A4
INT – 2

The MU6050 is powered from the UNO 5V

The https://howtomechatronics.com/tutorials/arduino/arduino-and-mpu6050-accelerometer-and-gyroscope-tutorial/ tutorial worked immediately on the UNO 🙂

I am only interested in Z axis, that is the yaw. Hopefully Patch will not pitch or roll! After tweaking the error – mine turned out to be 1.98, it works certainly well enough for me.

Adventures with Spot #1

Starting with a new install of Ubuntu 22.04 server on my Pi4. I have called my new version of spot spot@spot.local

I followed the post Adventures with Freenove Big Hexapod Kid for Raspberry Pi and set up the Pi.

I am interested in using the SEEED usb microphone I have so I followed the instructions at https://wiki.seeedstudio.com/ReSpeaker-USB-Mic-Array/#features

I ran into trouble as the supplied code was tested on Python2.7. But everything is Python3 now. It was easy enough to modify the code – put () into the print command and replace the line in tuning.py with response = struct.unpack(b’ii’, response.tobytes()) The original was response.tostring() (I think!). I cant find the forum I got it from.

I also had problems with import usb in Python. This was because I used pip install pyusb instead of sudo pip install pyusb. I did that because of the warnings you receive when you use sudo with pip. The result of that was when I used sudo to run tuning.py it couldn’t find usb But when I simply ran tuning.py without sudo it found usb but didnt have permission to access the usb ports.

I got around that by allowing all users to access the usb https://devicetests.com/ubuntu-usb-device-access-guide. Following the instructions, followed by a reboot did the trick. So now I don’t need sudo to run code to access the microphone.

Pyaudio

The instructions on SEEED ReSpeaker web site are for python 2.7. To install pyaudio use sudo apt install python3-pyaudio

gTTS

I find espeak to be metallic so I looked t gTTS. Even though you need an internet access to work it. It seems to work fine, though it doesn’t seem to have a male voice. Only female, though with different accents https://gtts.readthedocs.io/en/latest/module.html#.

It converts the text to speech well, but it returns it as an mp3 file. I think. To play it, the suggestion is to save the file and then play it from disk. It is not happy playing it without saving first. After a long search of the forums, I found a snippet of code using io.BytesIO() https://docs.python.org/3/library/io.html#id2 as a place for gTTS to write its output. Then use pygame to read it and play it. I cant reference the original post as I have lost it. But here is the code I lifted from the internet :

def speak(my_text):
with io.BytesIO() as f:
gTTS(text=my_text, lang=’en’, tld = ‘com.au’).write_to_fp(f)
f.seek(0)
pygame.mixer.init()
pygame.mixer.music.load(f)
pygame.mixer.music.play()
while pygame.mixer.music.get_busy():
continue

In the call to gTTS the tld setting is used to set the accent.

All this plays on the default speaker.

OpenAI

I am using GPT-4 as the chat bot engine. So I paid US$10 and got access! The API key is stored locally as an environment variable, OPENAI_API_KEY. You can check this by $echo $OPENAI_API_KEY. This will show you your key, if it is there. I found the documentation a bit dense so I followed the instructions for only one project – thats all I have! I set the environment variable OPENAI_API_KEY by editing the /etc/environment file. https://mkyong.com/linux/how-to-set-environment-variable-in-ubuntu/

I added the line OPENAI_API_KEY = my-secret-key (replace my-secret-key with the key you obtained from OpenAI). Reading the section on setting an API key for a single project at https://platform.openai.com/docs/quickstart?context=python. After rebooting the computer the little example code there worked well.

The environment variable is interrogated when the openai is initiated:

client = openai.OpenAI()#initialise chatgpt

I notice the syntax has changed slightly from the beginning of the year. Especially for calling chat.completions, and the results from it. The old prompt is now called messages, which takes the form of a python list of dictionaries. To extract the content of the results from chat.completions I used chunk.choices[0].delta.content. Slightly different from before.

I am using stream = True in my calling parameters. This is because I don’t want to wait until I get the full reply from GPT before I start speaking it. To see how to work this I used the OpenAI Cookbook entry https://cookbook.openai.com/examples/how_to_stream_completions. But that uses the old syntax.

Interrupting the GPT reply.

Even though I ask for a brief reply GPT is pretty verbose. It would be nice to be able to interrupt the reply. I have a function that checks to see if the user is shouting at it to stop!

I use cobra to detect if speech is present, then porcupine to check if the wake words are there.

interrupt_flag = 0

def interrupt_check():
     print("interrupt check")
     global interrupt_flag 
     ipcm = get_next_audio_frame()#look in audio stream
     result = cobra.process(ipcm)
     print("speech prob result...", result)
     if (result > speech_prob):  #speech detected
         iresult = porcupine.process(ipcm) # look for wake word
         if iresult >= 0:#detected wake word!

            interrupt_flag = 1
            print ("interrupted")

I use interrupt_flag to tell the function controlling GPT to break. This – I think – causes chat.completions to crash. It seems to work, but only when GPT pauses to take a breath.

Code

You can see the current (18 Nov 2023) – not polished – state of the code here : https://drive.google.com/file/d/1Uo5fiae9ZWpCTdJzf6Wcf4yiN1jXsdHN/view?usp=sharing

Adventures with Patch #1

Designing Patch

Patch is a 4 wheeled robot, I want him to behave as a chat bot but also recognise people. Spot, one of my other robots cant do it as his camera is only 15 cm from the floor. I could try foot recognition, but I think not. So Patch’s camera has to be at least a meter from the ground. I originally conceived him as a dog, but in view of his long neck he became a giraffe.

Patch

The motors will be controlled by an Arduino. Patch’s primary sensors will be an oak-d stereo camera, a time of flight camera and a SEEED microphone array. He will also carry a loud speaker. These will all be controlled by a Pi4, though I suspect he may need 2 Pi’s. One dedicated to audio and another dedicated to video. He will communicate with the rest of his family (Spot and Ami) using ROS2 Humble.

I am following the series of tutorials published by Josh Newmans – https://articulatedrobotics.xyz/

I have previously followed him for help with ROS2 and Ami (my Nao), but now I have restarted the whole series as he details how to physically make the robot.

Power

I have chosen to power Patch using 3 x 16850 batteries. They will give a nominal voltage of 11.1v I will use a 3 Cell 18650 BMS Module to control my 3 18650s. This has continuous discharge of 8A, but a 10A peak discharge. The motors I have chosen have a current rating of 0.68A (but a stall current of 2A) , and following from Josh’s post I calculate all the electronics will come in at less than 5A. So the current draw from my battery pack should be about 8A when everything is running. Just inside the limit. Though if all the motors stall at once I will be in trouble. To guard against that I will install a 10A fuse in the power circuit.

I will house the 3 batteries in individual battery holders, wired up as in the diagram. This module charges and discharges through the GND and PWR wires. To charge it needs a continuous 12.6V . I plan to charge the batteries on the robot so I will use a Sparkfun Constant Current Power Supply – 12.6V 10A. The DC power will be supplied to that by a 12V 2A plug pack.

I will use a DFRobot DC-DC Power Module 25W to convert the 11.1V out of the battery pack to a stable 5V supply to power the Arduino, Cameras, and Pi. This has a rated power of 25W, so 5A at 5V should be OK.

Motors

I have chosen a geared 12V DC motor. It is rated at 9 kg.cm torque, 45RPM. Drawing 0.68A at 12V. Though I will be supplying 11.1V I am hoping that will be sufficient torque. The wheels I have are 6cm radius. The general consensus about the torque required for a 2 powered wheel robot is that one of the motors should be able to lift the robot. I estimate Patch will weigh in at 3kg. According to that rule of thumb my motors should be 3 x 6 = 18 kg.cm. But I have 4 of them! So I think my 9 kg.cm torque should be fine.

The circumference of the wheels is about 38cm, so at 45 RPM he will not move 1.7 m/min, not very fast I am hoping this will not be a problem as to use higher power motors will demand a higher rated battery and controller.

The motors will be supported by the Pololu Stamped Aluminium L-Bracket Pair for 37D mm Metal Gearmotors. These should, in theory fit the holes in the motors.

The wheels will be attached to the motor ‘D’ shaft using 1309 Series Sonic Hubs.

To control the motors I am planning to use two DFRobot 2A dual motor controllers. This is based on the L298N chip. The maximum rating is 2A per motor. So I have a little capacity to spare if the motors are not big enough.

Control

To control the motors I plan to use an Arduino uno that I have already. The plan is for the Arduino to communicate with the pi over a serial connection. The Arduino will receive a target heading and direction from the pi, a target position in polar coordinates.. This is relative to the base of the robot. The pi will use camera and directional audio to calculate the coordinates to send to the Arduino.

I will use an MPU-6050 Module 3 Axis Gyroscope + Accelerometer unit to provide data to the Arduino about the error from the target coordinates

Back Order

Some of the equipment is on back order, so I will have to wait about 2 weeks before I actually receive it.

Circuit Diagram

I have put together the components. As shown in the schematic.

When I first wired up the BCM to the batteries I found no voltage across P+ and P-. But after talking with Graham and looking on the blogosphere. The board needs an initial charging voltage to kick into life. I carried on and wired up the board to the Constant Current Power – and it worked!

The output from the DC – DC 5V Power Board was actually 5.06V. So following the instructions for powering the ESP32 Thing I soldered a 10ohm resister into the power line.

The MPU6050 is powered directly from the 5V DC Power Board. The SCL and SDA ports are wired to the SCL and SDA ports on the Thing respectively.

To control the motor drivers the pins are assigned –

Motor 1
- Enable1 – GPIO 03
- Motor1 – GPIO 34
Motor 2
- Enable2 – GPIO 33
- Motor2 – GPIO 37
Motor 3
- Enable3 – GPIO 12
- Motor3 – GPIO 13
Motor 4
- Enable4 – GPIO 10
- Motor4 – GPIO 11

Adventures With Ami. A new Start

After a long break I managed to retrieve my Nao V5 I call Ami. When I came to it the battery pack would not charge, claiming it was full. The robot would not turn on. Fortunately after trying a number of times, 4 I think, it finally slowly came to life. Now it will take a charge and it is now running for about an hour on the battery – phew!

There are a few on line comments about it – https://robohack.org.uk/viewtopic.php?f=5&t=69&fbclid=IwAR15LnL_Z3vzEFYQCJL4Xx8FupMXZ3JwR8EjTYxjBfzRsVnUiRC8DE1GcC8 or https://www.bartneck.de/2021/02/03/repair-of-broken-nao-robot-battery-pack/

Chorographe

The Nao V5 is not supported by Softbank anymore. But the Chorographe 2.1.4 can be downloaded here : https://support.unitedrobotics.group/en/support/solutions/articles/80001033994-nao-v4-v5-naoqi-2-1-4-13-

On my Ubuntu 22.04 machine. I downloaded the .run file for linux. Then chmod +x file_name followed by sudo ./filename to install it. Dont forget the licence number is : 562a-750a-252e-143c-0f2a-4550-505e-4f40-5f48-504c

It didnt work on my Ubuntu 22.04 machine. But it works well on Ubuntu 16.04

Naoqi_bridge

The Aldebaran web site still has the naoqi 2.1.4 SDKs up https://www.aldebaran.com/en/support/nao-6/downloads-softwares/former-versions?os=49&category=76. But I am not sure how long they will maintain that. To connect to the Nao with Ros Kinetic I use naoqi_bridge – http://wiki.ros.org/nao/Tutorials/Installation

VoiceRecognition

I am running Python 2.7 so I am limited in the voice recognition libraries I have available.

I started with SpeechRecognition, not the latest version but version 3.8.1, the last ne to support Python 2.7 http://pip install SpeechRecognition==3.8.1

To fulfil the requirements I installed PyAudio but version 0.2.11 (pip install PyAudio==0.2.11). At first that failed with a portaudio.h file not found error. I fixed that with sudo apt-get install portaudio19-dev python-pyaudio. From https://stackoverflow.com/questions/48690984/portaudio-h-no-such-file-or-directory

Installing these packages I was running into problems with pip. Fixed with pip install –upgrade “pip < 21.0”.

When trying to install pocketsphinx I ran into similar problems. I had to install swig (sudo apt-get install swig) also I had to install – sudo apt-get install libpulse-dev.

ZeroMQ

I had become fixated on using ROS to communicate with the Nao. But my friend Graham suggested I use ZeroMq, a lot simpler than trying to get Naoqi-bridge to work.

I am running Python 2.7 ZeroMQ on my Ubuntu 16.04 machine (I call it Ami) , with Chreographe 2.14 and Python 3.10 ZeroMQ on my Ubuntu 22.04 machine (I call it spot_main).

Initially I had trouble resolving Ami (Ubuntu 16.04) from spot_main (Ubuntu 22.04). But following the post at https://askubuntu.com/questions/144636/why-do-none-of-my-local-servers-resolve I

edit the /etc/nsswitch.conf file.

cat /etc/nsswitch.conf
...
#hosts:      files mdns4_minimal [NOTFOUND=return] dns mdns4
hosts:      files mdns4_minimal dns [NOTFOUND=return] mdns4

It worked!

Audio streaming from Nao

ahttps://stackoverflow.com/questions/24243757/nao-robot-remote-audio-problems

Adventures With Ami: Using GPT-3.5 on Streaming Audio

Away from home for a week, with a little time on my hands and after talking to Maffy (my son, junior doc in hospital) I thought it would be fun to write a python script that would automate his note taking. The latest version is here : https://drive.google.com/file/d/1hhEjKgoq4BHMQuzazvEmufyp09nkvql8/view?usp=sharing

Still haven’t figured out how to use github!

The consult normally has a 6 part structure.

Picovoice

I am using picovoice to capture the audio stream from the microphone (pvrecorder) and pvporcupine as a wake up word detector.

After installing picovoice with pip I found that it had picked up an old version and the custom wake word library as downloaded from the picovoice console was not working. So had to run python3 -m pip install –upgrade picovoice = all good.

Once the wake up word is detected I use pvcheetah as a speech to text engine. It works well. As the consult proceeds the results from cheetah are added on the end of a growing string (partial_transcript).

To detect the end of the consult I use an end phrase. The look_for_end is a ‘last in first out’ field that is 5 characters longer than the end phrase. I found it best to use a three word phrase. I use this field as the results from cheetah are too small to contain a 3 word phrase.

GPT-3-turbo

Once the transcript is complete I submit it to GPT-3-turbo. This is the oldest and cheapest of the OpenAi models. It seems to work in testing, but I am not sure how it will perform if ever used in real life. There are many other newer more expensive models, but I have not tried them yet.

Using 6 different prompts I submit the entire transcript 6 times. This is a lazy expensive way to do it. There are more economical ways to do it, these rely on embedding and vector representation.

I ran into the model submission limit of 3/minute when I tried to submit 6 queries in rapid succession. The insertion of time.sleep(25) solved that problem. Though it slowed down the script considerably.

Saving to a .docx file

As the six answers came in I assemble them into a field and then save them all as a single file. I use the docx library to do this

Emailing an attachment

I sent the completed .docx document to an email address. To do that I used the smptlib and email libraries to send an email. I used a gmail account to send the emails. Google have recently upgraded their security so most of the tutorials are out of date. I found I had to get an app password, which I created in the security section of my google account. Apart from that it all works well. It does take a few minutes for the email carrying the attachment to arrive in the destination inbox.

Converting to An Android App

That python code worked well, but it will be far more useful if it was an android app.

So, it turns out that to build an Android app you need Android Studio, for that you need to have at least 8GB memory. I installed it using the instructions here https://www.tabnine.com/blog/install-android-studio-ubuntu-step-by-step-guide/

My machine was not able to run the emulator, but that was not serious as I was fairly easily able to connect my android phone (running android 11) through the wifi. To do that I first had to enable developer options on my phone https://nokiaandroidphones.com/developer-options-on-nokia-phones/.

After that it was fairly simple to pair my phone with Android Studio https://developer.android.com/studio/run/device. Open the Device Manager tab on the right hand side. Open the Physical Device tab and pair the phone. Make sure they are all on the same network. You should see the device name appear at the top of the screen. To build and deploy the current code, click on the green right arrow next to the device name.

Error – targetSdk wrong

I suddenly ran into this error whilst trying to build – it used to work, but this suddenly appeared. It said the target Sdk was 33 but should be 34. To fix that – Within the Gradle Scripts tab on the left hand side. Within build.gradle.kts(Module app) I changed the android compileSdk = 34 and within defaultConfig I changed targetSdk = 34. At the top it asked me if I wanted to sync the changed file. I did.

Picovoice

Using picovoice Cheetah to perform speech to text . I followed the instructions here https://picovoice.ai/blog/android-speech-recognition/#cheetah-streaming-speech-to-text

They are a little out of date. It asks you to include mavenCentral() in your top-level build.gradle file. (That is build.gradle.kts(Project <your project>)) Don’t. That is the old way. The new way is to include in settings.gradle.kts. But in my build it is included by default.

In the build.gradle.kts(Module app) file I included implementation(“ai.picovoice:android-voice-processor:1.0.2”) implementation(“ai.picovoice:cheetah-android:1.0.2”)

App Inventor

After telling my friend Graham about my problems with Kotlin, he suggested I use App Inventor. A block based language developed by MIT. It is much better. Though I believe the code it produces is bigger and less efficient than coding directly in Kotlin.

Using App Inventor was intuitive and fun, once I got the hang of using interrupts to navigate. My code is https://drive.google.com/drive/folders/1WnNOx8W0OxuQpEpq9FamrWl7APZ3-dlX?usp=sharing

The major problem I had was trying to attach a file to an email. I bought!! the taifun mail extension for App Inventor it worked well for picked files in the public folders. However I failed to create and write to a file inside the app, and then attach that file to the email. It kept on throwing an error saying it could not find the file. I could read the file from inside the app, but not attach it to the email.

I think the problem has to do with Android security and private, app specific files and public ones The blogs suggested the fix of copying the file to a public folder – like Documents for example. I could not find the full address of the Documents folder. It is an area with much confusion. So after some time banging my head against that problem – I gave up.

Jimmy’s Piano Journey

I have been mucking about with the piano for over 10 years, and yet I am still rubbish! So I have to submit to the discipline of lessons. Being too arrogant – and poor – to pay a teacher to tell me what I am doing wrong I have started following the pianote online lessons https://www.pianote.com/

As part of that I upload videos of my piano playing so the teachers can comment. The first video is of me playing a made up backing for a song, using the root, 4th, 5th and 6th arpeggios in C – I think! https://youtu.be/7IUuaTS8xj8 I will keep you, dear reader, informed on the feedback.

Adventures with ROS2 Humble, ChatGPT & Bard

Now I am can use a wake up word and then transcribe the spoken word to a ROS2 topic, I want to submit the text to the ChatGPT chatbot.

I want to write a node that subscribes to my heard_text topic, sends the text to ChatGPT and then publishes the response onto the reply_text topic. Writing a program that creates 2 nodes like this is not covered in the ROS2 Humble docs. But Automatic Eddison does it well : https://automaticaddison.com/how-to-write-a-ros2-publisher-and-subscriber-python-foxy/

n.b. I am using anew install of ROS2, and I was getting the error about setup tools being deprecated. The fix is : pip install setuptools==58.2.0

It is easy to create an account with OpenAI

To hide my key, following the start up example I put my key into a .env file. I then use the python package decouple to import it. Be careful when installing that to use :

pip install python-decouple

pip install openai

The documentation is confusing but I found https://platform.openai.com/docs/guides/chat/introduction to be most helpful. Following that the syntax I am using is:

    response = openai.ChatCompletion.create(
       model = "gpt-3.5-turbo",    
       messages = context,
       temperature = jtemperature, 
       max_tokens = jmax, 
     )
    jrep = str(response.choices[0].message["content"])

Here I have the prompt stored in context. The array(is that what it is?) is created using: context = [{‘role’ : ‘system’, ‘content’: ‘ Kaleen is in Canberra. You are a dog called Spot. Spot lives in Canberra’}] Though the documentation points out that gpt-3.5-turbo pays little attention to system prompt.

The code appends the question and its response onto the context array. In that way it is able to have a conversation, rather than simply be a question and answer. The longer the conversation is the more it has to load so it takes longer and uses more tokens. I haven’t had a long enough conversation yet for this to be a problem. Also remember there is a token limit of just over 1000

Google Bard

I thought I would try Bard, it is free – for now – and would give a different character. To get started I followed https://medium.com/@marc.bolle/access-google-bard-with-python-package-bard-api-6000251d1aa8. It all seemed to work OK. It seems to be a bit simpler than OpenAI, but not easily controllable. For example there is no temperature field. Maybe as it matures they will add more fields – and a fee!

To give my chat bot a character I simply prefixed the prompt with a text field, my experimental code is:

from bardapi import Bard
import config
import os
os.environ[‘_BARD_API_KEY’]=config._BARD_API_KEY

jcontext = “You are a friendly personal assistant robot that gives concise answers to questions. You have the form of a small hexapod. Your name is Spot. Sometimes you are mistaken for a redback spider. Humans are scared of redback spiders so they are sometimes scared of you. You often need to reassure them.”
jshort = “concisely”
while True:
jquestion = input(“Ask Bard something : “)
jinput = jcontext + jshort +jquestion
jreply = Bard().get_answer(jinput)[‘content’]
jcontext = jcontext + ” ” + jquestion + ” ” + jreply
print(jreply)

I stopped using decouple to hide my API Key as I found ROS2 stopped liking it. I am not sure why, so I turned instead to storing my keys in a file called config.py I simply import the config file and use the _BARD_API_KEY field. My config.py file is simply :

_BARD_API_KEY=”xxxx.”

I found Bard would give really long answers so I use the field jshort to tell it to give short answers. Notice I was using Bard to help me to write my launch file.

Bard tends to give long answers, even when requested to give a short answer. He has a memory limit but I don’t know what it is. It is possible to exceed it in a conversation.

Adventures with ROS2 Humble, Current State

My project is getting a little complex. I found my PC was not able to cope with processing the image stream as well as the audio stream, so I have split that processing over 2 computers. ami@jane processes the audio data and ami@spot-main processes the video data. spot@spot-hex is the Pi.

A brief diagram of the nodes and the interfaces they subscribe/publish to.

ROS2 Launch

As my project is getting more complex and I have to launch a number of different nodes for it to work I have written launch files for each computer. Instead of multiple ros2 run commands I only need one ros2 launch for each computer.

In the root directory of the package create a launch folder. For example my chat bot package is called j2chat, so the address of launch folder is ~/jspot_ws/src/j2chat/launch. Within that folder I have a file called j2chat_launch.py The code in that is:

import launch
import launch_ros.actions

def generate_launch_description():
return launch.LaunchDescription([
launch_ros.actions.Node(
package=’j2chat’,
executable=’j2chat_run’,
name=’subscribe_text_publish’),
])

Here the package is called j2chat the command to run it as given in setup.py is j2chat_run. I am not sure if the name field is necessary, or what it does. At first I thought it was the namespace of the node, prefixed onto the name of the node. That would allow multiple examples of the same node to run at the same time.

I have been looking at https://docs.ros.org/en/foxy/Tutorials/Intermediate/Launch/Creating-Launch-Files.html and at https://roboticscasual.com/tutorial-ros2-launch-files-all-you-need-to-know/#important-launch-functionalities

Current state

The current state of my code is here :https://drive.google.com/drive/folders/1YEuH58S0qvp9gzY6XhFlcfTNL41zNruw?usp=sharing

Still not managed to get Github to work nicely for me!