Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OpenAI Speech-To-Text 2

Status
Not open for further replies.

vernpace

Programmer
Feb 22, 2015
209
0
0
US
We have a requirement to provide MP3toTXT transcriptions using OpenAI. We use OpenAI for a bunch of stuff, but can't seem to get this working.

Here are the curl specs from OpenAI:

Code:
curl --request POST   --url [URL unfurl="true"]https://api.openai.com/v1/audio/transcriptions[/URL]   --header "Authorization: Bearer $OPENAI_API_KEY"   --header 'Content-Type: multipart/form-data'   --form file=@/path/to/file/audio.mp3   --form model=whisper-1

Here is what I have so far:

Code:
LOCAL lcApiKey, lcBoundary, lcFile, lcContent, lcFileName, lcName, lcName, lcModel, lcRequest

#DEFINE CRLF CHR(13) + CHR(10)

lcApiKey = "OpenAI-Key-Here[b][/b]"
lcBoundary =  "--" + STRTRAN(SYS(2015), "_" ,"")
lcFile = "c:\temp\audio.mp3"
lcContent = FILETOSTR(lcFile)
lcFileName = JUSTFNAME(lcFile)
lcName = "file"
lcName2 = "model"
lcModel = "whisper-1"

lcRequest = lcBoundary + CRLF + TEXTMERGE([Content-Disposition: form-data; name="<<[b]lcName[/b][i][/i]>>"; filename="<<[b]lcFileName[/b][i][/i]>>"]) + CRLF + ;
            "Content-Type: application/octet-stream" + CRLF + CRLF + lcContent + CRLF + lcBoundary + CRLF + ;
            TEXTMERGE([Content-Disposition: form-data; name="<<[b]lcName2[/b][i][/i]>>"]) + CRLF + CRLF + lcModel + CRLF + lcBoundary + "--"

[b]?GetOpenAIResponseSTT(lcBoundary, lcApiKey, lcRequest)[/b]

PROCEDURE GetOpenAIResponseSTT(tcBoundary AS String, tcApiKey AS String, tcRequest AS String) AS String
   LOCAL lcURL, lcResponse, loHTTP

   lcURL  = "[URL unfurl="true"]https://api.openai.com/v1/audio/transcriptions"[/URL]

   loHTTP = CREATEOBJECT("MSXML2.ServerXMLHTTP.6.0")
   loHTTP.Open("POST", lcURL, .F.)
   loHTTP.setRequestHeader("Content-Type", "multipart/form-data; boundary=" + SUBSTR(tcBoundary, 3))
   loHTTP.setRequestHeader("Authorization", "Bearer " + tcApiKey)
   loHTTP.Send(tcRequest)

   IF loHTTP.Status = 200
      lcResponse = loHTTP.responseText
   ELSE
      lcResponse = "Error: " + TRANSFORM(loHTTP.Status)
   ENDIF

   RETURN lcResponse

ENDPROC

This is returning Error code 400 (Bad request). It should be working... hmm... I actually sent the code to OpenAI for debugging and it was verified :} Going nuts now!

What am I missing?
 
Here is an example for an upload with VFP from Marco Plaza using CreateBinary() first and then sending the binary data as request body:


So in your code instead of

Code:
loHTTP.Send(tcRequest)

you can try

Code:
LOCAL qRequest
qRequest = Createbinary(m.tcRequest)
loHTTP.Send(qRequest)
 
Vernpace,

Did you try a with [highlight #FCE94F]+ CRLF[/highlight] after the final [highlight #FCE94F]"--"[/highlight]?
 
Have you tried using curl itself, just to find out the exact body composition it makes from the command line?

Chriss
 
It might just be as simple details as a missing final CRLF, as atlopes said. The curl variant also has the Authorization header before the Content-Type header.

Chriss
 
ManniB,

CREATEBINARY(lcRequest) works! yay! :) I saw Marco Plaza post early today, but forgot to try it.
Actually, the response was surprisingly fast compared to other OpenAI models we use.

Big thanks to you and everyone who responded.
 
I'm glad that it worked!

It's fascinating what OpenAI can do, I'm looking forward to more applications in the future. Unfortunately, data security/privacy is still an issue...

Out of interest: How fast can it convert an mp3 of one minute?

Regards,
Manni

 
ManniB,

OpenAI Sora (text-to-video) will be released to the public later this year:
I am not sure what you mean by data security/privacy issues. Can you be more explicit? As far as I know, in addition to AI's ability to take peoples jobs, there are potential issues concerning AI training on copyrighted material - but this has nothing to do with data security/privacy.
 
Hi Vernpace,

Could you upload the final and working version of your code?

Regards, Gerrit
 
Code:
LOCAL lcApiKey, lcBoundary, lcFile, lcContent, lcFileName, lcName, lcName, lcModel, lcRequest, lqRequest, lcResponse, lcReturn

#DEFINE CRLF CHR(13) + CHR(10)

lcApiKey = "[b]Your OpenAI Key Goes Here[/b]"
lcBoundary =  "--" + STRTRAN(SYS(2015), "_" ,"")
lcFile = "c:\temp\speech.mp3"    && [b]Replace with your MP3 file[/b]
lcContent = FILETOSTR(lcFile)
lcFileName = JUSTFNAME(lcFile)
lcName = "file"
lcName2 = "model"
lcModel = "whisper-1"

lcRequest = lcBoundary + CRLF + TEXTMERGE([Content-Disposition: form-data; name="<<lcName>>"; filename="<<lcFileName>>"]) + CRLF + ;
            "Content-Type: application/octet-stream" + CRLF + CRLF + lcContent + CRLF + lcBoundary + CRLF + ;
            TEXTMERGE([Content-Disposition: form-data; name="<<lcName2>>"]) + CRLF + CRLF + lcModel + CRLF + lcBoundary + "--"

lqRequest = CREATEBINARY(lcRequest)

lcResponse = GetOpenAIResponseSTT(lcBoundary, lcApiKey, lqRequest)

IF LEFT(lcResponse, 6) == "Error:"
   lcReturn = lcResponse
ELSE
   [b]&& The OpenAI response is in JSON format - this strips it out to a text string[/b]
   lcReturn = ALLTRIM(STRTRAN(STRTRAN(STRTRAN(STREXTRACT(lcResponse, "{", "}"), ["], ""), "text: ", ""), CHR(10), ""))
ENDIF

?lcReturn

PROCEDURE GetOpenAIResponseSTT(tcBoundary AS String, tcApiKey AS String, tqRequest AS String) AS String
   LOCAL lcURL, lcResponse, loHTTP

   lcURL  = "[URL unfurl="true"]https://api.openai.com/v1/audio/transcriptions"[/URL]

   loHTTP = CREATEOBJECT("MSXML2.ServerXMLHTTP.6.0")
   loHTTP.Open("POST", lcURL, .F.)
   loHTTP.setRequestHeader("Content-Type", "multipart/form-data; boundary=" + SUBSTR(tcBoundary, 3))
   loHTTP.setRequestHeader("Authorization", "Bearer " + tcApiKey)
   loHTTP.Send(tqRequest)

   IF loHTTP.Status = 200
      lcResponse = loHTTP.responseText
   ELSE
      lcResponse = "Error: " + TRANSFORM(loHTTP.Status)
   ENDIF

   RETURN lcResponse

ENDPROC

Have fun! I'm tempted to try sending a music MP3 to see if it returns lyrics, but I don't want to abuse a good thing...
 
Vernpace, thank you for sharing your code!

When I tried it with a newly created API key, I get error 429 (Rate limit reached for requests). I have a plus subscription and it's the first time I'm trying this. The mp3 file is only 500kb large. Why is it not working?

What I meant regarding privacy is that you can't upload sensible data like like recordings which can contain customers oder internal data etc. It's not like the software running on your PC or that it's guaranteed by OpenAI that the data is save.
EDIT: I see they have now an enterprise version with privacy, so it's moving forward.

 
Hi Vernpace,

I just tested your code and it works as a charm. It's giving amazingly good results for some sample MP3's and WAV's I tested.

Thanks again for sharing!

Regards, Gerrit
 
Gerrit, do you have a Plus Subscription? It's not working on my end.

 
Hi Manni,

I have a prepaid subscription for GPT3.5 (not 4.0). What are you using?

Regards, Gerrit
 
Hi Gerrit,

I have a Plus Subscription. When I'm logged in to ChatGTP, under My Plan it says:

ChatGPT Plus Subscription
Access to GPT-4, our most capable model
Browse, create, and use GPTs
Access to additional tools like DALL·E, Browsing, Advanced Data Analysis and more

Regards,
Manni

 
ManniB,

A ChatGPT subscription is different from an OpenAI API account - they are seperate services. While OpenAI owns ChatGPT, ChatGPT is a subset of the models that are available for the OpenAI API.
 
Hi Vernpace,

thank you, so I need another account then...

Regards,
Manni

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top