Do you have an audio clip that you want to turn into a shareable video? Perhaps you want to add a visual element to your podcast, voice over, or lecture. Or you want to convert your audio into video content for social media.
This guide will walk you through how to create a video from an input audio file. We'll begin by generating a transcript of the audio file, and use that transcript to create an image prompt. Then, we'll use a text-to-image API to create an image based on the prompt. And finally, we'll put the audio and image together to generate a video.
Shotstack is a cloud-based video editing platform. It's designed to make it convenient for developers to automate video editing at scale. Shotstack provides various APIs including the three we will use in this tutorial. These are the Ingest API, Create API, and Edit API.
We will be using the Ingest API to generate a transcript of the audio. We will also use the Create API to create the background image using AI. And for putting together the video, we will use the Edit API.
To follow the steps outlined in the guide, you'll need the following:
The following covers a step-by-step process of how to convert an audio file into a video with a background image.
For this guide, we will be using an audio excerpt of a financial podcast:
The audio needs to be available somewhere online, so we have uploaded it to the following URL: https://shotstack-assets.s3-ap-southeast-2.amazonaws.com/audio/financial-podcast.mp3.
The next step is to send a POST
request to the Ingest API to generate a transcript of the audio. The request should contain a JSON payload with details like the input audio's URL and the transcription format. In this case, we're using the SRT format.
Run the curl command below in your terminal to generate an SRT file from the audio. Replace SHOTSTACK_API_KEY
with your API key.
curl -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: SHOTSTACK_API_KEY" \
https://api.shotstack.io/ingest/stage/sources \
-d '
{
"url": "https://shotstack-assets.s3-ap-southeast-2.amazonaws.com/audio/financial-podcast.mp3",
"outputs": {
"transcription": {
"format": "srt"
}
}
}'
If the request succeeds, you should see a response like the one below. Take note of the render id
. We will use it in the next step.
{
"data": {
"type": "source",
"id": "zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct"
}
}
Wait for a few seconds for the audio transcription process to complete. Then run the command below to send a GET
request to the API. This will retrieve the generated SRT file.
Again, make sure to replace SHOTSTACK_API_KEY
with your API key. And replace ID
with the id
from the previous JSON response.
curl -X GET https://api.shotstack.io/ingest/stage/sources/ID \
-H 'Accept: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY'
You should see a response with relevant details about the generated file. It's going to look similar to this:
{
"data": {
"type": "source",
"id": "zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct",
"attributes": {
"id": "zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct",
"owner": "c2jsl2d4xd",
"input": "https://shotstack-assets.s3-ap-southeast-2.amazonaws.com/audio/financial-podcast.mp3",
"source": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct/source.mp3",
"status": "ready",
"outputs": {
"transcription": {
"status": "ready",
"url": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct/transcript.srt"
}
},
"duration": 67.38,
"created": "2024-04-03T07:08:27.673Z",
"updated": "2024-04-03T07:08:48.150Z"
}
}
}
Note: You may see waiting
, processing
or another status for the status
parameter under outputs.transcription
. If that's the case, retry the same GET request until the status reads ready
.
When the status is ready
, the response will include a URL to the output SRT file. For our example, the URL is:
https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy885gw-1m3y-rv30-xfcw-4e2ykd4xloct/transcript.srt
Below is the transcript of the example podcast audio.
1
00:00:00,009 --> 00:00:02,589
Nutrition was launched just over a year ago
2
00:00:02,829 --> 00:00:05,980
um as part of Blackrock's Sustainable Thematic Suite
3
00:00:06,239 --> 00:00:09,260
and I've been named on it uh since the start of this year.
4
00:00:09,800 --> 00:00:14,989
And the fund's mandate is to invest in anything related to
5
00:00:15,140 --> 00:00:20,049
um food and beverage consumer trends. And our job is to
6
00:00:20,229 --> 00:00:25,309
uh one make sure that the fund invests in those fast and moving um rivers
7
00:00:25,415 --> 00:00:25,954
within that
8
00:00:26,094 --> 00:00:31,364
overall thematic, but also uh to abide by our sustainability mandate,
9
00:00:31,604 --> 00:00:35,244
which is to ensure that at least 70% of the fund is
10
00:00:35,255 --> 00:00:38,235
investing in companies which align with
11
00:00:38,244 --> 00:00:41,034
the United Nations Sustainability Development Goals.
12
00:00:41,689 --> 00:00:48,029
Um So it's important um and critical that the companies that we invest in broadly um
13
00:00:48,349 --> 00:00:52,470
are helping the world move towards a more sustainable food chain
14
00:00:52,680 --> 00:00:54,770
and of course, plant-based.
15
00:00:54,779 --> 00:00:58,830
And um some of the other topics that we're gonna talk about today are really
16
00:00:58,840 --> 00:01:04,500
big piece to it because the food chain is possibly one of the single most polluting
17
00:01:04,680 --> 00:01:06,589
pieces of humankind.
You've now successfully generated a transcript of the audio. Copy and paste the content of the transcript into ChatGPT. Ask ChatGPT to use the transcript to create a prompt to generate an image via an API call to a text-to-image service.
Here is an example prompt you can use:
Use the content of an SRT transcription file, pasted below, to write a prompt to generate
an image using an AI text-to-image service:
CONTENT_OF_SRT_FILE
Replace CONTENT_OF_SRT_FILE
with the transcript we generated previously.
Below is the response we got from ChatGPT using the prompt above:
"A world map with highlights on areas with sustainable food production practices. Emphasize
plant-based agriculture and companies that align with the UN Sustainable Development
Goals. Include a flowing blue river representing fast-moving trends within the sustainable
food chain."
Let's use the prompt we got from ChatGPT to generate the image in the next step.
Note: if you want to automate this step you could use the OpenAI ChatGPT API to generate the prompt. This article to video guide does something very similar.
The next step is to send a POST
request to the Create API to generate an image based on the prompt. We will use the built in Shotstack text-to-image service which uses generative AI to generate the image. The payload includes the image width
and height
, the type
and prompt
parameters.
Run the curl command below in your terminal. Make sure to replace SHOTSTACK_API_KEY
with your API key.
curl -X POST \
-H 'Content-Type: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY' \
https://api.shotstack.io/create/stage/assets \
-d '
{
"provider": "shotstack",
"options": {
"type": "text-to-image",
"prompt": "A world map with highlights on areas with sustainable food production practices. Emphasize plant-based agriculture and companies that align with the UN Sustainable Development Goals. Include a flowing blue river representing fast-moving trends within the sustainable food chain.",
"width": 1024,
"height": 512
}
}'
A successful request should yield a response that looks like this: Take note of the value of id
.
{
"data": {
"type": "asset",
"id": "01hth-fjy7w-jd184-znpdm-r1hrv6",
"attributes": {
"owner": "c2jsl2d4xd",
"provider": "shotstack",
"type": "text-to-image",
"status": "queued",
"created": "2024-04-03T08:00:42.295Z",
"updated": "2024-04-03T08:00:42.295Z"
}
}
}
Wait for a few seconds for the AI text-to-image generation to complete. Then run the command below to send a GET
request to fetch the generated image. Replace ID
with the id
from the previous response, and SHOTSTACK_API_KEY
with your API key.
curl -X GET https://api.shotstack.io/create/stage/assets/ID \
-H 'Accept: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY'
You should expect a response similar to this:
{
"data": {
"type": "asset",
"id": "01hth-fjy7w-jd184-znpdm-r1hrv6",
"attributes": {
"owner": "c2jsl2d4xd",
"provider": "shotstack",
"type": "text-to-image",
"url": "https://shotstack-create-api-stage-assets.s3.amazonaws.com/c2jsl2d4xd/01hth-fjy7w-jd184-znkdm-r1hrv7.png",
"status": "done",
"created": "2024-04-03T08:00:42.295Z",
"updated": "2024-04-03T08:00:52.312Z"
}
}
}
Note: If the status
parameter shows anything other than done
, simply retry the same GET
request until its value is "done".
The response will contain a url
parameter whose value is the URL to the generated image.
A sample image generated using our text prompt
Almost set! What's left to do is to put the audio and image together to make a video. We can do that using the Edit API.
Create a new file named edit.json
and paste the JSON below to it. Make sure to replace the value of src
for the image asset with the URL from the previous response. The audio asset includes the URL of our podcast mp3 file.
{
"timeline": {
"background": "#000000",
"tracks": [
{
"clips": [
{
"asset": {
"type": "image",
"src": "https://shotstack-create-api-stage-assets.s3.amazonaws.com/c2jsl2d4xd/01hth-fjy7w-jd184-znkdm-r1hrv7.png"
},
"start": 0,
"length": 67.38,
"effect": "zoomIn"
}
]
},
{
"clips": [
{
"asset": {
"type": "audio",
"src": "https://shotstack-assets.s3-ap-southeast-2.amazonaws.com/audio/financial-podcast.mp3",
"volume": 1
},
"start": 0,
"length": 67.38
}
]
}
]
},
"output": {
"format": "mp4",
"resolution": "sd"
}
}
Then run the following command to send a POST
request to the Edit API. Note that we're using the content of the edit.json
file as the payload using the curl -d @edit.json
argument.
curl -X POST \
-H 'Content-Type: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY' \
-d @edit.json \
https://api.shotstack.io/edit/stage/render
Here is an example of the response you will get back from the API.
{
"success": true,
"message": "Created",
"response": {
"message": "Render Successfully Queued",
"id": "bf65d0a2-3c78-453e-851a-05565fe0ab23"
}
}
Wait for a few seconds for the video to finish rendering. Then, run the following command to get the video URL.
Replace SHOTSTACK_API_KEY
with your API key and ID
with the id
received in the previous JSON response.
curl -X GET \
-H 'Content-Type: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY' \
https://api.shotstack.io/edit/stage/render/ID
If successful, you will receive a response similar to this:
{
"success": true,
"message": "OK",
"response": {
"id": "bf65d0a2-3c78-453e-851a-05565fe0ab23",
"owner": "c2jsl2d4xd",
"plan": "sandbox",
"status": "done",
"error": "",
"duration": 67.38,
"billable": 67.38,
"renderTime": 16690.03,
"url": "https://shotstack-api-stage-output.s3-ap-southeast-2.amazonaws.com/c2jsl2d4xd/bf65d0a2-3c78-453e-851a-05565fe0ab23.mp4",
"poster": null,
"thumbnail": null,
"created": "2024-04-03T08:07:17.382Z",
"updated": "2024-04-03T08:07:35.482Z"
}
}
You can access the final video via the provided URL in the response. Copy and paste the URL to view it in your browser or to download it.
Here's how the final video looks, converted from our audio file in to a video with an AI generated background image:
Now you know how you can convert audio to a video with a background image using the Shotstack APIs and generative AI. This guide only covered the very basics. You could expand on this tutorial and start adding text, multiple images, transitions and effects. There's so much more that you can achieve with our APIs. Check out our developer guides to learn how to convert YouTube videos to MP3s, generate videos from images, add AI voice overs to your videos, and more.
Every month we share articles like this one to keep you up to speed with automated video editing.