Converting MP4 to MP3 will turn a video file into audio file. Audio is commonly used for podcasts, music, news narration, and similar media use cases. You can easily convert video files using existing conversion apps. But most don't offer a scalable solution. Even better, what if you can build your own app, bot, or a plugin that will convert MP4 files to MP3.
Cloudconvert, a media conversion web app ranks number one for the phrase convert MP4 to MP3 and gets an estimated 454,000 organic monthly users. That should get them significant revenue from CPC banner ads and charging users for a premium account. So why not leverage the power of programming and build your own media application to solve similar problems?
That is exactly what this tutorial aims to do. Well, not the getting successful part. But this will teach you to programmatically convert media files using Python, which is a great starting point to build the next big media app.
This tutorial has two parts:
Shotstack offers a cloud-based video editing API. Editing and generating videos at scale requires a lot of resource and can take hours. Shotstack's rendering infrastructure makes building and scaling media applications a breeze.
This tutorial also uses the Shotstack Python SDK for video editing. Python 3 is required for the SDK.
If you want to the source code and skip ahead, we have uploaded the source code in the convert MP4 to MP3 using Python GitHub repository.
Let's install the the Shotstack Python SDK from the command line:
pip install shotstack_sdk
Set your API key as an environment variable (Linux/Mac):
export SHOTSTACK_KEY=your_key_here
or, if using Windows (Make sure to add the SHOTSTACK_KEY
to the path):
set SHOTSTACK_KEY=your_key_here
Replace your_key_here
with your provided sandbox API key which is free for testing and development.
Use your favorite IDE or text editor to create a Python script. For this tutorial, we created a file called mp4-to-mp3.py. Select the file and begin editing it.
Let's import the required modules for the project. We need to import modules from the Shotstack SDK to edit and render our video plus a couple of built in modules:
import shotstack_sdk as shotstack
import os
import sys
from shotstack_sdk.api import edit_api
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset
Next, set up the client with the API URL and key. It should use the key you added to the environment variables in the previous step:
host = "https://api.shotstack.io/stage"
configuration = shotstack.Configuration(host = host)
configuration.api_key['DeveloperKey'] = os.getenv('SHOTSTACK_KEY')
with shotstack.ApiClient(configuration) as api_client:
api_instance = edit_api.EditApi(api_client)
The Shotstack API follows many of the principles of desktop editing software such as the use of a timeline, tracks, and clips. A timeline is like a container for multiple tracks and tracks include multiple clips which plays over time.
The video should be hosted somewhere accessible via a public or signed URL. We will use the following transcription example video from the AWS Transcription tutorial. You can replace it with your direct video url.
Next, add the code below to create a VideoAsset
using the video URL:
video_asset = VideoAsset(
src = "https://d1uej6xx5jo4cd.cloudfront.net/scott-ko-w-captions.mp4"
)
A clip is defined as a type of asset in Shotstack. We can configure different attributes like length and start time. The video_clip
variable below will be used to add the video_asset
on the timeline. The start
and length
for the video are defined below.
video_clip = Clip(
asset = video_asset,
start = 0.0,
length= 25.0
)
Now, let’s create a timeline, which is like a container for multiple video clips which play over time. Tracks on the timeline allow us to layer clips on top of each other. Let's add the video_clip
in the track and then track on the timeline.
track = Track(clips=[video_clip])
timeline = Timeline(
background = "#000000",
tracks = [track]
)
Next, we need to configure the output. To convert to a MP3, let's set the output format
to mp3
and resolution
to preview
.
output = Output(
format = "mp3",
resolution = "preview"
)
edit = Edit(
timeline = timeline,
output = output
)
Finally, let's send the edit for processing and rendering using the API. The Shotstack SDK takes care of converting our objects to JSON, including our key to the request header, and sending everything to the API.
try:
api_response = api_instance.post_render(edit)
message = api_response['response']['message']
id = api_response['response']['id']
print(f"{message}\n")
print(f">> render id: {id}")
except Exception as e:
print(f"Unable to resolve API call: {e}")
Below is the completed Python script:
import shotstack_sdk as shotstack
import os
import sys
from shotstack_sdk.api import edit_api
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset
if __name__ == "__main__":
host = "https://api.shotstack.io/stage"
configuration = shotstack.Configuration(host = host)
configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY")
with shotstack.ApiClient(configuration) as api_client:
api_instance = edit_api.EditApi(api_client)
video_asset = VideoAsset(
src = "https://d1uej6xx5jo4cd.cloudfront.net/scott-ko-w-captions.mp4"
)
video_clip = Clip(
asset = video_asset,
start = 0.0,
length= 25.0
)
track = Track(clips=[video_clip])
timeline = Timeline(
background = "#000000",
tracks = [track]
)
output = Output(
format = "mp3"
)
edit = Edit(
timeline = timeline,
output = output
)
try:
api_response = api_instance.post_render(edit)
message = api_response['response']['message']
id = api_response['response']['id']
print(f"{message}\n")
print(f">> render id: {id}")
except Exception as e:
print(f"Unable to resolve API call: {e}")
Run the script using Python:
python mp4-to-mp3.py
You may need to use python3
instead of python
depending on your configuration.
The API will return the render id
if the render request is successful. We need the render id to retrieve the render status.
The render process takes place in the background and may take several seconds. We need another short script that will check the render status endpoint.
Create a file called status.py and paste the following:
import sys
import os
import shotstack_sdk as shotstack
from shotstack_sdk.api import edit_api
if __name__ == "__main__":
host = "https://api.shotstack.io/stage"
configuration = shotstack.Configuration(host = host)
configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY")
with shotstack.ApiClient(configuration) as api_client:
api_instance = edit_api.EditApi(api_client)
api_response = api_instance.get_render(sys.argv[1], data=False, merged=True)
status = api_response['response']['status']
print(f"Status: {status}")
if status == "done":
url = api_response['response']['url']
print(f">> Asset URL: {url}")
Then run the script using Python:
python status.py {renderId}
Replace {renderId}
with the render id returned from the mp4-to-mp3.py script.
Re-run the status.py script every 4-5 seconds until the status is done and a URL is returned. If something goes wrong the status will return as failed.
If everything ran successfully you should now have the URL of the final video, just like the one in the example below.
The final rendered MP3 is ready to be hosted or transferred to your application:
You can also view your rendered media files inside the Shotstack dashboard under Renders. Media files are deleted after 24 hours and need to be transferred to your own storage provider. All files are however copied to Shotstack hosting and you can configure other destinations including AWS S3 and Mux.
As you can see, how easy it is to generate an audio from a video. The big advantage of using the Shotstack API is how seamless it is scale this process without having to worry about the rendering infrastructure.
To demonstrate the scalability, we will convert the following list of MP4 files to MP3. Create a new csv file called mp4.csv in the current working folder. Then paste the video url under the url column and length for each video under the length column. Length is required for each video as video length is different for each one.
url,length
https://d1uej6xx5jo4cd.cloudfront.net/slideshow-with-audio.mp4,35.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/d724e03c-1c4f-4ffa-805a-a47aab70a28f.mp4,13.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/b03c7b50-07f3-4463-992b-f5241ea15c18.mp4,36.0
https://cdn.shotstack.io/au/stage/c9npc4w5c4/d2552fc9-f05a-4e89-9749-a87d9a1ae9aa.mp4,12.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/c900a02f-e008-4c37-969f-7c9578279100.mp4,29.0
You can also inspect media using the probe endpoint to retrieve metadata of each video. The response includes width, height, duration, framerate and more. You can write a script to automatically fetch the video length. For the sake of simplicity of this tutorial, we have manually added it to the csv column.
We have used the Python csv module. So you will need to import it first with the following command:
import csv
Next, create a new file called mp4-to-mp3-list.py
, paste the following script, and save it.
import shotstack_sdk as shotstack
import os
import sys
import csv
from shotstack_sdk.api import edit_api
from shotstack_sdk.model.soundtrack import Soundtrack
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset
if __name__ == "__main__":
host = "https://api.shotstack.io/stage"
configuration = shotstack.Configuration(host = host)
configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY")
with shotstack.ApiClient(configuration) as api_client:
with open("mp4.csv", 'r') as file:
csvreader = csv.reader(file)
header = next(csvreader)
for row in csvreader:
length = float(row[1])
api_instance = edit_api.EditApi(api_client)
video_asset = VideoAsset(
src = row[0]
)
video_clip = Clip(
asset = video_asset,
start = 0.0,
length = length
)
track = Track(clips=[video_clip])
timeline = Timeline(
background = "#000000",
tracks = [track]
)
output = Output(
format = "mp3",
resolution = "preview"
)
edit = Edit(
timeline = timeline,
output = output
)
try:
api_response = api_instance.post_render(edit)
message = api_response['response']['message']
id = api_response['response']['id']
print(f"{message}\n")
print(f">> render id: {id}")
except Exception as e:
print(f"Unable to resolve API call: {e}")
Then use the python command to run the script.
python mp4-to-mp3-list.py
To check the render status, run the status.py
file we created in the first part and run it using the command line:
python status.py {renderId}
Replace the renderId
from the IDs from returned from the mp4-to-mp3-list.py
.
This tutorial should have given you a basic understanding of how to programmatically convert MP4 videos to MP3 using Python and the Shotstack video editing API. As a next step you could learn more to add other assets like text and images to create a media application.
This is just an introductory tutorial to programmatically working with media but we can so much more. Different use cases like
Every month we share articles like this one to keep you up to speed with automated video editing.