Note
It uses free APIs so It may run out of free quote quickly.
For small media files(~10 mins or less) you can use https://mediasleuth360.infinityfreeapp.com/. For larger files follow setup instructions given below and run locally with your API keys.
demo-video.mp4
- User Uploads Media: The user uploads an audio or video file through the Streamlit interface.
- Processing:
- The main.py script calls functions from raw_data.py to extract raw data from the media.
- transcribe.py is used to generate transcripts and subtitle files.
- summary.py provides a summary of the media content.
- response.py generates responses to user queries about the media.
- User Interaction: The application displays the media player, transcript, and summary. Users can ask questions about the media content, and the application responds using the generate_chat_response function.
- Media Support: Support almost any video or audio file format.
- Time Stamping: Automatically generates a transcript for the media.
- Subtitles Generation: Automatically adds subtitles to the video.
- Summarization: Provides a concise structered summary of the media content with timestamps for broader picture.
- Interactive Chatbot: An interactive chatbox allows users to engage with the AI to query specific topics within the media content.
- Media Navigation: Users can ask AI to search for specific keywords/topics/bits within the media.
- Multilingual Media Analysis: Supports analysis of media content in multiple languages, enhancing accessibility and usability.
- Caching: The smart caching media content will make processing much faster when user uploads the same file.
- CI/CD: This is developed in Python with Streamlit which by design enables CI and CD during development and testing.
Software Dependencies
Python Dependencies:
AI Models:
- llama3-8b-8192 via Groq API
- whisper-large-v3 via Groq API
- gemini-1.5-flash via Google API
git clone https://github.com/krishirajsinh-p/MediaSleuth360.gitcd MediaSleuth360bash install.shPlace your Groq and Google API keys in ./.streamlit/secrets.toml
bash run.shNote
Port number might be different but you will find it in the terminal when you run the above given command
open in browser http://localhost:8501
- Reliance on Speech Quality: The effectiveness of the system heavily depends on the quality of the input speech.
- File Size: There is a 200MB media file size limit due to Streamlit constraints.
- Language Support: It may not work as effectively for languages other than English.
- No Memory: The AI assistant don't have conversational memory because of the limited context window of the LLM model(also to minimize security concerns and resource usage).
MediaSleuth360's code is released under the MIT License. See LICENSE for further details.
