πŸ“Š "GitHub InFocus" speech data analysis w. videogrep 🎞️

adriens - Jun 28 '22 - - Dev Community

πŸ™‹ About

GitHub recently published "Propelling your DevOps to new heights | GitHub InFocus", a exciting DevOPS related content :

Also, within the same period of time I watched an episode of "The Download" series (animated by @film_girl ):

The Download: Maintainer Month, .NET MAUI Goes GA, Flight Simulator: Top Gun, and more - YouTube

On this episode of The Download, Christina is on location at RenderATL, but is still here to offer the latest developer news, including:0:00 Intro0:59 Mainta...

favicon youtube.com

This episode did introduce videogrep :

GitHub logo antiboredom / videogrep

automatic video supercuts with python

Videogrep

Videogrep is a command line tool that searches through dialog in video files and makes supercuts based on what it finds. It will recognize .srt or .vtt subtitle tracks, or transcriptions that can be generated with vosk, pocketsphinx, and other tools.

Examples

Tutorial

See my blog for a short tutorial on videogrep and yt-dlp, and part 2, on videogrep and natural language processing.


Installation

Videogrep is compatible with Python versions 3.6 to 3.10.

To install:

pip install videogrep

If you want to transcribe videos, you also need to install vosk:

pip install vosk

Note: the previous version of videogrep supported pocketsphinx for speech-to-text. Vosk seems much better so I've added…

Then came the idea :

What if I was analyzing "GitHub Infocus" with videogrep ?

This short post will guide through this first trial on videogrep and what I have been able to produce, discover... and the fun I also had.

☝️ Notice that I used the following excellent tutorial to perform this experience πŸ‘‡

Image description

πŸ“₯ Get the video with yt-dlp

First I want to get the YT video https://youtu.be/awQ7LFxfXWE
locally, therefore you can choose many encoding options and choose the one that best fits your needs (-F option) but in our case, we'll get the default one :

yt-dlp https://youtu.be/awQ7LFxfXWE -o propelling_your_devops.mp4 --write-auto-sub
Enter fullscreen mode Exit fullscreen mode

Then you are ready for the next step : use videogrep.

πŸ“Š Text analysis with ngrams

videogrep makes it possible (and super easy) to analyze text within the (downloaded vtt files) subtitles.

So, what are the trendiest group of word ( called ngrams) in the video ? Let's find out !

While the single word analysis is not really interesting :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 1 | head -10
to 449
and 352
that 347
you 323
the 322
we 306
a 255
of 251
so 167
is 157
Enter fullscreen mode Exit fullscreen mode

2-ngrams are much more interesting about the underlying intents of the video :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 2 | head -7
want to 97
that we 61
you can 55
you know 54
going to 51
we have 45
we can 45
Enter fullscreen mode Exit fullscreen mode

... soon confirmed with the 3-grams :

❯ videogrep --input propelling_your_devops.mp4.webm --ngrams 3 | head -9
we want to 30
you want to 20
a lot of 19
want to make 19
make sure that 18
i'm going to 17
to make sure 17
we have a 16
i want to 13
Enter fullscreen mode Exit fullscreen mode

πŸ”¬ Short analysis

With the help of ngrams, within less than a second we discover, by grepping the text of the video that

"GitHub focuses it attention on what they want... and also on what you want to achieve... and make"

πŸ‘‰ That first fact already tells us a lot.

☝️ It also puts in evidence

"the inclusive approach while using a lot of "I" and "We"

... which is also pretty exciting to onboard us on the product they are showcasing ❣️

βœ‚οΈπŸŽžοΈ Cut & get shorts

Now, the fun part.

You have made a text analysis but... wouldn't it be fun to see the movie of these grepped terms ?...

⚠️ Spoiler alert : Yes it is ❕ (and it's easy) 🀣

These are called fragments. Let's get some of them.

🎯 The "Want" movie

Let's get all the sentences containing "want"

videogrep --input propelling_your_devops.mp4.webm --search 'want' --resyncsubs 0.1 --output want_sentence.mp4
Enter fullscreen mode Exit fullscreen mode

πŸ€ͺ Also "we want" to get the "want" movie 🀣 :

videogrep --input propelling_your_devops.mp4.webm --search 'we want to' --search-type fragment --resyncsubs 0.1 --output want.mp4
Enter fullscreen mode Exit fullscreen mode

πŸ€“ GH talking about code

What we think the more when we think about Github services is : the "code".

Let's make them talk about "code"

videogrep --input propelling_your_devops.mp4.webm --search 'code' --search-type fragment --resyncsubs 0.1 --output code.mp4
Enter fullscreen mode Exit fullscreen mode

➰ Github about GitHub 😹

Last but not least, I'd love to

see how GitHub talks about GitHub

videogrep --input propelling_your_devops.mp4.webm --search 'github' --search-type fragment --resyncsubs 0.1 --output github.mp4
Enter fullscreen mode Exit fullscreen mode

πŸ§‘β€πŸŽ¨ Conclusion

These tools open a very wide area for speech and video analysis... making it possible to put in evidence patterns, intentions or simply have fun.

Also, being aware that yt-dlp makes it possbible to download complete channel, playlists or search queries...

possibilities are endless.

πŸ”– Resources

πŸ—žοΈ News

In its 2.1.1 , videogrep adds some really cool features like (but not only) :

  • Finding "non-english vtt subtitle files"
  • "Examples that integrate with spaCy"
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .