2019-07-24 01:12:15.531117
Day 1 I wanted to go wild and out. I've done 100 days before. It was like..hundreds of days ago.
So yeah, I know I've got some skills. I wanted to mess around with some javascript and make a graph but I can't figure out the whole .min.js file in the folder thing so maybe someone can tell me who knows about those things.
So I wanted to publicly start a project. I want to catalog a bunch of videos I want to watch on Youtube and see if I can't make a language corpus for the channel as well. What will these things help me with? Not much useful stuff but it's challenging and for now that's reason enough.
The channel I've picked is Siraj Rival's channel. He's super energetic, fun to watch, and talks about awesome things so maybe I might be able to meta learn from doing this to his channel. Going to start with the smallest slice, getting the transcript.
from youtube_transcript_api import YouTubeTranscriptApi
Install it if you don't have it. I'm your friend
Someone already made something dope and easy to use so just use that.
all you need is the part of the video link that comes after the equal sign in address bar
here's the url: https://www.youtube.com/watch?v=9rDhY1P3YLA
I could just cut it off and copy paste away but I am trying to build something I can use in the future.
video_url = "https://www.youtube.com/watch?v=9rDhY1P3YLA"
video_splice = video_url.split('=')
video_splice
video_transcript = YouTubeTranscriptApi.get_transcript(video_splice[1])
video_transcript[0:7]
video_transcript[0]['text'] # how to separate one 'text' from the pack
video_transcript_list = []
a = 0
for i in video_transcript:
video_transcript_list.append(video_transcript[a]['text'])
a+=1
video_transcript_list[:10]
alrighty, now for the finale. I'm going to turn it into a string.
s = ' '
transcript_string = s.join(video_transcript_list)
transcript_string[:500]
boom da boom. That's not all I have done today for coding but I don't always just want to work on my site. I also did some Dataquest lessons. Let me know what you think.
Peace.
!date # begin of EDIT
okay spoiler alert. i didn't end up doing what i said i was going to do in the first part of this post. why? i'm not sure i haven't read day two yet and i said i was going to go wild up there so who knows WHAT was going to happen. and the very last cell where i said i'm going to turn it into a string i don't even test the data type..i just print it. so silly.
the reason i was interested in this in the first place was to build up a kind of search bank for conversational knowledge on certain educational youtube videos. i think that was the intention. i'm not sure what i inteded to build tho..
okay for some vocab words up there. and maybe down here too.
min.js i still have no clue, something you need in your local directory to run javascript (that's a wile guess)
corpus is a collection of words. honestly that's the simplest version of it. the reason it's called a corpus is because of what it is used for, and it's used to teach machines how to read things within the context of what words are provided in that corpus. a corpus is kind of a data structure that you can get information about these words too.
data structure tbh, data structures and algorithms were my last course in computer science in school. and it was hard but simply - a data structure is for storing data, an algorithm is a pattern to take that data and move it over here and do something fancy to it. in code a data structure can look like these
some_data = ['look at all the data', 1, 3]
some other data = { 'this one' : 'has key value pairs',
'key' : 'value'
'do' : 'do'}
with all that contextual knowledge it's good to probably know that my indecision in ever completing this project probably came from trying to decide whether to write code or to find another youtuber. or day two came and i was hurrying trying to put code together.