day 1 ..

2019-07-24 01:12:15.531117

Day 1 I wanted to go wild and out. I've done 100 days before. It was like..hundreds of days ago.

So yeah, I know I've got some skills. I wanted to mess around with some javascript and make a graph but I can't figure out the whole .min.js file in the folder thing so maybe someone can tell me who knows about those things.

So I wanted to publicly start a project. I want to catalog a bunch of videos I want to watch on Youtube and see if I can't make a language corpus for the channel as well. What will these things help me with? Not much useful stuff but it's challenging and for now that's reason enough.

The channel I've picked is Siraj Rival's channel. He's super energetic, fun to watch, and talks about awesome things so maybe I might be able to meta learn from doing this to his channel. Going to start with the smallest slice, getting the transcript.

In [2]:
from youtube_transcript_api import YouTubeTranscriptApi

Install it if you don't have it. I'm your friend

Someone already made something dope and easy to use so just use that.

all you need is the part of the video link that comes after the equal sign in address bar


here's the url: https://www.youtube.com/watch?v=9rDhY1P3YLA

I could just cut it off and copy paste away but I am trying to build something I can use in the future.

In [12]:
video_url = "https://www.youtube.com/watch?v=9rDhY1P3YLA"
video_splice = video_url.split('=')
video_splice
Out[12]:
['https://www.youtube.com/watch?v', '9rDhY1P3YLA']
In [23]:
video_transcript = YouTubeTranscriptApi.get_transcript(video_splice[1])
video_transcript[0:7]
Out[23]:
[{'text': 'come for the data science stay for the',
  'start': 0.0,
  'duration': 5.19},
 {'text': "memes hello world it's Suraj and data",
  'start': 1.89,
  'duration': 5.31},
 {'text': 'science is the hottest career to get',
  'start': 5.19,
  'duration': 4.32},
 {'text': 'into this year every industry is', 'start': 7.2, 'duration': 4.59},
 {'text': 'collecting customer data and using it to',
  'start': 9.51,
  'duration': 4.799},
 {'text': 'make smarter decisions which leads to',
  'start': 11.79,
  'duration': 5.069},
 {'text': 'higher profits the demand to fill data',
  'start': 14.309,
  'duration': 4.831}]
In [15]:
video_transcript[0]['text'] # how to separate one 'text' from the pack
Out[15]:
'come for the data science stay for the'
In [25]:
video_transcript_list = []
a = 0

for i in video_transcript:
    video_transcript_list.append(video_transcript[a]['text'])
    a+=1
    
video_transcript_list[:10]
Out[25]:
['come for the data science stay for the',
 "memes hello world it's Suraj and data",
 'science is the hottest career to get',
 'into this year every industry is',
 'collecting customer data and using it to',
 'make smarter decisions which leads to',
 'higher profits the demand to fill data',
 'science positions is through the roof',
 'globally and forecasts reveal that this',
 'demand will only increase in the coming']

alrighty, now for the finale. I'm going to turn it into a string.

In [28]:
s = ' '
transcript_string = s.join(video_transcript_list)

transcript_string[:500]
Out[28]:
"come for the data science stay for the memes hello world it's Suraj and data science is the hottest career to get into this year every industry is collecting customer data and using it to make smarter decisions which leads to higher profits the demand to fill data science positions is through the roof globally and forecasts reveal that this demand will only increase in the coming years so to help you take part in this rapidly growing field I've created a three-month curriculum to take you from a"

boom da boom. That's not all I have done today for coding but I don't always just want to work on my site. I also did some Dataquest lessons. Let me know what you think.

Peace.


In [2]:
!date # begin of EDIT
Sat 07 Aug 2021 08:19:04 PM EDT

EDIT

okay spoiler alert. i didn't end up doing what i said i was going to do in the first part of this post. why? i'm not sure i haven't read day two yet and i said i was going to go wild up there so who knows WHAT was going to happen. and the very last cell where i said i'm going to turn it into a string i don't even test the data type..i just print it. so silly.

the reason i was interested in this in the first place was to build up a kind of search bank for conversational knowledge on certain educational youtube videos. i think that was the intention. i'm not sure what i inteded to build tho..

okay for some vocab words up there. and maybe down here too.

  • min.js i still have no clue, something you need in your local directory to run javascript (that's a wile guess)

  • corpus is a collection of words. honestly that's the simplest version of it. the reason it's called a corpus is because of what it is used for, and it's used to teach machines how to read things within the context of what words are provided in that corpus. a corpus is kind of a data structure that you can get information about these words too.

  • data structure tbh, data structures and algorithms were my last course in computer science in school. and it was hard but simply - a data structure is for storing data, an algorithm is a pattern to take that data and move it over here and do something fancy to it. in code a data structure can look like these

some_data = ['look at all the data', 1, 3]
some other data = { 'this one' : 'has key value pairs',
                    'key' : 'value'
                    'do' : 'do'}

with all that contextual knowledge it's good to probably know that my indecision in ever completing this project probably came from trying to decide whether to write code or to find another youtuber. or day two came and i was hurrying trying to put code together.

In [ ]: