day 1 ..

Day 1 I wanted to go wild and out. I've done 100 days before. It was like..hundreds of days ago.

So yeah, I know I've got some skills. I wanted to mess around with some javascript and make a graph but I can't figure out the whole .min.js file in the folder thing so maybe someone can tell me who knows about those things.

So I wanted to publicly start a project. I want to catalog a bunch of videos I want to watch on Youtube and see if I can't make a language corpus for the channel as well. What will these things help me with? Not much useful stuff but it's challenging and for now that's reason enough.

The channel I've picked is Siraj Rival's channel. He's super energetic, fun to watch, and talks about awesome things so maybe I might be able to meta learn from doing this to his channel. Going to start with the smallest slice, getting the transcript.

In [2]:
from youtube_transcript_api import YouTubeTranscriptApi

Install it if you don't have it. I'm your friend

Someone already made something dope and easy to use so just use that.

all you need is the part of the video link that comes after the equal sign in address bar


here's the url: https://www.youtube.com/watch?v=9rDhY1P3YLA

I could just cut it off and copy paste away but I am trying to build something I can use in the future.

In [12]:
video_url = "https://www.youtube.com/watch?v=9rDhY1P3YLA"
video_splice = video_url.split('=')
video_splice
Out[12]:
['https://www.youtube.com/watch?v', '9rDhY1P3YLA']
In [11]:
video_transcript = YouTubeTranscriptApi.get_transcript(video_splice[1])
video_transcript
Out[11]:
[{'text': 'come for the data science stay for the',
  'start': 0.0,
  'duration': 5.19},
 {'text': "memes hello world it's Suraj and data",
  'start': 1.89,
  'duration': 5.31},
 {'text': 'science is the hottest career to get',
  'start': 5.19,
  'duration': 4.32},
 {'text': 'into this year every industry is', 'start': 7.2, 'duration': 4.59},
 {'text': 'collecting customer data and using it to',
  'start': 9.51,
  'duration': 4.799},
 {'text': 'make smarter decisions which leads to',
  'start': 11.79,
  'duration': 5.069},
 {'text': 'higher profits the demand to fill data',
  'start': 14.309,
  'duration': 4.831},
 {'text': 'science positions is through the roof',
  'start': 16.859,
  'duration': 4.741},
 {'text': 'globally and forecasts reveal that this',
  'start': 19.14,
  'duration': 4.92},
 {'text': 'demand will only increase in the coming',
  'start': 21.6,
  'duration': 3.0},
 {'text': 'years', 'start': 24.06, 'duration': 3.12},
 {'text': 'so to help you take part in this rapidly',
  'start': 24.6,
  'duration': 5.519},
 {'text': "growing field I've created a three-month",
  'start': 27.18,
  'duration': 5.399},
 {'text': 'curriculum to take you from absolute',
  'start': 30.119,
  'duration': 4.771},
 {'text': 'beginner to proficient in the art of',
  'start': 32.579,
  'duration': 5.16},
 {'text': 'data science this open source curriculum',
  'start': 34.89,
  'duration': 5.939},
 {'text': 'consists of purely free resources that',
  'start': 37.739,
  'duration': 5.521},
 {'text': "I've compiled from across the web and",
  'start': 40.829,
  'duration': 4.89},
 {'text': "has no prerequisites you don't even have",
  'start': 43.26,
  'duration': 4.83},
 {'text': "to have coded before I've designed it",
  'start': 45.719,
  'duration': 4.561},
 {'text': 'for anyone who wants to improve their',
  'start': 48.09,
  'duration': 5.52},
 {'text': 'skills and find paid work ASAP either',
  'start': 50.28,
  'duration': 5.88},
 {'text': 'through a full-time position or contract',
  'start': 53.61,
  'duration': 4.83},
 {'text': "work you'll be learning a host of tools",
  'start': 56.16,
  'duration': 5.579},
 {'text': 'like sequel Python Hadoop and even data',
  'start': 58.44,
  'duration': 5.22},
 {'text': 'storytelling all of which make up the',
  'start': 61.739,
  'duration': 4.321},
 {'text': 'complete data science pipeline data',
  'start': 63.66,
  'duration': 4.77},
 {'text': 'science is the area of study involving',
  'start': 66.06,
  'duration': 4.95},
 {'text': 'extracting insights from data and a data',
  'start': 68.43,
  'duration': 5.04},
 {'text': 'scientist sits at the intersection of',
  'start': 71.01,
  'duration': 4.92},
 {'text': 'math software engineering and data',
  'start': 73.47,
  'duration': 4.89},
 {'text': 'communication or the ability to', 'start': 75.93, 'duration': 5.28},
 {'text': 'communicate insights from data there are',
  'start': 78.36,
  'duration': 5.189},
 {'text': 'a lot of related positions in the field',
  'start': 81.21,
  'duration': 4.769},
 {'text': 'ranging from machine learning engineer',
  'start': 83.549,
  'duration': 5.011},
 {'text': 'to data analyst to business analytics',
  'start': 85.979,
  'duration': 5.671},
 {'text': 'specialist usually a data scientist is',
  'start': 88.56,
  'duration': 5.669},
 {'text': 'expected to formulate the questions that',
  'start': 91.65,
  'duration': 4.59},
 {'text': 'will help a business and then proceeds',
  'start': 94.229,
  'duration': 4.921},
 {'text': 'to solve them while a data analyst is',
  'start': 96.24,
  'duration': 5.43},
 {'text': 'given questions by the business team and',
  'start': 99.15,
  'duration': 5.46},
 {'text': 'pursues a solution with that guidance on',
  'start': 101.67,
  'duration': 5.4},
 {'text': 'the other hand a machine learning',
  'start': 104.61,
  'duration': 4.91},
 {'text': 'engineers goal is to build and optimize',
  'start': 107.07,
  'duration': 5.369},
 {'text': "predictive models there's a lots of",
  'start': 109.52,
  'duration': 5.169},
 {'text': 'intersection between data science roles',
  'start': 112.439,
  'duration': 4.651},
 {'text': 'but the data scientist is usually the',
  'start': 114.689,
  'duration': 5.07},
 {'text': 'most senior role for example if we look',
  'start': 117.09,
  'duration': 5.55},
 {'text': 'at a data scientist job position hiring',
  'start': 119.759,
  'duration': 4.771},
 {'text': 'page at one of the big four tech',
  'start': 122.64,
  'duration': 4.32},
 {'text': 'companies like Google or Facebook will',
  'start': 124.53,
  'duration': 5.099},
 {'text': 'see that they expect several years of',
  'start': 126.96,
  'duration': 6.22},
 {'text': 'experience and irrelevant undergraduate',
  'start': 129.629,
  'duration': 6.041},
 {'text': "even graduate level degree that's",
  'start': 133.18,
  'duration': 4.699},
 {'text': 'because they can afford to do that',
  'start': 135.67,
  'duration': 4.409},
 {'text': 'everyone wants to work there and they',
  'start': 137.879,
  'duration': 4.961},
 {'text': 'have more data than anyone else on the',
  'start': 140.079,
  'duration': 5.671},
 {'text': 'planet so they set the bar very high but',
  'start': 142.84,
  'duration': 5.58},
 {'text': "don't get discouraged by that if you're",
  'start': 145.75,
  'duration': 5.609},
 {'text': 'applying as a first time data scientist',
  'start': 148.42,
  'duration': 5.219},
 {'text': "it's best to avoid applying there and",
  'start': 151.359,
  'duration': 4.981},
 {'text': 'instead applying to a lesser demanding',
  'start': 153.639,
  'duration': 5.52},
 {'text': 'role like a data analyst data science',
  'start': 156.34,
  'duration': 5.129},
 {'text': 'jobs at smaller companies are much more',
  'start': 159.159,
  'duration': 4.651},
 {'text': 'forgiving and you can make up for both a',
  'start': 161.469,
  'duration': 4.981},
 {'text': 'lack of experience and any gaps in',
  'start': 163.81,
  'duration': 5.459},
 {'text': 'formal education by showcasing the depth',
  'start': 166.45,
  'duration': 5.31},
 {'text': 'of your skills if you start your career',
  'start': 169.269,
  'duration': 4.92},
 {'text': 'there you can work your way up to one of',
  'start': 171.76,
  'duration': 4.559},
 {'text': 'the bigger companies or of course start',
  'start': 174.189,
  'duration': 4.17},
 {'text': "your own data science business I've",
  'start': 176.319,
  'duration': 4.35},
 {'text': 'divided this curriculum up into three',
  'start': 178.359,
  'duration': 4.53},
 {'text': 'months the first month focuses on data',
  'start': 180.669,
  'duration': 4.861},
 {'text': 'analysis month 2 is all about machine',
  'start': 182.889,
  'duration': 4.951},
 {'text': 'learning and the last month will have us',
  'start': 185.53,
  'duration': 5.34},
 {'text': 'learn production grade tools like spark',
  'start': 187.84,
  'duration': 5.94},
 {'text': 'and Hadoop that data scientists use in',
  'start': 190.87,
  'duration': 5.399},
 {'text': 'the real world before I start describing',
  'start': 193.78,
  'duration': 4.769},
 {'text': 'the curriculum keep in mind that we are',
  'start': 196.269,
  'duration': 5.761},
 {'text': 'practicing accelerated learning yes each',
  'start': 198.549,
  'duration': 6.241},
 {'text': 'week of my curriculum consists of a full',
  'start': 202.03,
  'duration': 4.769},
 {'text': "online course that's supposed to take",
  'start': 204.79,
  'duration': 4.319},
 {'text': "several weeks but we're concerned with",
  'start': 206.799,
  'duration': 4.59},
 {'text': 'efficiently downloading as much',
  'start': 209.109,
  'duration': 4.74},
 {'text': 'knowledge into our brains as fast as',
  'start': 211.389,
  'duration': 5.19},
 {'text': 'possible to do this watch course videos',
  'start': 213.849,
  'duration': 5.161},
 {'text': 'at 2x or 3x speed using a browser',
  'start': 216.579,
  'duration': 5.37},
 {'text': 'extension dedicate 2 or 3 hours every',
  'start': 219.01,
  'duration': 5.759},
 {'text': 'day to studying handwrite notes as you',
  'start': 221.949,
  'duration': 4.65},
 {'text': 'watch for increased memory retention',
  'start': 224.769,
  'duration': 4.261},
 {'text': 'which has been proven and complete just',
  'start': 226.599,
  'duration': 4.59},
 {'text': 'one of the projects of your choice from',
  'start': 229.03,
  'duration': 4.709},
 {'text': 'each course at the end of the week to',
  'start': 231.189,
  'duration': 5.121},
 {'text': "help synthesize the ideas you've learned",
  'start': 233.739,
  'duration': 5.31},
 {'text': "also while you're learning immerse",
  'start': 236.31,
  'duration': 5.139},
 {'text': 'yourself in the community by following',
  'start': 239.049,
  'duration': 4.981},
 {'text': 'this great list of data scientists for',
  'start': 241.449,
  'duration': 4.26},
 {'text': 'the first week will want to learn Python',
  'start': 244.03,
  'duration': 4.379},
 {'text': 'perhaps the most important tool in the',
  'start': 245.709,
  'duration': 5.131},
 {'text': "data science pipeline it's a highly",
  'start': 248.409,
  'duration': 4.23},
 {'text': "versatile programming language that's",
  'start': 250.84,
  'duration': 4.28},
 {'text': 'used across many different industries',
  'start': 252.639,
  'duration': 5.761},
 {'text': 'EDX has developed a great course made',
  'start': 255.12,
  'duration': 5.729},
 {'text': 'for absolute beginners to learn Python',
  'start': 258.4,
  'duration': 5.4},
 {'text': 'specifically for data science it takes',
  'start': 260.849,
  'duration': 5.011},
 {'text': 'us from Python language fundamental',
  'start': 263.8,
  'duration': 4.839},
 {'text': 'up to creating plots using real data',
  'start': 265.86,
  'duration': 5.399},
 {'text': "additionally I've developed a fun learn",
  'start': 268.639,
  'duration': 5.141},
 {'text': 'Python for data science playlist so',
  'start': 271.259,
  'duration': 5.071},
 {'text': 'definitely check that out once we have a',
  'start': 273.78,
  'duration': 5.4},
 {'text': 'basic grasp of Python in the second week',
  'start': 276.33,
  'duration': 4.739},
 {'text': "we'll want to take the statistics and",
  'start': 279.18,
  'duration': 4.949},
 {'text': "probability course at Khan Academy it's",
  'start': 281.069,
  'duration': 5.13},
 {'text': "actually really fun Khan Academy's",
  'start': 284.129,
  'duration': 4.741},
 {'text': 'website has gotten better every year the',
  'start': 286.199,
  'duration': 5.071},
 {'text': 'course has interactive content and they',
  'start': 288.87,
  'duration': 3.69},
 {'text': "make it feel like you're playing a game",
  'start': 291.27,
  'duration': 4.139},
 {'text': 'due to the mastery points system it',
  'start': 292.56,
  'duration': 4.59},
 {'text': 'covers topics like probability',
  'start': 295.409,
  'duration': 4.141},
 {'text': 'distributions random variables and',
  'start': 297.15,
  'duration': 4.949},
 {'text': 'hypothesis testing all of which are',
  'start': 299.55,
  'duration': 4.859},
 {'text': 'supremely useful in the data science',
  'start': 302.099,
  'duration': 5.391},
 {'text': 'pipeline after we have a bit more of a',
  'start': 304.409,
  'duration': 5.79},
 {'text': 'mathematical foundation we can start',
  'start': 307.49,
  'duration': 4.66},
 {'text': 'learning how to perform all sorts of',
  'start': 310.199,
  'duration': 4.52},
 {'text': 'exploratory data analysis techniques',
  'start': 312.15,
  'duration': 5.04},
 {'text': 'some of which use probability and',
  'start': 314.719,
  'duration': 4.57},
 {'text': 'statistics this is the process of',
  'start': 317.19,
  'duration': 4.349},
 {'text': 'summarizing the main characteristics of',
  'start': 319.289,
  'duration': 4.74},
 {'text': 'a data set Georgia Tech released a',
  'start': 321.539,
  'duration': 4.891},
 {'text': 'course called introduction to computing',
  'start': 324.029,
  'duration': 5.341},
 {'text': 'for data analysis that demonstrates how',
  'start': 326.43,
  'duration': 6.359},
 {'text': 'to pre-process analyze and visualize a',
  'start': 329.37,
  'duration': 5.4},
 {'text': 'data set the important thing about this',
  'start': 332.789,
  'duration': 4.111},
 {'text': 'course is that most of the focus is on',
  'start': 334.77,
  'duration': 4.47},
 {'text': 'data cleaning and in the real world data',
  'start': 336.9,
  'duration': 4.079},
 {'text': 'scientists will be quick to tell you',
  'start': 339.24,
  'duration': 3.87},
 {'text': 'that most of their time is spent',
  'start': 340.979,
  'duration': 4.921},
 {'text': 'cleaning data real world data is messy',
  'start': 343.11,
  'duration': 4.739},
 {'text': "it's not like kaggle where we get neatly",
  'start': 345.9,
  'duration': 4.769},
 {'text': "packaged data sets its unlabeled it's",
  'start': 347.849,
  'duration': 4.68},
 {'text': 'got missing values irrelevant features',
  'start': 350.669,
  'duration': 5.011},
 {'text': 'so learning how to carefully sculpt a',
  'start': 352.529,
  'duration': 4.89},
 {'text': "data set so that it's ready for further",
  'start': 355.68,
  'duration': 3.479},
 {'text': 'analysis is crucial', 'start': 357.419, 'duration': 3.72},
 {'text': 'speaking of Kaggle the website has',
  'start': 359.159,
  'duration': 4.68},
 {'text': 'become a phenomenal resource for data',
  'start': 361.139,
  'duration': 5.7},
 {'text': "science enthusiasts it's become not only",
  'start': 363.839,
  'duration': 5.67},
 {'text': 'a place for data scientists to compete',
  'start': 366.839,
  'duration': 5.431},
 {'text': 'for prize money by solving problems for',
  'start': 369.509,
  'duration': 4.62},
 {'text': 'companies but an incredible learning',
  'start': 372.27,
  'duration': 4.709},
 {'text': 'resource in fact Kaggle has a learn',
  'start': 374.129,
  'duration': 5.22},
 {'text': 'section that contains courses on a',
  'start': 376.979,
  'duration': 4.53},
 {'text': "series of tools you'll need to",
  'start': 379.349,
  'duration': 4.861},
 {'text': 'understand data science each course is a',
  'start': 381.509,
  'duration': 5.431},
 {'text': 'series of well-documented cago kernels',
  'start': 384.21,
  'duration': 5.1},
 {'text': 'which are their version of jupiter',
  'start': 386.94,
  'duration': 5.069},
 {'text': "notebooks my only gripe is that there's",
  'start': 389.31,
  'duration': 5.699},
 {'text': 'no video content or assignments but an',
  'start': 392.009,
  'duration': 6.241},
 {'text': 'awesome resource nonetheless definitely',
  'start': 395.009,
  'duration': 4.571},
 {'text': 'something to browse', 'start': 398.25, 'duration': 3.49},
 {'text': 'for week four spend the week solving a',
  'start': 399.58,
  'duration': 4.559},
 {'text': 'Kaggle competition that you personally',
  'start': 401.74,
  'duration': 5.34},
 {'text': "find interesting that's the best way to",
  'start': 404.139,
  'duration': 5.34},
 {'text': 'stay motivated pick a completed',
  'start': 407.08,
  'duration': 4.889},
 {'text': 'competition and briefly view one of the',
  'start': 409.479,
  'duration': 4.291},
 {'text': "Colonel's to get some sense of what",
  'start': 411.969,
  'duration': 4.801},
 {'text': 'people have done before then create your',
  'start': 413.77,
  'duration': 6.329},
 {'text': 'own repository and get to work document',
  'start': 416.77,
  'duration': 5.519},
 {'text': 'the project very well on your github',
  'start': 420.099,
  'duration': 4.771},
 {'text': 'profile so that anyone who views it can',
  'start': 422.289,
  'duration': 4.861},
 {'text': 'run the code if they follow your',
  'start': 424.87,
  'duration': 5.13},
 {'text': 'instructions including any future',
  'start': 427.15,
  'duration': 5.669},
 {'text': 'employers remember github is the new',
  'start': 430.0,
  'duration': 5.55},
 {'text': 'resume now that we know how to clean a',
  'start': 432.819,
  'duration': 5.16},
 {'text': 'data set and explore its different',
  'start': 435.55,
  'duration': 4.799},
 {'text': 'features and relationships we can start',
  'start': 437.979,
  'duration': 4.91},
 {'text': 'diving into the art of machine learning',
  'start': 440.349,
  'duration': 5.04},
 {'text': 'machine learning models help us derive',
  'start': 442.889,
  'duration': 5.58},
 {'text': 'insights from data sets correlations',
  'start': 445.389,
  'duration': 5.49},
 {'text': "classifications clustering there's a lot",
  'start': 448.469,
  'duration': 4.66},
 {'text': 'of possibilities there are several',
  'start': 450.879,
  'duration': 5.371},
 {'text': 'mathematical disciplines that make up ml',
  'start': 453.129,
  'duration': 5.4},
 {'text': "and I've got a cheat sheet for each of",
  'start': 456.25,
  'duration': 4.77},
 {'text': 'them that lists the most relevant parts',
  'start': 458.529,
  'duration': 4.471},
 {'text': "you'll need to know in the video",
  'start': 461.02,
  'duration': 4.259},
 {'text': 'description Columbia has an excellent',
  'start': 463.0,
  'duration': 4.259},
 {'text': 'course called machine learning for data',
  'start': 465.279,
  'duration': 5.04},
 {'text': 'science and analytics on EDX it starts',
  'start': 467.259,
  'duration': 5.31},
 {'text': 'with concepts like search trees and',
  'start': 470.319,
  'duration': 4.25},
 {'text': 'linear programming applied to a',
  'start': 472.569,
  'duration': 5.25},
 {'text': 'real-world personal genomic data set to',
  'start': 474.569,
  'duration': 5.65},
 {'text': 'give us an algorithmic foundation then',
  'start': 477.819,
  'duration': 4.71},
 {'text': 'it moves into popular machine learning',
  'start': 480.219,
  'duration': 5.1},
 {'text': 'techniques except for deep learning deep',
  'start': 482.529,
  'duration': 4.831},
 {'text': 'learning is the subset of machine',
  'start': 485.319,
  'duration': 4.71},
 {'text': 'learning focus on just one type of model',
  'start': 487.36,
  'duration': 5.489},
 {'text': 'neural networks the online deep learning',
  'start': 490.029,
  'duration': 5.67},
 {'text': 'book specifically parts 1 & 2 will get',
  'start': 492.849,
  'duration': 4.861},
 {'text': 'you up to speed on deep learning very',
  'start': 495.699,
  'duration': 4.731},
 {'text': 'fast so spend week 7 reading that',
  'start': 497.71,
  'duration': 4.62},
 {'text': "additionally I've got a deep learning",
  'start': 500.43,
  'duration': 3.549},
 {'text': "playlist on YouTube that's very",
  'start': 502.33,
  'duration': 4.559},
 {'text': "extensive for week 8 it's time for",
  'start': 503.979,
  'duration': 5.34},
 {'text': 'Kaggle project number 2 this time with',
  'start': 506.889,
  'duration': 4.14},
 {'text': 'the focus on different ways of using',
  'start': 509.319,
  'duration': 3.84},
 {'text': 'either machine learning or deep learning',
  'start': 511.029,
  'duration': 5.31},
 {'text': "to solve a problem or last month we'll",
  'start': 513.159,
  'duration': 5.581},
 {'text': 'focus on learning how the modern data',
  'start': 516.339,
  'duration': 5.521},
 {'text': 'science pipeline works data sets usually',
  'start': 518.74,
  'duration': 5.58},
 {'text': 'live in data bases so learning how to',
  'start': 521.86,
  'duration': 4.91},
 {'text': 'work with data bases is important',
  'start': 524.32,
  'duration': 5.34},
 {'text': 'Udacity is intro to relational databases',
  'start': 526.77,
  'duration': 5.71},
 {'text': 'is a relatively short but detailed',
  'start': 529.66,
  'duration': 3.56},
 {'text': 'introduction', 'start': 532.48, 'duration': 2.839},
 {'text': 'to the basics of structured query',
  'start': 533.22,
  'duration': 4.71},
 {'text': 'language or sequel and database design',
  'start': 535.319,
  'duration': 5.341},
 {'text': 'as well as the Python API for connecting',
  'start': 537.93,
  'duration': 5.699},
 {'text': 'Python code to a database will also fit',
  'start': 540.66,
  'duration': 5.25},
 {'text': 'in another short course into this week',
  'start': 543.629,
  'duration': 4.861},
 {'text': 'on the other type of database no sequel',
  'start': 545.91,
  'duration': 5.279},
 {'text': 'the intro to no sequel data solutions',
  'start': 548.49,
  'duration': 5.25},
 {'text': 'course by Microsoft on EDX is perfect',
  'start': 551.189,
  'duration': 4.621},
 {'text': 'for this it leads us through the three',
  'start': 553.74,
  'duration': 4.8},
 {'text': "V's of no sequel variety volume and",
  'start': 555.81,
  'duration': 4.889},
 {'text': 'velocity by demonstrating popular',
  'start': 558.54,
  'duration': 5.19},
 {'text': "examples like MongoDB for week 10 we'll",
  'start': 560.699,
  'duration': 5.281},
 {'text': 'move on to Hadoop and MapReduce as',
  'start': 563.73,
  'duration': 4.83},
 {'text': 'Google grew it had to index more and',
  'start': 565.98,
  'duration': 4.74},
 {'text': 'more data over a billion pages of',
  'start': 568.56,
  'duration': 4.29},
 {'text': 'content and in order to cope they',
  'start': 570.72,
  'duration': 4.5},
 {'text': 'invented a new style of data processing',
  'start': 572.85,
  'duration': 5.52},
 {'text': 'known as MapReduce Hadoop was created to',
  'start': 575.22,
  'duration': 5.52},
 {'text': 'apply these concepts to an open source',
  'start': 578.37,
  'duration': 4.98},
 {'text': 'framework that anyone could use data',
  'start': 580.74,
  'duration': 5.219},
 {'text': 'scientists use MapReduce to process data',
  'start': 583.35,
  'duration': 4.979},
 {'text': 'frequently and the intro to Hadoop and',
  'start': 585.959,
  'duration': 4.92},
 {'text': 'MapReduce by cloud era course on Udacity',
  'start': 588.329,
  'duration': 4.62},
 {'text': 'is the perfect way to get familiar with',
  'start': 590.879,
  'duration': 3.931},
 {'text': "these concepts there's also another",
  'start': 592.949,
  'duration': 4.351},
 {'text': 'framework called spark that is newer',
  'start': 594.81,
  'duration': 4.23},
 {'text': 'than Hadoop and is getting a lot of',
  'start': 597.3,
  'duration': 4.019},
 {'text': "attention because it's useful in",
  'start': 599.04,
  'duration': 3.779},
 {'text': 'different ways think of it like an',
  'start': 601.319,
  'duration': 4.2},
 {'text': 'extension of Hadoop Stanford has a one',
  'start': 602.819,
  'duration': 5.19},
 {'text': 'day workshop on spark and we can use the',
  'start': 605.519,
  'duration': 5.04},
 {'text': 'associated slide deck tutorial to learn',
  'start': 608.009,
  'duration': 5.07},
 {'text': "more when you're working on a team as a",
  'start': 610.559,
  'duration': 5.101},
 {'text': "data scientist often you're tasked with",
  'start': 613.079,
  'duration': 5.37},
 {'text': 'communicating your results to people in',
  'start': 615.66,
  'duration': 5.369},
 {'text': 'different teams so important business',
  'start': 618.449,
  'duration': 5.461},
 {'text': 'decisions can be made Microsoft has a',
  'start': 621.029,
  'duration': 5.31},
 {'text': 'course on EDX called analytics',
  'start': 623.91,
  'duration': 4.979},
 {'text': 'storytelling for impact that perfectly',
  'start': 626.339,
  'duration': 4.951},
 {'text': 'fits this use case and for the last week',
  'start': 628.889,
  'duration': 4.771},
 {'text': 'complete one more Kaggle project so you',
  'start': 631.29,
  'duration': 4.979},
 {'text': 'have three great demos to show the world',
  'start': 633.66,
  'duration': 4.919},
 {'text': 'once you finish this course you can',
  'start': 636.269,
  'duration': 5.161},
 {'text': 'start applying for jobs doing contract',
  'start': 638.579,
  'duration': 5.041},
 {'text': 'work start your own data science',
  'start': 641.43,
  'duration': 4.379},
 {'text': 'consulting group or just keep on',
  'start': 643.62,
  'duration': 4.589},
 {'text': 'learning remember to believe in your',
  'start': 645.809,
  'duration': 5.311},
 {'text': 'ability to learn you can learn data',
  'start': 648.209,
  'duration': 5.55},
 {'text': 'science you will learn data science and',
  'start': 651.12,
  'duration': 4.62},
 {'text': 'if you stick to it eventually', 'start': 653.759, 'duration': 5.25},
 {'text': 'you will master it oh and find a study',
  'start': 655.74,
  'duration': 5.519},
 {'text': "buddy to keep you motivated I've created",
  'start': 659.009,
  'duration': 4.5},
 {'text': 'a data science in three months channel',
  'start': 661.259,
  'duration': 4.231},
 {'text': 'in our slack group to help you find one',
  'start': 663.509,
  'duration': 2.541},
 {'text': 'good', 'start': 665.49, 'duration': 2.959},
 {'text': "I'm rooting for you please subscribe for",
  'start': 666.05,
  'duration': 4.259},
 {'text': "more programming videos and for now I've",
  'start': 668.449,
  'duration': 4.08},
 {'text': 'got to clean my data so thanks for',
  'start': 670.309,
  'duration': 4.58},
 {'text': 'watching', 'start': 672.529, 'duration': 2.36}]
In [15]:
video_transcript[0]['text'] # how to separate one 'text' from the pack
Out[15]:
'come for the data science stay for the'
In [20]:
video_transcript_list = []
a = 0

for i in video_transcript:
    video_transcript_list.append(video_transcript[a]['text'])
    a+=1
    
video_transcript_list
Out[20]:
['come for the data science stay for the',
 "memes hello world it's Suraj and data",
 'science is the hottest career to get',
 'into this year every industry is',
 'collecting customer data and using it to',
 'make smarter decisions which leads to',
 'higher profits the demand to fill data',
 'science positions is through the roof',
 'globally and forecasts reveal that this',
 'demand will only increase in the coming',
 'years',
 'so to help you take part in this rapidly',
 "growing field I've created a three-month",
 'curriculum to take you from absolute',
 'beginner to proficient in the art of',
 'data science this open source curriculum',
 'consists of purely free resources that',
 "I've compiled from across the web and",
 "has no prerequisites you don't even have",
 "to have coded before I've designed it",
 'for anyone who wants to improve their',
 'skills and find paid work ASAP either',
 'through a full-time position or contract',
 "work you'll be learning a host of tools",
 'like sequel Python Hadoop and even data',
 'storytelling all of which make up the',
 'complete data science pipeline data',
 'science is the area of study involving',
 'extracting insights from data and a data',
 'scientist sits at the intersection of',
 'math software engineering and data',
 'communication or the ability to',
 'communicate insights from data there are',
 'a lot of related positions in the field',
 'ranging from machine learning engineer',
 'to data analyst to business analytics',
 'specialist usually a data scientist is',
 'expected to formulate the questions that',
 'will help a business and then proceeds',
 'to solve them while a data analyst is',
 'given questions by the business team and',
 'pursues a solution with that guidance on',
 'the other hand a machine learning',
 'engineers goal is to build and optimize',
 "predictive models there's a lots of",
 'intersection between data science roles',
 'but the data scientist is usually the',
 'most senior role for example if we look',
 'at a data scientist job position hiring',
 'page at one of the big four tech',
 'companies like Google or Facebook will',
 'see that they expect several years of',
 'experience and irrelevant undergraduate',
 "even graduate level degree that's",
 'because they can afford to do that',
 'everyone wants to work there and they',
 'have more data than anyone else on the',
 'planet so they set the bar very high but',
 "don't get discouraged by that if you're",
 'applying as a first time data scientist',
 "it's best to avoid applying there and",
 'instead applying to a lesser demanding',
 'role like a data analyst data science',
 'jobs at smaller companies are much more',
 'forgiving and you can make up for both a',
 'lack of experience and any gaps in',
 'formal education by showcasing the depth',
 'of your skills if you start your career',
 'there you can work your way up to one of',
 'the bigger companies or of course start',
 "your own data science business I've",
 'divided this curriculum up into three',
 'months the first month focuses on data',
 'analysis month 2 is all about machine',
 'learning and the last month will have us',
 'learn production grade tools like spark',
 'and Hadoop that data scientists use in',
 'the real world before I start describing',
 'the curriculum keep in mind that we are',
 'practicing accelerated learning yes each',
 'week of my curriculum consists of a full',
 "online course that's supposed to take",
 "several weeks but we're concerned with",
 'efficiently downloading as much',
 'knowledge into our brains as fast as',
 'possible to do this watch course videos',
 'at 2x or 3x speed using a browser',
 'extension dedicate 2 or 3 hours every',
 'day to studying handwrite notes as you',
 'watch for increased memory retention',
 'which has been proven and complete just',
 'one of the projects of your choice from',
 'each course at the end of the week to',
 "help synthesize the ideas you've learned",
 "also while you're learning immerse",
 'yourself in the community by following',
 'this great list of data scientists for',
 'the first week will want to learn Python',
 'perhaps the most important tool in the',
 "data science pipeline it's a highly",
 "versatile programming language that's",
 'used across many different industries',
 'EDX has developed a great course made',
 'for absolute beginners to learn Python',
 'specifically for data science it takes',
 'us from Python language fundamental',
 'up to creating plots using real data',
 "additionally I've developed a fun learn",
 'Python for data science playlist so',
 'definitely check that out once we have a',
 'basic grasp of Python in the second week',
 "we'll want to take the statistics and",
 "probability course at Khan Academy it's",
 "actually really fun Khan Academy's",
 'website has gotten better every year the',
 'course has interactive content and they',
 "make it feel like you're playing a game",
 'due to the mastery points system it',
 'covers topics like probability',
 'distributions random variables and',
 'hypothesis testing all of which are',
 'supremely useful in the data science',
 'pipeline after we have a bit more of a',
 'mathematical foundation we can start',
 'learning how to perform all sorts of',
 'exploratory data analysis techniques',
 'some of which use probability and',
 'statistics this is the process of',
 'summarizing the main characteristics of',
 'a data set Georgia Tech released a',
 'course called introduction to computing',
 'for data analysis that demonstrates how',
 'to pre-process analyze and visualize a',
 'data set the important thing about this',
 'course is that most of the focus is on',
 'data cleaning and in the real world data',
 'scientists will be quick to tell you',
 'that most of their time is spent',
 'cleaning data real world data is messy',
 "it's not like kaggle where we get neatly",
 "packaged data sets its unlabeled it's",
 'got missing values irrelevant features',
 'so learning how to carefully sculpt a',
 "data set so that it's ready for further",
 'analysis is crucial',
 'speaking of Kaggle the website has',
 'become a phenomenal resource for data',
 "science enthusiasts it's become not only",
 'a place for data scientists to compete',
 'for prize money by solving problems for',
 'companies but an incredible learning',
 'resource in fact Kaggle has a learn',
 'section that contains courses on a',
 "series of tools you'll need to",
 'understand data science each course is a',
 'series of well-documented cago kernels',
 'which are their version of jupiter',
 "notebooks my only gripe is that there's",
 'no video content or assignments but an',
 'awesome resource nonetheless definitely',
 'something to browse',
 'for week four spend the week solving a',
 'Kaggle competition that you personally',
 "find interesting that's the best way to",
 'stay motivated pick a completed',
 'competition and briefly view one of the',
 "Colonel's to get some sense of what",
 'people have done before then create your',
 'own repository and get to work document',
 'the project very well on your github',
 'profile so that anyone who views it can',
 'run the code if they follow your',
 'instructions including any future',
 'employers remember github is the new',
 'resume now that we know how to clean a',
 'data set and explore its different',
 'features and relationships we can start',
 'diving into the art of machine learning',
 'machine learning models help us derive',
 'insights from data sets correlations',
 "classifications clustering there's a lot",
 'of possibilities there are several',
 'mathematical disciplines that make up ml',
 "and I've got a cheat sheet for each of",
 'them that lists the most relevant parts',
 "you'll need to know in the video",
 'description Columbia has an excellent',
 'course called machine learning for data',
 'science and analytics on EDX it starts',
 'with concepts like search trees and',
 'linear programming applied to a',
 'real-world personal genomic data set to',
 'give us an algorithmic foundation then',
 'it moves into popular machine learning',
 'techniques except for deep learning deep',
 'learning is the subset of machine',
 'learning focus on just one type of model',
 'neural networks the online deep learning',
 'book specifically parts 1 & 2 will get',
 'you up to speed on deep learning very',
 'fast so spend week 7 reading that',
 "additionally I've got a deep learning",
 "playlist on YouTube that's very",
 "extensive for week 8 it's time for",
 'Kaggle project number 2 this time with',
 'the focus on different ways of using',
 'either machine learning or deep learning',
 "to solve a problem or last month we'll",
 'focus on learning how the modern data',
 'science pipeline works data sets usually',
 'live in data bases so learning how to',
 'work with data bases is important',
 'Udacity is intro to relational databases',
 'is a relatively short but detailed',
 'introduction',
 'to the basics of structured query',
 'language or sequel and database design',
 'as well as the Python API for connecting',
 'Python code to a database will also fit',
 'in another short course into this week',
 'on the other type of database no sequel',
 'the intro to no sequel data solutions',
 'course by Microsoft on EDX is perfect',
 'for this it leads us through the three',
 "V's of no sequel variety volume and",
 'velocity by demonstrating popular',
 "examples like MongoDB for week 10 we'll",
 'move on to Hadoop and MapReduce as',
 'Google grew it had to index more and',
 'more data over a billion pages of',
 'content and in order to cope they',
 'invented a new style of data processing',
 'known as MapReduce Hadoop was created to',
 'apply these concepts to an open source',
 'framework that anyone could use data',
 'scientists use MapReduce to process data',
 'frequently and the intro to Hadoop and',
 'MapReduce by cloud era course on Udacity',
 'is the perfect way to get familiar with',
 "these concepts there's also another",
 'framework called spark that is newer',
 'than Hadoop and is getting a lot of',
 "attention because it's useful in",
 'different ways think of it like an',
 'extension of Hadoop Stanford has a one',
 'day workshop on spark and we can use the',
 'associated slide deck tutorial to learn',
 "more when you're working on a team as a",
 "data scientist often you're tasked with",
 'communicating your results to people in',
 'different teams so important business',
 'decisions can be made Microsoft has a',
 'course on EDX called analytics',
 'storytelling for impact that perfectly',
 'fits this use case and for the last week',
 'complete one more Kaggle project so you',
 'have three great demos to show the world',
 'once you finish this course you can',
 'start applying for jobs doing contract',
 'work start your own data science',
 'consulting group or just keep on',
 'learning remember to believe in your',
 'ability to learn you can learn data',
 'science you will learn data science and',
 'if you stick to it eventually',
 'you will master it oh and find a study',
 "buddy to keep you motivated I've created",
 'a data science in three months channel',
 'in our slack group to help you find one',
 'good',
 "I'm rooting for you please subscribe for",
 "more programming videos and for now I've",
 'got to clean my data so thanks for',
 'watching']

alrighty, now for the finale. I'm going to turn it into a string.

In [21]:
s = ' '
transcript_string = s.join(video_transcript_list)

transcript_string
Out[21]:
"come for the data science stay for the memes hello world it's Suraj and data science is the hottest career to get into this year every industry is collecting customer data and using it to make smarter decisions which leads to higher profits the demand to fill data science positions is through the roof globally and forecasts reveal that this demand will only increase in the coming years so to help you take part in this rapidly growing field I've created a three-month curriculum to take you from absolute beginner to proficient in the art of data science this open source curriculum consists of purely free resources that I've compiled from across the web and has no prerequisites you don't even have to have coded before I've designed it for anyone who wants to improve their skills and find paid work ASAP either through a full-time position or contract work you'll be learning a host of tools like sequel Python Hadoop and even data storytelling all of which make up the complete data science pipeline data science is the area of study involving extracting insights from data and a data scientist sits at the intersection of math software engineering and data communication or the ability to communicate insights from data there are a lot of related positions in the field ranging from machine learning engineer to data analyst to business analytics specialist usually a data scientist is expected to formulate the questions that will help a business and then proceeds to solve them while a data analyst is given questions by the business team and pursues a solution with that guidance on the other hand a machine learning engineers goal is to build and optimize predictive models there's a lots of intersection between data science roles but the data scientist is usually the most senior role for example if we look at a data scientist job position hiring page at one of the big four tech companies like Google or Facebook will see that they expect several years of experience and irrelevant undergraduate even graduate level degree that's because they can afford to do that everyone wants to work there and they have more data than anyone else on the planet so they set the bar very high but don't get discouraged by that if you're applying as a first time data scientist it's best to avoid applying there and instead applying to a lesser demanding role like a data analyst data science jobs at smaller companies are much more forgiving and you can make up for both a lack of experience and any gaps in formal education by showcasing the depth of your skills if you start your career there you can work your way up to one of the bigger companies or of course start your own data science business I've divided this curriculum up into three months the first month focuses on data analysis month 2 is all about machine learning and the last month will have us learn production grade tools like spark and Hadoop that data scientists use in the real world before I start describing the curriculum keep in mind that we are practicing accelerated learning yes each week of my curriculum consists of a full online course that's supposed to take several weeks but we're concerned with efficiently downloading as much knowledge into our brains as fast as possible to do this watch course videos at 2x or 3x speed using a browser extension dedicate 2 or 3 hours every day to studying handwrite notes as you watch for increased memory retention which has been proven and complete just one of the projects of your choice from each course at the end of the week to help synthesize the ideas you've learned also while you're learning immerse yourself in the community by following this great list of data scientists for the first week will want to learn Python perhaps the most important tool in the data science pipeline it's a highly versatile programming language that's used across many different industries EDX has developed a great course made for absolute beginners to learn Python specifically for data science it takes us from Python language fundamental up to creating plots using real data additionally I've developed a fun learn Python for data science playlist so definitely check that out once we have a basic grasp of Python in the second week we'll want to take the statistics and probability course at Khan Academy it's actually really fun Khan Academy's website has gotten better every year the course has interactive content and they make it feel like you're playing a game due to the mastery points system it covers topics like probability distributions random variables and hypothesis testing all of which are supremely useful in the data science pipeline after we have a bit more of a mathematical foundation we can start learning how to perform all sorts of exploratory data analysis techniques some of which use probability and statistics this is the process of summarizing the main characteristics of a data set Georgia Tech released a course called introduction to computing for data analysis that demonstrates how to pre-process analyze and visualize a data set the important thing about this course is that most of the focus is on data cleaning and in the real world data scientists will be quick to tell you that most of their time is spent cleaning data real world data is messy it's not like kaggle where we get neatly packaged data sets its unlabeled it's got missing values irrelevant features so learning how to carefully sculpt a data set so that it's ready for further analysis is crucial speaking of Kaggle the website has become a phenomenal resource for data science enthusiasts it's become not only a place for data scientists to compete for prize money by solving problems for companies but an incredible learning resource in fact Kaggle has a learn section that contains courses on a series of tools you'll need to understand data science each course is a series of well-documented cago kernels which are their version of jupiter notebooks my only gripe is that there's no video content or assignments but an awesome resource nonetheless definitely something to browse for week four spend the week solving a Kaggle competition that you personally find interesting that's the best way to stay motivated pick a completed competition and briefly view one of the Colonel's to get some sense of what people have done before then create your own repository and get to work document the project very well on your github profile so that anyone who views it can run the code if they follow your instructions including any future employers remember github is the new resume now that we know how to clean a data set and explore its different features and relationships we can start diving into the art of machine learning machine learning models help us derive insights from data sets correlations classifications clustering there's a lot of possibilities there are several mathematical disciplines that make up ml and I've got a cheat sheet for each of them that lists the most relevant parts you'll need to know in the video description Columbia has an excellent course called machine learning for data science and analytics on EDX it starts with concepts like search trees and linear programming applied to a real-world personal genomic data set to give us an algorithmic foundation then it moves into popular machine learning techniques except for deep learning deep learning is the subset of machine learning focus on just one type of model neural networks the online deep learning book specifically parts 1 & 2 will get you up to speed on deep learning very fast so spend week 7 reading that additionally I've got a deep learning playlist on YouTube that's very extensive for week 8 it's time for Kaggle project number 2 this time with the focus on different ways of using either machine learning or deep learning to solve a problem or last month we'll focus on learning how the modern data science pipeline works data sets usually live in data bases so learning how to work with data bases is important Udacity is intro to relational databases is a relatively short but detailed introduction to the basics of structured query language or sequel and database design as well as the Python API for connecting Python code to a database will also fit in another short course into this week on the other type of database no sequel the intro to no sequel data solutions course by Microsoft on EDX is perfect for this it leads us through the three V's of no sequel variety volume and velocity by demonstrating popular examples like MongoDB for week 10 we'll move on to Hadoop and MapReduce as Google grew it had to index more and more data over a billion pages of content and in order to cope they invented a new style of data processing known as MapReduce Hadoop was created to apply these concepts to an open source framework that anyone could use data scientists use MapReduce to process data frequently and the intro to Hadoop and MapReduce by cloud era course on Udacity is the perfect way to get familiar with these concepts there's also another framework called spark that is newer than Hadoop and is getting a lot of attention because it's useful in different ways think of it like an extension of Hadoop Stanford has a one day workshop on spark and we can use the associated slide deck tutorial to learn more when you're working on a team as a data scientist often you're tasked with communicating your results to people in different teams so important business decisions can be made Microsoft has a course on EDX called analytics storytelling for impact that perfectly fits this use case and for the last week complete one more Kaggle project so you have three great demos to show the world once you finish this course you can start applying for jobs doing contract work start your own data science consulting group or just keep on learning remember to believe in your ability to learn you can learn data science you will learn data science and if you stick to it eventually you will master it oh and find a study buddy to keep you motivated I've created a data science in three months channel in our slack group to help you find one good I'm rooting for you please subscribe for more programming videos and for now I've got to clean my data so thanks for watching"

boom da boom. That's not all I have done today for coding but I don't always just want to work on my site. I also did some Dataquest lessons. Let me know what you think.

Peace.