In [1]:
!date # not of day ten but the day I remembered to date this post..
Mon Sep  9 11:04:47 CDT 2019
In [7]:
import scrapy

lego day 10

double digits. (breathes in long breath of air) feels good to be here.

check me out.

butimnotarapper gif

That literally took me all morning to do. I just got done doing a bunch of warm up tutorials on using Scrapy again. I'm not sure where Scrapy is in the Scraper world as a viable tool but it sure is fun.

Alright.

here's the goal if I didn't make it clear last night

See this page. The page you're looking at?

I want to change the color of it. but I want to be able to run a command at the command line to do it, right after I run jupyter nbconvert this_file.ipynb. I want it to look like the rest of my site.

(have you literally not seen it? go delete part of the url or whatever, I dunno. It's all interactive yeah?)

I have pretty colors for the other parts of my site at the cost of very meticulously finding the tags in this extenison I use for my browser.

(huge breath).

Here's something I built to learn Scrapy awhile ago.

In [8]:
import scrapy

class BrickSetSpider(scrapy.Spider):
    name = "brickset_spider"
    start_urls = ['http://brickset.com/sets/year-2016']

    def parse(self, response):
        SET_SELECTOR = '.set'
        for brickset in response.css(SET_SELECTOR):

            NAME_SELECTOR = 'h1 a ::text'
            PIECES_SELECTOR = './/dl[dt/text() = "Pieces"]/dd/a/text()'
            MINIFIGS_SELECTOR = '//dl[dt/text() = "Minifigs"]/dd[2]/a/text()'
            IMAGE_SELECTOR = 'img ::attr(src)'
            yield {
                'name':brickset.css(NAME_SELECTOR).extract_first(),
                'pieces': brickset.xpath(PIECES_SELECTOR).extract_first(),
                'minifigs':brickset.xpath(MINIFIGS_SELECTOR).extract_first(),
                'image': brickset.css(IMAGE_SELECTOR).extract_first(),
            }

        NEXT_PAGE_SELECTOR = '.next a ::attr(href)'
        next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
        if next_page:
            yield scrapy.Request(
                response.urljoin(next_page),
                callback=self.parse)

that ol' script right there get's ya some names of lego sets. this was a demo put out by digitalocean a year or a few back.

man when you dig into a tool but you come back to it and you're smarter and you know about programming

it's a little different than first learning it.

I just keep reading documentation. Who knows what the right way to code is but it's crazy what this thing can do.

So I took that above scraper and tried to make my own scraper after I did the last #100DaysOfCode last year or something. Maybe a year ago. Who knows.

That's a whole project I think I have a github repo on for here.


In all fairness, I keep coming back to these posts like days later and push them really quick to github.

I am actually coding quite a bit I just keep forgetting the part where I have to write about it.

These really were supposed to mostly be about things I'm working on anyway.

That code up there is cool. Copy paste it and go run it if you're feeling confident. If you're not try it anyway.

What good stuff is happening is that I'm digging through old github projects.

That feels cool.

Also this is Saturday's post. I think.