In [29]:
import requests as r
from bs4 import BeautifulSoup as bs

In [36]:
res = r.get('https://mctopherganesh.com')
a = bs(res.text)

a.find_all('a')

Out[36]:
[<a class="blurb-links" href="https://github.com/mctopherganesh/email_addy_factory#email-factory" target="_blank">creating email addresses</a>,
<a class="blurb-links" href="https://github.com/mctopherganesh/bs-project-stage#blood-sugar-tracking-project" target="_blank">data collection and reporting bot</a>,
<a class="blurb-links" href="./new_blog.html" target="_blank">newly unfinished blog roll</a>,
<a href="https://github.com/mctopherganesh" target="_blank"><img src="./img/Selection_797.png"/></a>,
<a href="mailto:mctopherganesh@gmail.com" target="_blank"><img class="gmail-picture" src="./img/gmail.png"/></a>]
In [39]:
a.find_all('a')[0].get('href')

Out[39]:
'https://github.com/mctopherganesh/email_addy_factory#email-factory'
In [47]:
known_list = [link.get('href') for link in a.find_all('a')]

In [48]:
for i in range(3):
print(r.get(known_list[i]))

<Response [200]>
<Response [200]>
<Response [200]>

In [50]:
res.headers

Out[50]:
{'Connection': 'keep-alive', 'Content-Length': '2188', 'Server': 'GitHub.com', 'Content-Type': 'text/html; charset=utf-8', 'Last-Modified': 'Sun, 24 Oct 2021 05:15:31 GMT', 'Access-Control-Allow-Origin': '*', 'ETag': 'W/"6174ebf3-15be"', 'expires': 'Sun, 24 Oct 2021 05:28:26 GMT', 'Cache-Control': 'max-age=600', 'Content-Encoding': 'gzip', 'x-proxy-cache': 'MISS', 'X-GitHub-Request-Id': '8E8E:6B21:6DA137:1C25C8F:6174ECA2', 'Accept-Ranges': 'bytes', 'Date': 'Sun, 24 Oct 2021 05:19:33 GMT', 'Via': '1.1 varnish', 'Age': '0', 'X-Served-By': 'cache-iad-kiad7000076-IAD', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1635052774.828719,VS0,VE8', 'Vary': 'Accept-Encoding', 'X-Fastly-Request-ID': 'ea97477bd8d741434fbcdf0b062e92d7f788e3fa'}

This is something that I was thinking about for work but definitely something I need to do for my own website. As you can see here.

All that jargon up top is a list of links from my website. I wanted to see if they all worked and going through and clicking on them is far too much work if you ask me. I just want to make sure they go somewhere.

Also adding those janky looking links into this is is another story but lets continue.

I've been itching to program lately and this is something that I've just been putting off for too long. This project is far from over but it's been started.

What's happening up there and what's with 200 appearing everywhere? 200 is a HTTP status code. What's HTTP?

why do you have so many questions

HTTP is a "transfer protocol" or, a way that your computer sends and receives information from the internet. another that you might be familiar with is a 404. Maybe you're not familiar with them at all and concerned about your internet just working.

Then you, my friend, like your 200s.

And as a website cobbler and QA enthusiast, I like my 200s as well. That means everything is working, or, that links take some user who clicks on that link somewhere.

If you're still here and you're wondering what res.headers means - headers are part of the information response (res) being sent back by the connection to the site. It's basically all jargon granted, header's are important for things like letting website use your phone microphone and camera. This is why you are prompted on your phone about letting new websites use your camera/audio/microphone.

Why is the code useful for this so far? Because otherwise, you have to read those headers, in a browser with developer tools (press F12 and go crazy) open and here we can start to build something that will just read something for me, and tell me what's there.

Alright that's it. Thanks.

In [ ]: