Scraping Tumblr Photos Under 50 Lines of Python Code

Python Zac 2 years ago (2018-02-11) 1811 views 0 comments

Description:

I will show you how can I crawl Tumblr photos and save to local storage under 50 lines of Python code.

I hope this article is useful for someone who is new to Python Web Crawling. You can know how to use modules bs4(Beautifulsoup) and requests to develop a Web Crawler.

If you like my articles, you can click the’Like’button to support me. Thank you!

Crawling Target: http://cosplay-girls.tumblr.com/

 

Result

Scraping results

Scraping results

 

Preparatory work

  1. Python IDE. (I recommend Pycharm)
  2. Please make sure you have already install Python which version is 3.x
  3. Python modules: bs4(BeautifulSoup), requests.

 

 

Target Analysis

1.Click the keyboard button “F12” to see the page details.

2. You can notice that all the photos which we need to save are inside the tag<div>. The class name is “post photo reblog” or “post photo set reblog”.   All single photos are inside the “post photo reblog” and the set photos are all in the tagpost photo set reblog” .

target analysis

 

3. Scroll to the bottom.

scraping target analysis

 

 

4. Click the “Next Page” and see the URL changes.

target URL scraping

target URL scraping

We can notice that all the pages which URL are similar. Only the last number changes. So we can use “for loop” to change the last number to traverse all URL in this Tumblr blog.

 

 

Steps of crawling

The First Part:

  • 1. Request home page.
  • 2. In“post photo reblog” and “post photo set reblog” , find all single photos and set photos URL in tag “img” and “iframe” .

 

The Second Part:

  • 3. Find all single photos and set photos URL then save it. (‘set photos’ part is special)

 

The Third Part:

  • 4. Create a “for loop” to traverse the blog which all URL.
  • 5. Repeat step 2 to step 3.

 

 

The First Part:  Request home page and find all photos tag.

photo URL analysis

IMAGE.01

 

photo URL analysis

IMAGE.02

 

 

 

The Second Part: Find all single photos and set photos then save it.

Single photo part:

 

Set photos part:

scraping photoIMAGE.03

 

 

 

The Third Part: Create a ‘for loop’ in order to crawl the remaining page.

 

 

The whole code

 

 

Run program to see the result.

scraping results

Is not bad, right?

 

 

Say something…

Thank you for viewing this article. If you have any questions about this article, you can comment it in the bottom.

I feel sorry that my English is not well and I hope you can understand what I wrote.

If you like this article, you can click the “Like” button to support me. I will appreciate it, thank you!

 

Like (9)
Comment
Cancel comment

emoji Image Bold Strilethrough Center Italic

Your email and address

  • Name (Required)
  • Email (Required)
  • Website(Optional)