Usually, when you start developing a scraper to scrape loads of records, your first step is usually to go to the page where all listings are available. You go to the page by page, fetch individual URLs, store in DB or in a file and then start parsing. Nothing wrong with it. The only issue is the wastage of resources. Say there are 100 records in a certain category. Each page has 10 records. Ideally, you will write a scraper that will go page by page and fetch all links. Then you will switch to the next category and repeat…
This post is part of the FastAPI series.
This is another post related to FastAPI(indirectly) in which I am going to discuss how to use GraphQL based APIs to access and manipulate data. I already have discussed how you can make Rest API in the FastAPI framework in the previous post. We will be learning some basics of GraphQL and how graphene helps us to use Python for writing a GraphQL based application. So, let’s get started!
From the official website:
GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL…
This post is part of the FastAPI series.
In the first post, I introduced you to FastAPI and how you can create high-performance Python-based applications in it. In this post, we are going to work on Rest APIs that interact with a MySQL DB. We will also be looking at how we can organize routers and models in multiple files to make them maintainable and easier to read.
In this post, I am going to introduce FastAPI: A Python-based framework to create Rest APIs. I will briefly introduce you to some basic features of this framework and then we will create a simple set of APIs for a contact management system. Knowledge of Python is very necessary to use this framework.
Before we discuss the FastAPI framework, let’s talk a bit about REST itself.
Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called…
In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from eBay. eBay is an online auction site where people put their listing up for selling stuff based on an auction.
Like before, we will be writing the two scripts, one to fetch listing URs and store in a text file and the other to parse those links. The data will be stored in JSON format for further processing.
I will be using Scraper API service for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since it takes…
In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from Amazon. I do not need to tell you what is Amazon. You are here because you already know about it 🙂
So, we are going to write two different scripts: one would be
fetch.py that would be fetching URLs of individual listings and save in a text file. Later another script,
parse.py that will have a function taking an individual listing URL, scrape data, and save in JSON format.
I will be using Scraper API service for parsing purposes which makes me free…
In this post, I am going to talk about Apache Avro, an open-source data serialization system that is being used by tools like Spark, Kafka, and others for big data processing.
According to Wikipedia:
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the…
In this post, I am going to talk about Django Rest Framework or DRF. DRF is used to create RESTful APIs in Django which later could be consumed by various apps; mobile, web, desktop, etc. We will be discussing how to install DRF on your machine and then will be writing our APIs for a system.
Before we discuss DRF, let’s talk a bit about REST itself.
Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style…
I have been covering web scraping for a long time on this blog for a long time but they were mostly in Python; be it
requests, Selenium or Scrapy framework, all were based on Python language but scraping is not limited to a specific language. Any language that provides APIs or libraries for an Http client and HTML parser is able to provide you web scraping facility. Go also provides you the ability to write web scrapers. Go is a compiled and static type language and could be very beneficial to write efficient and quick and scaleable web scrapers. …
Today I present you another library I made in Go language, called, Fehrist
From the Github README:
Fehrist is a pure Go library for indexing different types of documents. Currently, it supports only CSV and JSON but flexible architecture gives you the liberty to add more documents. Fehrist(فہرست) is an Urdu word for Index. Similar terminologies used in Arabic(فھرس) and Farsi(فہرست) as well.
Fehrist is based on an Inverted Index data structure for indexing purposes.