DataScraperX 1000: Simplified Data Extraction

DataScraperX 1000: Appwrite Hashnode Hackathon
π€ Team Details
- Anuj Singh - ani chan
Description:
DataScraperX 1000 stands out as a powerful and versatile data extraction tool, specifically tailored for image retrieval. One of its key strengths lies in its simplified user interface, enabling users to navigate through the extraction process with ease (This user-friendly design is achieved by my lack of design and frontend creativity.)

Here's how the process typically works with DataScraperX 1000:
Topic Entry: Users start by entering the specific topic or keyword they want to retrieve images for. This could be anything from "beach landscapes" to "modern architecture" or "hot dog."
Source Selection: DataScraperX 1000 provides a list of available sources from which images can be extracted. These sources may include popular search engines, social media platforms, image-hosting websites, or specific image databases. Users can select the desired sources based on their preferences and requirements.
Initiation of Extraction: After that, users can initiate the image extraction process. DataScraperX 1000 then leverages its (advanced search algorithms) aka different sources API and web scraping to crawl the selected sources, searching for images relevant to the specified topic...
Export and Usage: Once the desired images are identified, users can export them for further use by downloading them.


The motivation behind doing this project was my interest in trying out machine learning for image classification. As I scoured through several online resources, I couldn't help but notice the significance of data in ML image classification was consistently emphasized. So I developed this project to simplify the process of obtaining relevant images.
π§° Tech Stack
Technologies:
Appwrite | Backend taking care of functions, Databases, and File Storage ποΈ
Next.js | Rendering framework for React ecosystem and routing π€
TypeScript | Programming language to keep website strongly typed
Python | Programming language for its simplicity, readability, and ecosystem
Chakra UI | UI component library that gives you the building blocks for nice looks
Auth UI | An authentication wrapper for the Appwrite
Production:
Appwrite Cloud | Hosted solution of Appwrite
Vercel | Website hosting provider
π’ Challenges I Faced
For now, at least for me auth ui works only on Chrome browser.
Adding images to bucket parallely.
Relative module import not working properly
Subreddits went down on 12 June because of API changes and I started this project week before that without knowing the details.
In the cloud error logs not showing off a failed function.
Download files parallel file from the bucket to the client.
In my code, I implemented parallel execution of different functions to reduce processing time. However, I was uncertain about how the Python interpreter handles concurrent executions and the order in which the functions would run.
Not able to use middleware in nextjs for protected routes just using context for now.
Get the states(isLoading, error, onSuccess, etc...) for client-side appwrite CRUD operations.
π Public Code Repo
GitHub: AnujSsStw/data_scrap
β‘ Demo Link
Live Demo: βοΈ DataScraperX 1000 πΆβπ«οΈ
#appwrite #AppwriteHackathon
Note-> The generation step is going to take time if the topic is new.
