this post was submitted on 18 Nov 2024
49 points (94.5% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54698 readers
455 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS
 

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can't seem to make it click.

For context I am trying to scrape books myself that I can't seem to find elsewhere so I can use and post them for others.

The scraper tutorial

Hackernoon tutorial by Ethan Jarell

I initially tried to follow this but I kept having a "couldn't find module" error. Since I have never touched python prior to this, I am unaware how to fix this and the help links are not exactly helpful. If there's someone who could guide me through this tutorial that would be great.

Selenium

Selenium Homepage

I don't really get what this is but I think its some sort of python pack and it tells me to download using the pip command but that doesn't seem to work (syntax error). I don't know how to manually add it in because, again, I have little idea of what I'm doing.

Scrapy

Scrapy Homepage

This one seemed like it'd be an out-of-box deal but not only does it need the pip command to download but it has like 5 other dependencies it needs to function which complicates it more for me.

I am not criticizing these wares, I am just asking for help and if someone could help with the simplification of it all or maybe even point me to an easier method that would be amazing!


Updates

  • Figured out that I am supposed to run the command for pip in the command prompt thing on my computer, not the python runner. py -m followed by the pip request

  • Got the Ethan Jarrell tutorial to work and managed to add in selenium, which made me realize that selenium isn't really helpful with the project. rip xP

  • Spent a bunch of time trying to workshop the basic scraper to work with dynamic sites, unsuccessful

  • Online self-help doesn't go in as much as I would like, probably due to the legal grey area


you are viewing a single comment's thread
view the rest of the comments
[–] SpaceBishop@lemmy.zip 3 points 3 days ago* (last edited 3 days ago) (1 children)

I am no expert, but I have used Python in a professional environment, and helped on board a Python newbie to build out his first project.

It would be helpful to know what your environment looks like (what OS you are running, Python version, terminal interface -- are you running cmd, powershell, terminal) and which steps prompts the reported error messages.

Starting from the first time running Python using a Windows computer, the first steps should be

Launch Powershell as admin and type in the following commands:

set-executionpolicy remotesigned

winget install python

mkdir python

cd python

python -m venv scraper

.\scraper\Scripts\activate

Following that you should be able to use pip to install more modules or packages. I have Visual Studio Code as my IDE, and that means from there I can also run code to open the text editor to write whatever code I intend to run. Be sure to save it to C:\Users\youruseraccount\python If your scripts are saved to that folder, you can run them from powershell by just typing in their filename. Any time you run scripts, open powershell and type cd python and then .\scraper\Scripts\activate Hit enter, then type in the name of the script you want to run.

This information dump is not the most detailed, but it should get you to the point that you can run your scripts.

I am having to frankinscript because resources don't really give out the code for my needs. I am using command prompt from win powershell and testing with python IDLE