• 0 Posts
  • 57 Comments
Joined 1 year ago
cake
Cake day: June 18th, 2023

help-circle
  • Depending on what you want to scape, that’s a lot of overkill and overcomplication. Full website testing frameworks may not be necessary to scrape. Python with it’s tooling and package management may not be necessary.

    I’ve recently extracted and downloaded stuff via Nushell.

    1. Requirement: Knowledge of CSS Selectors
    2. Inspect Website DOM in Webbrowser web developer tools
      1. Identify structure
      2. Identify adequate selectors; testable via browser dev tools console document.querySelectorAll()
    3. Get and query data

    For me, my command line terminal and scripting language of choice is Nushell:

    let $html = http get 'https://example.org/'
    let $meta = $html | query web --query '#infobox .title, #infobox .tags' |  | { title: $in.0.0 tags: $in.1.0 }
    let $content = $html | query web --query 'main img' --attribute data-src
    $meta | save meta.json
    

    or

    1..30 | each {|x| http get $'https://example.org/img/($x).jpg' | save $'($x).jpg'; sleep 100ms }
    

    Depending on the tools you use, it’ll be quite similar or very different.

    Selenium is an entire web-browser driver meaning it does a lot more and has a more extensive interface because of it; and you can talk to it through different interfaces and languages.



















  • Streaming can provide decent quality, but not high quality. That’s simply too costly on scale.

    Bit rate alone doesn’t necessarily tell you quality either.

    I suggest you look for downloads and look for

    1. Release Groups that match your intentions (once you found favorites you may want to stick to them)
    2. Screenshots on releases/info pages
    3. Encoding information

    To assess encoding information, you look at file type, video codec, and encoding bit-ness.

    From high to low compatibility, and low to high compression ratio:

    1. mp4 file, AVC/x264/h.264
    2. mkv file, HEVC/x265/h.265
    3. mkv file, HEVC, 10-bit
    4. mkv file, AV1 [10-bit]

    You can consider the triplets of the codec to be different names for the same thing.

    You’ll be able to play all file and codec types on a PC, but not necessarily on other devices. If you’re streaming from PC to something else, that’s fine too.


    I’m usually looking for 10-bit HEVC releases because of their vastly superior size for quality. If that’s not available, HEVC or AVC. In most cases, it doesn’t matter too much to me.

    A video with a lot of movement or visual detail will have bigger sizes.


    If you compare an AVC release and bitrate with a HEVC 10-bit release and bitrate, they are vastly different. You can get the same quality for a fraction of file size and bitrate. More bitrate is often a waste of bandwidth and storage space.