1
0
Fork 0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2025-03-09 12:50:23 -05:00

Merge remote-tracking branch 'origin/master' into pr/10649

This commit is contained in:
Simon Sawicki 2025-02-20 19:59:50 +01:00
commit c387c4815b
No known key found for this signature in database
95 changed files with 3380 additions and 2464 deletions

View file

@ -2,13 +2,11 @@ name: Broken site support
description: Report issue with yt-dlp on a supported site
labels: [triage, site-bug]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -24,9 +22,7 @@ body:
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input
@ -47,6 +43,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -78,11 +76,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -2,13 +2,11 @@ name: Site support request
description: Request support for a new site
labels: [triage, site-request]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -24,9 +22,7 @@ body:
required: true
- label: I've checked that none of provided URLs [violate any copyrights](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy) or contain any [DRM](https://en.wikipedia.org/wiki/Digital_rights_management) to the best of my knowledge
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and am willing to share it if required
- type: input
@ -59,6 +55,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -90,11 +88,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -1,14 +1,12 @@
name: Site feature request
description: Request a new functionality for a supported site
description: Request new functionality for a site supported by yt-dlp
labels: [triage, site-enhancement]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -22,9 +20,7 @@ body:
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input
@ -55,6 +51,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -86,11 +84,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -2,13 +2,11 @@ name: Core bug report
description: Report a bug unrelated to any particular site or extractor
labels: [triage, bug]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -20,13 +18,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description
@ -40,6 +32,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -71,11 +65,3 @@ body:
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -1,14 +1,12 @@
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
description: Request a new feature unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: checkboxes
id: checklist
attributes:
@ -22,9 +20,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description
@ -38,6 +34,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
- label: "If using API, add `'verbose': True` to `YoutubeDL` params instead"
@ -65,11 +63,3 @@ body:
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
<more lines>
render: shell
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -1,14 +1,12 @@
name: Ask question
description: Ask yt-dlp related question
description: Ask a question about using yt-dlp
labels: [question]
body:
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
- type: markdown
attributes:
value: |
@ -28,9 +26,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%3Aissue%20-label%3Aspam%20%20) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: question
@ -44,6 +40,8 @@ body:
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
- label: "If using API, add `'verbose': True` to `YoutubeDL` params instead"
@ -71,11 +69,3 @@ body:
[youtube] Extracting URL: https://www.youtube.com/watch?v=BaW_jenozKc
<more lines>
render: shell
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.

View file

@ -1,8 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Get help from the community on Discord
- name: Get help on Discord
url: https://discord.gg/H5MNcFW63r
about: Join the yt-dlp Discord for community-powered support!
- name: Matrix Bridge to the Discord server
url: https://matrix.to/#/#yt-dlp:matrix.org
about: For those who do not want to use Discord
about: Join the yt-dlp Discord server for support and discussion

View file

@ -18,9 +18,7 @@ body:
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input

View file

@ -18,9 +18,7 @@ body:
required: true
- label: I've checked that none of provided URLs [violate any copyrights](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy) or contain any [DRM](https://en.wikipedia.org/wiki/Digital_rights_management) to the best of my knowledge
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and am willing to share it if required
- type: input

View file

@ -1,5 +1,5 @@
name: Site feature request
description: Request a new functionality for a supported site
description: Request new functionality for a site supported by yt-dlp
labels: [triage, site-enhancement]
body:
%(no_skip)s
@ -16,9 +16,7 @@ body:
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- label: I've read about [sharing account credentials](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#are-you-willing-to-share-account-details-if-needed) and I'm willing to share it if required
- type: input

View file

@ -14,13 +14,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
- label: I've checked that all URLs and arguments with special characters are [properly quoted or escaped](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#video-url-contains-an-ampersand--and-im-getting-some-strange-output-1-2839-or-v-is-not-recognized-as-an-internal-or-external-command)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description

View file

@ -1,5 +1,5 @@
name: Feature request
description: Request a new functionality unrelated to any particular site or extractor
description: Request a new feature unrelated to any particular site or extractor
labels: [triage, enhancement]
body:
%(no_skip)s
@ -16,9 +16,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar requests **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: description

View file

@ -1,5 +1,5 @@
name: Ask question
description: Ask yt-dlp related question
description: Ask a question about using yt-dlp
labels: [question]
body:
%(no_skip)s
@ -22,9 +22,7 @@ body:
required: true
- label: I've verified that I have **updated yt-dlp to nightly or master** ([update instructions](https://github.com/yt-dlp/yt-dlp#update-channels))
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766), [the FAQ](https://github.com/yt-dlp/yt-dlp/wiki/FAQ), and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=is%%3Aissue%%20-label%%3Aspam%%20%%20) for similar questions **including closed ones**. DO NOT post duplicates
required: true
- type: textarea
id: question

View file

@ -1,14 +1,17 @@
**IMPORTANT**: PRs without the template will be CLOSED
<!--
**IMPORTANT**: PRs without the template will be CLOSED
Due to the high volume of pull requests, it may be a while before your PR is reviewed.
Please try to keep your pull request focused on a single bugfix or new feature.
Pull requests with a vast scope and/or very large diff will take much longer to review.
It is recommended for new contributors to stick to smaller pull requests, so you can receive much more immediate feedback as you familiarize yourself with the codebase.
PLEASE AVOID FORCE-PUSHING after opening a PR, as it makes reviewing more difficult.
-->
### Description of your *pull request* and other information
<!--
Explanation of your *pull request* in arbitrary form goes here. Please **make sure the description explains the purpose and effect** of your *pull request* and is worded well enough to be understood. Provide as much **context and examples** as possible
-->
ADD DESCRIPTION HERE
ADD DETAILED DESCRIPTION HERE
Fixes #
@ -16,24 +19,22 @@ ### Description of your *pull request* and other information
<details open><summary>Template</summary> <!-- OPEN is intentional -->
<!--
# PLEASE FOLLOW THE GUIDE BELOW
# PLEASE FOLLOW THE GUIDE BELOW
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes `[ ]` relevant to your *pull request* (like [x])
- Use *Preview* tab to see how your *pull request* will actually look like
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes `[ ]` relevant to your *pull request* (like [x])
- Use *Preview* tab to see what your *pull request* will actually look like
-->
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [contributing guidelines](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#developer-instructions) including [yt-dlp coding conventions](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#yt-dlp-coding-conventions)
- [ ] [Searched](https://github.com/yt-dlp/yt-dlp/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check all of the following options that apply:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check those that apply and remove the others:
- [ ] I am the original author of the code in this PR, and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of the code in this PR, but it is in the public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
### What is the purpose of your *pull request*? Check those that apply and remove the others:
- [ ] Fix or improvement to an extractor (Make sure to add/update tests)
- [ ] New extractor ([Piracy websites will not be accepted](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#is-the-website-primarily-used-for-piracy))
- [ ] Core bug fix/improvement

View file

@ -33,7 +33,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@ -47,7 +47,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3
# Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
@ -60,6 +60,6 @@ jobs:
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{matrix.language}}"

1
.gitignore vendored
View file

@ -92,6 +92,7 @@ updates_key.pem
*.class
*.isorted
*.stackdump
uv.lock
# Generated
AUTHORS

View file

@ -710,3 +710,35 @@ subrat-lima
gitninja1234
jkruse
xiaomac
wesson09
Crypto90
MutantPiggieGolem1
Sanceilaks
Strkmn
0x9fff00
4ft35t
7x11x13
b5i
cotko
d3d9
Dioarya
finch71
hexahigh
InvalidUsernameException
jixunmoe
knackku
krandor
kvk-2015
lonble
msm595
n10dollar
NecroRomnt
pjrobertson
subsense
test20140
arantius
entourage8
lfavole
mp3butcher
slipinthedove
YoshiTabletopGamer

View file

@ -4,6 +4,170 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2025.02.19
#### Core changes
- **jsinterp**
- [Add `js_number_to_string`](https://github.com/yt-dlp/yt-dlp/commit/0d9f061d38c3a4da61972e2adad317079f2f1c84) ([#12110](https://github.com/yt-dlp/yt-dlp/issues/12110)) by [Grub4K](https://github.com/Grub4K)
- [Improve zeroise](https://github.com/yt-dlp/yt-dlp/commit/4ca8c44a073d5aa3a3e3112c35b2b23d6ce25ac6) ([#12313](https://github.com/yt-dlp/yt-dlp/issues/12313)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **acast**: [Support shows.acast.com URLs](https://github.com/yt-dlp/yt-dlp/commit/57c717fee4bfbc9309845bbb48901b72e4b69304) ([#12223](https://github.com/yt-dlp/yt-dlp/issues/12223)) by [barsnick](https://github.com/barsnick)
- **cwtv**
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/18a28514e306e822eab4f3a79c76d515bf076406) ([#12207](https://github.com/yt-dlp/yt-dlp/issues/12207)) by [arantius](https://github.com/arantius)
- movie: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/03c3d705778c07739e0034b51490877cffdc0983) ([#12227](https://github.com/yt-dlp/yt-dlp/issues/12227)) by [bashonly](https://github.com/bashonly)
- **digiview**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/f53553087d3fde9dcd61d6e9f98caf09db1d8ef2) ([#9902](https://github.com/yt-dlp/yt-dlp/issues/9902)) by [lfavole](https://github.com/lfavole)
- **dropbox**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/861aeec449c8f3c062d962945b234ff0341f61f3) ([#12228](https://github.com/yt-dlp/yt-dlp/issues/12228)) by [bashonly](https://github.com/bashonly)
- **francetv**
- site
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/817483ccc68aed6049ed9c4a2ffae44ca82d2b1c) ([#12236](https://github.com/yt-dlp/yt-dlp/issues/12236)) by [bashonly](https://github.com/bashonly)
- [Fix livestream extraction](https://github.com/yt-dlp/yt-dlp/commit/1295bbedd45fa8d9bc3f7a194864ae280297848e) ([#12316](https://github.com/yt-dlp/yt-dlp/issues/12316)) by [bashonly](https://github.com/bashonly)
- **francetvinfo.fr**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/5c4c2ddfaa47988b4d50c1ad4988badc0b4f30c2) ([#12402](https://github.com/yt-dlp/yt-dlp/issues/12402)) by [bashonly](https://github.com/bashonly)
- **gem.cbc.ca**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/5271ef48c6f61c145e03e18e960995d2e651d205) ([#12404](https://github.com/yt-dlp/yt-dlp/issues/12404)) by [bashonly](https://github.com/bashonly), [dirkf](https://github.com/dirkf)
- **generic**: [Extract `live_status` for DASH manifest URLs](https://github.com/yt-dlp/yt-dlp/commit/19edaa44fcd375f54e63d6227b092f5252d3e889) ([#12256](https://github.com/yt-dlp/yt-dlp/issues/12256)) by [mp3butcher](https://github.com/mp3butcher)
- **globo**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f8d0161455f00add65585ca1a476a7b5d56f5f96) ([#11795](https://github.com/yt-dlp/yt-dlp/issues/11795)) by [slipinthedove](https://github.com/slipinthedove), [YoshiTabletopGamer](https://github.com/YoshiTabletopGamer)
- **goplay**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/d59f14a0a7a8b55e6bf468237def62b73ab4a517) ([#12237](https://github.com/yt-dlp/yt-dlp/issues/12237)) by [alard](https://github.com/alard)
- **pbs**: [Support www.thirteen.org URLs](https://github.com/yt-dlp/yt-dlp/commit/9fb8ab2ff67fb699f60cce09163a580976e90c0e) ([#11191](https://github.com/yt-dlp/yt-dlp/issues/11191)) by [rohieb](https://github.com/rohieb)
- **reddit**: [Bypass gated subreddit warning](https://github.com/yt-dlp/yt-dlp/commit/6ca23ffaa4663cb552f937f0b1e9769b66db11bd) ([#12335](https://github.com/yt-dlp/yt-dlp/issues/12335)) by [bashonly](https://github.com/bashonly)
- **twitter**: [Fix syndication token generation](https://github.com/yt-dlp/yt-dlp/commit/14cd7f3443c6da4d49edaefcc12da9dee86e243e) ([#12107](https://github.com/yt-dlp/yt-dlp/issues/12107)) by [Grub4K](https://github.com/Grub4K), [pjrobertson](https://github.com/pjrobertson)
- **youtube**
- [Retry on more critical requests](https://github.com/yt-dlp/yt-dlp/commit/d48e612609d012abbea3785be4d26d78a014abb2) ([#12339](https://github.com/yt-dlp/yt-dlp/issues/12339)) by [coletdjnz](https://github.com/coletdjnz)
- [nsig workaround for `tce` player JS](https://github.com/yt-dlp/yt-dlp/commit/ec17fb16e8d69d4e3e10fb73bf3221be8570dfee) ([#12401](https://github.com/yt-dlp/yt-dlp/issues/12401)) by [bashonly](https://github.com/bashonly)
- **zdf**: [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/241ace4f104d50fdf7638f9203927aefcf57a1f7) ([#9565](https://github.com/yt-dlp/yt-dlp/issues/9565)) by [StefanLobbenmeier](https://github.com/StefanLobbenmeier) (With fixes in [e7882b6](https://github.com/yt-dlp/yt-dlp/commit/e7882b682b959e476d8454911655b3e9b14c79b2) by [bashonly](https://github.com/bashonly))
#### Downloader changes
- **hls**
- [Fix `BYTERANGE` logic](https://github.com/yt-dlp/yt-dlp/commit/10b7ff68e98f17655e31952f6e17120b2d7dda96) ([#11972](https://github.com/yt-dlp/yt-dlp/issues/11972)) by [entourage8](https://github.com/entourage8)
- [Support `--write-pages` for m3u8 media playlists](https://github.com/yt-dlp/yt-dlp/commit/be69468752ff598cacee57bb80533deab2367a5d) ([#12333](https://github.com/yt-dlp/yt-dlp/issues/12333)) by [bashonly](https://github.com/bashonly)
- [Support `hls_media_playlist_data` format field](https://github.com/yt-dlp/yt-dlp/commit/c987be0acb6872c6561f28aa28171e803393d851) ([#12322](https://github.com/yt-dlp/yt-dlp/issues/12322)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- [Improve Issue/PR templates](https://github.com/yt-dlp/yt-dlp/commit/517ddf3c3f12560ab93e3d36244dc82db9f97818) ([#11499](https://github.com/yt-dlp/yt-dlp/issues/11499)) by [seproDev](https://github.com/seproDev) (With fixes in [4ecb833](https://github.com/yt-dlp/yt-dlp/commit/4ecb833472c90e078567b561fb7c089f1aa9587b) by [bashonly](https://github.com/bashonly))
- **cleanup**: Miscellaneous: [4985a40](https://github.com/yt-dlp/yt-dlp/commit/4985a4041770eaa0016271809a1fd950dc809a55) by [dirkf](https://github.com/dirkf), [Grub4K](https://github.com/Grub4K), [StefanLobbenmeier](https://github.com/StefanLobbenmeier)
- **docs**: [Add note to `supportedsites.md`](https://github.com/yt-dlp/yt-dlp/commit/01a63629a21781458dcbd38779898e117678f5ff) ([#12382](https://github.com/yt-dlp/yt-dlp/issues/12382)) by [seproDev](https://github.com/seproDev)
- **test**: download: [Validate and sort info dict fields](https://github.com/yt-dlp/yt-dlp/commit/208163447408c78673b08c172beafe5c310fb167) ([#12299](https://github.com/yt-dlp/yt-dlp/issues/12299)) by [bashonly](https://github.com/bashonly), [pzhlkj6612](https://github.com/pzhlkj6612)
### 2025.01.26
#### Core changes
- [Fix float comparison values in format filters](https://github.com/yt-dlp/yt-dlp/commit/f7d071e8aa3bf67ed7e0f881e749ca9ab50b3f8f) ([#11880](https://github.com/yt-dlp/yt-dlp/issues/11880)) by [bashonly](https://github.com/bashonly), [Dioarya](https://github.com/Dioarya)
- **utils**: `sanitize_path`: [Fix some incorrect behavior](https://github.com/yt-dlp/yt-dlp/commit/fc12e724a3b4988cfc467d2981887dde48c26b69) ([#11923](https://github.com/yt-dlp/yt-dlp/issues/11923)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **1tv**: [Support sport1tv.ru domain](https://github.com/yt-dlp/yt-dlp/commit/61ae5dc34ac775d6c122575e21ef2153b1273a2b) ([#11889](https://github.com/yt-dlp/yt-dlp/issues/11889)) by [kvk-2015](https://github.com/kvk-2015)
- **abematv**: [Support season extraction](https://github.com/yt-dlp/yt-dlp/commit/c709cc41cbc16edc846e0a431cfa8508396d4cb6) ([#11771](https://github.com/yt-dlp/yt-dlp/issues/11771)) by [middlingphys](https://github.com/middlingphys)
- **bilibili**
- [Support space `/lists/` URLs](https://github.com/yt-dlp/yt-dlp/commit/465167910407449354eb48e9861efd0819f53eb5) ([#11964](https://github.com/yt-dlp/yt-dlp/issues/11964)) by [c-basalt](https://github.com/c-basalt)
- [Support space video list extraction without login](https://github.com/yt-dlp/yt-dlp/commit/78912ed9c81f109169b828c397294a6cf8eacf41) ([#12089](https://github.com/yt-dlp/yt-dlp/issues/12089)) by [grqz](https://github.com/grqz)
- **bilibilidynamic**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/9676b05715b61c8c5dd5598871e60d8807fb1a86) ([#11838](https://github.com/yt-dlp/yt-dlp/issues/11838)) by [finch71](https://github.com/finch71), [grqz](https://github.com/grqz)
- **bluesky**: [Prefer source format](https://github.com/yt-dlp/yt-dlp/commit/ccda63934df7de2823f0834218c4254c7c4d2e4c) ([#12154](https://github.com/yt-dlp/yt-dlp/issues/12154)) by [0x9fff00](https://github.com/0x9fff00)
- **crunchyroll**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/ff44ed53061e065804da6275d182d7928cc03a5e) ([#12195](https://github.com/yt-dlp/yt-dlp/issues/12195)) by [seproDev](https://github.com/seproDev)
- **dropout**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/164368610456e2d96b279f8b120dea08f7b1d74f) ([#12102](https://github.com/yt-dlp/yt-dlp/issues/12102)) by [bashonly](https://github.com/bashonly)
- **eggs**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/20c765d02385a105c8ef13b6f7a737491d29c19a) ([#11904](https://github.com/yt-dlp/yt-dlp/issues/11904)) by [seproDev](https://github.com/seproDev), [subsense](https://github.com/subsense)
- **funimation**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/cdcf1e86726b8fa44f7e7126bbf1c18e1798d25c) ([#12167](https://github.com/yt-dlp/yt-dlp/issues/12167)) by [doe1080](https://github.com/doe1080)
- **goodgame**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e7cc02b14d8d323f805d14325a9c95593a170d28) ([#12173](https://github.com/yt-dlp/yt-dlp/issues/12173)) by [NecroRomnt](https://github.com/NecroRomnt)
- **lbry**: [Support signed URLs](https://github.com/yt-dlp/yt-dlp/commit/de30f652ffb7623500215f5906844f2ae0d92c7b) ([#12138](https://github.com/yt-dlp/yt-dlp/issues/12138)) by [seproDev](https://github.com/seproDev)
- **naver**: [Fix m3u8 formats extraction](https://github.com/yt-dlp/yt-dlp/commit/b3007c44cdac38187fc6600de76959a7079a44d1) ([#12037](https://github.com/yt-dlp/yt-dlp/issues/12037)) by [kclauhk](https://github.com/kclauhk)
- **nest**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/1ef3ee7500c4ab8c26f7fdc5b0ad1da4d16eec8e) ([#11747](https://github.com/yt-dlp/yt-dlp/issues/11747)) by [pabs3](https://github.com/pabs3), [seproDev](https://github.com/seproDev)
- **niconico**: series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bc88b904cd02314da41ce1b2fdf046d0680fe965) ([#11822](https://github.com/yt-dlp/yt-dlp/issues/11822)) by [test20140](https://github.com/test20140)
- **nrk**
- [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/89198bb23b4d03e0473ac408bfb50d67c2f71165) ([#12069](https://github.com/yt-dlp/yt-dlp/issues/12069)) by [hexahigh](https://github.com/hexahigh)
- [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/45732e2590a1bd0bc9608f5eb68c59341ca84f02) ([#12193](https://github.com/yt-dlp/yt-dlp/issues/12193)) by [hexahigh](https://github.com/hexahigh)
- **patreon**: [Extract attachment filename as `alt_title`](https://github.com/yt-dlp/yt-dlp/commit/e2e73b5c65593ec0a5e685663e6ec0f4aaffc1f1) ([#12000](https://github.com/yt-dlp/yt-dlp/issues/12000)) by [msm595](https://github.com/msm595)
- **pbs**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/13825ab77815ee6e1603abbecbb9f3795057b93c) ([#12024](https://github.com/yt-dlp/yt-dlp/issues/12024)) by [dirkf](https://github.com/dirkf), [krandor](https://github.com/krandor), [n10dollar](https://github.com/n10dollar)
- **piramidetv**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/af2c821d74049b519895288aca23cee81fc4b049) ([#10777](https://github.com/yt-dlp/yt-dlp/issues/10777)) by [HobbyistDev](https://github.com/HobbyistDev), [kclauhk](https://github.com/kclauhk), [seproDev](https://github.com/seproDev)
- **redgifs**: [Support `/ifr/` URLs](https://github.com/yt-dlp/yt-dlp/commit/4850ce91d163579fa615c3c0d44c9bd64682c22b) ([#11805](https://github.com/yt-dlp/yt-dlp/issues/11805)) by [invertico](https://github.com/invertico)
- **rtvslo.si**: show: [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/3fc46086562857d5493cbcff687f76e4e4ed303f) ([#12136](https://github.com/yt-dlp/yt-dlp/issues/12136)) by [cotko](https://github.com/cotko)
- **senategov**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/68221ecc87c6a3f3515757bac2a0f9674a38e3f2) ([#9361](https://github.com/yt-dlp/yt-dlp/issues/9361)) by [Grabien](https://github.com/Grabien), [seproDev](https://github.com/seproDev)
- **soundcloud**
- [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/6d304133ab32bcd1eb78ff1467f1a41dd9b66c33) ([#11945](https://github.com/yt-dlp/yt-dlp/issues/11945)) by [7x11x13](https://github.com/7x11x13)
- user: [Add `/comments` page support](https://github.com/yt-dlp/yt-dlp/commit/7bfb4f72e490310d2681c7f4815218a2ebbc73ee) ([#11999](https://github.com/yt-dlp/yt-dlp/issues/11999)) by [7x11x13](https://github.com/7x11x13)
- **subsplash**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/5d904b077d2f58ae44bdf208d2dcfcc3ff8347f5) ([#11054](https://github.com/yt-dlp/yt-dlp/issues/11054)) by [seproDev](https://github.com/seproDev), [subrat-lima](https://github.com/subrat-lima)
- **theatercomplextownppv**: [Support `live` URLs](https://github.com/yt-dlp/yt-dlp/commit/797d2472a299692e01ad1500e8c3b7bc1daa7fe4) ([#11720](https://github.com/yt-dlp/yt-dlp/issues/11720)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/9ff330948c92f6b2e1d9c928787362ab19cd6c62) ([#12142](https://github.com/yt-dlp/yt-dlp/issues/12142)) by [jixunmoe](https://github.com/jixunmoe)
- **vimp**: Playlist: [Add support for tags](https://github.com/yt-dlp/yt-dlp/commit/d4f5be1735c8feaeb3308666e0b878e9782f529d) ([#11688](https://github.com/yt-dlp/yt-dlp/issues/11688)) by [FestplattenSchnitzel](https://github.com/FestplattenSchnitzel)
- **weibo**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/a567f97b62ae9f6d6f5a9376c361512ab8dceda2) ([#12088](https://github.com/yt-dlp/yt-dlp/issues/12088)) by [4ft35t](https://github.com/4ft35t)
- **xhamster**: [Various improvements](https://github.com/yt-dlp/yt-dlp/commit/3b99a0f0e07f0120ab416f34a8f5ab75d4fdf1d1) ([#11738](https://github.com/yt-dlp/yt-dlp/issues/11738)) by [knackku](https://github.com/knackku)
- **xiaohongshu**: [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/f9f24ae376a9eaca777816479a4a29f6f0ce7681) ([#12147](https://github.com/yt-dlp/yt-dlp/issues/12147)) by [seproDev](https://github.com/seproDev)
- **youtube**
- [Download `tv` client Innertube config](https://github.com/yt-dlp/yt-dlp/commit/326fb1ffaf4e8349f1fe8ba2a81839652e044bff) ([#12168](https://github.com/yt-dlp/yt-dlp/issues/12168)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract `media_type` for livestreams](https://github.com/yt-dlp/yt-dlp/commit/421bc72103d1faed473a451299cd17d6abb433bb) ([#11605](https://github.com/yt-dlp/yt-dlp/issues/11605)) by [nosoop](https://github.com/nosoop)
- [Restore convenience workarounds](https://github.com/yt-dlp/yt-dlp/commit/f0d4b8a5d6354b294bc9631cf15a7160b7bad5de) ([#12181](https://github.com/yt-dlp/yt-dlp/issues/12181)) by [bashonly](https://github.com/bashonly)
- [Update `ios` player client](https://github.com/yt-dlp/yt-dlp/commit/de82acf8769282ce321a86737ecc1d4bef0e82a7) ([#12155](https://github.com/yt-dlp/yt-dlp/issues/12155)) by [b5i](https://github.com/b5i)
- [Use different PO token for GVS and Player](https://github.com/yt-dlp/yt-dlp/commit/6b91d232e316efa406035915532eb126fbaeea38) ([#12090](https://github.com/yt-dlp/yt-dlp/issues/12090)) by [coletdjnz](https://github.com/coletdjnz)
- tab: [Improve shorts title extraction](https://github.com/yt-dlp/yt-dlp/commit/76ac023ff02f06e8c003d104f02a03deeddebdcd) ([#11997](https://github.com/yt-dlp/yt-dlp/issues/11997)) by [bashonly](https://github.com/bashonly), [d3d9](https://github.com/d3d9)
- **zdf**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/bb69f5dab79fb32c4ec0d50e05f7fa26d05d54ba) ([#11041](https://github.com/yt-dlp/yt-dlp/issues/11041)) by [InvalidUsernameException](https://github.com/InvalidUsernameException)
#### Misc. changes
- **cleanup**: Miscellaneous: [3b45319](https://github.com/yt-dlp/yt-dlp/commit/3b4531934465580be22937fecbb6e1a3a9e2334f) by [bashonly](https://github.com/bashonly), [lonble](https://github.com/lonble), [pjrobertson](https://github.com/pjrobertson), [seproDev](https://github.com/seproDev)
### 2025.01.15
#### Extractor changes
- **youtube**: [Do not use `web_creator` as a default client](https://github.com/yt-dlp/yt-dlp/commit/c8541f8b13e743fcfa06667530d13fee8686e22a) ([#12087](https://github.com/yt-dlp/yt-dlp/issues/12087)) by [bashonly](https://github.com/bashonly)
### 2025.01.12
#### Core changes
- [Fix filename sanitization with `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/8346b549150003df988538e54c9d8bc4de568979) ([#11988](https://github.com/yt-dlp/yt-dlp/issues/11988)) by [bashonly](https://github.com/bashonly)
- [Validate retries values are non-negative](https://github.com/yt-dlp/yt-dlp/commit/1f4e1e85a27c5b43e34d7706cfd88ffce1b56a4a) ([#11927](https://github.com/yt-dlp/yt-dlp/issues/11927)) by [Strkmn](https://github.com/Strkmn)
#### Extractor changes
- **drtalks**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1f489f4a45691cac3f9e787d22a3a8a086229ba6) ([#10831](https://github.com/yt-dlp/yt-dlp/issues/10831)) by [pzhlkj6612](https://github.com/pzhlkj6612), [seproDev](https://github.com/seproDev)
- **plvideo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3c14e9191f3035b9a729d1d87bc0381f42de57cf) ([#10657](https://github.com/yt-dlp/yt-dlp/issues/10657)) by [Sanceilaks](https://github.com/Sanceilaks), [seproDev](https://github.com/seproDev)
- **vine**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/e2ef4fece6c9742d1733e3bae408c4787765f78c) ([#11700](https://github.com/yt-dlp/yt-dlp/issues/11700)) by [allendema](https://github.com/allendema)
- **xiaohongshu**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/763ed06ee69f13949397897bd42ff2ec3dc3d384) ([#11806](https://github.com/yt-dlp/yt-dlp/issues/11806)) by [HobbyistDev](https://github.com/HobbyistDev)
- **youtube**
- [Fix DASH formats incorrectly skipped in some situations](https://github.com/yt-dlp/yt-dlp/commit/0b6b7742c2e7f2a1fcb0b54ef3dd484bab404b3f) ([#11910](https://github.com/yt-dlp/yt-dlp/issues/11910)) by [coletdjnz](https://github.com/coletdjnz)
- [Refactor cookie auth](https://github.com/yt-dlp/yt-dlp/commit/75079f4e3f7dce49b61ef01da7adcd9876a0ca3b) ([#11989](https://github.com/yt-dlp/yt-dlp/issues/11989)) by [coletdjnz](https://github.com/coletdjnz)
- [Use `tv` instead of `mweb` client by default](https://github.com/yt-dlp/yt-dlp/commit/712d2abb32f59b2d246be2901255f84f1a4c30b3) ([#12059](https://github.com/yt-dlp/yt-dlp/issues/12059)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **cleanup**: Miscellaneous: [dade5e3](https://github.com/yt-dlp/yt-dlp/commit/dade5e35c89adaad04408bfef766820dbca06ebe) by [grqz](https://github.com/grqz), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
### 2024.12.23
#### Core changes
- [Don't sanitize filename on Unix when `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/6fc85f617a5850307fd5b258477070e6ee177796) ([#9591](https://github.com/yt-dlp/yt-dlp/issues/9591)) by [pukkandan](https://github.com/pukkandan)
- **update**
- [Check 64-bitness when upgrading ARM builds](https://github.com/yt-dlp/yt-dlp/commit/b91c3925c2059970daa801cb131c0c2f4f302e72) ([#11819](https://github.com/yt-dlp/yt-dlp/issues/11819)) by [bashonly](https://github.com/bashonly)
- [Fix endless update loop for `linux_exe` builds](https://github.com/yt-dlp/yt-dlp/commit/3d3ee458c1fe49dd5ebd7651a092119d23eb7000) ([#11827](https://github.com/yt-dlp/yt-dlp/issues/11827)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- **soundcloud**: [Various fixes](https://github.com/yt-dlp/yt-dlp/commit/d298693b1b266d198e8eeecb90ea17c4a031268f) ([#11820](https://github.com/yt-dlp/yt-dlp/issues/11820)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Add age-gate workaround for some embeddable videos](https://github.com/yt-dlp/yt-dlp/commit/09a6c687126f04e243fcb105a828787efddd1030) ([#11821](https://github.com/yt-dlp/yt-dlp/issues/11821)) by [bashonly](https://github.com/bashonly)
- [Fix `uploader_id` extraction](https://github.com/yt-dlp/yt-dlp/commit/1a8851b689763e5173b96f70f8a71df0e4a44b66) ([#11818](https://github.com/yt-dlp/yt-dlp/issues/11818)) by [bashonly](https://github.com/bashonly)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/65cf46cddd873fd229dbb0fc0689bca4c201c6b6) ([#11893](https://github.com/yt-dlp/yt-dlp/issues/11893)) by [bashonly](https://github.com/bashonly)
- [Skip iOS formats that require PO Token](https://github.com/yt-dlp/yt-dlp/commit/9f42e68a74f3f00b0253fe70763abd57cac4237b) ([#11890](https://github.com/yt-dlp/yt-dlp/issues/11890)) by [coletdjnz](https://github.com/coletdjnz)
### 2024.12.13
#### Extractor changes
- **patreon**: campaign: [Support /c/ URLs](https://github.com/yt-dlp/yt-dlp/commit/bc262bcad4d3683ceadf61a7eb87e233e72adef3) ([#11756](https://github.com/yt-dlp/yt-dlp/issues/11756)) by [bashonly](https://github.com/bashonly)
- **soundcloud**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/f4d3e9e6dc25077b79849a31a2f67f93fdc01e62) ([#11777](https://github.com/yt-dlp/yt-dlp/issues/11777)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Fix `release_date` extraction](https://github.com/yt-dlp/yt-dlp/commit/d5e2a379f2adcb28bc48c7d9e90716d7278f89d2) ([#11759](https://github.com/yt-dlp/yt-dlp/issues/11759)) by [MutantPiggieGolem1](https://github.com/MutantPiggieGolem1)
- [Fix signature function extraction for `2f1832d2`](https://github.com/yt-dlp/yt-dlp/commit/5460cd91891bf613a2065e2fc278d9903c37a127) ([#11801](https://github.com/yt-dlp/yt-dlp/issues/11801)) by [bashonly](https://github.com/bashonly)
- [Prioritize original language over auto-dubbed audio](https://github.com/yt-dlp/yt-dlp/commit/dc3c4fddcc653989dae71fc563d82a308fc898cc) ([#11803](https://github.com/yt-dlp/yt-dlp/issues/11803)) by [bashonly](https://github.com/bashonly)
- search_url: [Fix playlist searches](https://github.com/yt-dlp/yt-dlp/commit/f6c73aad5f1a67544bea137ebd9d1e22e0e56567) ([#11782](https://github.com/yt-dlp/yt-dlp/issues/11782)) by [Crypto90](https://github.com/Crypto90)
#### Misc. changes
- **cleanup**: [Make more playlist entries lazy](https://github.com/yt-dlp/yt-dlp/commit/54216696261bc07cacd9a837c501d9e0b7fed09e) ([#11763](https://github.com/yt-dlp/yt-dlp/issues/11763)) by [seproDev](https://github.com/seproDev)
### 2024.12.06
#### Core changes
- **cookies**: [Add `--cookies-from-browser` support for MS Store Firefox](https://github.com/yt-dlp/yt-dlp/commit/354cb4026cf2191e1a130ec2a627b95cabfbc60a) ([#11731](https://github.com/yt-dlp/yt-dlp/issues/11731)) by [wesson09](https://github.com/wesson09)
#### Extractor changes
- **bilibili**: [Fix HD formats extraction](https://github.com/yt-dlp/yt-dlp/commit/fca3eb5f8be08d5fab2e18b45b7281a12e566725) ([#11734](https://github.com/yt-dlp/yt-dlp/issues/11734)) by [grqz](https://github.com/grqz)
- **soundcloud**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/2feb28028ee48f2185d2d95076e62accb09b9e2e) ([#11742](https://github.com/yt-dlp/yt-dlp/issues/11742)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Fix `n` sig extraction for player `3bb1f723`](https://github.com/yt-dlp/yt-dlp/commit/a95ee6d8803fca9157adecf63732ab58bf87fd88) ([#11750](https://github.com/yt-dlp/yt-dlp/issues/11750)) by [bashonly](https://github.com/bashonly) (With fixes in [4bd2655](https://github.com/yt-dlp/yt-dlp/commit/4bd2655398aed450456197a6767639114a24eac2))
- [Fix signature function extraction](https://github.com/yt-dlp/yt-dlp/commit/4c85ccd1366c88cf93982f8350f58eed17355981) ([#11751](https://github.com/yt-dlp/yt-dlp/issues/11751)) by [bashonly](https://github.com/bashonly)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/2e49c789d3eebc39af8910705d65a98bca0e4c4f) ([#11724](https://github.com/yt-dlp/yt-dlp/issues/11724)) by [bashonly](https://github.com/bashonly)
### 2024.12.03
#### Core changes

View file

@ -6,7 +6,6 @@
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=brightgreen&label=Download&style=for-the-badge)](#installation "Installation")
[![PyPI](https://img.shields.io/badge/-PyPI-blue.svg?logo=pypi&labelColor=555555&style=for-the-badge)](https://pypi.org/project/yt-dlp "PyPI")
[![Donate](https://img.shields.io/badge/_-Donate-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](Collaborators.md#collaborators "Donate")
[![Matrix](https://img.shields.io/matrix/yt-dlp:matrix.org?color=brightgreen&labelColor=555555&label=&logo=element&style=for-the-badge)](https://matrix.to/#/#yt-dlp:matrix.org "Matrix")
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&labelColor=555555&label=&logo=discord&style=for-the-badge)](https://discord.gg/H5MNcFW63r "Discord")
[![Supported Sites](https://img.shields.io/badge/-Supported_Sites-brightgreen.svg?style=for-the-badge)](supportedsites.md "Supported Sites")
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE "License")
@ -613,8 +612,7 @@ ## Filesystem Options:
--no-restrict-filenames Allow Unicode characters, "&" and spaces in
filenames (default)
--windows-filenames Force filenames to be Windows-compatible
--no-windows-filenames Make filenames Windows-compatible only if
using Windows (default)
--no-windows-filenames Sanitize filenames only minimally
--trim-filenames LENGTH Limit the filename length (excluding
extension) to the specified number of
characters
@ -1527,7 +1525,7 @@ ## Sorting Formats
- `hasvid`: Gives priority to formats that have a video stream
- `hasaud`: Gives priority to formats that have an audio stream
- `ie_pref`: The format preference
- `lang`: The language preference
- `lang`: The language preference as determined by the extractor (e.g. original language preferred over audio description)
- `quality`: The quality of the format
- `source`: The preference of the source
- `proto`: Protocol used for download (`https`/`ftps` > `http`/`ftp` > `m3u8_native`/`m3u8` > `http_dash_segments`> `websocket_frag` > `mms`/`rtsp` > `f4f`/`f4m`)
@ -1761,7 +1759,7 @@ # Replace all spaces and "_" in title and uploader with a `-`
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "funimation:version=uncut"`
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "twitter:api=syndication"`
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
@ -1770,19 +1768,19 @@ # EXTRACTOR ARGUMENTS
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,mweb` is used, or `web_creator,mweb` is used when authenticating with cookies. The `_music` variants are added for `music.youtube.com` URLs. Some clients, such as `web` and `android`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `all` to use all the clients, and `default` for the default clients. You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=all,-web`
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use for requesting video playback. Comma seperated list of PO Tokens in the format `CLIENT+PO_TOKEN`, e.g. `youtube:po_token=web+XXX,android+YYY`
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
#### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
@ -1796,13 +1794,6 @@ #### generic
* `is_live`: Bypass live HLS detection and manually set `live_status` - a value of `false` will set `not_live`, any other value (or no value) will set `is_live`
* `impersonate`: Target(s) to try and impersonate with the initial webpage request; e.g. `generic:impersonate=safari,chrome-110`. Use `generic:impersonate` to impersonate any available target, and use `generic:impersonate=false` to disable impersonation (default)
#### funimation
* `language`: Audio languages to extract, e.g. `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyrollbeta (Crunchyroll)
* `hardsub`: One or more hardsub versions to extract (in order of preference), or `all` (default: `None` = no hardsubs will be extracted), e.g. `crunchyrollbeta:hardsub=en-US,de-DE`
#### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
@ -1860,7 +1851,7 @@ #### afreecatvlive
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
#### soundcloud
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{extension}` (omitting the bitrate), e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known extensions include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{codec}`, e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known codecs include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
#### orfon (orf:on)
* `prefer_segments_playlist`: Prefer a playlist of program segments instead of a single complete video when available. If individual segments are desired, use `--concat-playlist never --extractor-args "orfon:prefer_segments_playlist"`

View file

@ -239,5 +239,11 @@
"action": "add",
"when": "52c0ffe40ad6e8404d93296f575007b05b04c686",
"short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)"
},
{
"action": "change",
"when": "76ac023ff02f06e8c003d104f02a03deeddebdcd",
"short": "[ie/youtube:tab] Improve shorts title extraction (#11997)",
"authors": ["bashonly", "d3d9"]
}
]

View file

@ -11,11 +11,13 @@
from devscripts.utils import get_filename_args, read_file, write_file
VERBOSE_TMPL = '''
VERBOSE = '''
- type: checkboxes
id: verbose
attributes:
label: Provide verbose output that clearly demonstrates the problem
description: |
This is mandatory unless absolutely impossible to provide. If you are unable to provide the output, please explain why.
options:
- label: Run **your** yt-dlp command with **-vU** flag added (`yt-dlp -vU <your command line>`)
required: true
@ -47,31 +49,23 @@
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
> [!CAUTION]
> ### GitHub is experiencing a high volume of malicious spam comments.
> ### If you receive any replies asking you download a file, do NOT follow the download links!
>
> Note that this issue may be temporarily locked as an anti-spam measure after it is opened.
'''.strip()
NO_SKIP = '''
- type: checkboxes
- type: markdown
attributes:
label: DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
description: Fill all fields even if you think it is irrelevant for the issue
options:
- label: I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\\* field
required: true
value: |
> [!IMPORTANT]
> Not providing the required (*) information or removing the template will result in your issue being closed and ignored.
'''.strip()
def main():
fields = {'no_skip': NO_SKIP}
fields['verbose'] = VERBOSE_TMPL % fields
fields['verbose_optional'] = re.sub(r'(\n\s+validations:)?\n\s+required: true', '', fields['verbose'])
fields = {
'no_skip': NO_SKIP,
'verbose': VERBOSE,
'verbose_optional': re.sub(r'(\n\s+validations:)?\n\s+required: true', '', VERBOSE),
}
infile, outfile = get_filename_args(has_infile=True)
write_file(outfile, read_file(infile) % fields)

View file

@ -10,10 +10,21 @@
from devscripts.utils import get_filename_args, write_file
from yt_dlp.extractor import list_extractor_classes
TEMPLATE = '''\
# Supported sites
Below is a list of all extractors that are currently included with yt-dlp.
If a site is not listed here, it might still be supported by yt-dlp's embed extraction or generic extractor.
Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them.
The only reliable way to check if a site is supported is to try it.
{ie_list}
'''
def main():
out = '\n'.join(ie.description() for ie in list_extractor_classes() if ie.IE_DESC is not False)
write_file(get_filename_args(), f'# Supported sites\n{out}\n')
write_file(get_filename_args(), TEMPLATE.format(ie_list=out))
if __name__ == '__main__':

View file

@ -25,7 +25,8 @@ def parse_args():
def run_tests(*tests, pattern=None, ci=False):
run_core = 'core' in tests or (not pattern and not tests)
# XXX: hatch uses `tests` if no arguments are passed
run_core = 'core' in tests or 'tests' in tests or (not pattern and not tests)
run_download = 'download' in tests
pytest_args = args.pytest_args or os.getenv('HATCH_TEST_ARGS', '')

View file

@ -76,7 +76,7 @@ dev = [
]
static-analysis = [
"autopep8~=2.0",
"ruff~=0.8.0",
"ruff~=0.9.0",
]
test = [
"pytest~=8.1",
@ -195,6 +195,7 @@ ignore = [
"B023", # function-uses-loop-variable (false positives)
"B028", # no-explicit-stacklevel
"B904", # raise-without-from-inside-except
"A005", # stdlib-module-shadowing
"C401", # unnecessary-generator-set
"C402", # unnecessary-generator-dict
"PIE790", # unnecessary-placeholder

View file

@ -1,4 +1,10 @@
# Supported sites
Below is a list of all extractors that are currently included with yt-dlp.
If a site is not listed here, it might still be supported by yt-dlp's embed extraction or generic extractor.
Not all sites listed here are guaranteed to work; websites are constantly changing and sometimes this breaks yt-dlp's support for them.
The only reliable way to check if a site is supported is to try it.
- **17live**
- **17live:clip**
- **1News**: 1news.co.nz article videos
@ -171,6 +177,7 @@ # Supported sites
- **BilibiliCheese**
- **BilibiliCheeseSeason**
- **BilibiliCollectionList**
- **BiliBiliDynamic**
- **BilibiliFavoritesList**
- **BiliBiliPlayer**
- **BilibiliPlaylist**
@ -303,10 +310,6 @@ # Supported sites
- **CrowdBunker**
- **CrowdBunkerChannel**
- **Crtvg**
- **crunchyroll**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:artist**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:music**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:playlist**: [*crunchyroll*](## "netrc machine")
- **CSpan**: C-SPAN
- **CSpanCongress**
- **CtsNews**: 華視新聞
@ -317,7 +320,8 @@ # Supported sites
- **curiositystream**: [*curiositystream*](## "netrc machine")
- **curiositystream:collections**: [*curiositystream*](## "netrc machine")
- **curiositystream:series**: [*curiositystream*](## "netrc machine")
- **CWTV**
- **cwtv**
- **cwtv:movie**
- **Cybrary**: [*cybrary*](## "netrc machine")
- **CybraryCourse**: [*cybrary*](## "netrc machine")
- **DacastPlaylist**
@ -352,6 +356,7 @@ # Supported sites
- **DigitalConcertHall**: [*digitalconcerthall*](## "netrc machine") DigitalConcertHall extractor
- **DigitallySpeaking**
- **Digiteka**
- **Digiview**
- **DiscogsReleasePlaylist**
- **DiscoveryLife**
- **DiscoveryNetworksDe**
@ -374,6 +379,7 @@ # Supported sites
- **Dropbox**
- **Dropout**: [*dropout*](## "netrc machine")
- **DropoutSeason**
- **DrTalks**
- **DrTuber**
- **drtv**
- **drtv:live**
@ -392,6 +398,8 @@ # Supported sites
- **Ebay**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **eggs:artist**
- **eggs:single**
- **EinsUndEinsTV**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVLive**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVRecordings**: [*1und1tv*](## "netrc machine")
@ -465,9 +473,9 @@ # Supported sites
- **fptplay**: fptplay.vn
- **FranceCulture**
- **FranceInter**
- **FranceTV**
- **francetv**
- **francetv:site**
- **francetvinfo.fr**
- **FranceTVSite**
- **Freesound**
- **freespeech.org**
- **freetv:series**
@ -476,9 +484,6 @@ # Supported sites
- **FrontendMastersCourse**: [*frontendmasters*](## "netrc machine")
- **FrontendMastersLesson**: [*frontendmasters*](## "netrc machine")
- **FujiTVFODPlus7**
- **Funimation**: [*funimation*](## "netrc machine")
- **funimation:page**: [*funimation*](## "netrc machine")
- **funimation:show**: [*funimation*](## "netrc machine")
- **Funk**
- **Funker530**
- **Fux**
@ -502,7 +507,7 @@ # Supported sites
- **GediDigital**
- **gem.cbc.ca**: [*cbcgem*](## "netrc machine")
- **gem.cbc.ca:live**
- **gem.cbc.ca:playlist**
- **gem.cbc.ca:playlist**: [*cbcgem*](## "netrc machine")
- **Genius**
- **GeniusLyrics**
- **Germanupa**: germanupa.de
@ -891,6 +896,8 @@ # Supported sites
- **nebula:video**: [*watchnebula*](## "netrc machine")
- **NekoHacker**
- **NerdCubedFeed**
- **Nest**
- **NestClip**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
@ -1070,6 +1077,8 @@ # Supported sites
- **Pinkbike**
- **Pinterest**
- **PinterestCollection**
- **PiramideTV**
- **PiramideTVChannel**
- **pixiv:sketch**
- **pixiv:sketch:user**
- **Pladform**
@ -1086,6 +1095,7 @@ # Supported sites
- **pluralsight**: [*pluralsight*](## "netrc machine")
- **pluralsight:course**
- **PlutoTV**: (**Currently broken**)
- **PlVideo**: Платформа
- **PodbayFM**
- **PodbayFMChannel**
- **Podchaser**
@ -1394,6 +1404,8 @@ # Supported sites
- **StretchInternet**
- **Stripchat**
- **stv:player**
- **Subsplash**
- **subsplash:playlist**
- **Substack**
- **SunPorno**
- **sverigesradio:episode**
@ -1641,8 +1653,6 @@ # Supported sites
- **Vimm:stream**
- **ViMP**
- **ViMP:Playlist**
- **Vine**
- **vine:user**
- **Viously**
- **Viqeo**: (**Currently broken**)
- **Viu**

View file

@ -237,6 +237,20 @@ def sanitize(key, value):
def expect_info_dict(self, got_dict, expected_dict):
ALLOWED_KEYS_SORT_ORDER = (
# NB: Keep in sync with the docstring of extractor/common.py
'id', 'ext', 'direct', 'display_id', 'title', 'alt_title', 'description', 'media_type',
'uploader', 'uploader_id', 'uploader_url', 'channel', 'channel_id', 'channel_url', 'channel_is_verified',
'channel_follower_count', 'comment_count', 'view_count', 'concurrent_view_count',
'like_count', 'dislike_count', 'repost_count', 'average_rating', 'age_limit', 'duration', 'thumbnail', 'heatmap',
'chapters', 'chapter', 'chapter_number', 'chapter_id', 'start_time', 'end_time', 'section_start', 'section_end',
'categories', 'tags', 'cast', 'composers', 'artists', 'album_artists', 'creators', 'genres',
'track', 'track_number', 'track_id', 'album', 'album_type', 'disc_number',
'series', 'series_id', 'season', 'season_number', 'season_id', 'episode', 'episode_number', 'episode_id',
'timestamp', 'upload_date', 'release_timestamp', 'release_date', 'release_year', 'modified_timestamp', 'modified_date',
'playable_in_embed', 'availability', 'live_status', 'location', 'license', '_old_archive_ids',
)
expect_dict(self, got_dict, expected_dict)
# Check for the presence of mandatory fields
if got_dict.get('_type') not in ('playlist', 'multi_video'):
@ -252,7 +266,13 @@ def expect_info_dict(self, got_dict, expected_dict):
test_info_dict = sanitize_got_info_dict(got_dict)
missing_keys = set(test_info_dict.keys()) - set(expected_dict.keys())
# Check for invalid/misspelled field names being returned by the extractor
invalid_keys = sorted(test_info_dict.keys() - ALLOWED_KEYS_SORT_ORDER)
self.assertFalse(invalid_keys, f'Invalid fields returned by the extractor: {", ".join(invalid_keys)}')
missing_keys = sorted(
test_info_dict.keys() - expected_dict.keys(),
key=lambda x: ALLOWED_KEYS_SORT_ORDER.index(x))
if missing_keys:
def _repr(v):
if isinstance(v, str):

View file

@ -486,11 +486,11 @@ def assert_syntax_error(format_spec):
def test_format_filtering(self):
formats = [
{'format_id': 'A', 'filesize': 500, 'width': 1000},
{'format_id': 'B', 'filesize': 1000, 'width': 500},
{'format_id': 'C', 'filesize': 1000, 'width': 400},
{'format_id': 'D', 'filesize': 2000, 'width': 600},
{'format_id': 'E', 'filesize': 3000},
{'format_id': 'A', 'filesize': 500, 'width': 1000, 'aspect_ratio': 1.0},
{'format_id': 'B', 'filesize': 1000, 'width': 500, 'aspect_ratio': 1.33},
{'format_id': 'C', 'filesize': 1000, 'width': 400, 'aspect_ratio': 1.5},
{'format_id': 'D', 'filesize': 2000, 'width': 600, 'aspect_ratio': 1.78},
{'format_id': 'E', 'filesize': 3000, 'aspect_ratio': 0.56},
{'format_id': 'F'},
{'format_id': 'G', 'filesize': 1000000},
]
@ -549,6 +549,31 @@ def test_format_filtering(self):
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts, [])
ydl = YDL({'format': 'best[aspect_ratio=1]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'A')
ydl = YDL({'format': 'all[aspect_ratio > 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['D', 'C', 'B'])
ydl = YDL({'format': 'all[aspect_ratio < 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E'])
ydl = YDL({'format': 'best[aspect_ratio=1.5]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'C')
ydl = YDL({'format': 'all[aspect_ratio!=1]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E', 'D', 'C', 'B'])
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False)
def test_default_format_spec_without_ffmpeg(self):
ydl = YDL({})
@ -761,6 +786,13 @@ def test(tmpl, expected, *, info=None, **params):
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
# Sanitization options
test('%(title3)s', (None, 'foobartest'))
test('%(title5)s', (None, 'aei_A'), restrictfilenames=True)
test('%(title3)s', (None, 'foo_bar_test'), windowsfilenames=False, restrictfilenames=True)
if sys.platform != 'win32':
test('%(title3)s', (None, 'foobar\\test'), windowsfilenames=False)
# ID sanitization
test('%(id)s', '_abcd', info={'id': '_abcd'})
test('%(some_id)s', '_abcd', info={'some_id': '_abcd'})

View file

@ -9,7 +9,7 @@
import math
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter
from yt_dlp.jsinterp import JS_Undefined, JSInterpreter, js_number_to_string
class NaN:
@ -93,6 +93,16 @@ def test_operators(self):
self._test('function f(){return 0 ?? 42;}', 0)
self._test('function f(){return "life, the universe and everything" < 42;}', False)
self._test('function f(){return 0 - 7 * - 6;}', 42)
self._test('function f(){return true << "5";}', 32)
self._test('function f(){return true << true;}', 2)
self._test('function f(){return "19" & "21.9";}', 17)
self._test('function f(){return "19" & false;}', 0)
self._test('function f(){return "11.0" >> "2.1";}', 2)
self._test('function f(){return 5 ^ 9;}', 12)
self._test('function f(){return 0.0 << NaN}', 0)
self._test('function f(){return null << undefined}', 0)
# TODO: Does not work due to number too large
# self._test('function f(){return 21 << 4294967297}', 42)
def test_array_access(self):
self._test('function f(){var x = [1,2,3]; x[0] = 4; x[0] = 5; x[2.0] = 7; return x;}', [5, 2, 7])
@ -431,6 +441,27 @@ def test_slice(self):
self._test('function f(){return "012345678".slice(-1, 1)}', '')
self._test('function f(){return "012345678".slice(-3, -1)}', '67')
def test_js_number_to_string(self):
for test, radix, expected in [
(0, None, '0'),
(-0, None, '0'),
(0.0, None, '0'),
(-0.0, None, '0'),
(math.nan, None, 'NaN'),
(-math.nan, None, 'NaN'),
(math.inf, None, 'Infinity'),
(-math.inf, None, '-Infinity'),
(10 ** 21.5, 8, '526665530627250154000000'),
(6, 2, '110'),
(254, 16, 'fe'),
(-10, 2, '-1010'),
(-0xff, 2, '-11111111'),
(0.1 + 0.2, 16, '0.4cccccccccccd'),
(1234.1234, 10, '1234.1234'),
# (1000000000000000128, 10, '1000000000000000100')
]:
assert js_number_to_string(test, radix) == expected
if __name__ == '__main__':
unittest.main()

View file

@ -249,17 +249,36 @@ def _test_sanitize_path(self):
self.assertEqual(sanitize_path('abc/def...'), 'abc\\def..#')
self.assertEqual(sanitize_path('abc.../def'), 'abc..#\\def')
self.assertEqual(sanitize_path('abc.../def...'), 'abc..#\\def..#')
self.assertEqual(sanitize_path('../abc'), '..\\abc')
self.assertEqual(sanitize_path('../../abc'), '..\\..\\abc')
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
self.assertEqual(sanitize_path('\\abc'), '\\abc')
self.assertEqual(sanitize_path('C:abc'), 'C:abc')
self.assertEqual(sanitize_path('C:abc\\..\\'), 'C:..')
self.assertEqual(sanitize_path('C:\\abc:%(title)s.%(ext)s'), 'C:\\abc#%(title)s.%(ext)s')
# Check with nt._path_normpath if available
try:
import nt
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
nt_path_normpath = None
for test, expected in [
('C:\\', 'C:\\'),
('../abc', '..\\abc'),
('../../abc', '..\\..\\abc'),
('./abc', 'abc'),
('./../abc', '..\\abc'),
('\\abc', '\\abc'),
('C:abc', 'C:abc'),
('C:abc\\..\\', 'C:'),
('C:abc\\..\\def\\..\\..\\', 'C:..'),
('C:\\abc\\xyz///..\\def\\', 'C:\\abc\\def'),
('abc/../', '.'),
('./abc/../', '.'),
]:
result = sanitize_path(test)
assert result == expected, f'{test} was incorrectly resolved'
assert result == sanitize_path(result), f'{test} changed after sanitizing again'
if nt_path_normpath:
assert result == nt_path_normpath(test), f'{test} does not match nt._path_normpath'
def test_sanitize_url(self):
self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar')
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')

View file

@ -68,6 +68,16 @@
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'AOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL2QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'MyOSJXtKI3m-uME_jv7-pT12gOFC02RFkGoqWpzE0Cs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
),
]
_NSIG_TESTS = [
@ -183,6 +193,18 @@
'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js',
'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw',
),
(
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
'gK15nzVyaXE9RsMP3z', 'ZFFWFLPWx9DEgQ',
),
(
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
'YWt1qdbe8SAfkoPHW5d', 'RrRjWQOJmBiP',
),
(
'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js',
'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg',
),
]
@ -254,8 +276,11 @@ def signature(jscode, sig_input):
def n_sig(jscode, sig_input):
funcname = YoutubeIE(FakeYDL())._extract_n_function_name(jscode)
return JSInterpreter(jscode).call_function(funcname, sig_input)
ie = YoutubeIE(FakeYDL())
funcname = ie._extract_n_function_name(jscode)
jsi = JSInterpreter(jscode)
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname)))
return func([sig_input])
make_sig_test = t_factory(

View file

@ -266,7 +266,9 @@ class YoutubeDL:
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
trim_file_name: Limit length of filename (extension excluded)
windowsfilenames: Force the filenames to be windows compatible
windowsfilenames: True: Force filenames to be Windows compatible
False: Sanitize filenames only minimally
This option has no effect when running on Windows
ignoreerrors: Do not stop on download/postprocessing errors.
Can be 'only_download' to ignore only download errors.
Default is 'only_download' for CLI, but False for API
@ -281,7 +283,10 @@ class YoutubeDL:
lazy_playlist: Process playlist entries as they are received.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance.
logger: A class having a `debug`, `warning` and `error` function where
each has a single string parameter, the message to be logged.
For compatibility reasons, both debug and info messages are passed to `debug`.
A debug message will have a prefix of `[debug] ` to discern it from info messages.
logtostderr: Print everything to stderr instead of stdout.
consoletitle: Display progress in the console window's titlebar.
writedescription: Write the video description to a .description file
@ -593,7 +598,7 @@ class YoutubeDL:
# NB: Keep in sync with the docstring of extractor/common.py
'url', 'manifest_url', 'manifest_stream_number', 'ext', 'format', 'format_id', 'format_note',
'width', 'height', 'aspect_ratio', 'resolution', 'dynamic_range', 'tbr', 'abr', 'acodec', 'asr', 'audio_channels',
'vbr', 'fps', 'vcodec', 'container', 'filesize', 'filesize_approx', 'rows', 'columns',
'vbr', 'fps', 'vcodec', 'container', 'filesize', 'filesize_approx', 'rows', 'columns', 'hls_media_playlist_data',
'player_url', 'protocol', 'fragment_base_url', 'fragments', 'is_from_start', 'is_dash_periods', 'request_data',
'preference', 'language', 'language_preference', 'quality', 'source_preference', 'cookies',
'http_headers', 'stretched_ratio', 'no_resume', 'has_drm', 'extra_param_to_segment_url', 'extra_param_to_key_url',
@ -1194,8 +1199,7 @@ def _copy_infodict(info_dict):
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=False):
""" Make the outtmpl and info_dict suitable for substitution: ydl.escape_outtmpl(outtmpl) % info_dict
@param sanitize Whether to sanitize the output as a filename.
For backward compatibility, a function can also be passed
@param sanitize Whether to sanitize the output as a filename
"""
info_dict.setdefault('epoch', int(time.time())) # keep epoch consistent once set
@ -1311,14 +1315,23 @@ def get_value(mdict):
na = self.params.get('outtmpl_na_placeholder', 'NA')
def filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames')):
def filename_sanitizer(key, value, restricted):
return sanitize_filename(str(value), restricted=restricted, is_id=(
bool(re.search(r'(^|[_.])id(\.|$)', key))
if 'filename-sanitization' in self.params['compat_opts']
else NO_DEFAULT))
sanitizer = sanitize if callable(sanitize) else filename_sanitizer
sanitize = bool(sanitize)
if callable(sanitize):
self.deprecation_warning('Passing a callable "sanitize" to YoutubeDL.prepare_outtmpl is deprecated')
elif not sanitize:
pass
elif (sys.platform != 'win32' and not self.params.get('restrictfilenames')
and self.params.get('windowsfilenames') is False):
def sanitize(key, value):
return str(value).replace('/', '\u29F8').replace('\0', '')
else:
def sanitize(key, value):
return filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames'))
def _dumpjson_default(obj):
if isinstance(obj, (set, LazyList)):
@ -1401,13 +1414,13 @@ def create_key(outer_mobj):
if sanitize:
# If value is an object, sanitize might convert it to a string
# So we convert it to repr first
# So we manually convert it before sanitizing
if fmt[-1] == 'r':
value, fmt = repr(value), str_fmt
elif fmt[-1] == 'a':
value, fmt = ascii(value), str_fmt
if fmt[-1] in 'csra':
value = sanitizer(last_field, value)
value = sanitize(last_field, value)
key = '{}\0{}'.format(key.replace('%', '%\0'), outer_mobj.group('format'))
TMPL_DICT[key] = value
@ -2110,7 +2123,7 @@ def _build_format_filter(self, filter_spec):
m = operator_rex.fullmatch(filter_spec)
if m:
try:
comparison_value = int(m.group('value'))
comparison_value = float(m.group('value'))
except ValueError:
comparison_value = parse_filesize(m.group('value'))
if comparison_value is None:

View file

@ -261,9 +261,11 @@ def parse_retries(name, value):
elif value in ('inf', 'infinite'):
return float('inf')
try:
return int(value)
int_value = int(value)
except (TypeError, ValueError):
validate(False, f'{name} retry count', value)
validate_positive(f'{name} retry count', int_value)
return int_value
opts.retries = parse_retries('download', opts.retries)
opts.fragment_retries = parse_retries('fragment', opts.fragment_retries)
@ -293,18 +295,20 @@ def parse_sleep_func(expr):
raise ValueError(f'invalid {key} retry sleep expression {expr!r}')
# Bytes
def validate_bytes(name, value):
def validate_bytes(name, value, strict_positive=False):
if value is None:
return None
numeric_limit = parse_bytes(value)
validate(numeric_limit is not None, 'rate limit', value)
validate(numeric_limit is not None, name, value)
if strict_positive:
validate_positive(name, numeric_limit, True)
return numeric_limit
opts.ratelimit = validate_bytes('rate limit', opts.ratelimit)
opts.ratelimit = validate_bytes('rate limit', opts.ratelimit, True)
opts.throttledratelimit = validate_bytes('throttled rate limit', opts.throttledratelimit)
opts.min_filesize = validate_bytes('min filesize', opts.min_filesize)
opts.max_filesize = validate_bytes('max filesize', opts.max_filesize)
opts.buffersize = validate_bytes('buffer size', opts.buffersize)
opts.buffersize = validate_bytes('buffer size', opts.buffersize, True)
opts.http_chunk_size = validate_bytes('http chunk size', opts.http_chunk_size)
# Output templates

View file

@ -195,7 +195,10 @@ def _extract_firefox_cookies(profile, container, logger):
def _firefox_browser_dirs():
if sys.platform in ('cygwin', 'win32'):
yield os.path.expandvars(R'%APPDATA%\Mozilla\Firefox\Profiles')
yield from map(os.path.expandvars, (
R'%APPDATA%\Mozilla\Firefox\Profiles',
R'%LOCALAPPDATA%\Packages\Mozilla.Firefox_n80bbvh6b1yt2\LocalCache\Roaming\Mozilla\Firefox\Profiles',
))
elif sys.platform == 'darwin':
yield os.path.expanduser('~/Library/Application Support/Firefox/Profiles')

View file

@ -16,6 +16,7 @@
update_url_query,
urljoin,
)
from ..utils._utils import _request_dump_filename
class HlsFD(FragmentFD):
@ -72,11 +73,23 @@ def check_results():
def real_download(self, filename, info_dict):
man_url = info_dict['url']
self.to_screen(f'[{self.FD_NAME}] Downloading m3u8 manifest')
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.url
s = urlh.read().decode('utf-8', 'ignore')
s = info_dict.get('hls_media_playlist_data')
if s:
self.to_screen(f'[{self.FD_NAME}] Using m3u8 manifest from extracted info')
else:
self.to_screen(f'[{self.FD_NAME}] Downloading m3u8 manifest')
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.url
s_bytes = urlh.read()
if self.params.get('write_pages'):
dump_filename = _request_dump_filename(
man_url, info_dict['id'], None,
trim_length=self.params.get('trim_file_name'))
self.to_screen(f'[{self.FD_NAME}] Saving request to {dump_filename}')
with open(dump_filename, 'wb') as outf:
outf.write(s_bytes)
s = s_bytes.decode('utf-8', 'ignore')
can_download, message = self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')), None
if can_download:
@ -177,6 +190,7 @@ def is_ad_fragment_end(s):
if external_aes_iv:
external_aes_iv = binascii.unhexlify(remove_start(external_aes_iv, '0x').zfill(32))
byte_range = {}
byte_range_offset = 0
discontinuity_count = 0
frag_index = 0
ad_frag_next = False
@ -204,6 +218,11 @@ def is_ad_fragment_end(s):
})
media_sequence += 1
# If the byte_range is truthy, reset it after appending a fragment that uses it
if byte_range:
byte_range_offset = byte_range['end']
byte_range = {}
elif line.startswith('#EXT-X-MAP'):
if format_index and discontinuity_count != format_index:
continue
@ -217,10 +236,12 @@ def is_ad_fragment_end(s):
if extra_segment_query:
frag_url = update_url_query(frag_url, extra_segment_query)
map_byte_range = {}
if map_info.get('BYTERANGE'):
splitted_byte_range = map_info.get('BYTERANGE').split('@')
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range['end']
byte_range = {
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else 0
map_byte_range = {
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
@ -229,7 +250,7 @@ def is_ad_fragment_end(s):
'frag_index': frag_index,
'url': frag_url,
'decrypt_info': decrypt_info,
'byte_range': byte_range,
'byte_range': map_byte_range,
'media_sequence': media_sequence,
})
media_sequence += 1
@ -257,7 +278,7 @@ def is_ad_fragment_end(s):
media_sequence = int(line[22:])
elif line.startswith('#EXT-X-BYTERANGE'):
splitted_byte_range = line[17:].split('@')
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range['end']
sub_range_start = int(splitted_byte_range[1]) if len(splitted_byte_range) == 2 else byte_range_offset
byte_range = {
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),

View file

@ -256,6 +256,7 @@
BilibiliCheeseIE,
BilibiliCheeseSeasonIE,
BilibiliCollectionListIE,
BiliBiliDynamicIE,
BilibiliFavoritesListIE,
BiliBiliIE,
BiliBiliPlayerIE,
@ -440,12 +441,6 @@
CrowdBunkerIE,
)
from .crtvg import CrtvgIE
from .crunchyroll import (
CrunchyrollArtistIE,
CrunchyrollBetaIE,
CrunchyrollBetaShowIE,
CrunchyrollMusicIE,
)
from .cspan import (
CSpanCongressIE,
CSpanIE,
@ -459,7 +454,10 @@
CuriosityStreamIE,
CuriosityStreamSeriesIE,
)
from .cwtv import CWTVIE
from .cwtv import (
CWTVIE,
CWTVMovieIE,
)
from .cybrary import (
CybraryCourseIE,
CybraryIE,
@ -510,6 +508,7 @@
from .dhm import DHMIE
from .digitalconcerthall import DigitalConcertHallIE
from .digiteka import DigitekaIE
from .digiview import DigiviewIE
from .discogs import DiscogsReleasePlaylistIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
@ -555,6 +554,7 @@
DropoutIE,
DropoutSeasonIE,
)
from .drtalks import DrTalksIE
from .drtuber import DrTuberIE
from .drtv import (
DRTVIE,
@ -584,6 +584,10 @@
EggheadCourseIE,
EggheadLessonIE,
)
from .eggs import (
EggsArtistIE,
EggsIE,
)
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .elementorembed import ElementorEmbedIE
@ -699,11 +703,6 @@
FrontendMastersLessonIE,
)
from .fujitv import FujiTVFODPlus7IE
from .funimation import (
FunimationIE,
FunimationPageIE,
FunimationShowIE,
)
from .funk import FunkIE
from .funker530 import Funker530IE
from .fuyintv import FuyinTVIE
@ -1278,6 +1277,10 @@
)
from .nekohacker import NekoHackerIE
from .nerdcubed import NerdCubedFeedIE
from .nest import (
NestClipIE,
NestIE,
)
from .neteasemusic import (
NetEaseMusicAlbumIE,
NetEaseMusicDjRadioIE,
@ -1532,6 +1535,10 @@
PinterestCollectionIE,
PinterestIE,
)
from .piramidetv import (
PiramideTVChannelIE,
PiramideTVIE,
)
from .pixivsketch import (
PixivSketchIE,
PixivSketchUserIE,
@ -1551,6 +1558,7 @@
PluralsightIE,
)
from .plutotv import PlutoTVIE
from .plvideo import PlVideoIE
from .podbayfm import (
PodbayFMChannelIE,
PodbayFMIE,
@ -1981,6 +1989,10 @@
from .stretchinternet import StretchInternetIE
from .stripchat import StripchatIE
from .stv import STVPlayerIE
from .subsplash import (
SubsplashIE,
SubsplashPlaylistIE,
)
from .substack import SubstackIE
from .sunporno import SunPornoIE
from .sverigesradio import (
@ -2354,10 +2366,6 @@
VimmIE,
VimmRecordingIE,
)
from .vine import (
VineIE,
VineUserIE,
)
from .viously import ViouslyIE
from .viqeo import ViqeoIE
from .viu import (

View file

@ -421,14 +421,15 @@ def _real_extract(self, url):
class AbemaTVTitleIE(AbemaTVBaseIE):
_VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/]+)'
_VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/#]+)/?(?:\?(?:[^#]+&)?s=(?P<season>[^&#]+))?'
_PAGE_SIZE = 25
_TESTS = [{
'url': 'https://abema.tv/video/title/90-1597',
'url': 'https://abema.tv/video/title/90-1887',
'info_dict': {
'id': '90-1597',
'id': '90-1887',
'title': 'シャッフルアイランド',
'description': 'md5:61b2425308f41a5282a926edda66f178',
},
'playlist_mincount': 2,
}, {
@ -436,41 +437,54 @@ class AbemaTVTitleIE(AbemaTVBaseIE):
'info_dict': {
'id': '193-132',
'title': '真心が届く~僕とスターのオフィス・ラブ!?~',
'description': 'md5:9b59493d1f3a792bafbc7319258e7af8',
},
'playlist_mincount': 16,
}, {
'url': 'https://abema.tv/video/title/25-102',
'url': 'https://abema.tv/video/title/25-1nzan-whrxe',
'info_dict': {
'id': '25-102',
'title': 'ソードアート・オンライン アリシゼーション',
'id': '25-1nzan-whrxe',
'title': 'ソードアート・オンライン',
'description': 'md5:c094904052322e6978495532bdbf06e6',
},
'playlist_mincount': 24,
'playlist_mincount': 25,
}, {
'url': 'https://abema.tv/video/title/26-2mzbynr-cph?s=26-2mzbynr-cph_s40',
'info_dict': {
'title': '〈物語〉シリーズ',
'id': '26-2mzbynr-cph',
'description': 'md5:e67873de1c88f360af1f0a4b84847a52',
},
'playlist_count': 59,
}]
def _fetch_page(self, playlist_id, series_version, page):
def _fetch_page(self, playlist_id, series_version, season_id, page):
query = {
'seriesVersion': series_version,
'offset': str(page * self._PAGE_SIZE),
'order': 'seq',
'limit': str(self._PAGE_SIZE),
}
if season_id:
query['seasonId'] = season_id
programs = self._call_api(
f'v1/video/series/{playlist_id}/programs', playlist_id,
note=f'Downloading page {page + 1}',
query={
'seriesVersion': series_version,
'offset': str(page * self._PAGE_SIZE),
'order': 'seq',
'limit': str(self._PAGE_SIZE),
})
query=query)
yield from (
self.url_result(f'https://abema.tv/video/episode/{x}')
for x in traverse_obj(programs, ('programs', ..., 'id')))
def _entries(self, playlist_id, series_version):
def _entries(self, playlist_id, series_version, season_id):
return OnDemandPagedList(
functools.partial(self._fetch_page, playlist_id, series_version),
functools.partial(self._fetch_page, playlist_id, series_version, season_id),
self._PAGE_SIZE)
def _real_extract(self, url):
playlist_id = self._match_id(url)
playlist_id, season_id = self._match_valid_url(url).group('id', 'season')
series_info = self._call_api(f'v1/video/series/{playlist_id}', playlist_id)
return self.playlist_result(
self._entries(playlist_id, series_info['version']), playlist_id=playlist_id,
self._entries(playlist_id, series_info['version'], season_id), playlist_id=playlist_id,
playlist_title=series_info.get('title'),
playlist_description=series_info.get('content'))

View file

@ -43,14 +43,14 @@ class ACastIE(ACastBaseIE):
_VALID_URL = r'''(?x:
https?://
(?:
(?:(?:embed|www)\.)?acast\.com/|
(?:(?:embed|www|shows)\.)?acast\.com/|
play\.acast\.com/s/
)
(?P<channel>[^/]+)/(?P<id>[^/#?"]+)
(?P<channel>[^/?#]+)/(?:episodes/)?(?P<id>[^/#?"]+)
)'''
_EMBED_REGEX = [rf'(?x)<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'url': 'https://shows.acast.com/sparpodcast/episodes/2.raggarmordet-rosterurdetforflutna',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
@ -59,7 +59,7 @@ class ACastIE(ACastBaseIE):
'timestamp': 1477346700,
'upload_date': '20161024',
'duration': 2766,
'creator': 'Third Ear Studio',
'creators': ['Third Ear Studio'],
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
'thumbnail': 'https://assets.pippa.io/shows/616ebe1886d7b1398620b943/616ebe33c7e6e70013cae7da.jpg',
@ -74,6 +74,9 @@ class ACastIE(ACastBaseIE):
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True,
}, {
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
'only_matching': True,
@ -110,7 +113,7 @@ class ACastChannelIE(ACastBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?acast\.com/|
(?:(?:www|shows)\.)?acast\.com/|
play\.acast\.com/s/
)
(?P<id>[^/#?]+)
@ -120,12 +123,15 @@ class ACastChannelIE(ACastBaseIE):
'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus',
'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
'description': 'md5:feca253de9947634605080cd9eeea2bf',
},
'playlist_mincount': 200,
}, {
'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True,
}, {
'url': 'https://shows.acast.com/sparpodcast',
'only_matching': True,
}]
@classmethod

View file

@ -4,7 +4,9 @@
import itertools
import json
import math
import random
import re
import string
import time
import urllib.parse
import uuid
@ -681,12 +683,6 @@ def _real_extract(self, url):
old_video_id = format_field(aid, None, f'%s_part{part_id or 1}')
cid = traverse_obj(video_data, ('pages', part_id - 1, 'cid')) if part_id else video_data.get('cid')
play_info = (
traverse_obj(
self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id, default=None),
('data', {dict}))
or self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1}))
festival_info = {}
if is_festival:
festival_info = traverse_obj(initial_state, {
@ -724,6 +720,13 @@ def _real_extract(self, url):
duration=traverse_obj(initial_state, ('videoData', 'duration', {int_or_none})),
__post_extractor=self.extract_comments(aid))
play_info = None
if self.is_logged_in:
play_info = traverse_obj(
self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id, default=None),
('data', {dict}))
if not play_info:
play_info = self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1})
formats = self.extract_formats(play_info)
if video_data.get('is_upower_exclusive'):
@ -1176,28 +1179,26 @@ def _extract_playlist(self, fetch_page, get_metadata, get_entries):
class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>/video)?/?(?:[?#]|$)'
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>(?:/upload)?/video)?/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://space.bilibili.com/3985676/video',
'info_dict': {
'id': '3985676',
},
'playlist_mincount': 178,
'skip': 'login required',
}, {
'url': 'https://space.bilibili.com/313580179/video',
'info_dict': {
'id': '313580179',
},
'playlist_mincount': 92,
'skip': 'login required',
}]
def _real_extract(self, url):
playlist_id, is_video_url = self._match_valid_url(url).group('id', 'video')
if not is_video_url:
self.to_screen('A channel URL was given. Only the channel\'s videos will be downloaded. '
'To download audios, add a "/audio" to the URL')
'To download audios, add a "/upload/audio" to the URL')
def fetch_page(page_idx):
query = {
@ -1210,6 +1211,12 @@ def fetch_page(page_idx):
'ps': 30,
'tid': 0,
'web_location': 1550101,
'dm_img_list': '[]',
'dm_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(16, 64))).encode())[:-2].decode(),
'dm_cover_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(32, 128))).encode())[:-2].decode(),
'dm_img_inter': '{"ds":[],"wh":[6093,6631,31],"of":[430,760,380]}',
}
try:
@ -1220,14 +1227,14 @@ def fetch_page(page_idx):
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 412:
raise ExtractorError(
'Request is blocked by server (412), please add cookies, wait and try later.', expected=True)
'Request is blocked by server (412), please wait and try later.', expected=True)
raise
status_code = response['code']
if status_code == -401:
raise ExtractorError(
'Request is blocked by server (401), please add cookies, wait and try later.', expected=True)
elif status_code == -352 and not self.is_logged_in:
self.raise_login_required('Request is rejected, you need to login to access playlist')
'Request is blocked by server (401), please wait and try later.', expected=True)
elif status_code == -352:
raise ExtractorError('Request is rejected by server (352)', expected=True)
elif status_code != 0:
raise ExtractorError(f'Request failed ({status_code}): {response.get("message") or "Unknown error"}')
return response['data']
@ -1249,9 +1256,9 @@ def get_entries(page_data):
class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/audio'
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/(?:upload/)?audio'
_TESTS = [{
'url': 'https://space.bilibili.com/313580179/audio',
'url': 'https://space.bilibili.com/313580179/upload/audio',
'info_dict': {
'id': '313580179',
},
@ -1274,7 +1281,8 @@ def get_metadata(page_data):
}
def get_entries(page_data):
for entry in page_data.get('data', []):
# data is None when the playlist is empty
for entry in page_data.get('data') or []:
yield self.url_result(f'https://www.bilibili.com/audio/au{entry["id"]}', BilibiliAudioIE, entry['id'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
@ -1298,30 +1306,43 @@ def _extract_playlist(self, fetch_page, get_metadata, get_entries):
class BilibiliCollectionListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)'
_VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)',
]
_TESTS = [{
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445',
'url': 'https://space.bilibili.com/2142762/lists/3662502?type=season',
'info_dict': {
'id': '2142762_57445',
'title': '【完结】《底特律 变人》全结局流程解说',
'description': '',
'id': '2142762_3662502',
'title': '合集·《黑神话悟空》流程解说',
'description': '黑神话悟空 相关节目',
'uploader': '老戴在此',
'uploader_id': '2142762',
'timestamp': int,
'upload_date': str,
'thumbnail': 'https://archive.biliimg.com/bfs/archive/e0e543ae35ad3df863ea7dea526bc32e70f4c091.jpg',
'thumbnail': 'https://archive.biliimg.com/bfs/archive/22302e17dc849dd4533606d71bc89df162c3a9bf.jpg',
},
'playlist_mincount': 31,
'playlist_mincount': 62,
}, {
'url': 'https://space.bilibili.com/2142762/lists/3662502',
'only_matching': True,
}, {
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if BilibiliSeriesListIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
mid, sid = self._match_valid_url(url).group('mid', 'sid')
playlist_id = f'{mid}_{sid}'
def fetch_page(page_idx):
return self._download_json(
'https://api.bilibili.com/x/polymer/space/seasons_archives_list',
playlist_id, note=f'Downloading page {page_idx}',
'https://api.bilibili.com/x/polymer/web-space/seasons_archives_list',
playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'season_id': sid, 'page_num': page_idx + 1, 'page_size': 30})['data']
def get_metadata(page_data):
@ -1348,9 +1369,12 @@ def get_entries(page_data):
class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)'
_VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)/?\?(?:[^#]+&)?type=series(?:[&#]|$)',
]
_TESTS = [{
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0',
'url': 'https://space.bilibili.com/1958703906/lists/547718?type=series',
'info_dict': {
'id': '1958703906_547718',
'title': '直播回放',
@ -1363,6 +1387,9 @@ class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
'modified_date': str,
},
'playlist_mincount': 513,
}, {
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0',
'only_matching': True,
}]
def _real_extract(self, url):
@ -1381,7 +1408,7 @@ def _real_extract(self, url):
def fetch_page(page_idx):
return self._download_json(
'https://api.bilibili.com/x/series/archives',
playlist_id, note=f'Downloading page {page_idx}',
playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'series_id': sid, 'pn': page_idx + 1, 'ps': 30})['data']
def get_metadata(page_data):
@ -1860,6 +1887,47 @@ def _real_extract(self, url):
ie=BiliBiliIE.ie_key(), video_id=video_id)
class BiliBiliDynamicIE(InfoExtractor):
_VALID_URL = r'https?://(?:t\.bilibili\.com|(?:www\.)?bilibili\.com/opus)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://t.bilibili.com/998134289197432852',
'info_dict': {
'id': 'BV1TAmBYVEJr',
'ext': 'mp4',
'uploader_id': '1192648858',
'comment_count': int,
'_old_archive_ids': ['bilibili 113457567568273_part1'],
'thumbnail': 'http://i2.hdslb.com/bfs/archive/50091efd965d9f13ff6814f7ad374f90ab21e77d.jpg',
'duration': 929.238,
'upload_date': '20241110',
'uploader': '何同学工作室',
'like_count': int,
'view_count': int,
'title': '美国小朋友就玩这个何同学工作室11月开箱',
'description': '本期产品信息:\n机器狗\n气味模拟器\nCloudboom Strike LS\n无弦吉他\n蓝牙磁带音箱\n神奇画板',
'timestamp': 1731232800,
'tags': list,
'chapters': list,
},
}]
def _real_extract(self, url):
post_id = self._match_id(url)
# Without the newer chrome UA, the API will return an error (-352)
post_data = self._download_json(
'https://api.bilibili.com/x/polymer/web-dynamic/v1/detail', post_id,
query={'id': post_id}, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
})
video_url = traverse_obj(post_data, (
'data', 'item', (None, 'orig'), 'modules', 'module_dynamic',
(('major', ('archive', 'pgc')), ('additional', ('reserve', 'common'))),
'jump_url', {url_or_none}, any, {self._proto_relative_url}))
if not video_url or (self.suitable(video_url) and post_id == self._match_id(video_url)):
raise ExtractorError('No valid video URL found', expected=True)
return self.url_result(video_url)
class BiliIntlBaseIE(InfoExtractor):
_API_URL = 'https://api.bilibili.tv/intl/gateway'
_NETRC_MACHINE = 'biliintl'

View file

@ -88,7 +88,7 @@ class BlueskyIE(InfoExtractor):
},
}, {
'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e',
'md5': '1af9c7fda061cf7593bbffca89e43d1c',
'md5': 'cc0110ed1f6b0247caac8234cc1e861d',
'info_dict': {
'id': '3l3w4tnezek2e',
'ext': 'mp4',
@ -133,6 +133,8 @@ class BlueskyIE(InfoExtractor):
'channel_follower_count': int,
'categories': ['Entertainment'],
'tags': [],
'chapters': list,
'heatmap': 'count:100',
},
'add_ie': ['Youtube'],
}, {
@ -184,14 +186,14 @@ class BlueskyIE(InfoExtractor):
},
},
}, {
'url': 'https://bsky.app/profile/alt.bun.how/post/3l7rdfxhyds2f',
'url': 'https://bsky.app/profile/cinny.bun.how/post/3l7rdfxhyds2f',
'md5': '8775118b235cf9fa6b5ad30f95cda75c',
'info_dict': {
'id': '3l7rdfxhyds2f',
'ext': 'mp4',
'uploader': 'cinnamon',
'uploader_id': 'alt.bun.how',
'uploader_url': 'https://bsky.app/profile/alt.bun.how',
'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
@ -284,17 +286,19 @@ def _get_service_endpoint(self, did, video_id):
services, ('service', lambda _, x: x['type'] == 'AtprotoPersonalDataServer',
'serviceEndpoint', {url_or_none}, any)) or 'https://bsky.social'
def _real_extract(self, url):
handle, video_id = self._match_valid_url(url).group('handle', 'id')
post = self._download_json(
def _extract_post(self, handle, post_id):
return self._download_json(
'https://public.api.bsky.app/xrpc/app.bsky.feed.getPostThread',
video_id, query={
'uri': f'at://{handle}/app.bsky.feed.post/{video_id}',
post_id, query={
'uri': f'at://{handle}/app.bsky.feed.post/{post_id}',
'depth': 0,
'parentHeight': 0,
})['thread']['post']
def _real_extract(self, url):
handle, video_id = self._match_valid_url(url).group('handle', 'id')
post = self._extract_post(handle, video_id)
entries = []
# app.bsky.embed.video.view/app.bsky.embed.external.view
entries.extend(self._extract_videos(post, video_id))
@ -341,6 +345,7 @@ def _extract_videos(self, root, video_id, embed_path='embed', record_path='recor
formats.append({
'format_id': 'blob',
'quality': 1,
'url': update_url_query(
self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}),
**traverse_obj(root, (*embed_path, 'aspectRatio', {

View file

@ -31,6 +31,7 @@
update_url_query,
url_or_none,
)
from ..utils.traversal import traverse_obj
class BrightcoveLegacyIE(InfoExtractor):
@ -935,8 +936,8 @@ def extract_policy_key():
if content_type == 'playlist':
return self.playlist_result(
[self._parse_brightcove_metadata(vid, vid.get('id'), headers)
for vid in json_data.get('videos', []) if vid.get('id')],
(self._parse_brightcove_metadata(vid, vid['id'], headers)
for vid in traverse_obj(json_data, ('videos', lambda _, v: v['id']))),
json_data.get('id'), json_data.get('name'),
json_data.get('description'))

View file

@ -14,16 +14,18 @@
js_to_json,
mimetype2ext,
orderedSet,
parse_age_limit,
parse_iso8601,
replace_extension,
smuggle_url,
strip_or_none,
traverse_obj,
try_get,
unified_timestamp,
update_url,
url_basename,
url_or_none,
)
from ..utils.traversal import require, traverse_obj, trim_str
class CBCIE(InfoExtractor):
@ -516,9 +518,43 @@ def entries():
return self.playlist_result(entries(), playlist_id)
class CBCGemIE(InfoExtractor):
class CBCGemBaseIE(InfoExtractor):
_NETRC_MACHINE = 'cbcgem'
_GEO_COUNTRIES = ['CA']
def _call_show_api(self, item_id, display_id=None):
return self._download_json(
f'https://services.radio-canada.ca/ott/catalog/v2/gem/show/{item_id}',
display_id or item_id, query={'device': 'web'})
def _extract_item_info(self, item_info):
episode_number = None
title = traverse_obj(item_info, ('title', {str}))
if title and (mobj := re.match(r'(?P<episode>\d+)\. (?P<title>.+)', title)):
episode_number = int_or_none(mobj.group('episode'))
title = mobj.group('title')
return {
'episode_number': episode_number,
**traverse_obj(item_info, {
'id': ('url', {str}),
'episode_id': ('url', {str}),
'description': ('description', {str}),
'thumbnail': ('images', 'card', 'url', {url_or_none}, {update_url(query=None)}),
'episode_number': ('episodeNumber', {int_or_none}),
'duration': ('metadata', 'duration', {int_or_none}),
'release_timestamp': ('metadata', 'airDate', {unified_timestamp}),
'timestamp': ('metadata', 'availabilityDate', {unified_timestamp}),
'age_limit': ('metadata', 'rating', {trim_str(start='C')}, {parse_age_limit}),
}),
'episode': title,
'title': title,
}
class CBCGemIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s[0-9]+[a-z][0-9]+)'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s(?P<season>[0-9]+)[a-z][0-9]+)'
_TESTS = [{
# This is a normal, public, TV show video
'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
@ -529,7 +565,7 @@ class CBCGemIE(InfoExtractor):
'description': 'md5:929868d20021c924020641769eb3e7f1',
'thumbnail': r're:https://images\.radio-canada\.ca/[^#?]+/cbc_schitts_creek_season_06e01_thumbnail_v01\.jpg',
'duration': 1324,
'categories': ['comedy'],
'genres': ['Comédie et humour'],
'series': 'Schitt\'s Creek',
'season': 'Season 6',
'season_number': 6,
@ -537,9 +573,10 @@ class CBCGemIE(InfoExtractor):
'episode_number': 1,
'episode_id': 'schitts-creek/s06e01',
'upload_date': '20210618',
'timestamp': 1623988800,
'timestamp': 1623974400,
'release_date': '20200107',
'release_timestamp': 1578427200,
'release_timestamp': 1578355200,
'age_limit': 14,
},
'params': {'format': 'bv'},
}, {
@ -557,12 +594,13 @@ class CBCGemIE(InfoExtractor):
'episode_number': 1,
'episode': 'The Cup Runneth Over',
'episode_id': 'schitts-creek/s01e01',
'duration': 1309,
'categories': ['comedy'],
'duration': 1308,
'genres': ['Comédie et humour'],
'upload_date': '20210617',
'timestamp': 1623902400,
'release_date': '20151124',
'release_timestamp': 1448323200,
'timestamp': 1623888000,
'release_date': '20151123',
'release_timestamp': 1448236800,
'age_limit': 14,
},
'params': {'format': 'bv'},
}, {
@ -570,9 +608,7 @@ class CBCGemIE(InfoExtractor):
'only_matching': True,
}]
_GEO_COUNTRIES = ['CA']
_TOKEN_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcgem'
_claims_token = None
def _new_claims_token(self, email, password):
@ -634,10 +670,12 @@ def _real_initialize(self):
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._download_json(
f'https://services.radio-canada.ca/ott/cbc-api/v2/assets/{video_id}',
video_id, expected_status=426)
video_id, season_number = self._match_valid_url(url).group('id', 'season')
video_info = self._call_show_api(video_id)
item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items',
lambda _, v: v['url'] == video_id, any, {require('item info')}))
media_id = item_info['idMedia']
email, password = self._get_login_info()
if email and password:
@ -645,7 +683,20 @@ def _real_extract(self, url):
headers = {'x-claims-token': claims_token}
else:
headers = {}
m3u8_info = self._download_json(video_info['playSession']['url'], video_id, headers=headers)
m3u8_info = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
video_id, headers=headers, query={
'appCode': 'gem',
'connectionType': 'hd',
'deviceType': 'ipad',
'multibitrate': 'true',
'output': 'json',
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': media_id,
})
if m3u8_info.get('errorCode') == 1:
self.raise_geo_restricted(countries=['CA'])
@ -671,26 +722,20 @@ def _real_extract(self, url):
fmt['preference'] = -2
return {
'season_number': int_or_none(season_number),
**traverse_obj(video_info, {
'series': ('title', {str}),
'season_number': ('structuredMetadata', 'partofSeason', 'seasonNumber', {int_or_none}),
'genres': ('structuredMetadata', 'genre', ..., {str}),
}),
**self._extract_item_info(item_info),
'id': video_id,
'episode_id': video_id,
'formats': formats,
**traverse_obj(video_info, {
'title': ('title', {str}),
'episode': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'series': ('series', {str}),
'season_number': ('season', {int_or_none}),
'episode_number': ('episode', {int_or_none}),
'duration': ('duration', {int_or_none}),
'categories': ('category', {str}, all),
'release_timestamp': ('airDate', {int_or_none(scale=1000)}),
'timestamp': ('availableDate', {int_or_none(scale=1000)}),
}),
}
class CBCGemPlaylistIE(InfoExtractor):
class CBCGemPlaylistIE(CBCGemBaseIE):
IE_NAME = 'gem.cbc.ca:playlist'
_VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>(?P<show>[0-9a-z-]+)/s(?P<season>[0-9]+))/?(?:[?#]|$)'
_TESTS = [{
@ -700,70 +745,35 @@ class CBCGemPlaylistIE(InfoExtractor):
'info_dict': {
'id': 'schitts-creek/s06',
'title': 'Season 6',
'description': 'md5:6a92104a56cbeb5818cc47884d4326a2',
'series': 'Schitt\'s Creek',
'season_number': 6,
'season': 'Season 6',
'thumbnail': 'https://images.radio-canada.ca/v1/synps-cbc/season/perso/cbc_schitts_creek_season_06_carousel_v03.jpg?impolicy=ott&im=Resize=(_Size_)&quality=75',
},
}, {
'url': 'https://gem.cbc.ca/schitts-creek/s06',
'only_matching': True,
}]
_API_BASE = 'https://services.radio-canada.ca/ott/cbc-api/v2/shows/'
def _entries(self, season_info):
for episode in traverse_obj(season_info, ('items', lambda _, v: v['url'])):
yield self.url_result(
f'https://gem.cbc.ca/media/{episode["url"]}', CBCGemIE,
**self._extract_item_info(episode))
def _real_extract(self, url):
match = self._match_valid_url(url)
season_id = match.group('id')
show = match.group('show')
show_info = self._download_json(self._API_BASE + show, season_id, expected_status=426)
season = int(match.group('season'))
season_id, show, season = self._match_valid_url(url).group('id', 'show', 'season')
show_info = self._call_show_api(show, display_id=season_id)
season_info = traverse_obj(show_info, (
'content', ..., 'lineups',
lambda _, v: v['seasonNumber'] == int(season), any, {require('season info')}))
season_info = next((s for s in show_info['seasons'] if s.get('season') == season), None)
if season_info is None:
raise ExtractorError(f'Couldn\'t find season {season} of {show}')
episodes = []
for episode in season_info['assets']:
episodes.append({
'_type': 'url_transparent',
'ie_key': 'CBCGem',
'url': 'https://gem.cbc.ca/media/' + episode['id'],
'id': episode['id'],
'title': episode.get('title'),
'description': episode.get('description'),
'thumbnail': episode.get('image'),
'series': episode.get('series'),
'season_number': episode.get('season'),
'season': season_info['title'],
'season_id': season_info.get('id'),
'episode_number': episode.get('episode'),
'episode': episode.get('title'),
'episode_id': episode['id'],
'duration': episode.get('duration'),
'categories': [episode.get('category')],
})
thumbnail = None
tn_uri = season_info.get('image')
# the-national was observed to use a "data:image/png;base64"
# URI for their 'image' value. The image was 1x1, and is
# probably just a placeholder, so it is ignored.
if tn_uri is not None and not tn_uri.startswith('data:'):
thumbnail = tn_uri
return {
'_type': 'playlist',
'entries': episodes,
'id': season_id,
'title': season_info['title'],
'description': season_info.get('description'),
'thumbnail': thumbnail,
'series': show_info.get('title'),
'season_number': season_info.get('season'),
'season': season_info['title'],
}
return self.playlist_result(
self._entries(season_info), season_id,
**traverse_obj(season_info, {
'title': ('title', {str}),
'season': ('title', {str}),
'season_number': ('seasonNumber', {int_or_none}),
}), series=traverse_obj(show_info, ('title', {str})))
class CBCGemLiveIE(InfoExtractor):

View file

@ -2,7 +2,6 @@
import collections
import functools
import getpass
import hashlib
import http.client
import http.cookiejar
import http.cookies
@ -78,7 +77,6 @@
parse_iso8601,
parse_m3u8_attributes,
parse_resolution,
sanitize_filename,
sanitize_url,
smuggle_url,
str_or_none,
@ -100,6 +98,7 @@
xpath_text,
xpath_with_ns,
)
from ..utils._utils import _request_dump_filename
class InfoExtractor:
@ -201,6 +200,11 @@ class InfoExtractor:
fragment_base_url
* "duration" (optional, int or float)
* "filesize" (optional, int)
* hls_media_playlist_data
The M3U8 media playlist data as a string.
Only use if the data must be modified during extraction and
the native HLS downloader should bypass requesting the URL.
Does not apply if ffmpeg is used as external downloader
* is_from_start Is a live format that can be downloaded
from the start. Boolean
* preference Order number of this format. If this field is
@ -1017,23 +1021,6 @@ def __check_blocked(self, content):
'Visit http://blocklist.rkn.gov.ru/ for a block reason.',
expected=True)
def _request_dump_filename(self, url, video_id, data=None):
if data is not None:
data = hashlib.md5(data).hexdigest()
basen = join_nonempty(video_id, data, url, delim='_')
trim_length = self.get_param('trim_file_name') or 240
if len(basen) > trim_length:
h = '___' + hashlib.md5(basen.encode()).hexdigest()
basen = basen[:trim_length - len(h)] + h
filename = sanitize_filename(f'{basen}.dump', restricted=True)
# Working around MAX_PATH limitation on Windows (see
# http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
if os.name == 'nt':
absfilepath = os.path.abspath(filename)
if len(absfilepath) > 259:
filename = fR'\\?\{absfilepath}'
return filename
def __decode_webpage(self, webpage_bytes, encoding, headers):
if not encoding:
encoding = self._guess_encoding_from_content(headers.get('Content-Type', ''), webpage_bytes)
@ -1062,7 +1049,9 @@ def _webpage_read_content(self, urlh, url_or_request, video_id, note=None, errno
if self.get_param('write_pages'):
if isinstance(url_or_request, Request):
data = self._create_request(url_or_request, data).data
filename = self._request_dump_filename(urlh.url, video_id, data)
filename = _request_dump_filename(
urlh.url, video_id, data,
trim_length=self.get_param('trim_file_name'))
self.to_screen(f'Saving request to {filename}')
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
@ -1123,7 +1112,9 @@ def download_content(self, url_or_request, video_id, note=note, errnote=errnote,
impersonate=None, require_impersonation=False):
if self.get_param('load_pages'):
url_or_request = self._create_request(url_or_request, data, headers, query)
filename = self._request_dump_filename(url_or_request.url, video_id, url_or_request.data)
filename = _request_dump_filename(
url_or_request.url, video_id, url_or_request.data,
trim_length=self.get_param('trim_file_name'))
self.to_screen(f'Loading request from {filename}')
try:
with open(filename, 'rb') as dumpf:

View file

@ -1,692 +0,0 @@
import base64
import uuid
from .common import InfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
format_field,
int_or_none,
jwt_decode_hs256,
parse_age_limit,
parse_count,
parse_iso8601,
qualities,
time_seconds,
traverse_obj,
url_or_none,
urlencode_postdata,
)
class CrunchyrollBaseIE(InfoExtractor):
_BASE_URL = 'https://www.crunchyroll.com'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
_SWITCH_USER_AGENT = 'Crunchyroll/1.8.0 Nintendo Switch/12.3.12.0 UE4/4.27'
_REFRESH_TOKEN = None
_AUTH_HEADERS = None
_AUTH_EXPIRY = None
_API_ENDPOINT = None
_BASIC_AUTH = 'Basic ' + base64.b64encode(':'.join((
't-kdgp2h8c3jub8fn0fq',
'yfLDfMfrYvKXh4JXS1LEI2cCqu1v5Wan',
)).encode()).decode()
_IS_PREMIUM = None
_LOCALE_LOOKUP = {
'ar': 'ar-SA',
'de': 'de-DE',
'': 'en-US',
'es': 'es-419',
'es-es': 'es-ES',
'fr': 'fr-FR',
'it': 'it-IT',
'pt-br': 'pt-BR',
'pt-pt': 'pt-PT',
'ru': 'ru-RU',
'hi': 'hi-IN',
}
def _set_auth_info(self, response):
CrunchyrollBaseIE._IS_PREMIUM = 'cr_premium' in traverse_obj(response, ('access_token', {jwt_decode_hs256}, 'benefits', ...))
CrunchyrollBaseIE._AUTH_HEADERS = {'Authorization': response['token_type'] + ' ' + response['access_token']}
CrunchyrollBaseIE._AUTH_EXPIRY = time_seconds(seconds=traverse_obj(response, ('expires_in', {float_or_none}), default=300) - 10)
def _request_token(self, headers, data, note='Requesting token', errnote='Failed to request token'):
try:
return self._download_json(
f'{self._BASE_URL}/auth/v1/token', None, note=note, errnote=errnote,
headers=headers, data=urlencode_postdata(data), impersonate=True)
except ExtractorError as error:
if not isinstance(error.cause, HTTPError) or error.cause.status != 403:
raise
if target := error.cause.response.extensions.get('impersonate'):
raise ExtractorError(f'Got HTTP Error 403 when using impersonate target "{target}"')
raise ExtractorError(
'Request blocked by Cloudflare. '
'Install the required impersonation dependency if possible, '
'or else navigate to Crunchyroll in your browser, '
'then pass the fresh cookies (with --cookies-from-browser or --cookies) '
'and your browser\'s User-Agent (with --user-agent)', expected=True)
def _perform_login(self, username, password):
if not CrunchyrollBaseIE._REFRESH_TOKEN:
CrunchyrollBaseIE._REFRESH_TOKEN = self.cache.load(self._NETRC_MACHINE, username)
if CrunchyrollBaseIE._REFRESH_TOKEN:
return
try:
login_response = self._request_token(
headers={'Authorization': self._BASIC_AUTH}, data={
'username': username,
'password': password,
'grant_type': 'password',
'scope': 'offline_access',
}, note='Logging in', errnote='Failed to log in')
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 401:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
CrunchyrollBaseIE._REFRESH_TOKEN = login_response['refresh_token']
self.cache.store(self._NETRC_MACHINE, username, CrunchyrollBaseIE._REFRESH_TOKEN)
self._set_auth_info(login_response)
def _update_auth(self):
if CrunchyrollBaseIE._AUTH_HEADERS and CrunchyrollBaseIE._AUTH_EXPIRY > time_seconds():
return
auth_headers = {'Authorization': self._BASIC_AUTH}
if CrunchyrollBaseIE._REFRESH_TOKEN:
data = {
'refresh_token': CrunchyrollBaseIE._REFRESH_TOKEN,
'grant_type': 'refresh_token',
'scope': 'offline_access',
}
else:
data = {'grant_type': 'client_id'}
auth_headers['ETP-Anonymous-ID'] = uuid.uuid4()
try:
auth_response = self._request_token(auth_headers, data)
except ExtractorError as error:
username, password = self._get_login_info()
if not username or not isinstance(error.cause, HTTPError) or error.cause.status != 400:
raise
self.to_screen('Refresh token has expired. Re-logging in')
CrunchyrollBaseIE._REFRESH_TOKEN = None
self.cache.store(self._NETRC_MACHINE, username, None)
self._perform_login(username, password)
return
self._set_auth_info(auth_response)
def _locale_from_language(self, language):
config_locale = self._configuration_arg('metadata', ie_key=CrunchyrollBetaIE, casesense=True)
return config_locale[0] if config_locale else self._LOCALE_LOOKUP.get(language)
def _call_base_api(self, endpoint, internal_id, lang, note=None, query={}):
self._update_auth()
if not endpoint.startswith('/'):
endpoint = f'/{endpoint}'
query = query.copy()
locale = self._locale_from_language(lang)
if locale:
query['locale'] = locale
return self._download_json(
f'{self._BASE_URL}{endpoint}', internal_id, note or f'Calling API: {endpoint}',
headers=CrunchyrollBaseIE._AUTH_HEADERS, query=query)
def _call_api(self, path, internal_id, lang, note='api', query={}):
if not path.startswith(f'/content/v2/{self._API_ENDPOINT}/'):
path = f'/content/v2/{self._API_ENDPOINT}/{path}'
try:
result = self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON ({self._API_ENDPOINT})', query=query)
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 404:
return None
raise
if not result:
raise ExtractorError(f'Unexpected response when downloading {note} JSON')
return result
def _extract_chapters(self, internal_id):
# if no skip events are available, a 403 xml error is returned
skip_events = self._download_json(
f'https://static.crunchyroll.com/skip-events/production/{internal_id}.json',
internal_id, note='Downloading chapter info', fatal=False, errnote=False)
if not skip_events:
return None
chapters = []
for event in ('recap', 'intro', 'credits', 'preview'):
start = traverse_obj(skip_events, (event, 'start', {float_or_none}))
end = traverse_obj(skip_events, (event, 'end', {float_or_none}))
# some chapters have no start and/or ending time, they will just be ignored
if start is None or end is None:
continue
chapters.append({'title': event.capitalize(), 'start_time': start, 'end_time': end})
return chapters
def _extract_stream(self, identifier, display_id=None):
if not display_id:
display_id = identifier
self._update_auth()
headers = {**CrunchyrollBaseIE._AUTH_HEADERS, 'User-Agent': self._SWITCH_USER_AGENT}
try:
stream_response = self._download_json(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/{identifier}/console/switch/play',
display_id, note='Downloading stream info', errnote='Failed to download stream info', headers=headers)
except ExtractorError as error:
if self.get_param('ignore_no_formats_error'):
self.report_warning(error.orig_msg)
return [], {}
elif isinstance(error.cause, HTTPError) and error.cause.status == 420:
raise ExtractorError(
'You have reached the rate-limit for active streams; try again later', expected=True)
raise
available_formats = {'': ('', '', stream_response['url'])}
for hardsub_lang, stream in traverse_obj(stream_response, ('hardSubs', {dict.items}, lambda _, v: v[1]['url'])):
available_formats[hardsub_lang] = (f'hardsub-{hardsub_lang}', hardsub_lang, stream['url'])
requested_hardsubs = [('' if val == 'none' else val) for val in (self._configuration_arg('hardsub') or ['none'])]
hardsub_langs = [lang for lang in available_formats if lang]
if hardsub_langs and 'all' not in requested_hardsubs:
full_format_langs = set(requested_hardsubs)
self.to_screen(f'Available hardsub languages: {", ".join(hardsub_langs)}')
self.to_screen(
'To extract formats of a hardsub language, use '
'"--extractor-args crunchyrollbeta:hardsub=<language_code or all>". '
'See https://github.com/yt-dlp/yt-dlp#crunchyrollbeta-crunchyroll for more info',
only_once=True)
else:
full_format_langs = set(map(str.lower, available_formats))
audio_locale = traverse_obj(stream_response, ('audioLocale', {str}))
hardsub_preference = qualities(requested_hardsubs[::-1])
formats, subtitles = [], {}
for format_id, hardsub_lang, stream_url in available_formats.values():
if hardsub_lang.lower() in full_format_langs:
adaptive_formats, dash_subs = self._extract_mpd_formats_and_subtitles(
stream_url, display_id, mpd_id=format_id, headers=CrunchyrollBaseIE._AUTH_HEADERS,
fatal=False, note=f'Downloading {f"{format_id} " if hardsub_lang else ""}MPD manifest')
self._merge_subtitles(dash_subs, target=subtitles)
else:
continue # XXX: Update this if meta mpd formats work; will be tricky with token invalidation
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_locale
f['quality'] = hardsub_preference(hardsub_lang.lower())
formats.extend(adaptive_formats)
for locale, subtitle in traverse_obj(stream_response, (('subtitles', 'captions'), {dict.items}, ...)):
subtitles.setdefault(locale, []).append(traverse_obj(subtitle, {'url': 'url', 'ext': 'format'}))
# Invalidate stream token to avoid rate-limit
error_msg = 'Unable to invalidate stream token; you may experience rate-limiting'
if stream_token := stream_response.get('token'):
self._request_webpage(Request(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/token/{identifier}/{stream_token}/inactive',
headers=headers, method='PATCH'), display_id, 'Invalidating stream token', error_msg, fatal=False)
else:
self.report_warning(error_msg)
return formats, subtitles
class CrunchyrollCmsBaseIE(CrunchyrollBaseIE):
_API_ENDPOINT = 'cms'
_CMS_EXPIRY = None
def _call_cms_api_signed(self, path, internal_id, lang, note='api'):
if not CrunchyrollCmsBaseIE._CMS_EXPIRY or CrunchyrollCmsBaseIE._CMS_EXPIRY <= time_seconds():
response = self._call_base_api('index/v2', None, lang, 'Retrieving signed policy')['cms_web']
CrunchyrollCmsBaseIE._CMS_QUERY = {
'Policy': response['policy'],
'Signature': response['signature'],
'Key-Pair-Id': response['key_pair_id'],
}
CrunchyrollCmsBaseIE._CMS_BUCKET = response['bucket']
CrunchyrollCmsBaseIE._CMS_EXPIRY = parse_iso8601(response['expires']) - 10
if not path.startswith('/cms/v2'):
path = f'/cms/v2{CrunchyrollCmsBaseIE._CMS_BUCKET}/{path}'
return self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON (signed cms)', query=CrunchyrollCmsBaseIE._CMS_QUERY)
class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?:(?P<lang>\w{2}(?:-\w{2})?)/)?
watch/(?!concert|musicvideo)(?P<id>\w+)'''
_TESTS = [{
# Premium only
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'info_dict': {
'id': 'GY2P1Q98Y',
'ext': 'mp4',
'duration': 1380.241,
'timestamp': 1459632600,
'description': 'md5:a022fbec4fbb023d43631032c91ed64b',
'title': 'World Trigger Episode 73 To the Future',
'upload_date': '20160402',
'series': 'World Trigger',
'series_id': 'GR757DMKY',
'season': 'World Trigger',
'season_id': 'GR9P39NJ6',
'season_number': 1,
'episode': 'To the Future',
'episode_number': 73,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'chapters': 'count:2',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {
'skip_download': 'm3u8',
'extractor_args': {'crunchyrollbeta': {'hardsub': ['de-DE']}},
'format': 'bv[format_id~=hardsub]',
},
}, {
# Premium only
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
'info_dict': {
'id': 'GYE5WKQGR',
'ext': 'mp4',
'duration': 366.459,
'timestamp': 1476788400,
'description': 'md5:74b67283ffddd75f6e224ca7dc031e76',
'title': 'SHELTER Porter Robinson presents Shelter the Animation',
'upload_date': '20161018',
'series': 'SHELTER',
'series_id': 'GYGG09WWY',
'season': 'SHELTER',
'season_id': 'GR09MGK4R',
'season_number': 1,
'episode': 'Porter Robinson presents Shelter the Animation',
'episode_number': 0,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': True},
}, {
'url': 'https://www.crunchyroll.com/watch/GJWU2VKK3/cherry-blossom-meeting-and-a-coming-blizzard',
'info_dict': {
'id': 'GJWU2VKK3',
'ext': 'mp4',
'duration': 1420.054,
'description': 'md5:2d1c67c0ec6ae514d9c30b0b99a625cd',
'title': 'The Ice Guy and His Cool Female Colleague Episode 1 Cherry Blossom Meeting and a Coming Blizzard',
'series': 'The Ice Guy and His Cool Female Colleague',
'series_id': 'GW4HM75NP',
'season': 'The Ice Guy and His Cool Female Colleague',
'season_id': 'GY9PC21VE',
'season_number': 1,
'episode': 'Cherry Blossom Meeting and a Coming Blizzard',
'episode_number': 1,
'chapters': 'count:2',
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'timestamp': 1672839000,
'upload_date': '20230104',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/GM8F313NQ',
'info_dict': {
'id': 'GM8F313NQ',
'ext': 'mp4',
'title': 'Garakowa -Restore the World-',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'duration': 3996.104,
'age_limit': 13,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/G62PEZ2E6',
'info_dict': {
'id': 'G62PEZ2E6',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'age_limit': 13,
'duration': 65.138,
'title': 'Garakowa -Restore the World-',
},
'playlist_mincount': 5,
}, {
'url': 'https://www.crunchyroll.com/de/watch/GY2P1Q98Y',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}]
# We want to support lazy playlist filtering and movie listings cannot be inside a playlist
_RETURN_TYPE = 'video'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
# We need to use unsigned API call to allow ratings query string
response = traverse_obj(self._call_api(
f'objects/{internal_id}', internal_id, lang, 'object info', {'ratings': 'true'}), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
object_type = response.get('type')
if object_type == 'episode':
result = self._transform_episode_response(response)
elif object_type == 'movie':
result = self._transform_movie_response(response)
elif object_type == 'movie_listing':
first_movie_id = traverse_obj(response, ('movie_listing_metadata', 'first_movie_id'))
if not self._yes_playlist(internal_id, first_movie_id):
return self.url_result(f'{self._BASE_URL}/{lang}watch/{first_movie_id}', CrunchyrollBetaIE, first_movie_id)
def entries():
movies = self._call_api(f'movie_listings/{internal_id}/movies', internal_id, lang, 'movie list')
for movie_response in traverse_obj(movies, ('data', ...)):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{movie_response["id"]}',
CrunchyrollBetaIE, **self._transform_movie_response(movie_response))
return self.playlist_result(entries(), **self._transform_movie_response(response))
else:
raise ExtractorError(f'Unknown object type {object_type}')
if not self._IS_PREMIUM and traverse_obj(response, (f'{object_type}_metadata', 'is_premium_only')):
message = f'This {object_type} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], result['subtitles'] = self._extract_stream(internal_id)
result['chapters'] = self._extract_chapters(internal_id)
def calculate_count(item):
return parse_count(''.join((item['displayed'], item.get('unit') or '')))
result.update(traverse_obj(response, ('rating', {
'like_count': ('up', {calculate_count}),
'dislike_count': ('down', {calculate_count}),
})))
return result
@staticmethod
def _transform_episode_response(data):
metadata = traverse_obj(data, (('episode_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
'title': ' \u2013 '.join((
('{}{}'.format(
format_field(metadata, 'season_title'),
format_field(metadata, 'episode', ' Episode %s'))),
format_field(data, 'title'))),
**traverse_obj(data, {
'episode': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'timestamp': ('upload_date', {parse_iso8601}),
'series': ('series_title', {str}),
'series_id': ('series_id', {str}),
'season': ('season_title', {str}),
'season_id': ('season_id', {str}),
'season_number': ('season_number', ({int}, {float_or_none})),
'episode_number': ('sequence_number', ({int}, {float_or_none})),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'language': ('audio_locale', {str}),
}, get_all=False),
}
@staticmethod
def _transform_movie_response(data):
metadata = traverse_obj(data, (('movie_metadata', 'movie_listing_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollBetaShowIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
series/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'info_dict': {
'id': 'GY19NQ2QR',
'title': 'Girl Friend BETA',
'description': 'md5:99c1b22ee30a74b536a8277ced8eb750',
# XXX: `thumbnail` does not get set from `thumbnails` in playlist
# 'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
},
'playlist_mincount': 10,
}, {
'url': 'https://beta.crunchyroll.com/it/series/GY19NQ2QR',
'only_matching': True,
}]
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
def entries():
seasons_response = self._call_cms_api_signed(f'seasons?series_id={internal_id}', internal_id, lang, 'seasons')
for season in traverse_obj(seasons_response, ('items', ..., {dict})):
episodes_response = self._call_cms_api_signed(
f'episodes?season_id={season["id"]}', season['id'], lang, 'episode list')
for episode_response in traverse_obj(episodes_response, ('items', ..., {dict})):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{episode_response["id"]}',
CrunchyrollBetaIE, **CrunchyrollBetaIE._transform_episode_response(episode_response))
return self.playlist_result(
entries(), internal_id,
**traverse_obj(self._call_api(f'series/{internal_id}', internal_id, lang, 'series'), ('data', 0, {
'title': ('title', {str}),
'description': ('description', {lambda x: x.replace(r'\r\n', '\n')}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'thumbnails': ('images', ..., ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
})))
class CrunchyrollMusicIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:music'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<type>concert|musicvideo)/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79',
'info_dict': {
'ext': 'mp4',
'id': 'MV5B02C79',
'display_id': 'egaono-hana',
'title': 'Egaono Hana',
'track': 'Egaono Hana',
'artists': ['Goose house'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C',
'info_dict': {
'ext': 'mp4',
'id': 'MV88BB7F2C',
'display_id': 'crossing-field',
'title': 'Crossing Field',
'track': 'Crossing Field',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['Anime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135',
'info_dict': {
'ext': 'mp4',
'id': 'MC2E2AC135',
'display_id': 'live-is-smile-always-364joker-at-yokohama-arena',
'title': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'track': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'description': 'md5:747444e7e6300907b7a43f0a0503072e',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79/egaono-hana',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135/live-is-smile-always-364joker-at-yokohama-arena',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C/crossing-field',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id, object_type = self._match_valid_url(url).group('lang', 'id', 'type')
path, name = {
'concert': ('concerts', 'concert info'),
'musicvideo': ('music_videos', 'music video info'),
}[object_type]
response = traverse_obj(self._call_api(f'{path}/{internal_id}', internal_id, lang, name), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
result = self._transform_music_response(response)
if not self._IS_PREMIUM and response.get('isPremiumOnly'):
message = f'This {response.get("type") or "media"} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], _ = self._extract_stream(f'music/{internal_id}', internal_id)
return result
@staticmethod
def _transform_music_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'display_id': 'slug',
'title': 'title',
'track': 'title',
'artists': ('artist', 'name', all),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n') or None}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollArtistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:artist'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
artist/(?P<id>\w{10})'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/artist/MA179CB50D',
'info_dict': {
'id': 'MA179CB50D',
'title': 'LiSA',
'genres': ['Anime', 'J-Pop', 'Rock'],
'description': 'md5:16d87de61a55c3f7d6c454b73285938e',
},
'playlist_mincount': 83,
}, {
'url': 'https://www.crunchyroll.com/artist/MA179CB50D/lisa',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
response = traverse_obj(self._call_api(
f'artists/{internal_id}', internal_id, lang, 'artist info'), ('data', 0))
def entries():
for attribute, path in [('concerts', 'concert'), ('videos', 'musicvideo')]:
for internal_id in traverse_obj(response, (attribute, ...)):
yield self.url_result(f'{self._BASE_URL}/watch/{path}/{internal_id}', CrunchyrollMusicIE, internal_id)
return self.playlist_result(entries(), **self._transform_artist_response(response))
@staticmethod
def _transform_artist_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'title': 'name',
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
}),
}

View file

@ -1,35 +1,40 @@
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
parse_qs,
smuggle_url,
str_or_none,
update_url_query,
)
from ..utils.traversal import traverse_obj
class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
IE_NAME = 'cwtv'
_VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch|guid)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{
'url': 'https://www.cwtv.com/shows/all-american-homecoming/ready-or-not/?play=d848488f-f62a-40fd-af1f-6440b1821aab',
'url': 'https://www.cwtv.com/shows/continuum/a-stitch-in-time/?play=9149a1e1-4cb2-46d7-81b2-47d35bbd332b',
'info_dict': {
'id': 'd848488f-f62a-40fd-af1f-6440b1821aab',
'id': '9149a1e1-4cb2-46d7-81b2-47d35bbd332b',
'ext': 'mp4',
'title': 'Ready Or Not',
'description': 'Simone is concerned about changes taking place at Bringston; JR makes a decision about his future.',
'thumbnail': r're:^https?://.*\.jpe?g$',
'duration': 2547,
'timestamp': 1720519200,
'title': 'A Stitch in Time',
'description': r're:(?s)City Protective Services officer Kiera Cameron is transported from 2077.+',
'thumbnail': r're:https?://.+\.jpe?g',
'duration': 2632,
'timestamp': 1736928000,
'uploader': 'CWTV',
'chapters': 'count:6',
'series': 'All American: Homecoming',
'season_number': 3,
'chapters': 'count:5',
'series': 'Continuum',
'season_number': 1,
'episode_number': 1,
'age_limit': 0,
'upload_date': '20240709',
'season': 'Season 3',
'age_limit': 14,
'upload_date': '20250115',
'season': 'Season 1',
'episode': 'Episode 1',
},
'params': {
@ -42,7 +47,7 @@ class CWTVIE(InfoExtractor):
'id': '6b15e985-9345-4f60-baf8-56e96be57c63',
'ext': 'mp4',
'title': 'Legends of Yesterday',
'description': 'Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote location to keep them hidden from Vandal Savage while they figure out how to defeat him.',
'description': r're:(?s)Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote.+',
'duration': 2665,
'series': 'Arrow',
'season_number': 4,
@ -71,7 +76,7 @@ class CWTVIE(InfoExtractor):
'timestamp': 1444107300,
'age_limit': 14,
'uploader': 'CWTV',
'thumbnail': r're:^https?://.*\.jpe?g$',
'thumbnail': r're:https?://.+\.jpe?g',
'chapters': 'count:4',
'episode': 'Episode 20',
'season': 'Season 11',
@ -89,14 +94,17 @@ class CWTVIE(InfoExtractor):
}, {
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?watch=6b15e985-9345-4f60-baf8-56e96be57c63',
'only_matching': True,
}, {
'url': 'http://www.cwtv.com/movies/play/?guid=0a8e8b5b-1356-41d5-9a6a-4eda1a6feb6c',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
f'https://images.cwtv.com/feed/mobileapp/video-meta/apiversion_12/guid_{video_id}', video_id)
if data.get('result') != 'ok':
raise ExtractorError(data['msg'], expected=True)
f'https://images.cwtv.com/feed/app-2/video-meta/apiversion_22/device_android/guid_{video_id}', video_id)
if traverse_obj(data, 'result') != 'ok':
raise ExtractorError(traverse_obj(data, (('error_msg', 'msg'), {str}, any)), expected=True)
video_data = data['video']
title = video_data['title']
mpx_url = update_url_query(
@ -123,3 +131,50 @@ def _real_extract(self, url):
'ie_key': 'ThePlatform',
'thumbnail': video_data.get('large_thumbnail'),
}
class CWTVMovieIE(InfoExtractor):
IE_NAME = 'cwtv:movie'
_VALID_URL = r'https?://(?:www\.)?cwtv\.com/shows/(?P<id>[\w-]+)/?\?(?:[^#]+&)?viewContext=Movies'
_TESTS = [{
'url': 'https://www.cwtv.com/shows/the-crush/?viewContext=Movies+Swimlane',
'info_dict': {
'id': '0a8e8b5b-1356-41d5-9a6a-4eda1a6feb6c',
'ext': 'mp4',
'title': 'The Crush',
'upload_date': '20241112',
'description': 'md5:1549acd90dff4a8273acd7284458363e',
'chapters': 'count:9',
'timestamp': 1731398400,
'age_limit': 16,
'duration': 5337,
'series': 'The Crush',
'season': 'Season 1',
'uploader': 'CWTV',
'season_number': 1,
'episode': 'Episode 1',
'episode_number': 1,
'thumbnail': r're:https?://.+\.jpe?g',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
_UUID_RE = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
app_url = (
self._html_search_meta('al:ios:url', webpage, default=None)
or self._html_search_meta('al:android:url', webpage, default=None))
video_id = (
traverse_obj(parse_qs(app_url), ('video_id', 0, {lambda x: re.fullmatch(self._UUID_RE, x)}, 0))
or self._search_regex([
rf'CWTV\.Site\.curPlayingGUID\s*=\s*["\']({self._UUID_RE})',
rf'CWTV\.Site\.viewInAppURL\s*=\s*["\']/shows/[\w-]+/watch-in-app/\?play=({self._UUID_RE})',
], webpage, 'video ID'))
return self.url_result(
f'https://www.cwtv.com/shows/{display_id}/{display_id}/?play={video_id}', CWTVIE, video_id)

View file

@ -0,0 +1,130 @@
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import clean_html, int_or_none, traverse_obj, url_or_none, urlencode_postdata
class DigiviewIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ladigitale\.dev/digiview/#/v/(?P<id>[0-9a-f]+)'
_TESTS = [{
# normal video
'url': 'https://ladigitale.dev/digiview/#/v/67a8e50aee2ec',
'info_dict': {
'id': '67a8e50aee2ec',
'ext': 'mp4',
'title': 'Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 635,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 635,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 0,
'availability': 'public',
},
}, {
# cut video
'url': 'https://ladigitale.dev/digiview/#/v/67a8e51d0dd58',
'info_dict': {
'id': '67a8e51d0dd58',
'ext': 'mp4',
'title': 'Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 5,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 10,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 5,
'availability': 'public',
},
}, {
# changed title
'url': 'https://ladigitale.dev/digiview/#/v/67a8ea5644d7a',
'info_dict': {
'id': '67a8ea5644d7a',
'ext': 'mp4',
'title': 'Big Buck Bunny (with title changed)',
'thumbnail': 'https://i.ytimg.com/vi/aqz-KE-bpKQ/hqdefault.jpg',
'upload_date': '20141110',
'playable_in_embed': True,
'duration': 5,
'view_count': int,
'comment_count': int,
'channel': 'Blender',
'license': 'Creative Commons Attribution license (reuse allowed)',
'like_count': int,
'tags': 'count:8',
'live_status': 'not_live',
'channel_id': 'UCSMOQeBJ2RAnuFungnQOxLg',
'channel_follower_count': int,
'channel_url': 'https://www.youtube.com/channel/UCSMOQeBJ2RAnuFungnQOxLg',
'uploader_id': '@BlenderOfficial',
'description': 'md5:8f3ed18a53a1bb36cbb3b70a15782fd0',
'categories': ['Film & Animation'],
'channel_is_verified': True,
'heatmap': 'count:100',
'section_end': 15,
'uploader': 'Blender',
'timestamp': 1415628355,
'uploader_url': 'https://www.youtube.com/@BlenderOfficial',
'age_limit': 0,
'section_start': 10,
'availability': 'public',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'https://ladigitale.dev/digiview/inc/recuperer_video.php', video_id,
data=urlencode_postdata({'id': video_id}))
clip_id = video_data['videoId']
return self.url_result(
f'https://www.youtube.com/watch?v={clip_id}',
YoutubeIE, video_id, url_transparent=True,
**traverse_obj(video_data, {
'section_start': ('debut', {int_or_none}),
'section_end': ('fin', {int_or_none}),
'description': ('description', {clean_html}, filter),
'title': ('titre', {str}),
'thumbnail': ('vignette', {url_or_none}),
'view_count': ('vues', {int_or_none}),
}),
)

View file

@ -1,10 +1,24 @@
from .zdf import ZDFIE
from .zdf import ZDFBaseIE
class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
class DreiSatIE(ZDFBaseIE):
IE_NAME = '3sat'
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
'url': 'https://www.3sat.de/dokumentation/reise/traumziele-suedostasiens-die-philippinen-und-vietnam-102.html',
'info_dict': {
'id': '231124_traumziele_philippinen_und_vietnam_dokreise',
'ext': 'mp4',
'title': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'description': 'md5:26329ce5197775b596773b939354079d',
'duration': 2625.0,
'thumbnail': 'https://www.3sat.de/assets/traumziele-suedostasiens-die-philippinen-und-vietnam-100~2400x1350?cb=1699870351148',
'episode': 'Traumziele Südostasiens (1/2): Die Philippinen und Vietnam',
'episode_id': 'POS_cc7ff51c-98cf-4d12-b99d-f7a551de1c95',
'timestamp': 1738593000,
'upload_date': '20250203',
},
}, {
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
@ -17,6 +31,7 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'timestamp': 1608604200,
'upload_date': '20201222',
},
'skip': '410 Gone',
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
@ -30,6 +45,7 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'params': {
'skip_download': True,
},
'skip': '404 Not Found',
}, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
@ -39,3 +55,14 @@ class DreiSatIE(ZDFIE): # XXX: Do not subclass from concrete IE
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, fatal=False)
if webpage:
player = self._extract_player(webpage, url, fatal=False)
if player:
return self._extract_regular(url, player, video_id)
return self._extract_mobile(video_id)

View file

@ -82,7 +82,7 @@ def _real_extract(self, url):
has_anonymous_download = self._search_regex(
r'(anonymous:\tanonymous)', part, 'anonymous', default=False)
transcode_url = self._search_regex(
r'\n.(https://[^\x03\x08\x12\n]+\.m3u8)', part, 'transcode url', default=None)
r'\n.?(https://[^\x03\x08\x12\n]+\.m3u8)', part, 'transcode url', default=None)
if not transcode_url:
continue
formats, subtitles = self._extract_m3u8_formats_and_subtitles(transcode_url, video_id, 'mp4')

View file

@ -135,7 +135,7 @@ def _real_extract(self, url):
self.raise_login_required(method='any')
raise ExtractorError(login_err, expected=True)
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
embed_url = self._html_search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
thumbnail = self._og_search_thumbnail(webpage)
watch_info = get_element_by_id('watch-info', webpage) or ''

View file

@ -0,0 +1,51 @@
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import url_or_none
from ..utils.traversal import traverse_obj
class DrTalksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?drtalks\.com/videos/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://drtalks.com/videos/six-pillars-of-resilience-tools-for-managing-stress-and-flourishing/',
'info_dict': {
'id': '6366193757112',
'ext': 'mp4',
'uploader_id': '6314452011001',
'tags': ['resilience'],
'description': 'md5:9c6805aee237ee6de8052461855b9dda',
'timestamp': 1734546659,
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/12/Episode-82-Eva-Selhub-DrTalks-Thumbs.jpg',
'title': 'Six Pillars of Resilience: Tools for Managing Stress and Flourishing',
'duration': 2800.682,
'upload_date': '20241218',
},
}, {
'url': 'https://drtalks.com/videos/the-pcos-puzzle-mastering-metabolic-health-with-marcelle-pick/',
'info_dict': {
'id': '6364699891112',
'ext': 'mp4',
'title': 'The PCOS Puzzle: Mastering Metabolic Health with Marcelle Pick',
'description': 'md5:e87cbe00ca50135d5702787fc4043aaa',
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/11/Episode-34-Marcelle-Pick-OBGYN-NP-DrTalks.jpg',
'duration': 3515.2,
'tags': ['pcos'],
'upload_date': '20241114',
'timestamp': 1731592119,
'uploader_id': '6314452011001',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
next_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['data']['video']
return self.url_result(
next_data['videos']['brightcoveVideoLink'], BrightcoveNewIE, video_id,
url_transparent=True,
**traverse_obj(next_data, {
'title': ('title', {str}),
'description': ('videos', 'summury', {str}),
'thumbnail': ('featuredImage', 'node', 'sourceUrl', {url_or_none}),
}))

View file

@ -162,7 +162,7 @@ def _real_extract(self, url):
items = re.findall(r'(?s)playlist\.push\(({.+?})\);', webpage)
if items:
return self.playlist_result(
[self._parse_video_metadata(i, video_id, timestamp) for i in items],
(self._parse_video_metadata(i, video_id, timestamp) for i in items),
video_id, self._html_search_meta('twitter:title', webpage))
item = self._search_regex(

155
yt_dlp/extractor/eggs.py Normal file
View file

@ -0,0 +1,155 @@
import secrets
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
int_or_none,
parse_iso8601,
str_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class EggsBaseIE(InfoExtractor):
_API_HEADERS = {
'Accept': '*/*',
'apVersion': '8.2.00',
'deviceName': 'Android',
}
def _real_initialize(self):
self._API_HEADERS['deviceId'] = secrets.token_hex(8)
def _call_api(self, endpoint, video_id):
return self._download_json(
f'https://app-front-api.eggs.mu/v1/{endpoint}', video_id,
headers=self._API_HEADERS)
def _extract_music_info(self, data):
if yt_url := traverse_obj(data, ('youtubeUrl', {url_or_none})):
return self.url_result(yt_url, ie=YoutubeIE)
artist_name = traverse_obj(data, ('artist', 'artistName', {str_or_none}))
music_id = traverse_obj(data, ('musicId', {str_or_none}))
webpage_url = None
if artist_name and music_id:
webpage_url = f'https://eggs.mu/artist/{artist_name}/song/{music_id}'
return {
'id': music_id,
'vcodec': 'none',
'webpage_url': webpage_url,
'extractor_key': EggsIE.ie_key(),
'extractor': EggsIE.IE_NAME,
**traverse_obj(data, {
'title': ('musicTitle', {str}),
'url': ('musicDataPath', {url_or_none}),
'uploader': ('artist', 'displayName', {str}),
'uploader_id': ('artist', 'artistId', {str_or_none}),
'thumbnail': ('imageDataPath', {url_or_none}),
'view_count': ('numberOfMusicPlays', {int_or_none}),
'like_count': ('numberOfLikes', {int_or_none}),
'comment_count': ('numberOfComments', {int_or_none}),
'composers': ('composer', {str}, all),
'tags': ('tags', ..., {str}),
'timestamp': ('releaseDate', {parse_iso8601}),
'artist': ('artist', 'displayName', {str}),
})}
class EggsIE(EggsBaseIE):
IE_NAME = 'eggs:single'
_VALID_URL = r'https?://eggs\.mu/artist/[^/?#]+/song/(?P<id>[\da-f-]+)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl/song/0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'info_dict': {
'id': '0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'ext': 'm4a',
'title': 'シネマと信号',
'uploader': 'Sunny Girl',
'thumbnail': r're:https?://.*\.jpg(?:\?.*)?$',
'uploader_id': '1607',
'like_count': int,
'timestamp': 1731327327,
'composers': ['橘高連太郎'],
'view_count': int,
'comment_count': int,
'artists': ['Sunny Girl'],
'upload_date': '20241111',
'tags': ['SunnyGirl', 'シネマと信号'],
},
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband/song/1d4bc45f-1af6-47a9-8b30-a70cae350b4f',
'info_dict': {
'id': '80cLKA2wnoA',
'ext': 'mp4',
'title': 'KAMO「いい女だから」Audio',
'uploader': 'KAMO',
'live_status': 'not_live',
'channel_id': 'UCsHLBw2__5Q9y55skXPotOg',
'channel_follower_count': int,
'description': 'md5:d260da711ecbec3e720293dc11401b87',
'availability': 'public',
'uploader_id': '@KAMO_band',
'upload_date': '20240925',
'thumbnail': 'https://i.ytimg.com/vi/80cLKA2wnoA/maxresdefault.jpg',
'comment_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsHLBw2__5Q9y55skXPotOg',
'view_count': int,
'duration': 151,
'like_count': int,
'channel': 'KAMO',
'playable_in_embed': True,
'uploader_url': 'https://www.youtube.com/@KAMO_band',
'tags': [],
'timestamp': 1727271121,
'age_limit': 0,
'categories': ['People & Blogs'],
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
song_id = self._match_id(url)
json_data = self._call_api(f'musics/{song_id}', song_id)
return self._extract_music_info(json_data)
class EggsArtistIE(EggsBaseIE):
IE_NAME = 'eggs:artist'
_VALID_URL = r'https?://eggs\.mu/artist/(?P<id>\w+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl',
'info_dict': {
'id': '32_sunny_girl',
'thumbnail': 'https://image-pro.eggs.mu/profile/1607.jpeg?updated_at=2024-04-03T20%3A06%3A00%2B09%3A00',
'description': 'Muddy Mine / 東京高田馬場CLUB PHASE / Gt.Vo 橘高 連太郎 / Ba.Cho 小野 ゆうき / Dr 大森 りゅうひこ',
'title': 'Sunny Girl',
},
'playlist_mincount': 18,
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband',
'info_dict': {
'id': 'KAMO_3pband',
'description': '川崎発3ピースバンド',
'thumbnail': 'https://image-pro.eggs.mu/profile/35217.jpeg?updated_at=2024-11-27T16%3A31%3A50%2B09%3A00',
'title': 'KAMO',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
artist_id = self._match_id(url)
artist_data = self._call_api(f'artists/{artist_id}', artist_id)
song_data = self._call_api(f'artists/{artist_id}/musics', artist_id)
return self.playlist_result(
traverse_obj(song_data, ('data', ..., {dict}, {self._extract_music_info})),
playlist_id=artist_id, **traverse_obj(artist_data, {
'title': ('displayName', {str}),
'description': ('profile', {str}),
'thumbnail': ('imageDataPath', {url_or_none}),
}))

View file

@ -12,7 +12,7 @@
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:sport)?1tv\.ru/(?:[^/?#]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# single format
@ -52,6 +52,9 @@ class FirstTVIE(InfoExtractor):
}, {
'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016',
'only_matching': True,
}, {
'url': 'https://www.sport1tv.ru/sport/chempionat-rossii-po-figurnomu-kataniyu-2025',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -1,3 +1,4 @@
import json
import re
import urllib.parse
@ -5,8 +6,10 @@
from .dailymotion import DailymotionIE
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
extract_attributes,
filter_dict,
format_field,
int_or_none,
@ -16,7 +19,7 @@
unsmuggle_url,
url_or_none,
)
from ..utils.traversal import traverse_obj
from ..utils.traversal import find_element, traverse_obj
class FranceTVBaseInfoExtractor(InfoExtractor):
@ -29,6 +32,7 @@ def _make_url_result(self, video_id, url=None):
class FranceTVIE(InfoExtractor):
IE_NAME = 'francetv'
_VALID_URL = r'francetv:(?P<id>[^@#]+)'
_GEO_COUNTRIES = ['FR']
_GEO_BYPASS = False
@ -248,18 +252,19 @@ def _real_extract(self, url):
class FranceTVSiteIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetv:site'
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
'info_dict': {
'id': 'c5bda21d-2c6f-4470-8849-3d8327adb2ba',
'id': 'ec217ecc-0733-48cf-ac06-af1347b849d1', # old: c5bda21d-2c6f-4470-8849-3d8327adb2ba'
'ext': 'mp4',
'title': '13h15, le dimanche... - Les mystères de Jésus',
'timestamp': 1514118300,
'duration': 2880,
'timestamp': 1502623500,
'duration': 2580,
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20171224',
'upload_date': '20170813',
},
'params': {
'skip_download': True,
@ -282,6 +287,7 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1441,
},
'skip': 'No longer available',
}, {
# geo-restricted livestream (workflow == 'token-akamai')
'url': 'https://www.france.tv/france-4/direct.html',
@ -336,19 +342,33 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
'only_matching': True,
}]
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:data-main-video\s*=|videoId["\']?\s*[:=])\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'video id', default=None, group='id')
nextjs_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json}, ..., 'children', ..., ..., 'children', ..., ..., 'children'))
if traverse_obj(nextjs_data, (..., ..., 'children', ..., 'isLive', {bool}, any)):
# For livestreams we need the id of the stream instead of the currently airing episode id
video_id = traverse_obj(nextjs_data, (
..., ..., 'children', ..., 'children', ..., 'children', ..., 'children', ..., ...,
'children', ..., ..., 'children', ..., ..., 'children', (..., (..., ...)),
'options', 'id', {str}, any))
else:
video_id = traverse_obj(nextjs_data, (
..., ..., ..., 'children',
lambda _, v: v['video']['url'] == urllib.parse.urlparse(url).path,
'video', ('playerReplayId', 'siId'), {str}, any))
if not video_id:
video_id = self._html_search_regex(
r'(?:href=|player\.setVideo\(\s*)"http://videos?\.francetv\.fr/video/([^@"]+@[^"]+)"',
webpage, 'video ID')
raise ExtractorError('Unable to extract video ID')
return self._make_url_result(video_id, url=url)
@ -441,11 +461,16 @@ def _real_extract(self, url):
self.url_result(dailymotion_url, DailymotionIE.ie_key())
for dailymotion_url in dailymotion_urls])
video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
video_id = (
traverse_obj(webpage, (
{find_element(tag='button', attr='data-cy', value='francetv-player-wrapper', html=True)},
{extract_attributes}, 'id'))
or self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
)
return self._make_url_result(video_id, url=url)

View file

@ -1,349 +0,0 @@
import random
import re
import string
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
join_nonempty,
js_to_json,
make_archive_id,
orderedSet,
qualities,
str_or_none,
traverse_obj,
try_get,
urlencode_postdata,
)
class FunimationBaseIE(InfoExtractor):
_NETRC_MACHINE = 'funimation'
_REGION = None
_TOKEN = None
def _get_region(self):
region_cookie = self._get_cookies('https://www.funimation.com').get('region')
region = region_cookie.value if region_cookie else self.get_param('geo_bypass_country')
return region or traverse_obj(
self._download_json(
'https://geo-service.prd.funimationsvc.com/geo/v1/region/check', None, fatal=False,
note='Checking geo-location', errnote='Unable to fetch geo-location information'),
'region') or 'US'
def _perform_login(self, username, password):
if self._TOKEN:
return
try:
data = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/auth/login/',
None, 'Logging in', data=urlencode_postdata({
'username': username,
'password': password,
}))
FunimationBaseIE._TOKEN = data['token']
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
error = self._parse_json(e.cause.response.read().decode(), None)['error']
raise ExtractorError(error, expected=True)
raise
class FunimationPageIE(FunimationBaseIE):
IE_NAME = 'funimation:page'
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:(?P<lang>[^/]+)/)?(?:shows|v)/(?P<show>[^/]+)/(?P<episode>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funimation.com/shows/attack-on-titan-junior-high/broadcast-dub-preview/',
'info_dict': {
'id': '210050',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
# Other metadata is tested in FunimationIE
},
'params': {
'skip_download': 'm3u8',
},
'add_ie': ['Funimation'],
}, {
# Not available in US
'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}, {
'url': 'https://www.funimation.com/v/a-certain-scientific-railgun/super-powered-level-5',
'only_matching': True,
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
locale, show, episode = self._match_valid_url(url).group('lang', 'show', 'episode')
video_id = traverse_obj(self._download_json(
f'https://title-api.prd.funimationsvc.com/v1/shows/{show}/episodes/{episode}',
f'{show}_{episode}', query={
'deviceType': 'web',
'region': self._REGION,
'locale': locale or 'en',
}), ('videoList', ..., 'id'), get_all=False)
return self.url_result(f'https://www.funimation.com/player/{video_id}', FunimationIE.ie_key(), video_id)
class FunimationIE(FunimationBaseIE):
_VALID_URL = r'https?://(?:www\.)?funimation\.com/player/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210050',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
},
}, {
'note': 'player_id should be extracted with the relevent compat-opt',
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210051',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
'compat_opts': ['seperate-video-versions'],
},
}]
@staticmethod
def _get_experiences(episode):
for lang, lang_data in episode.get('languages', {}).items():
for video_data in lang_data.values():
for version, f in video_data.items():
yield lang, version.title(), f
def _get_episode(self, webpage, experience_id=None, episode_id=None, fatal=True):
""" Extract the episode, season and show objects given either episode/experience id """
show = self._parse_json(
self._search_regex(
r'show\s*=\s*({.+?})\s*;', webpage, 'show data', fatal=fatal),
experience_id, transform_source=js_to_json, fatal=fatal) or []
for season in show.get('seasons', []):
for episode in season.get('episodes', []):
if episode_id is not None:
if str(episode.get('episodePk')) == episode_id:
return episode, season, show
continue
for _, _, f in self._get_experiences(episode):
if f.get('experienceId') == experience_id:
return episode, season, show
if fatal:
raise ExtractorError('Unable to find episode information')
else:
self.report_warning('Unable to find episode information')
return {}, {}, {}
def _real_extract(self, url):
initial_experience_id = self._match_id(url)
webpage = self._download_webpage(
url, initial_experience_id, note=f'Downloading player webpage for {initial_experience_id}')
episode, season, show = self._get_episode(webpage, experience_id=int(initial_experience_id))
episode_id = str(episode['episodePk'])
display_id = episode.get('slug') or episode_id
formats, subtitles, thumbnails, duration = [], {}, [], 0
requested_languages, requested_versions = self._configuration_arg('language'), self._configuration_arg('version')
language_preference = qualities((requested_languages or [''])[::-1])
source_preference = qualities((requested_versions or ['uncut', 'simulcast'])[::-1])
only_initial_experience = 'seperate-video-versions' in self.get_param('compat_opts', [])
for lang, version, fmt in self._get_experiences(episode):
experience_id = str(fmt['experienceId'])
if ((only_initial_experience and experience_id != initial_experience_id)
or (requested_languages and lang.lower() not in requested_languages)
or (requested_versions and version.lower() not in requested_versions)):
continue
thumbnails.append({'url': fmt.get('poster')})
duration = max(duration, fmt.get('duration', 0))
format_name = f'{version} {lang} ({experience_id})'
self.extract_subtitles(
subtitles, experience_id, display_id=display_id, format_name=format_name,
episode=episode if experience_id == initial_experience_id else episode_id)
headers = {}
if self._TOKEN:
headers['Authorization'] = f'Token {self._TOKEN}'
page = self._download_json(
f'https://www.funimation.com/api/showexperience/{experience_id}/',
display_id, headers=headers, expected_status=403, query={
'pinst_id': ''.join(random.choices(string.digits + string.ascii_letters, k=8)),
}, note=f'Downloading {format_name} JSON')
sources = page.get('items') or []
if not sources:
error = try_get(page, lambda x: x['errors'][0], dict)
if error:
self.report_warning('{} said: Error {} - {}'.format(
self.IE_NAME, error.get('code'), error.get('detail') or error.get('title')))
else:
self.report_warning('No sources found for format')
current_formats = []
for source in sources:
source_url = source.get('src')
source_type = source.get('videoType') or determine_ext(source_url)
if source_type == 'm3u8':
current_formats.extend(self._extract_m3u8_formats(
source_url, display_id, 'mp4', m3u8_id='{}-{}'.format(experience_id, 'hls'), fatal=False,
note=f'Downloading {format_name} m3u8 information'))
else:
current_formats.append({
'format_id': f'{experience_id}-{source_type}',
'url': source_url,
})
for f in current_formats:
# TODO: Convert language to code
f.update({
'language': lang,
'format_note': version,
'source_preference': source_preference(version.lower()),
'language_preference': language_preference(lang.lower()),
})
formats.extend(current_formats)
if not formats and (requested_languages or requested_versions):
self.raise_no_formats(
'There are no video formats matching the requested languages/versions', expected=True, video_id=display_id)
self._remove_duplicate_formats(formats)
return {
'id': episode_id,
'_old_archive_ids': [make_archive_id(self, initial_experience_id)],
'display_id': display_id,
'duration': duration,
'title': episode['episodeTitle'],
'description': episode.get('episodeSummary'),
'episode': episode.get('episodeTitle'),
'episode_number': int_or_none(episode.get('episodeId')),
'episode_id': episode_id,
'season': season.get('seasonTitle'),
'season_number': int_or_none(season.get('seasonId')),
'season_id': str_or_none(season.get('seasonPk')),
'series': show.get('showTitle'),
'formats': formats,
'thumbnails': thumbnails,
'subtitles': subtitles,
'_format_sort_fields': ('lang', 'source'),
}
def _get_subtitles(self, subtitles, experience_id, episode, display_id, format_name):
if isinstance(episode, str):
webpage = self._download_webpage(
f'https://www.funimation.com/player/{experience_id}/', display_id,
fatal=False, note=f'Downloading player webpage for {format_name}')
episode, _, _ = self._get_episode(webpage, episode_id=episode, fatal=False)
for _, version, f in self._get_experiences(episode):
for source in f.get('sources'):
for text_track in source.get('textTracks'):
if not text_track.get('src'):
continue
sub_type = text_track.get('type').upper()
sub_type = sub_type if sub_type != 'FULL' else None
current_sub = {
'url': text_track['src'],
'name': join_nonempty(version, text_track.get('label'), sub_type, delim=' '),
}
lang = join_nonempty(text_track.get('language', 'und'),
version if version != 'Simulcast' else None,
sub_type, delim='_')
if current_sub not in subtitles.get(lang, []):
subtitles.setdefault(lang, []).append(current_sub)
return subtitles
class FunimationShowIE(FunimationBaseIE):
IE_NAME = 'funimation:show'
_VALID_URL = r'(?P<url>https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?P<locale>[^/]+)?/?shows/(?P<id>[^/?#&]+))/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.funimation.com/en/shows/sk8-the-infinity',
'info_dict': {
'id': '1315000',
'title': 'SK8 the Infinity',
},
'playlist_count': 13,
'params': {
'skip_download': True,
},
}, {
# without lang code
'url': 'https://www.funimation.com/shows/ouran-high-school-host-club/',
'info_dict': {
'id': '39643',
'title': 'Ouran High School Host Club',
},
'playlist_count': 26,
'params': {
'skip_download': True,
},
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
base_url, locale, display_id = self._match_valid_url(url).groups()
show_info = self._download_json(
'https://title-api.prd.funimationsvc.com/v2/shows/{}?region={}&deviceType=web&locale={}'.format(
display_id, self._REGION, locale or 'en'), display_id)
items_info = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/funimation/episodes/?limit=99999&title_id={}'.format(
show_info.get('id')), display_id)
vod_items = traverse_obj(items_info, ('items', ..., lambda k, _: re.match(r'(?i)mostRecent[AS]vod', k), 'item'))
return {
'_type': 'playlist',
'id': str_or_none(show_info['id']),
'title': show_info['name'],
'entries': orderedSet(
self.url_result(
'{}/{}'.format(base_url, vod_item.get('episodeSlug')), FunimationPageIE.ie_key(),
vod_item.get('episodeId'), vod_item.get('episodeName'))
for vod_item in sorted(vod_items, key=lambda x: x.get('episodeOrder', -1))),
}

View file

@ -293,6 +293,19 @@ class GenericIE(InfoExtractor):
'timestamp': 1378272859.0,
},
},
# Live DASH MPD
{
'url': 'https://livesim2.dashif.org/livesim2/ato_10/testpic_2s/Manifest.mpd',
'info_dict': {
'id': 'Manifest',
'ext': 'mp4',
'title': r're:Manifest \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'live_status': 'is_live',
},
'params': {
'skip_download': 'livestream',
},
},
# m3u8 served with Content-Type: audio/x-mpegURL; charset=utf-8
{
'url': 'http://once.unicornmedia.com/now/master/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/content.m3u8',
@ -2436,10 +2449,9 @@ def _real_extract(self, url):
subtitles = {}
if format_id.endswith('mpegurl') or ext == 'm3u8':
formats, subtitles = self._extract_m3u8_formats_and_subtitles(url, video_id, 'mp4', headers=headers)
elif format_id.endswith(('mpd', 'dash+xml')) or ext == 'mpd':
formats, subtitles = self._extract_mpd_formats_and_subtitles(url, video_id, headers=headers)
elif format_id == 'f4m' or ext == 'f4m':
formats = self._extract_f4m_formats(url, video_id, headers=headers)
# Don't check for DASH/mpd here, do it later w/ first_bytes. Same number of requests either way
else:
formats = [{
'format_id': format_id,
@ -2521,6 +2533,7 @@ def _real_extract(self, url):
doc,
mpd_base_url=full_response.url.rpartition('/')[0],
mpd_url=url)
info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None
self._extra_manifest_info(info_dict, url)
self.report_detected('DASH manifest')
return info_dict

View file

@ -1,32 +1,48 @@
import base64
import hashlib
import json
import random
import re
import uuid
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
determine_ext,
filter_dict,
float_or_none,
int_or_none,
orderedSet,
str_or_none,
try_get,
url_or_none,
)
from ..utils.traversal import subs_list_to_dict, traverse_obj
class GloboIE(InfoExtractor):
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_VALID_URL = r'(?:globo:|https?://[^/?#]+?\.globo\.com/(?:[^/?#]+/))(?P<id>\d{7,})'
_NETRC_MACHINE = 'globo'
_VIDEO_VIEW = '''
query getVideoView($videoId: ID!) {
video(id: $videoId) {
duration
description
relatedEpisodeNumber
relatedSeasonNumber
headline
title {
originProgramId
headline
}
}
}
'''
_TESTS = [{
'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
'url': 'https://globoplay.globo.com/v/3607726/',
'info_dict': {
'id': '3607726',
'ext': 'mp4',
'title': 'Mercedes-Benz GLA passa por teste de colisão na Europa',
'duration': 103.204,
'uploader': 'G1',
'uploader_id': '2015',
'uploader': 'G1 ao vivo',
'uploader_id': '4209',
},
'params': {
'skip_download': True,
@ -38,39 +54,36 @@ class GloboIE(InfoExtractor):
'ext': 'mp4',
'title': 'Acidentes de trânsito estão entre as maiores causas de queda de energia em SP',
'duration': 137.973,
'uploader': 'Rede Globo',
'uploader_id': '196',
'uploader': 'Bom Dia Brasil',
'uploader_id': '810',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://canalbrasil.globo.com/programas/sangue-latino/videos/3928201.html',
'only_matching': True,
}, {
'url': 'http://globosatplay.globo.com/globonews/v/4472924/',
'only_matching': True,
}, {
'url': 'http://globotv.globo.com/t/programa/v/clipe-sexo-e-as-negas-adeus/3836166/',
'only_matching': True,
}, {
'url': 'http://globotv.globo.com/canal-brasil/sangue-latino/t/todos-os-videos/v/ator-e-diretor-argentino-ricado-darin-fala-sobre-utopias-e-suas-perdas/3928201/',
'only_matching': True,
}, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}, {
'url': 'https://globoplay.globo.com/v/10248083/',
},
{
'url': 'globo:8013907', # needs subscription to globoplay
'info_dict': {
'id': '10248083',
'id': '8013907',
'ext': 'mp4',
'title': 'Melhores momentos: Equador 1 x 1 Brasil pelas Eliminatórias da Copa do Mundo 2022',
'duration': 530.964,
'uploader': 'SporTV',
'uploader_id': '698',
'title': 'Capítulo de 14081989',
'episode_number': 1,
},
'params': {
'skip_download': True,
},
},
{
'url': 'globo:12824146',
'info_dict': {
'id': '12824146',
'ext': 'mp4',
'title': 'Acordo de damas',
'episode_number': 1,
'season_number': 2,
},
'params': {
'skip_download': True,
@ -80,98 +93,70 @@ class GloboIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
self._request_webpage(
HEADRequest('https://globo-ab.globo.com/v2/selected-alternatives?experiments=player-isolated-experiment-02&skipImpressions=true'),
video_id, 'Getting cookies')
video = self._download_json(
f'http://api.globovideos.com/videos/{video_id}/playlist',
video_id)['videos'][0]
if not self.get_param('allow_unplayable_formats') and video.get('encrypted') is True:
self.report_drm(video_id)
title = video['title']
info = self._download_json(
'https://cloud-jarvis.globo.com/graphql', video_id,
query={
'operationName': 'getVideoView',
'variables': json.dumps({'videoId': video_id}),
'query': self._VIDEO_VIEW,
}, headers={
'content-type': 'application/json',
'x-platform-id': 'web',
'x-device-id': 'desktop',
'x-client-version': '2024.12-5',
})['data']['video']
formats = []
security = self._download_json(
'https://playback.video.globo.com/v2/video-session', video_id, f'Downloading security hash for {video_id}',
headers={'content-type': 'application/json'}, data=json.dumps({
'player_type': 'desktop',
video = self._download_json(
'https://playback.video.globo.com/v4/video-session', video_id,
f'Downloading resource info for {video_id}',
headers={'Content-Type': 'application/json'},
data=json.dumps(filter_dict({
'player_type': 'mirakulo_8k_hdr',
'video_id': video_id,
'quality': 'max',
'content_protection': 'widevine',
'vsid': '581b986b-4c40-71f0-5a58-803e579d5fa2',
'tz': '-3.0:00',
}).encode())
'vsid': f'{uuid.uuid4()}',
'consumption': 'streaming',
'capabilities': {'low_latency': True},
'tz': '-03:00',
'Authorization': try_get(self._get_cookies('https://globo.com'),
lambda x: f'Bearer {x["GLBID"].value}'),
'version': 1,
})).encode())
self._request_webpage(HEADRequest(security['sources'][0]['url_template']), video_id, 'Getting locksession cookie')
if traverse_obj(video, ('resource', 'drm_protection_enabled', {bool})):
self.report_drm(video_id)
security_hash = security['sources'][0]['token']
if not security_hash:
message = security.get('message')
if message:
raise ExtractorError(
f'{self.IE_NAME} returned error: {message}', expected=True)
main_source = video['sources'][0]
hash_code = security_hash[:2]
padding = '%010d' % random.randint(1, 10000000000)
if hash_code in ('04', '14'):
received_time = security_hash[3:13]
received_md5 = security_hash[24:]
hash_prefix = security_hash[:23]
elif hash_code in ('02', '12', '03', '13'):
received_time = security_hash[2:12]
received_md5 = security_hash[22:]
padding += '1'
hash_prefix = '05' + security_hash[:22]
padded_sign_time = str(int(received_time) + 86400) + padding
md5_data = (received_md5 + padded_sign_time + '0xAC10FD').encode()
signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_prefix + padded_sign_time + signed_md5
source = security['sources'][0]['url_parts']
resource_url = source['scheme'] + '://' + source['domain'] + source['path']
signed_url = '{}?h={}&k=html5&a={}'.format(resource_url, signed_hash, 'F' if video.get('subscriber_only') else 'A')
fmts, subtitles = self._extract_m3u8_formats_and_subtitles(
signed_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
formats.extend(fmts)
for resource in video['resources']:
if resource.get('type') == 'subtitle':
subtitles.setdefault(resource.get('language') or 'por', []).append({
'url': resource.get('url'),
})
subs = try_get(security, lambda x: x['source']['subtitles'], expected_type=dict) or {}
for sub_lang, sub_url in subs.items():
if sub_url:
subtitles.setdefault(sub_lang or 'por', []).append({
'url': sub_url,
})
subs = try_get(security, lambda x: x['source']['subtitles_webvtt'], expected_type=dict) or {}
for sub_lang, sub_url in subs.items():
if sub_url:
subtitles.setdefault(sub_lang or 'por', []).append({
'url': sub_url,
})
duration = float_or_none(video.get('duration'), 1000)
uploader = video.get('channel')
uploader_id = str_or_none(video.get('channel_id'))
# 4k streams are exclusively outputted in dash, so we need to filter these out
if determine_ext(main_source['url']) == 'mpd':
formats, subtitles = self._extract_mpd_formats_and_subtitles(main_source['url'], video_id, mpd_id='dash')
else:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
main_source['url'], video_id, 'mp4', m3u8_id='hls')
self._merge_subtitles(traverse_obj(main_source, ('text', ..., {
'url': ('subtitle', 'srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='en')})), target=subtitles)
return {
'id': video_id,
'title': title,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
**traverse_obj(info, {
'title': ('headline', {str}),
'duration': ('duration', {float_or_none(scale=1000)}),
'uploader': ('title', 'headline', {str}),
'uploader_id': ('title', 'originProgramId', {str_or_none}),
'episode_number': ('relatedEpisodeNumber', {int_or_none}),
'season_number': ('relatedSeasonNumber', {int_or_none}),
}),
'formats': formats,
'subtitles': subtitles,
}
class GloboArticleIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VALID_URL = r'https?://(?!globoplay).+?\.globo\.com/(?:[^/?#]+/)*(?P<id>[^/?#.]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})["\']',

View file

@ -1,40 +1,48 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
str_or_none,
traverse_obj,
url_or_none,
)
class GoodGameIE(InfoExtractor):
IE_NAME = 'goodgame:stream'
_VALID_URL = r'https?://goodgame\.ru/channel/(?P<id>\w+)'
_VALID_URL = r'https?://goodgame\.ru/(?!channel/)(?P<id>[\w.*-]+)'
_TESTS = [{
'url': 'https://goodgame.ru/channel/Pomi/#autoplay',
'url': 'https://goodgame.ru/TGW#autoplay',
'info_dict': {
'id': 'pomi',
'id': '7998',
'ext': 'mp4',
'title': r're:Reynor vs Special \(1/2,bo3\) Wardi Spring EU \- playoff \(финальный день\) \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'channel_id': '1644',
'channel': 'Pomi',
'channel_url': 'https://goodgame.ru/channel/Pomi/',
'description': 'md5:4a87b775ee7b2b57bdccebe285bbe171',
'thumbnail': r're:^https?://.*\.jpg$',
'channel_id': '7998',
'title': r're:шоуматч Happy \(NE\) vs Fortitude \(UD\), потом ладдер и дс \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'channel_url': 'https://goodgame.ru/TGW',
'thumbnail': 'https://hls.goodgame.ru/previews/7998_240.jpg',
'uploader': 'TGW',
'channel': 'JosephStalin',
'live_status': 'is_live',
'view_count': int,
'age_limit': 18,
'channel_follower_count': int,
'uploader_id': '2899',
'concurrent_view_count': int,
},
'params': {'skip_download': 'm3u8'},
'skip': 'May not be online',
}, {
'url': 'https://goodgame.ru/Mr.Gray',
'only_matching': True,
}, {
'url': 'https://goodgame.ru/HeDoPa3yMeHue*',
'only_matching': True,
}]
def _real_extract(self, url):
channel_name = self._match_id(url)
response = self._download_json(f'https://api2.goodgame.ru/v2/streams/{channel_name}', channel_name)
player_id = response['channel']['gg_player_src']
response = self._download_json(f'https://goodgame.ru/api/4/users/{channel_name}/stream', channel_name)
player_id = response['streamkey']
formats, subtitles = [], {}
if response.get('status') == 'Live':
if response.get('status'):
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://hls.goodgame.ru/manifest/{player_id}_master.m3u8',
channel_name, 'mp4', live=True)
@ -45,13 +53,17 @@ def _real_extract(self, url):
'id': player_id,
'formats': formats,
'subtitles': subtitles,
'title': traverse_obj(response, ('channel', 'title')),
'channel': channel_name,
'channel_id': str_or_none(traverse_obj(response, ('channel', 'id'))),
'channel_url': response.get('url'),
'description': clean_html(traverse_obj(response, ('channel', 'description'))),
'thumbnail': traverse_obj(response, ('channel', 'thumb')),
'is_live': bool(formats),
'view_count': int_or_none(response.get('viewers')),
'age_limit': 18 if traverse_obj(response, ('channel', 'adult')) else None,
**traverse_obj(response, {
'title': ('title', {str}),
'channel': ('channelkey', {str}),
'channel_id': ('id', {str_or_none}),
'channel_url': ('link', {url_or_none}),
'uploader': ('streamer', 'username', {str}),
'uploader_id': ('streamer', 'id', {str_or_none}),
'thumbnail': ('preview', {url_or_none}, {self._proto_relative_url}),
'concurrent_view_count': ('viewers', {int_or_none}),
'channel_follower_count': ('followers', {int_or_none}),
'age_limit': ('adult', {bool}, {lambda x: 18 if x else None}),
}),
}

View file

@ -12,7 +12,6 @@
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
remove_end,
traverse_obj,
)
@ -76,6 +75,7 @@ def _real_initialize(self):
if not self._id_token:
raise self.raise_login_required(method='password')
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
@ -86,9 +86,10 @@ def _real_extract(self, url):
nextjs_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {js_to_json}, {json.loads}, ..., {self._find_json}, ...))
(..., {json.loads}, ..., {self._find_json}, ...))
meta = traverse_obj(nextjs_data, (
..., lambda _, v: v['meta']['path'] == urllib.parse.urlparse(url).path, 'meta', any))
..., ..., 'children', ..., ..., 'children',
lambda _, v: v['video']['path'] == urllib.parse.urlparse(url).path, 'video', any))
video_id = meta['uuid']
info_dict = traverse_obj(meta, {

View file

@ -39,7 +39,7 @@ def _parse_episode(self, episode):
'description': ('body', {clean_html}),
'thumbnail': ('largeThumbnail', {url_or_none}),
'duration': ('length', {int_or_none}),
'date': ('dateSegments', 'published', {unified_strdate}),
'upload_date': ('dateSegments', 'published', {unified_strdate}),
}))
@ -54,7 +54,7 @@ class LaracastsIE(LaracastsBaseIE):
'title': 'Hello, Laravel',
'ext': 'mp4',
'duration': 519,
'date': '20240312',
'upload_date': '20240312',
'thumbnail': 'https://laracasts.s3.amazonaws.com/videos/thumbnails/youtube/30-days-to-learn-laravel-11-1.png',
'description': 'md5:ddd658bb241975871d236555657e1dd1',
'season_number': 1,

View file

@ -310,7 +310,13 @@ def _real_extract(self, url):
if stream_type in self._SUPPORTED_STREAM_TYPES:
claim_id, is_live = result['claim_id'], False
streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
'get', claim_id, {
'uri': uri,
**traverse_obj(parse_qs(url), {
'signature': ('signature', 0),
'signature_ts': ('signature_ts', 0),
}),
}, 'streaming url')['streaming_url']
# GET request to v3 API returns original video/audio file if available
direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url)

View file

@ -72,6 +72,7 @@ def extract_formats(streams, stream_type, query={}):
'abr': int_or_none(bitrate.get('audio')),
'filesize': int_or_none(stream.get('size')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
'extra_param_to_segment_url': urllib.parse.urlencode(query, doseq=True) if stream_type == 'HLS' else None,
})
extract_formats(get_list('video'), 'H264')
@ -168,6 +169,26 @@ class NaverIE(NaverBaseIE):
'duration': 277,
'thumbnail': r're:^https?://.*\.jpg',
},
}, {
'url': 'https://tv.naver.com/v/67838091',
'md5': '126ea384ab033bca59672c12cca7a6be',
'info_dict': {
'id': '67838091',
'ext': 'mp4',
'title': '[라인W 날씨] 내일 아침 서울 체감 -19도…호남·충남 대설',
'description': 'md5:fe026e25634c85845698aed4b59db5a7',
'timestamp': 1736347853,
'upload_date': '20250108',
'uploader': 'KBS뉴스',
'uploader_id': 'kbsnews',
'uploader_url': 'https://tv.naver.com/kbsnews',
'view_count': int,
'like_count': int,
'comment_count': int,
'duration': 69,
'thumbnail': r're:^https?://.*\.jpg',
},
'params': {'format': 'HLS_144P'},
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,

117
yt_dlp/extractor/nest.py Normal file
View file

@ -0,0 +1,117 @@
from .common import InfoExtractor
from ..utils import ExtractorError, float_or_none, update_url_query, url_or_none
from ..utils.traversal import traverse_obj
class NestIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?live/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/embedded/live/4fvYdSo8AX?autoplay=0',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://video.nest.com/live/4fvYdSo8AX',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.pacificblue.biz/noyo-harbor-webcam/',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_json(
'https://video.nest.com/api/dropcam/cameras.get_by_public_token',
video_id, query={'token': video_id})['items'][0]
uuid = item.get('uuid')
stream_domain = item.get('live_stream_host')
if not stream_domain or not uuid:
raise ExtractorError('Unable to construct playlist URL')
thumb_domain = item.get('nexus_api_nest_domain_host')
return {
'id': video_id,
**traverse_obj(item, {
'description': ('description', {str}),
'title': (('title', 'name', 'where'), {str}, filter, any),
'alt_title': ('name', {str}),
'location': ((('timezone', {lambda x: x.split('/')[1].replace('_', ' ')}), 'where'), {str}, filter, any),
}),
'thumbnail': update_url_query(
f'https://{thumb_domain}/get_image',
{'uuid': uuid, 'public': video_id}) if thumb_domain else None,
'availability': self._availability(is_private=item.get('is_public') is False),
'formats': self._extract_m3u8_formats(
f'https://{stream_domain}/nexus_aac/{uuid}/playlist.m3u8',
video_id, 'mp4', live=True, query={'public': video_id}),
'is_live': True,
}
class NestClipIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?clip/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/clip/f34c9dd237a44eca9a0001af685e3dff',
'info_dict': {
'id': 'f34c9dd237a44eca9a0001af685e3dff',
'ext': 'mp4',
'title': 'NestClip video #f34c9dd237a44eca9a0001af685e3dff',
'thumbnail': 'https://clips.dropcam.com/f34c9dd237a44eca9a0001af685e3dff.jpg',
'timestamp': 1735413474.468,
'upload_date': '20241228',
},
}, {
'url': 'https://video.nest.com/embedded/clip/34e0432adc3c46a98529443d8ad5aa76',
'info_dict': {
'id': '34e0432adc3c46a98529443d8ad5aa76',
'ext': 'mp4',
'title': 'Shootout at Veterans Boulevard at Fleur De Lis Drive',
'thumbnail': 'https://clips.dropcam.com/34e0432adc3c46a98529443d8ad5aa76.jpg',
'upload_date': '20230817',
'timestamp': 1692262897.191,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://video.nest.com/api/dropcam/videos.get_by_filename', video_id,
query={'filename': f'{video_id}.mp4'})
return {
'id': video_id,
**traverse_obj(data, ('items', 0, {
'title': ('title', {str}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'url': ('download_url', {url_or_none}),
'timestamp': ('start_time', {float_or_none}),
})),
}

View file

@ -13,11 +13,13 @@
ExtractorError,
OnDemandPagedList,
clean_html,
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
parse_duration,
parse_iso8601,
parse_qs,
parse_resolution,
qualities,
remove_start,
@ -592,8 +594,8 @@ def _call_api(self, list_id, resource, query):
@staticmethod
def _parse_owner(item):
return {
'uploader': traverse_obj(item, ('owner', 'name')),
'uploader_id': traverse_obj(item, ('owner', 'id')),
'uploader': traverse_obj(item, ('owner', ('name', ('user', 'nickname')), {str}, any)),
'uploader_id': traverse_obj(item, ('owner', 'id', {str})),
}
def _fetch_page(self, list_id, page):
@ -666,7 +668,7 @@ def _real_extract(self, url):
mylist.get('name'), mylist.get('description'), **self._parse_owner(mylist))
class NiconicoSeriesIE(InfoExtractor):
class NiconicoSeriesIE(NiconicoPlaylistBaseIE):
IE_NAME = 'niconico:series'
_VALID_URL = r'https?://(?:(?:www\.|sp\.)?nicovideo\.jp(?:/user/\d+)?|nico\.ms)/series/(?P<id>\d+)'
@ -675,6 +677,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': {
'id': '110226',
'title': 'ご立派ァ!のシリーズ',
'description': '楽しそうな外人の吹き替えをさせたら終身名誉ホモガキの右に出る人はいませんね…',
'uploader': 'アルファるふぁ',
'uploader_id': '44113208',
},
'playlist_mincount': 10,
}, {
@ -682,6 +687,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': {
'id': '12312',
'title': 'バトルスピリッツ お勧めカード紹介(調整中)',
'description': '',
'uploader': '野鳥',
'uploader_id': '2275360',
},
'playlist_mincount': 103,
}, {
@ -689,19 +697,21 @@ class NiconicoSeriesIE(InfoExtractor):
'only_matching': True,
}]
def _call_api(self, list_id, resource, query):
return self._download_json(
f'https://nvapi.nicovideo.jp/v2/series/{list_id}', list_id,
f'Downloading {resource}', query=query,
headers=self._API_HEADERS)['data']
def _real_extract(self, url):
list_id = self._match_id(url)
webpage = self._download_webpage(url, list_id)
series = self._call_api(list_id, 'list', {
'pageSize': 1,
})['detail']
title = self._search_regex(
(r'<title>「(.+)(全',
r'<div class="TwitterShareButton"\s+data-text="(.+)\s+https:'),
webpage, 'title', fatal=False)
if title:
title = unescapeHTML(title)
json_data = next(self._yield_json_ld(webpage, None, fatal=False))
return self.playlist_from_matches(
traverse_obj(json_data, ('itemListElement', ..., 'url')), list_id, title, ie=NiconicoIE)
return self.playlist_result(
self._entries(list_id), list_id,
series.get('title'), series.get('description'), **self._parse_owner(series))
class NiconicoHistoryIE(NiconicoPlaylistBaseIE):
@ -1025,6 +1035,7 @@ def _real_extract(self, url):
thumbnails.append({
'id': f'{name}_{width}x{height}',
'url': img_url,
'ext': traverse_obj(parse_qs(img_url), ('image', 0, {determine_ext(default_ext='jpg')})),
**res,
})

View file

@ -12,6 +12,7 @@
parse_iso8601,
str_or_none,
try_get,
update_url_query,
url_or_none,
urljoin,
)
@ -27,6 +28,12 @@ class NRKBaseIE(InfoExtractor):
)/'''
def _extract_nrk_formats(self, asset_url, video_id):
asset_url = update_url_query(asset_url, {
# Remove 'adap' to return all streams (known values are: small, large, small_h265, large_h265)
'adap': [],
# Disable subtitles since they are fetched separately
's': 0,
})
if re.match(r'https?://[^/]+\.akamaihd\.net/i/', asset_url):
return self._extract_akamai_formats(asset_url, video_id)
asset_url = re.sub(r'(?:bw_(?:low|high)=\d+|no_audio_only)&?', '', asset_url)
@ -58,7 +65,10 @@ def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None
return self._download_json(
urljoin('https://psapi.nrk.no/', path),
video_id, note or f'Downloading {item} JSON',
fatal=fatal, query=query)
fatal=fatal, query=query, headers={
# Needed for working stream URLs, see https://github.com/yt-dlp/yt-dlp/issues/12192
'Accept': 'application/vnd.nrk.psapi+json; version=9; player=tv-player; device=player-core',
})
class NRKIE(NRKBaseIE):
@ -77,13 +87,17 @@ class NRKIE(NRKBaseIE):
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
'md5': 'f46be075326e23ad0e524edfcb06aeb6',
'md5': '2b88a652ad2e275591e61cf550887eec',
'info_dict': {
'id': '150533',
'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 262,
'upload_date': '20140325',
'thumbnail': r're:^https?://gfx\.nrk\.no/.*$',
'timestamp': 1395751833,
'alt_title': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
},
}, {
# audio
@ -95,6 +109,10 @@ class NRKIE(NRKBaseIE):
'title': 'Slik høres internett ut når du er blind',
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20,
'timestamp': 1398429565,
'alt_title': 'Cathrine Lie Wathne er blind, og bruker hurtigtaster for å navigere seg rundt på ulike nettsider.',
'thumbnail': 'https://gfx.nrk.no/urxQMSXF-WnbfjBH5ke2igLGyN27EdJVWZ6FOsEAclhA',
'upload_date': '20140425',
},
}, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
@ -152,7 +170,7 @@ def call_playback_api(item, query=None):
return self._call_api(f'playback/{item}/{video_id}', video_id, item, query=query)
raise
# known values for preferredCdn: akamai, iponly, minicdn and telenor
# known values for preferredCdn: akamai, globalconnect and telenor
manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'})
video_id = try_get(manifest, lambda x: x['id'], str) or video_id
@ -307,6 +325,13 @@ class NRKTVIE(InfoExtractor):
'ext': 'vtt',
}],
},
'upload_date': '20170627',
'timestamp': 1498591822,
'thumbnail': 'https://gfx.nrk.no/myRSc4vuFlahB60P3n6swwRTQUZI1LqJZl9B7icZFgzA',
'alt_title': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
@ -321,6 +346,13 @@ class NRKTVIE(InfoExtractor):
'series': '20 spørsmål',
'episode': '23. mai 2014',
'age_limit': 0,
'timestamp': 1584593700,
'thumbnail': 'https://gfx.nrk.no/u7uCe79SEfPVGRAGVp2_uAZnNc4mfz_kjXg6Bgek8lMQ',
'season_id': '126936',
'upload_date': '20200319',
'season': 'Season 2014',
'season_number': 2014,
'episode_number': 3,
},
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',

View file

@ -343,7 +343,7 @@ def _real_extract(self, url):
if media_ids:
media_ids.append(lead_video_id)
return self.playlist_result(
[self._extract_video(media_id) for media_id in media_ids], page_id, title, description)
map(self._extract_video, media_ids), page_id, title, description)
return {
**self._extract_video(lead_video_id),

View file

@ -63,6 +63,7 @@ class PatreonIE(PatreonBaseIE):
'info_dict': {
'id': '743933',
'ext': 'mp3',
'alt_title': 'cd166.mp3',
'title': 'Episode 166: David Smalley of Dogma Debate',
'description': 'md5:34d207dd29aa90e24f1b3f58841b81c7',
'uploader': 'Cognitive Dissonance Podcast',
@ -280,7 +281,7 @@ def _real_extract(self, url):
video_id = self._match_id(url)
post = self._call_api(
f'posts/{video_id}', video_id, query={
'fields[media]': 'download_url,mimetype,size_bytes',
'fields[media]': 'download_url,mimetype,size_bytes,file_name',
'fields[post]': 'comment_count,content,embed,image,like_count,post_file,published_at,title,current_user_can_view',
'fields[user]': 'full_name,url',
'fields[post_tag]': 'value',
@ -317,6 +318,7 @@ def _real_extract(self, url):
'ext': ext,
'filesize': size_bytes,
'url': download_url,
'alt_title': traverse_obj(media_attributes, ('file_name', {str})),
})
elif include_type == 'user':
@ -457,7 +459,7 @@ class PatreonCampaignIE(PatreonBaseIE):
_VALID_URL = r'''(?x)
https?://(?:www\.)?patreon\.com/(?:
(?:m|api/campaigns)/(?P<campaign_id>\d+)|
(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
(?:c/)?(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
)(?:/posts)?/?(?:$|[?#])'''
_TESTS = [{
'url': 'https://www.patreon.com/dissonancepod/',
@ -509,6 +511,26 @@ class PatreonCampaignIE(PatreonBaseIE):
'thumbnail': r're:^https?://.*$',
},
'playlist_mincount': 201,
}, {
'url': 'https://www.patreon.com/c/OgSog',
'info_dict': {
'id': '8504388',
'title': 'OGSoG',
'description': r're:(?s)Hello and welcome to our Patreon page. We are Mari, Lasercorn, .+',
'channel': 'OGSoG',
'channel_id': '8504388',
'channel_url': 'https://www.patreon.com/OgSog',
'uploader_url': 'https://www.patreon.com/OgSog',
'uploader_id': '72323575',
'uploader': 'David Moss',
'thumbnail': r're:https?://.+/.+',
'channel_follower_count': int,
'age_limit': 0,
},
'playlist_mincount': 331,
}, {
'url': 'https://www.patreon.com/c/OgSog/posts',
'only_matching': True,
}, {
'url': 'https://www.patreon.com/dissonancepod/posts',
'only_matching': True,

View file

@ -47,7 +47,7 @@ class PBSIE(InfoExtractor):
(r'video\.kpbs\.org', 'KPBS San Diego (KPBS)'), # http://www.kpbs.org/
(r'video\.kqed\.org', 'KQED (KQED)'), # http://www.kqed.org
(r'vids\.kvie\.org', 'KVIE Public Television (KVIE)'), # http://www.kvie.org
(r'video\.pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/
(r'(?:video\.|www\.)pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/
(r'video\.valleypbs\.org', 'ValleyPBS (KVPT)'), # http://www.valleypbs.org/
(r'video\.cptv\.org', 'CONNECTICUT PUBLIC TELEVISION (WEDH)'), # http://cptv.org
(r'watch\.knpb\.org', 'KNPB Channel 5 (KNPB)'), # http://www.knpb.org/
@ -61,7 +61,7 @@ class PBSIE(InfoExtractor):
(r'video\.wyomingpbs\.org', 'Wyoming PBS (KCWC)'), # http://www.wyomingpbs.org
(r'video\.cpt12\.org', 'Colorado Public Television / KBDI 12 (KBDI)'), # http://www.cpt12.org/
(r'video\.kbyueleven\.org', 'KBYU-TV (KBYU)'), # http://www.kbyutv.org/
(r'video\.thirteen\.org', 'Thirteen/WNET New York (WNET)'), # http://www.thirteen.org
(r'(?:video\.|www\.)thirteen\.org', 'Thirteen/WNET New York (WNET)'), # http://www.thirteen.org
(r'video\.wgbh\.org', 'WGBH/Channel 2 (WGBH)'), # http://wgbh.org
(r'video\.wgby\.org', 'WGBY (WGBY)'), # http://www.wgby.org
(r'watch\.njtvonline\.org', 'NJTV Public Media NJ (WNJT)'), # http://www.njtvonline.org/
@ -185,12 +185,13 @@ class PBSIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://
(?:
# Direct video URL
(?:{})/(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/]|$) |
# Article with embedded player (or direct video)
(?:www\.)?pbs\.org/(?:[^/]+/){{1,5}}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
# Player
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)
# Player
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/?#]+) |
# Direct video URL, or article with embedded player
(?:{})/(?:
(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/#]|$) |
(?:[^/?#]+/){{1,5}}(?P<presumptive_id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])
)
)
'''.format('|'.join(next(zip(*_STATIONS))))
@ -207,16 +208,40 @@ class PBSIE(InfoExtractor):
'description': 'md5:31b664af3c65fd07fa460d306b837d00',
'duration': 3190,
},
'skip': 'dead URL',
},
{
'url': 'https://www.thirteen.org/programs/the-woodwrights-shop/carving-away-with-mary-may-tioglz/',
'info_dict': {
'id': '3004803331',
'ext': 'mp4',
'title': "The Woodwright's Shop - Carving Away with Mary May",
'description': 'md5:7cbaaaa8b9bcc78bd8f0e31911644e28',
'duration': 1606,
'display_id': 'carving-away-with-mary-may-tioglz',
'chapters': [],
'thumbnail': 'https://image.pbs.org/video-assets/NcnTxNl-asset-mezzanine-16x9-K0Keoyv.jpg',
},
},
{
'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/',
'md5': '6f722cb3c3982186d34b0f13374499c7',
'md5': '372b12b670070de39438b946474df92f',
'info_dict': {
'id': '2365297690',
'ext': 'mp4',
'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:5979a4d069b157f622d02bff62fbe654',
'duration': 5050,
'chapters': [
{'start_time': 0.0, 'end_time': 1234.0, 'title': 'After Saddam, Chaos'},
{'start_time': 1233.0, 'end_time': 1719.0, 'title': 'The Insurgency Takes Root'},
{'start_time': 1718.0, 'end_time': 2461.0, 'title': 'A Light Footprint'},
{'start_time': 2460.0, 'end_time': 3589.0, 'title': 'The Surge '},
{'start_time': 3588.0, 'end_time': 4355.0, 'title': 'The Withdrawal '},
{'start_time': 4354.0, 'end_time': 5051.0, 'title': 'ISIS on the March '},
],
'display_id': 'losing-iraq',
'thumbnail': 'https://image.pbs.org/video-assets/pbs/frontline/138098/images/mezzanine_401.jpg',
},
},
{
@ -403,6 +428,19 @@ class PBSIE(InfoExtractor):
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
},
{
'url': 'https://www.pbssocal.org/shows/newshour/clip/capehart-johnson-1715984001',
'info_dict': {
'id': '3091549094',
'ext': 'mp4',
'title': 'PBS NewsHour - Capehart and Johnson on the unusual Biden-Trump debate plans',
'description': 'Capehart and Johnson on how the Biden-Trump debates could shape the campaign season',
'display_id': 'capehart-johnson-1715984001',
'duration': 593,
'thumbnail': 'https://image.pbs.org/video-assets/mF3oSVn-asset-mezzanine-16x9-QeXjXPy.jpg',
'chapters': [],
},
},
{
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
'only_matching': True,
@ -463,10 +501,12 @@ def _extract_webpage(self, url):
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
r'class="coveplayerid">([^<]+)<', # coveplayer
r'<section[^>]+data-coveid="(\d+)"', # coveplayer from http://www.pbs.org/wgbh/frontline/film/real-csi/
r'\sclass="passportcoveplayer"[^>]*\sdata-media="(\d+)', # https://www.thirteen.org/programs/the-woodwrights-shop/who-wrote-the-book-of-sloyd-fggvvq/
r'<input type="hidden" id="pbs_video_id_[0-9]+" value="([0-9]+)"/>', # jwplayer
r"(?s)window\.PBS\.playerConfig\s*=\s*{.*?id\s*:\s*'([0-9]+)',",
r'<div[^>]+\bdata-cove-id=["\'](\d+)"', # http://www.pbs.org/wgbh/roadshow/watch/episode/2105-indianapolis-hour-2/
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//video\.pbs\.org/widget/partnerplayer/(\d+)', # https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/
r'\bhttps?://player\.pbs\.org/[\w-]+player/(\d+)', # last pattern to avoid false positives
]
media_id = self._search_regex(

View file

@ -0,0 +1,99 @@
from .common import InfoExtractor
from ..utils import parse_iso8601, smuggle_url, unsmuggle_url, url_or_none
from ..utils.traversal import traverse_obj
class PiramideTVIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/video/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/video/wWtBAORdJUTh',
'info_dict': {
'id': 'wWtBAORdJUTh',
'ext': 'mp4',
'title': 'md5:79f9c8183ea6a35c836923142cf0abcc',
'description': '',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/W86PgQDn/thumbnails/B9gpIxkH.jpg',
'channel': 'León Picarón',
'channel_id': 'leonpicaron',
'timestamp': 1696460362,
'upload_date': '20231004',
},
}, {
'url': 'https://piramide.tv/video/wcYn6li79NgN',
'info_dict': {
'id': 'wcYn6li79NgN',
'ext': 'mp4',
'title': 'ACEPTO TENER UN BEBE CON MI NOVIA\u2026? | Parte 1',
'description': '',
'channel': 'ARTA GAME',
'channel_id': 'arta_game',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/cnEdGp5X/thumbnails/rHAaWfP7.jpg',
'timestamp': 1703434976,
'upload_date': '20231224',
},
}]
def _extract_video(self, video_id):
video_data = self._download_json(
f'https://hermes.piramide.tv/video/data/{video_id}', video_id, fatal=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://cdn.piramide.tv/video/{video_id}/manifest.m3u8', video_id, fatal=False)
next_video = traverse_obj(video_data, ('video', 'next_video', 'id', {str}))
return next_video, {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_data, ('video', {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('media', 'thumbnail', {url_or_none}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'id', {str}),
'timestamp': ('date', {parse_iso8601}),
})),
}
def _entries(self, video_id):
visited = set()
while True:
visited.add(video_id)
next_video, info = self._extract_video(video_id)
yield info
if not next_video or next_video in visited:
break
video_id = next_video
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
if self._yes_playlist(video_id, video_id, smuggled_data):
return self.playlist_result(self._entries(video_id), video_id)
return self._extract_video(video_id)[1]
class PiramideTVChannelIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/channel/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/channel/thekalo',
'playlist_mincount': 10,
'info_dict': {
'id': 'thekalo',
},
}]
def _entries(self, channel_name):
videos = self._download_json(
f'https://hermes.piramide.tv/channel/list/{channel_name}/date/100000', channel_name)
for video in traverse_obj(videos, ('videos', lambda _, v: v['id'])):
yield self.url_result(smuggle_url(
f'https://piramide.tv/video/{video["id"]}', {'force_noplaylist': True}),
**traverse_obj(video, {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
}))
def _real_extract(self, url):
channel_name = self._match_id(url)
return self.playlist_result(self._entries(channel_name), channel_name)

130
yt_dlp/extractor/plvideo.py Normal file
View file

@ -0,0 +1,130 @@
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
parse_resolution,
url_or_none,
)
from ..utils.traversal import traverse_obj
class PlVideoIE(InfoExtractor):
IE_DESC = 'Платформа'
_VALID_URL = r'https?://(?:www\.)?plvideo\.ru/(?:watch\?(?:[^#]+&)?v=|shorts/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://plvideo.ru/watch?v=Y5JzUzkcQTMK',
'md5': 'fe8e18aca892b3b31f3bf492169f8a26',
'info_dict': {
'id': 'Y5JzUzkcQTMK',
'ext': 'mp4',
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-images/v/cover/37/dd/37dd00a4c96c77436ab737e85947abd7/original663a4a3bb713e5.33151959.jpg',
'title': 'Presidente de Cuba llega a Moscú en una visita de trabajo',
'channel': 'RT en Español',
'channel_id': 'ZH4EKqunVDvo',
'media_type': 'video',
'comment_count': int,
'tags': ['rusia', 'cuba', 'russia', 'miguel díaz-canel'],
'description': 'md5:a1a395d900d77a86542a91ee0826c115',
'release_timestamp': 1715096124,
'channel_is_verified': True,
'like_count': int,
'timestamp': 1715095911,
'duration': 44320,
'view_count': int,
'dislike_count': int,
'upload_date': '20240507',
'modified_date': '20240701',
'channel_follower_count': int,
'modified_timestamp': 1719824073,
},
}, {
'url': 'https://plvideo.ru/shorts/S3Uo9c-VLwFX',
'md5': '7d8fa2279406c69d2fd2a6fc548a9805',
'info_dict': {
'id': 'S3Uo9c-VLwFX',
'ext': 'mp4',
'channel': 'Romaatom',
'tags': 'count:22',
'dislike_count': int,
'upload_date': '20241130',
'description': 'md5:452e6de219bf2f32bb95806c51c3b364',
'duration': 58433,
'modified_date': '20241130',
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-11-cover/S3Uo9c-VLwFX/f9318999-a941-482b-b700-2102a7049366.jpg',
'media_type': 'shorts',
'like_count': int,
'modified_timestamp': 1732961458,
'channel_is_verified': True,
'channel_id': 'erJyyTIbmUd1',
'timestamp': 1732961355,
'comment_count': int,
'title': 'Белоусов отменил приказы о кадровом резерве на гражданской службе',
'channel_follower_count': int,
'view_count': int,
'release_timestamp': 1732961458,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
f'https://api.g1.plvideo.ru/v1/videos/{video_id}?Aud=18', video_id)
is_live = False
formats = []
subtitles = {}
automatic_captions = {}
for quality, data in traverse_obj(video_data, ('item', 'profiles', {dict.items}, lambda _, v: url_or_none(v[1]['hls']))):
formats.append({
'format_id': quality,
'ext': 'mp4',
'protocol': 'm3u8_native',
**traverse_obj(data, {
'url': 'hls',
'fps': ('fps', {float_or_none}),
'aspect_ratio': ('aspectRatio', {float_or_none}),
}),
**parse_resolution(quality),
})
if livestream_url := traverse_obj(video_data, ('item', 'livestream', 'url', {url_or_none})):
is_live = True
formats.extend(self._extract_m3u8_formats(livestream_url, video_id, 'mp4', live=True))
for lang, url in traverse_obj(video_data, ('item', 'subtitles', {dict.items}, lambda _, v: url_or_none(v[1]))):
if lang.endswith('-auto'):
automatic_captions.setdefault(lang[:-5], []).append({
'url': url,
})
else:
subtitles.setdefault(lang, []).append({
'url': url,
})
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'automatic_captions': automatic_captions,
'is_live': is_live,
**traverse_obj(video_data, ('item', {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('cover', 'paths', 'original', 'src', {url_or_none}),
'duration': ('uploadFile', 'videoDuration', {int_or_none}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'id', {str}),
'channel_follower_count': ('channel', 'stats', 'subscribers', {int_or_none}),
'channel_is_verified': ('channel', 'verified', {bool}),
'tags': ('tags', ..., {str}),
'timestamp': ('createdAt', {parse_iso8601}),
'release_timestamp': ('publishedAt', {parse_iso8601}),
'modified_timestamp': ('updatedAt', {parse_iso8601}),
'view_count': ('stats', 'viewTotalCount', {int_or_none}),
'like_count': ('stats', 'likeCount', {int_or_none}),
'dislike_count': ('stats', 'dislikeCount', {int_or_none}),
'comment_count': ('stats', 'commentCount', {int_or_none}),
'media_type': ('type', {str}),
})),
}

View file

@ -198,6 +198,25 @@ class RedditIE(InfoExtractor):
'skip_download': True,
'writesubtitles': True,
},
}, {
# "gated" subreddit post
'url': 'https://old.reddit.com/r/ketamine/comments/degtjo/when_the_k_hits/',
'info_dict': {
'id': 'gqsbxts133r31',
'ext': 'mp4',
'display_id': 'degtjo',
'title': 'When the K hits',
'uploader': '[deleted]',
'channel_id': 'ketamine',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 18,
'duration': 34,
'thumbnail': r're:https?://.+/.+\.(?:jpg|png)',
'timestamp': 1570438713.0,
'upload_date': '20191007',
},
}, {
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj',
'only_matching': True,
@ -245,6 +264,15 @@ def _perform_login(self, username, password):
elif not traverse_obj(login, ('json', 'data', 'cookie', {str})):
raise ExtractorError('Unable to login, no cookie was returned')
def _real_initialize(self):
# Set cookie to opt-in to age-restricted subreddits
self._set_cookie('reddit.com', 'over18', '1')
# Set cookie to opt-in to "gated" subreddits
options = traverse_obj(self._get_cookies('https://www.reddit.com/'), (
'_options', 'value', {urllib.parse.unquote}, {json.loads}, {dict})) or {}
options['pref_gated_sr_optin'] = True
self._set_cookie('reddit.com', '_options', urllib.parse.quote(json.dumps(options)))
def _get_subtitles(self, video_id):
# Fallback if there were no subtitles provided by DASH or HLS manifests
caption_url = f'https://v.redd.it/{video_id}/wh_ben_en.vtt'

View file

@ -114,7 +114,7 @@ def _paged_entries(self, ep, item_id, query, fields):
class RedGifsIE(RedGifsBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/watch/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent',
'info_dict': {
@ -147,6 +147,22 @@ class RedGifsIE(RedGifsBaseInfoExtractor):
'age_limit': 18,
'tags': list,
},
}, {
'url': 'https://www.redgifs.com/ifr/squeakyhelplesswisent',
'info_dict': {
'id': 'squeakyhelplesswisent',
'ext': 'mp4',
'title': 'Hotwife Legs Thick',
'timestamp': 1636287915,
'upload_date': '20211107',
'uploader': 'ignored52',
'duration': 16,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
'tags': list,
},
}]
def _real_extract(self, url):

View file

@ -176,6 +176,8 @@ class RTVSLOShowIE(InfoExtractor):
'info_dict': {
'id': '173250997',
'title': 'Ekipa Bled',
'description': 'md5:c88471e27a1268c448747a5325319ab7',
'thumbnail': 'https://img.rtvcdn.si/_up/ava/ava_misc/show_logos/173250997/logo_wide1.jpg',
},
'playlist_count': 18,
}]
@ -187,4 +189,7 @@ def _real_extract(self, url):
return self.playlist_from_matches(
re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage),
playlist_id, self._html_extract_title(webpage),
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE)
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE,
description=self._og_search_description(webpage),
thumbnail=self._og_search_thumbnail(webpage),
)

View file

@ -4,43 +4,12 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_qs,
unsmuggle_url,
UnsupportedError,
make_archive_id,
remove_end,
url_or_none,
)
_COMMITTEES = {
'ag': ('76440', 'http://ag-f.akamaihd.net'),
'aging': ('76442', 'http://aging-f.akamaihd.net'),
'approps': ('76441', 'http://approps-f.akamaihd.net'),
'arch': ('', 'http://ussenate-f.akamaihd.net'),
'armed': ('76445', 'http://armed-f.akamaihd.net'),
'banking': ('76446', 'http://banking-f.akamaihd.net'),
'budget': ('76447', 'http://budget-f.akamaihd.net'),
'cecc': ('76486', 'http://srs-f.akamaihd.net'),
'commerce': ('80177', 'http://commerce1-f.akamaihd.net'),
'csce': ('75229', 'http://srs-f.akamaihd.net'),
'dpc': ('76590', 'http://dpc-f.akamaihd.net'),
'energy': ('76448', 'http://energy-f.akamaihd.net'),
'epw': ('76478', 'http://epw-f.akamaihd.net'),
'ethics': ('76449', 'http://ethics-f.akamaihd.net'),
'finance': ('76450', 'http://finance-f.akamaihd.net'),
'foreign': ('76451', 'http://foreign-f.akamaihd.net'),
'govtaff': ('76453', 'http://govtaff-f.akamaihd.net'),
'help': ('76452', 'http://help-f.akamaihd.net'),
'indian': ('76455', 'http://indian-f.akamaihd.net'),
'intel': ('76456', 'http://intel-f.akamaihd.net'),
'intlnarc': ('76457', 'http://intlnarc-f.akamaihd.net'),
'jccic': ('85180', 'http://jccic-f.akamaihd.net'),
'jec': ('76458', 'http://jec-f.akamaihd.net'),
'judiciary': ('76459', 'http://judiciary-f.akamaihd.net'),
'rpc': ('76591', 'http://rpc-f.akamaihd.net'),
'rules': ('76460', 'http://rules-f.akamaihd.net'),
'saa': ('76489', 'http://srs-f.akamaihd.net'),
'smbiz': ('76461', 'http://smbiz-f.akamaihd.net'),
'srs': ('75229', 'http://srs-f.akamaihd.net'),
'uscc': ('76487', 'http://srs-f.akamaihd.net'),
'vetaff': ('76462', 'http://vetaff-f.akamaihd.net'),
}
from ..utils.traversal import traverse_obj
class SenateISVPIE(InfoExtractor):
@ -53,31 +22,46 @@ class SenateISVPIE(InfoExtractor):
'info_dict': {
'id': 'judiciary031715',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'title': 'ISVP',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'_old_archive_ids': ['senategov judiciary031715'],
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
'info_dict': {
'id': 'commerce011514',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'_old_archive_ids': ['senategov commerce011514'],
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'This video is not available.',
}, {
'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
# checksum differs each time
'info_dict': {
'id': 'intel090613',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'title': 'ISVP',
'_old_archive_ids': ['senategov intel090613'],
},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'https://www.senate.gov/isvp/?auto_play=false&comm=help&filename=help090920&poster=https://www.help.senate.gov/assets/images/video-poster.png&stt=950',
'info_dict': {
'id': 'help090920',
'ext': 'mp4',
'title': 'ISVP',
'thumbnail': 'https://www.help.senate.gov/assets/images/video-poster.png',
'_old_archive_ids': ['senategov help090920'],
},
}, {
# From http://www.c-span.org/video/?96791-1
@ -85,60 +69,81 @@ class SenateISVPIE(InfoExtractor):
'only_matching': True,
}]
_COMMITTEES = {
'ag': ('76440', 'https://ag-f.akamaihd.net', '2036803', 'agriculture'),
'aging': ('76442', 'https://aging-f.akamaihd.net', '2036801', 'aging'),
'approps': ('76441', 'https://approps-f.akamaihd.net', '2036802', 'appropriations'),
'arch': ('', 'https://ussenate-f.akamaihd.net', '', 'arch'),
'armed': ('76445', 'https://armed-f.akamaihd.net', '2036800', 'armedservices'),
'banking': ('76446', 'https://banking-f.akamaihd.net', '2036799', 'banking'),
'budget': ('76447', 'https://budget-f.akamaihd.net', '2036798', 'budget'),
'cecc': ('76486', 'https://srs-f.akamaihd.net', '2036782', 'srs_cecc'),
'commerce': ('80177', 'https://commerce1-f.akamaihd.net', '2036779', 'commerce'),
'csce': ('75229', 'https://srs-f.akamaihd.net', '2036777', 'srs_srs'),
'dpc': ('76590', 'https://dpc-f.akamaihd.net', '', 'dpc'),
'energy': ('76448', 'https://energy-f.akamaihd.net', '2036797', 'energy'),
'epw': ('76478', 'https://epw-f.akamaihd.net', '2036783', 'environment'),
'ethics': ('76449', 'https://ethics-f.akamaihd.net', '2036796', 'ethics'),
'finance': ('76450', 'https://finance-f.akamaihd.net', '2036795', 'finance_finance'),
'foreign': ('76451', 'https://foreign-f.akamaihd.net', '2036794', 'foreignrelations'),
'govtaff': ('76453', 'https://govtaff-f.akamaihd.net', '2036792', 'hsgac'),
'help': ('76452', 'https://help-f.akamaihd.net', '2036793', 'help'),
'indian': ('76455', 'https://indian-f.akamaihd.net', '2036791', 'indianaffairs'),
'intel': ('76456', 'https://intel-f.akamaihd.net', '2036790', 'intelligence'),
'intlnarc': ('76457', 'https://intlnarc-f.akamaihd.net', '', 'internationalnarcoticscaucus'),
'jccic': ('85180', 'https://jccic-f.akamaihd.net', '2036778', 'jccic'),
'jec': ('76458', 'https://jec-f.akamaihd.net', '2036789', 'jointeconomic'),
'judiciary': ('76459', 'https://judiciary-f.akamaihd.net', '2036788', 'judiciary'),
'rpc': ('76591', 'https://rpc-f.akamaihd.net', '', 'rpc'),
'rules': ('76460', 'https://rules-f.akamaihd.net', '2036787', 'rules'),
'saa': ('76489', 'https://srs-f.akamaihd.net', '2036780', 'srs_saa'),
'smbiz': ('76461', 'https://smbiz-f.akamaihd.net', '2036786', 'smallbusiness'),
'srs': ('75229', 'https://srs-f.akamaihd.net', '2031966', 'srs_srs'),
'uscc': ('76487', 'https://srs-f.akamaihd.net', '2036781', 'srs_uscc'),
'vetaff': ('76462', 'https://vetaff-f.akamaihd.net', '2036785', 'veteransaffairs'),
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs'))
if not qs.get('filename') or not qs.get('type') or not qs.get('comm'):
if not qs.get('filename') or not qs.get('comm'):
raise ExtractorError('Invalid URL', expected=True)
video_id = re.sub(r'.mp4$', '', qs['filename'][0])
filename = qs['filename'][0]
video_id = remove_end(filename, '.mp4')
webpage = self._download_webpage(url, video_id)
committee = qs['comm'][0]
if smuggled_data.get('force_title'):
title = smuggled_data['force_title']
else:
title = self._html_extract_title(webpage)
poster = qs.get('poster')
thumbnail = poster[0] if poster else None
video_type = qs['type'][0]
committee = video_type if video_type == 'arch' else qs['comm'][0]
stream_num, domain = _COMMITTEES[committee]
stream_num, stream_domain, stream_id, msl3 = self._COMMITTEES[committee]
urls_alternatives = [f'https://www-senate-gov-media-srs.akamaized.net/hls/live/{stream_id}/{committee}/{filename}/master.m3u8',
f'https://www-senate-gov-msl3archive.akamaized.net/{msl3}/{filename}_1/master.m3u8',
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
f'{stream_domain}/i/{filename}.mp4/master.m3u8']
formats = []
if video_type == 'arch':
filename = video_id if '.' in video_id else video_id + '.mp4'
m3u8_url = urllib.parse.urljoin(domain, 'i/' + filename + '/master.m3u8')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8')
else:
hdcore_sign = 'hdcore=3.1.0'
url_params = (domain, video_id, stream_num)
f4m_url = f'%s/z/%s_1@%s/manifest.f4m?{hdcore_sign}' % url_params
m3u8_url = '{}/i/{}_1@{}/master.m3u8'.format(*url_params)
for entry in self._extract_f4m_formats(f4m_url, video_id, f4m_id='f4m'):
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.append(entry)
for entry in self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8'):
mobj = re.search(r'(?P<tag>(?:-p|-b)).m3u8', entry['url'])
if mobj:
entry['format_id'] += mobj.group('tag')
formats.append(entry)
subtitles = {}
for video_url in urls_alternatives:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(video_url, video_id, ext='mp4', fatal=False)
if formats:
break
return {
'id': video_id,
'title': title,
'title': self._html_extract_title(webpage),
'formats': formats,
'thumbnail': thumbnail,
'subtitles': subtitles,
'thumbnail': traverse_obj(qs, ('poster', 0, {url_or_none})),
'_old_archive_ids': [make_archive_id(SenateGovIE, video_id)],
}
class SenateGovIE(InfoExtractor):
_IE_NAME = 'senate.gov'
_VALID_URL = r'https?:\/\/(?:www\.)?(help|appropriations|judiciary|banking|armed-services|finance)\.senate\.gov'
_SUBDOMAIN_RE = '|'.join(map(re.escape, (
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',
'intelligence', 'inaugural', 'judiciary', 'rules', 'sbc', 'veterans',
)))
_VALID_URL = rf'https?://(?:www\.)?(?:{_SUBDOMAIN_RE})\.senate\.gov'
_TESTS = [{
'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health',
'info_dict': {
@ -147,6 +152,9 @@ class SenateGovIE(InfoExtractor):
'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health',
'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions',
'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.help.senate.gov/assets/images/sharelogo.jpg',
'_old_archive_ids': ['senategov help090920'],
},
'params': {'skip_download': 'm3u8'},
}, {
@ -156,8 +164,12 @@ class SenateGovIE(InfoExtractor):
'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD',
'title': 'Review of the FY2019 Budget Request for the U.S. Army',
'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.appropriations.senate.gov/themes/appropriations/images/video-poster-flash-fit.png',
'_old_archive_ids': ['senategov appropsA051518'],
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization',
'info_dict': {
@ -166,32 +178,65 @@ class SenateGovIE(InfoExtractor):
'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization',
'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs',
'ext': 'mp4',
'thumbnail': 'https://www.banking.senate.gov/themes/banking/images/sharelogo.jpg',
'age_limit': 0,
'_old_archive_ids': ['senategov banking041521'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.agriculture.senate.gov/hearings/hemp-production-and-the-2018-farm-bill',
'only_matching': True,
}, {
'url': 'https://www.aging.senate.gov/hearings/the-older-americans-act-the-local-impact-of-the-law-and-the-upcoming-reauthorization',
'only_matching': True,
}, {
'url': 'https://www.budget.senate.gov/hearings/improving-care-lowering-costs-achieving-health-care-efficiency',
'only_matching': True,
}, {
'url': 'https://www.commerce.senate.gov/2024/12/communications-networks-safety-and-security',
'only_matching': True,
}, {
'url': 'https://www.energy.senate.gov/hearings/2024/2/full-committee-hearing-to-examine',
'only_matching': True,
}, {
'url': 'https://www.epw.senate.gov/public/index.cfm/hearings?ID=F63083EA-2C13-498C-B548-341BED68C209',
'only_matching': True,
}, {
'url': 'https://www.foreign.senate.gov/hearings/american-diplomacy-and-global-leadership-review-of-the-fy25-state-department-budget-request',
'only_matching': True,
}, {
'url': 'https://www.intelligence.senate.gov/hearings/foreign-threats-elections-2024-%E2%80%93-roles-and-responsibilities-us-tech-providers',
'only_matching': True,
}, {
'url': 'https://www.inaugural.senate.gov/52nd-inaugural-ceremonies/',
'only_matching': True,
}, {
'url': 'https://www.rules.senate.gov/hearings/02/07/2023/business-meeting',
'only_matching': True,
}, {
'url': 'https://www.sbc.senate.gov/public/index.cfm/hearings?ID=5B13AA6B-8279-45AF-B54B-94156DC7A2AB',
'only_matching': True,
}, {
'url': 'https://www.veterans.senate.gov/2024/5/frontier-health-care-ensuring-veterans-access-no-matter-where-they-live',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._generic_id(url)
webpage = self._download_webpage(url, display_id)
parse_info = parse_qs(self._search_regex(
r'<iframe class="[^>"]*streaminghearing[^>"]*"\s[^>]*\bsrc="([^">]*)', webpage, 'hearing URL'))
stream_num, stream_domain = _COMMITTEES[parse_info['comm'][-1]]
filename = parse_info['filename'][-1]
formats = self._extract_m3u8_formats(
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
display_id, ext='mp4')
url_info = next(SenateISVPIE.extract_from_webpage(self._downloader, url, webpage), None)
if not url_info:
raise UnsupportedError(url)
title = self._html_search_regex(
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title')
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title', fatal=False)
return {
'id': re.sub(r'.mp4$', '', filename),
**url_info,
'_type': 'url_transparent',
'display_id': display_id,
'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(),
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'age_limit': self._rta_search(webpage),
'formats': formats,
}

View file

@ -7,7 +7,6 @@
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import (
KNOWN_EXTENSIONS,
ExtractorError,
float_or_none,
int_or_none,
@ -54,6 +53,7 @@ class SoundcloudBaseIE(InfoExtractor):
_HEADERS = {}
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_TAGS_RE = re.compile(r'"([^"]+)"|([^ ]+)')
_ARTWORK_MAP = {
'mini': 16,
@ -211,6 +211,7 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, extract_f
format_urls = set()
formats = []
has_drm = False
query = {'client_id': self._CLIENT_ID}
if secret_token:
query['secret_token'] = secret_token
@ -246,55 +247,24 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, extract_f
'url': format_url,
'quality': 10,
'format_note': 'Original',
'vcodec': 'none',
})
def invalid_url(url):
return not url or url in format_urls
def add_format(f, protocol, is_preview=False):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj:
for k, v in mobj.groupdict().items():
if not f.get(k):
f[k] = v
format_id_list = []
if protocol:
format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f.update({
'abr': 256,
'quality': 5,
'format_note': 'Premium',
})
for k in ('ext', 'abr'):
v = str_or_none(f.get(k))
if v:
format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
if preview:
format_id_list.append('preview')
abr = f.get('abr')
if abr:
f['abr'] = int(abr)
if protocol in ('hls', 'hls-aes'):
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({
'format_id': '_'.join(format_id_list),
'protocol': protocol,
'preference': -10 if preview else None,
})
formats.append(f)
# New API
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']))):
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']) and v['preset'])):
if extract_flat:
break
format_url = t['url']
preset = t['preset']
preset_base = preset.partition('_')[0]
protocol = traverse_obj(t, ('format', 'protocol', {str}))
protocol = traverse_obj(t, ('format', 'protocol', {str})) or 'http'
if protocol.startswith(('ctr-', 'cbc-')):
has_drm = True
continue
if protocol == 'progressive':
protocol = 'http'
if protocol != 'hls' and '/hls' in format_url:
@ -302,35 +272,60 @@ def add_format(f, protocol, is_preview=False):
if protocol == 'encrypted-hls' or '/encrypted-hls' in format_url:
protocol = 'hls-aes'
ext = None
if preset := traverse_obj(t, ('preset', {str_or_none})):
ext = preset.split('_')[0]
if ext not in KNOWN_EXTENSIONS:
ext = mimetype2ext(traverse_obj(t, ('format', 'mime_type', {str})))
identifier = join_nonempty(protocol, ext, delim='_')
if not self._is_requested(identifier):
self.write_debug(f'"{identifier}" is not a requested format, skipping')
short_identifier = f'{protocol}_{preset_base}'
if preset_base == 'abr':
self.write_debug(f'Skipping broken "{short_identifier}" format')
continue
if not self._is_requested(short_identifier):
self.write_debug(f'"{short_identifier}" is not a requested format, skipping')
continue
# XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called
stream_url = traverse_obj(self._call_api(
format_url, track_id, f'Downloading {identifier} format info JSON',
format_url, track_id, f'Downloading {short_identifier} format info JSON',
query=query, headers=self._HEADERS), ('url', {url_or_none}))
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
add_format({
mime_type = traverse_obj(t, ('format', 'mime_type', {str}))
codec = self._search_regex(r'codecs="([^"]+)"', mime_type, 'codec', default=None)
ext = {
'mp4a': 'm4a',
'opus': 'opus',
}.get(codec[:4] if codec else None) or mimetype2ext(mime_type, default=None)
if not ext or ext == 'm3u8':
ext = preset_base
is_premium = t.get('quality') == 'hq'
abr = int_or_none(
self._search_regex(r'(\d+)k$', preset, 'abr', default=None)
or self._search_regex(r'\.(\d+)\.(?:opus|mp3)[/?]', stream_url, 'abr', default=None)
or (256 if (is_premium and 'aac' in preset) else None))
is_preview = (t.get('snipped')
or '/preview/' in format_url
or re.search(r'/(?:preview|playlist)/0/30/', stream_url))
formats.append({
'format_id': join_nonempty(protocol, preset, is_preview and 'preview', delim='_'),
'url': stream_url,
'ext': ext,
}, protocol, t.get('snipped') or '/preview/' in format_url)
'acodec': codec,
'vcodec': 'none',
'abr': abr,
'protocol': 'm3u8_native' if protocol in ('hls', 'hls-aes') else 'http',
'container': 'm4a_dash' if ext == 'm4a' else None,
'quality': 5 if is_premium else 0 if (abr and abr >= 160) else -1,
'format_note': 'Premium' if is_premium else None,
'preference': -10 if is_preview else None,
})
for f in formats:
f['vcodec'] = 'none'
if not formats and info.get('policy') == 'BLOCK':
self.raise_geo_restricted(metadata_available=True)
if not formats:
if has_drm:
self.report_drm(track_id)
if info.get('policy') == 'BLOCK':
self.raise_geo_restricted(metadata_available=True)
user = info.get('user') or {}
@ -367,6 +362,7 @@ def extract_count(key):
'uploader_url': user.get('permalink_url'),
'timestamp': unified_timestamp(info.get('created_at')),
'title': info.get('title'),
'track': info.get('title'),
'description': info.get('description'),
'thumbnails': thumbnails,
'duration': float_or_none(info.get('duration'), 1000),
@ -377,6 +373,7 @@ def extract_count(key):
'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'),
'genres': traverse_obj(info, ('genre', {str}, filter, all, filter)),
'tags': traverse_obj(info, ('tag_list', {self._TAGS_RE.findall}, ..., ..., filter)),
'artists': traverse_obj(info, ('publisher_metadata', 'artist', {str}, filter, all, filter)),
'formats': formats if not extract_flat else None,
}
@ -399,7 +396,7 @@ class SoundcloudIE(SoundcloudBaseIE):
(?:(?:(?:www\.|m\.)?soundcloud\.com/
(?!stations/track)
(?P<uploader>[\w\d-]+)/
(?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
(?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight|comments)/?(?:$|[?#]))
(?P<title>[\w\d-]+)
(?:/(?P<token>(?!(?:albums|sets|recommended))[^?]+?))?
(?:[?].*)?$)
@ -416,6 +413,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '62986583',
'ext': 'opus',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'track': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d',
'uploader': 'E.T. ExTerrestrial Music',
'uploader_id': '1571244',
@ -429,6 +427,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg',
'uploader_url': 'https://soundcloud.com/ethmusic',
'tags': 'count:14',
},
},
# geo-restricted
@ -438,12 +437,13 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '47127627',
'ext': 'opus',
'title': 'Goldrushed',
'track': 'Goldrushed',
'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
'uploader': 'The Royal Concept',
'uploader_id': '9615865',
'timestamp': 1337635207,
'upload_date': '20120521',
'duration': 227.155,
'duration': 227.103,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@ -453,6 +453,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg',
'genres': ['Alternative'],
'artists': ['The Royal Concept'],
'tags': [],
},
},
# private link
@ -463,6 +464,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367',
'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF',
'uploader_id': '69767071',
@ -477,6 +479,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# private link (alt format)
@ -487,6 +490,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367',
'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF',
'uploader_id': '69767071',
@ -501,16 +505,18 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# downloadable song
{
'url': 'https://soundcloud.com/the80m/the-following',
'md5': '9ffcddb08c87d74fb5808a3c183a1d04',
'md5': 'ecb87d7705d5f53e6c02a63760573c75', # wav: '9ffcddb08c87d74fb5808a3c183a1d04'
'info_dict': {
'id': '343609555',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'The Following',
'track': 'The Following',
'description': '',
'uploader': '80M',
'uploader_id': '312384765',
@ -526,16 +532,20 @@ class SoundcloudIE(SoundcloudBaseIE):
'view_count': int,
'genres': ['Dance & EDM'],
'artists': ['80M'],
'tags': ['80M', 'EDM', 'Dance', 'Music'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# private link, downloadable format
# tags with spaces (e.g. "Uplifting Trance", "Ori Uplift")
{
'url': 'https://soundcloud.com/oriuplift/uponly-238-no-talking-wav/s-AyZUd',
'md5': '64a60b16e617d41d0bef032b7f55441e',
'md5': '2e1530d0e9986a833a67cb34fc90ece0', # wav: '64a60b16e617d41d0bef032b7f55441e'
'info_dict': {
'id': '340344461',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
'uploader': 'Ori Uplift Music',
'uploader_id': '12563093',
@ -551,7 +561,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/oriuplift',
'genres': ['Trance'],
'artists': ['Ori Uplift'],
'tags': ['Orchestral', 'Emotional', 'Uplifting Trance', 'Trance', 'Ori Uplift', 'UpOnly'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# no album art, use avatar pic for thumbnail
{
@ -561,6 +573,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '309699954',
'ext': 'mp3',
'title': 'Sideways (Prod. Mad Real)',
'track': 'Sideways (Prod. Mad Real)',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'uploader': 'garyvee',
'uploader_id': '2366352',
@ -575,6 +588,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'uploader_url': 'https://soundcloud.com/garyvee',
'artists': ['MadReal'],
'tags': [],
},
'params': {
'skip_download': True,
@ -587,6 +601,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '583011102',
'ext': 'opus',
'title': 'Mezzo Valzer',
'track': 'Mezzo Valzer',
'description': 'md5:f4d5f39d52e0ccc2b4f665326428901a',
'uploader': 'Giovanni Sarani',
'uploader_id': '3352531',
@ -601,6 +616,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'genres': ['Piano'],
'uploader_url': 'https://soundcloud.com/giovannisarani',
'tags': 'count:10',
},
},
{
@ -662,6 +678,11 @@ def _extract_set(self, playlist, token=None):
'playlistId': playlist_id,
'playlistSecretToken': token,
}, headers=self._HEADERS)
album_info = traverse_obj(playlist, {
'album': ('title', {str}),
'album_artist': ('user', 'username', {str}),
'album_type': ('set_type', {str}, {lambda x: x or 'playlist'}),
})
entries = []
for track in tracks:
track_id = str_or_none(track.get('id'))
@ -673,11 +694,17 @@ def _extract_set(self, playlist, token=None):
if token:
url += '?secret_token=' + token
entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id))
url, SoundcloudIE.ie_key(), track_id, url_transparent=True, **album_info))
return self.playlist_result(
entries, playlist_id,
playlist.get('title'),
playlist.get('description'))
playlist.get('description'),
**album_info,
**traverse_obj(playlist, {
'uploader': ('user', 'username', {str}),
'uploader_id': ('user', 'id', {str_or_none}),
}),
)
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -689,6 +716,11 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'id': '2284613',
'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e',
'uploader': 'The Royal Concept',
'uploader_id': '9615865',
'album': 'The Royal Concept EP',
'album_artists': ['The Royal Concept'],
'album_type': 'ep',
},
'playlist_mincount': 5,
}, {
@ -782,7 +814,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
(?:(?:www|m)\.)?soundcloud\.com/
(?P<user>[^/]+)
(?:/
(?P<rsrc>tracks|albums|sets|reposts|likes|spotlight)
(?P<rsrc>tracks|albums|sets|reposts|likes|spotlight|comments)
)?
/?(?:[?#].*)?$
'''
@ -836,6 +868,13 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'title': 'Grynpyret (Spotlight)',
},
'playlist_mincount': 1,
}, {
'url': 'https://soundcloud.com/one-thousand-and-one/comments',
'info_dict': {
'id': '992430331',
'title': '7x11x13-testing (Comments)',
},
'playlist_mincount': 1,
}]
_BASE_URL_MAP = {
@ -846,6 +885,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'reposts': 'stream/users/%s/reposts',
'likes': 'users/%s/likes',
'spotlight': 'users/%s/spotlight',
'comments': 'users/%s/comments',
}
def _real_extract(self, url):
@ -966,6 +1006,11 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
'id': '4110309',
'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
'description': 're:.*?TILT Brass - Bowery Poetry Club',
'uploader': 'Non-Site Records',
'uploader_id': '33660914',
'album_artists': ['Non-Site Records'],
'album_type': 'playlist',
'album': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
},
'playlist_count': 6,
}]

View file

@ -207,7 +207,7 @@ def _real_extract(self, url):
class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?ppv/(?P<id>\w+)'
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?(?:ppv|live)/(?P<id>\w+)'
IE_NAME = 'theatercomplextown:ppv'
_TESTS = [{
'url': 'https://www.theater-complex.town/ppv/wytW3X7khrjJBUpKuV3jen',
@ -229,6 +229,9 @@ class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
}, {
'url': 'https://www.theater-complex.town/ja/ppv/qwUVmLmGEiZ3ZW6it9uGys',
'only_matching': True,
}, {
'url': 'https://www.theater-complex.town/en/live/79akNM7bJeD5Fi9EP39aDp',
'only_matching': True,
}]
_API_PATH = 'events'

View file

@ -0,0 +1,199 @@
import functools
import math
from .common import InfoExtractor
from ..utils import (
InAdvancePagedList,
int_or_none,
parse_iso8601,
try_call,
url_or_none,
)
from ..utils.traversal import traverse_obj
class SubsplashBaseIE(InfoExtractor):
def _get_headers(self, url, display_id):
token = try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
if not token:
webpage, urlh = self._download_webpage_handle(url, display_id)
token = (
try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
or urlh.get_header('x-api-token')
or self._search_json(
r'<script[^>]+\bid="shoebox-tokens"[^>]*>', webpage, 'shoebox tokens',
display_id, default={}).get('apiToken')
or self._search_regex(r'\\"tokens\\":{\\"guest\\":\\"([A-Za-z0-9._-]+)\\"', webpage, 'token', default=None))
if not token:
self.report_warning('Unable to extract auth token')
return None
return {'Authorization': f'Bearer {token}'}
def _extract_video(self, data, video_id):
formats = []
video_data = traverse_obj(data, ('_embedded', 'video', '_embedded', {dict}))
m3u8_url = traverse_obj(video_data, ('playlists', 0, '_links', 'related', 'href', {url_or_none}))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
mp4_entry = traverse_obj(video_data, ('video-outputs', lambda _, v: url_or_none(v['_links']['related']['href']), any))
if mp4_entry:
formats.append({
'url': mp4_entry['_links']['related']['href'],
'format_id': 'direct',
'quality': 1,
**traverse_obj(mp4_entry, {
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
'filesize': ('file_size', {int_or_none}),
}),
})
return {
'id': video_id,
'formats': formats,
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('summary_text', {str}),
'thumbnail': ('_embedded', 'images', 0, '_links', 'related', 'href', {url_or_none}),
'duration': ('_embedded', 'video', 'duration', {int_or_none(scale=1000)}),
'timestamp': ('date', {parse_iso8601}),
'release_timestamp': ('published_at', {parse_iso8601}),
'modified_timestamp': ('updated_at', {parse_iso8601}),
}),
}
class SubsplashIE(SubsplashBaseIE):
_VALID_URL = [
r'https?://(?:www\.)?subsplash\.com/(?:u/)?[^/?#]+/[^/?#]+/(?:d/|mi/\+)(?P<id>\w+)',
r'https?://(?:\w+\.)?subspla\.sh/(?P<id>\w+)',
]
_TESTS = [{
'url': 'https://subsplash.com/u/skywatchtv/media/d/5whnx5s-the-grand-delusion-taking-place-right-now',
'md5': 'd468729814e533cec86f1da505dec82d',
'info_dict': {
'id': '5whnx5s',
'ext': 'mp4',
'title': 'THE GRAND DELUSION TAKING PLACE RIGHT NOW!',
'description': 'md5:220a630865c3697b0ec9dcb3a70cbc33',
'upload_date': '20240901',
'duration': 1710,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'modified_date': '20240901',
'release_date': '20240901',
'release_timestamp': 1725195600,
'timestamp': 1725148800,
'modified_timestamp': 1725195657,
},
}, {
'url': 'https://subsplash.com/u/prophecywatchers/media/d/n4dr8b2-the-transhumanist-plan-for-humanity-billy-crone',
'md5': '01982d58021af81c969958459bd81f13',
'info_dict': {
'id': 'n4dr8b2',
'ext': 'mp4',
'title': 'The Transhumanist Plan for Humanity | Billy Crone',
'upload_date': '20240903',
'duration': 1709,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'timestamp': 1725321600,
'modified_date': '20241010',
'release_date': '20240903',
'release_timestamp': 1725379200,
'modified_timestamp': 1728577804,
},
}, {
'url': 'https://subsplash.com/laiglesiadelcentro/vid/mi/+ecb6a6b?autoplay=true',
'md5': '013c9b1e391dd4b34d8612439445deef',
'info_dict': {
'id': 'ecb6a6b',
'ext': 'mp4',
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'release_timestamp': 1477095852,
'title': 'En el Principio Era el Verbo | EVANGELIO DE JUAN | Ps. Gadiel Ríos',
'timestamp': 1425772800,
'upload_date': '20150308',
'description': 'md5:f368221de93176654989ba66bb564798',
'modified_timestamp': 1730258864,
'modified_date': '20241030',
'release_date': '20161022',
},
}, {
'url': 'https://prophecywatchers.subspla.sh/8gps8cx',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items',
video_id, headers=self._get_headers(url, video_id),
query={
'filter[short_code]': video_id,
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document,broadcast',
})
return self._extract_video(traverse_obj(data, ('_embedded', 'media-items', 0)), video_id)
class SubsplashPlaylistIE(SubsplashBaseIE):
IE_NAME = 'subsplash:playlist'
_VALID_URL = r'https?://(?:www\.)?subsplash\.com/[^/?#]+/(?:our-videos|media)/ms/\+(?P<id>\w+)'
_PAGE_SIZE = 15
_TESTS = [{
'url': 'https://subsplash.com/skywatchtv/our-videos/ms/+dbyjzp8',
'info_dict': {
'id': 'dbyjzp8',
'title': 'Five in Ten',
},
'playlist_mincount': 11,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+n42mr48',
'info_dict': {
'id': 'n42mr48',
'title': 'Road to Zion Series',
},
'playlist_mincount': 13,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+918b9f6',
'only_matching': True,
}]
def _entries(self, series_id, headers, page):
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items', series_id, headers=headers,
query={
'filter[broadcast.status|broadcast.status]': 'null|on-demand',
'filter[media_series]': series_id,
'filter[status]': 'published',
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document',
'page[number]': page + 1,
'page[size]': self._PAGE_SIZE,
'sort': '-position',
}, note=f'Downloading page {page + 1}')
for entry in traverse_obj(data, ('_embedded', 'media-items', lambda _, v: v['short_code'])):
entry_id = entry['short_code']
info = self._extract_video(entry, entry_id)
yield {
**info,
'webpage_url': f'https://subspla.sh/{entry_id}',
'extractor_key': SubsplashIE.ie_key(),
'extractor': SubsplashIE.IE_NAME,
}
def _real_extract(self, url):
display_id = self._match_id(url)
headers = self._get_headers(url, display_id)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-series', display_id, headers=headers,
query={'filter[short_code]': display_id})
series_data = traverse_obj(data, ('_embedded', 'media-series', 0, {
'id': ('id', {str}),
'title': ('title', {str}),
'count': ('media_items_count', {int}),
}))
total_pages = math.ceil(series_data['count'] / self._PAGE_SIZE)
return self.playlist_result(
InAdvancePagedList(functools.partial(self._entries, series_data['id'], headers), total_pages, self._PAGE_SIZE),
display_id, series_data['title'])

View file

@ -118,8 +118,9 @@ def extract_site_specific_field(field):
'categories', lambda _, v: v.get('label') in ('category', None), 'name', {str})) or None,
'tags': traverse_obj(info, ('keywords', {lambda x: re.split(r'[;,]\s?', x) if x else None})),
'location': extract_site_specific_field('region'),
'series': extract_site_specific_field('show'),
'series': extract_site_specific_field('show') or extract_site_specific_field('seriesTitle'),
'season_number': int_or_none(extract_site_specific_field('seasonNumber')),
'episode_number': int_or_none(extract_site_specific_field('episodeNumber')),
'media_type': extract_site_specific_field('programmingType') or extract_site_specific_field('type'),
}

View file

@ -189,26 +189,6 @@ class TumblrIE(InfoExtractor):
'release_date': '20140227',
},
'add_ie': ['Vimeo'],
}, {
'url': 'http://sutiblr.tumblr.com/post/139638707273',
'md5': '2dd184b3669e049ba40563a7d423f95c',
'info_dict': {
'id': 'ir7qBEIKqvq',
'ext': 'mp4',
'title': 'Vine by sutiblr',
'alt_title': 'Vine by sutiblr',
'uploader': 'sutiblr',
'uploader_id': '1198993975374495744',
'upload_date': '20160220',
'like_count': int,
'comment_count': int,
'repost_count': int,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1455940159,
'view_count': int,
},
'add_ie': ['Vine'],
'skip': 'Vine is unavailable',
}, {
'url': 'https://silami.tumblr.com/post/84250043974/my-bad-river-flows-in-you-impression-on-maschine',
'md5': '3c92d7c3d867f14ccbeefa2119022277',
@ -366,7 +346,6 @@ class TumblrIE(InfoExtractor):
_providers = {
'instagram': 'Instagram',
'vimeo': 'Vimeo',
'vine': 'Vine',
'youtube': 'Youtube',
'dailymotion': 'Dailymotion',
'tiktok': 'TikTok',

View file

@ -24,8 +24,6 @@ class TVerIE(InfoExtractor):
'channel': 'テレビ朝日',
'id': 'ep83nf3w4p',
'ext': 'mp4',
'onair_label': '5月3日(火)放送分',
'ext_title': '家事ヤロウ!!! 売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着 テレビ朝日 5月3日(火)放送分',
},
'add_ie': ['BrightcoveNew'],
}, {

View file

@ -1,11 +1,12 @@
import functools
import json
import random
import math
import re
import urllib.parse
from .common import InfoExtractor
from .periscope import PeriscopeBaseIE, PeriscopeIE
from ..jsinterp import js_number_to_string
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
@ -409,26 +410,6 @@ class TwitterCardIE(InfoExtractor):
},
'add_ie': ['Youtube'],
},
{
'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
'info_dict': {
'id': 'iBb2x00UVlv',
'ext': 'mp4',
'upload_date': '20151113',
'uploader_id': '1189339351084113920',
'uploader': 'ArsenalTerje',
'title': 'Vine by ArsenalTerje',
'timestamp': 1447451307,
'alt_title': 'Vine by ArsenalTerje',
'comment_count': int,
'like_count': int,
'thumbnail': r're:^https?://[^?#]+\.jpg',
'view_count': int,
'repost_count': int,
},
'add_ie': ['Vine'],
'params': {'skip_download': 'm3u8'},
},
{
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
'md5': '884812a2adc8aaf6fe52b15ccbfa3b88',
@ -567,25 +548,6 @@ class TwitterIE(TwitterBaseIE):
'age_limit': 0,
'_old_archive_ids': ['twitter 700207533655363584'],
},
}, {
'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
'md5': '89a15ed345d13b86e9a5a5e051fa308a',
'info_dict': {
'id': 'MIOxnrUteUd',
'ext': 'mp4',
'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
'uploader': 'TAKUMA',
'uploader_id': '1004126642786242560',
'timestamp': 1402826626,
'upload_date': '20140615',
'thumbnail': r're:^https?://.*\.jpg',
'alt_title': 'Vine by TAKUMA',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
},
'add_ie': ['Vine'],
}, {
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
'info_dict': {
@ -1369,6 +1331,11 @@ def _build_graphql_query(self, media_id):
},
}
def _generate_syndication_token(self, twid):
# ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
translation = str.maketrans(dict.fromkeys('0.'))
return js_number_to_string((int(twid) / 1e15) * math.PI, 36).translate(translation)
def _call_syndication_api(self, twid):
self.report_warning(
'Not all metadata or media is available via syndication endpoint', twid, only_once=True)
@ -1376,8 +1343,7 @@ def _call_syndication_api(self, twid):
'https://cdn.syndication.twimg.com/tweet-result', twid, 'Downloading syndication JSON',
headers={'User-Agent': 'Googlebot'}, query={
'id': twid,
# TODO: token = ((Number(twid) / 1e15) * Math.PI).toString(36).replace(/(0+|\.)/g, '')
'token': ''.join(random.choices('123456789abcdefghijklmnopqrstuvwxyz', k=10)),
'token': self._generate_syndication_token(twid),
})
if not status:
raise ExtractorError('Syndication endpoint returned empty JSON response')

View file

@ -50,6 +50,7 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'music\.amazon\.(?:\w{2}\.)?\w+',
r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai',
r'(?:beta\.)?crunchyroll\.com',
)
_TESTS = [{
@ -153,6 +154,12 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, {
'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -14,59 +14,69 @@ class VideocampusSachsenIE(InfoExtractor):
'corporate.demo.vimp.com',
'dancehalldatabase.com',
'drehzahl.tv',
'educhannel.hs-gesundheit.de',
'educhannel.hs-gesundheit.de', # Hochschule für Gesundheit NRW
'emedia.ls.haw-hamburg.de',
'globale-evolution.net',
'hohu.tv',
'htvideos.hightechhigh.org',
'k210039.vimp.mivitec.net',
'media.cmslegal.com',
'media.hs-furtwangen.de',
'media.hwr-berlin.de',
'media.fh-swf.de', # Fachhochschule Südwestfalen
'media.hs-furtwangen.de', # Hochschule Furtwangen
'media.hwr-berlin.de', # Hochschule für Wirtschaft und Recht Berlin
'mediathek.dkfz.de',
'mediathek.htw-berlin.de',
'mediathek.htw-berlin.de', # Hochschule für Technik und Wirtschaft Berlin
'mediathek.polizei-bw.de',
'medien.hs-merseburg.de',
'mportal.europa-uni.de',
'medien.hs-merseburg.de', # Hochschule Merseburg
'mitmedia.manukau.ac.nz', # Manukau Institute of Technology Auckland (NZ)
'mportal.europa-uni.de', # Europa-Universität Viadrina
'pacific.demo.vimp.com',
'slctv.com',
'streaming.prairiesouth.ca',
'tube.isbonline.cn',
'univideo.uni-kassel.de',
'univideo.uni-kassel.de', # Universität Kassel
'ursula2.genetics.emory.edu',
'ursulablicklevideoarchiv.com',
'v.agrarumweltpaedagogik.at',
'video.eplay-tv.de',
'video.fh-dortmund.de',
'video.hs-offenburg.de',
'video.hs-pforzheim.de',
'video.hspv.nrw.de',
'video.fh-dortmund.de', # Fachhochschule Dortmund
'video.hs-nb.de', # Hochschule Neubrandenburg
'video.hs-offenburg.de', # Hochschule Offenburg
'video.hs-pforzheim.de', # Hochschule Pforzheim
'video.hspv.nrw.de', # Hochschule für Polizei und öffentliche Verwaltung NRW
'video.irtshdf.fr',
'video.pareygo.de',
'video.tu-freiberg.de',
'videocampus.sachsen.de',
'videoportal.uni-freiburg.de',
'videoportal.vm.uni-freiburg.de',
'video.tu-dortmund.de', # Technische Universität Dortmund
'video.tu-freiberg.de', # Technische Universität Bergakademie Freiberg
'videocampus.sachsen.de', # Video Campus Sachsen (gemeinsame Videoplattform sächsischer Universitäten, Hochschulen und der Berufsakademie Sachsen)
'videoportal.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videoportal.vm.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videos.duoc.cl',
'videos.uni-paderborn.de',
'videos.uni-paderborn.de', # Universität Paderborn
'vimp-bemus.udk-berlin.de',
'vimp.aekwl.de',
'vimp.hs-mittweida.de',
'vimp.oth-regensburg.de',
'vimp.ph-heidelberg.de',
'vimp.landesfilmdienste.de',
'vimp.oth-regensburg.de', # Ostbayerische Technische Hochschule Regensburg
'vimp.ph-heidelberg.de', # Pädagogische Hochschule Heidelberg
'vimp.sma-events.com',
'vimp.weka-fachmedien.de',
'vimpdesk.com',
'webtv.univ-montp3.fr',
'www.b-tu.de/media',
'www.b-tu.de/media', # Brandenburgische Technische Universität Cottbus-Senftenberg
'www.bergauf.tv',
'www.bigcitytv.de',
'www.cad-videos.de',
'www.drehzahl.tv',
'www.fh-bielefeld.de/medienportal',
'www.hohu.tv',
'www.hsbi.de/medienportal', # Hochschule Bielefeld
'www.logistic.tv',
'www.orvovideo.com',
'www.printtube.co.uk',
'www.rwe.tv',
'www.salzi.tv',
'www.signtube.co.uk',
'www.twb-power.com',
'www.wenglor-media.com',
'www2.univ-sba.dz',
)
@ -188,22 +198,23 @@ def _real_extract(self, url):
class ViMPPlaylistIE(InfoExtractor):
IE_NAME = 'ViMP:Playlist'
_VALID_URL = r'''(?x)(?P<host>https?://(?:{}))/(?:
album/view/aid/(?P<album_id>[0-9]+)|
(?P<mode>category|channel)/(?P<name>[\w-]+)/(?P<id>[0-9]+)
(?P<mode1>album)/view/aid/(?P<album_id>[0-9]+)|
(?P<mode2>category|channel)/(?P<name>[\w-]+)/(?P<channel_id>[0-9]+)|
(?P<mode3>tag)/(?P<tag_id>[0-9]+)
)'''.format('|'.join(map(re.escape, VideocampusSachsenIE._INSTANCES)))
_TESTS = [{
'url': 'https://vimp.oth-regensburg.de/channel/Designtheorie-1-SoSe-2020/3',
'info_dict': {
'id': 'channel-3',
'title': 'Designtheorie 1 SoSe 2020 :: Channels :: ViMP OTH Regensburg',
'title': 'Designtheorie 1 SoSe 2020 - Channels - ViMP OTH Regensburg',
},
'playlist_mincount': 9,
}, {
'url': 'https://www.fh-bielefeld.de/medienportal/album/view/aid/208',
'url': 'https://www.hsbi.de/medienportal/album/view/aid/208',
'info_dict': {
'id': 'album-208',
'title': 'KG Praktikum ABT/MEC :: Playlists :: FH-Medienportal',
'title': 'KG Praktikum ABT/MEC - Playlists - HSBI-Medienportal',
},
'playlist_mincount': 4,
}, {
@ -213,6 +224,13 @@ class ViMPPlaylistIE(InfoExtractor):
'title': 'Online-Seminare ONYX - BPS - Bildungseinrichtungen - VCS',
},
'playlist_mincount': 7,
}, {
'url': 'https://videocampus.sachsen.de/tag/26902',
'info_dict': {
'id': 'tag-26902',
'title': 'advanced mobile and v2x communication - Tags - VCS',
},
'playlist_mincount': 6,
}]
_PAGE_SIZE = 10
@ -220,34 +238,37 @@ def _fetch_page(self, host, url_part, playlist_id, data, page):
webpage = self._download_webpage(
f'{host}/media/ajax/component/boxList/{url_part}', playlist_id,
query={'page': page, 'page_only': 1}, data=urlencode_postdata(data))
urls = re.findall(r'"([^"]+/video/[^"]+)"', webpage)
urls = re.findall(r'"([^"]*/video/[^"]+)"', webpage)
for url in urls:
yield self.url_result(host + url, VideocampusSachsenIE)
def _real_extract(self, url):
host, album_id, mode, name, playlist_id = self._match_valid_url(url).group(
'host', 'album_id', 'mode', 'name', 'id')
host, album_id, name, channel_id, tag_id, mode1, mode2, mode3 = self._match_valid_url(url).group(
'host', 'album_id', 'name', 'channel_id', 'tag_id', 'mode1', 'mode2', 'mode3')
webpage = self._download_webpage(url, album_id or playlist_id, fatal=False) or ''
mode = mode1 or mode2 or mode3
playlist_id = album_id or channel_id or tag_id
webpage = self._download_webpage(url, playlist_id, fatal=False) or ''
title = (self._html_search_meta('title', webpage, fatal=False)
or self._html_extract_title(webpage))
url_part = (f'aid/{album_id}' if album_id
else f'category/{name}/category_id/{playlist_id}' if mode == 'category'
else f'title/{name}/channel/{playlist_id}')
else f'category/{name}/category_id/{channel_id}' if mode == 'category'
else f'title/{name}/channel/{channel_id}' if mode == 'channel'
else f'tag/{tag_id}')
mode = mode or 'album'
data = {
'vars[mode]': mode,
f'vars[{mode}]': album_id or playlist_id,
'vars[context]': '4' if album_id else '1' if mode == 'category' else '3',
'vars[context_id]': album_id or playlist_id,
f'vars[{mode}]': playlist_id,
'vars[context]': '4' if album_id else '1' if mode == 'category' else '3' if mode == 'album' else '0',
'vars[context_id]': playlist_id,
'vars[layout]': 'thumb',
'vars[per_page][thumb]': str(self._PAGE_SIZE),
}
return self.playlist_result(
OnDemandPagedList(functools.partial(
self._fetch_page, host, url_part, album_id or playlist_id, data), self._PAGE_SIZE),
playlist_title=title, id=f'{mode}-{album_id or playlist_id}')
self._fetch_page, host, url_part, playlist_id, data), self._PAGE_SIZE),
playlist_title=title, id=f'{mode}-{playlist_id}')

View file

@ -421,5 +421,5 @@ def _real_extract(self, url):
return self._process_video_json(video_json['chapters'][0], video_id)
return self.playlist_result(
[self._process_video_json(chapter, video_id) for chapter in video_json['chapters']],
(self._process_video_json(chapter, video_id) for chapter in video_json['chapters']),
str(video_json['playerUuid']), video_json.get('name'))

View file

@ -28,6 +28,7 @@
try_get,
unified_timestamp,
unsmuggle_url,
url_or_none,
urlencode_postdata,
urlhandle_detect_ext,
urljoin,
@ -211,11 +212,7 @@ def _parse_config(self, config, video_id):
'width': int_or_none(key),
'url': thumb,
})
thumbnail = video_data.get('thumbnail')
if thumbnail:
thumbnails.append({
'url': thumbnail,
})
thumbnails.extend(traverse_obj(video_data, (('thumbnail', 'thumbnail_url'), {'url': {url_or_none}})))
owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url')
@ -388,7 +385,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'businessofsoftware',
'duration': 3610,
'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d',
},
'params': {
'format': 'best[protocol=https]',
@ -413,7 +410,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 10,
'comment_count': int,
'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
},
'params': {
'format': 'best[protocol=https]',
@ -437,7 +434,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'timestamp': 1380339469,
'upload_date': '20130928',
'duration': 187,
'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d',
'view_count': int,
'comment_count': int,
'like_count': int,
@ -463,7 +460,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 62,
'comment_count': int,
'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d',
'subtitles': {
'de': 'count:3',
'en': 'count:3',
@ -488,7 +485,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593',
'uploader_id': 'user28849593',
'duration': 118,
'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
},
@ -509,7 +506,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 60,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d',
'like_count': int,
'tags': 'count:11',
},
@ -531,7 +528,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:f2edc61af3ea7a5592681ddbb683db73',
'upload_date': '20200225',
'duration': 176,
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d',
'uploader_url': 'https://vimeo.com/frameworkla',
},
# 'params': {'format': 'source'},
@ -556,7 +553,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 321,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d',
'like_count': int,
'tags': ['[the shining', 'vimeohq', 'cv', 'vimeo tribute]'],
},
@ -596,7 +593,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
},
'params': {
'format': 'best[protocol=https]',
@ -633,7 +630,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': str, # FIXME: Dynamic SEO spam description
'upload_date': '20150209',
'timestamp': 1423518307,
'thumbnail': 'https://i.vimeocdn.com/video/default_1280',
'thumbnail': 'https://i.vimeocdn.com/video/default',
'duration': 10,
'like_count': int,
'uploader_url': 'https://vimeo.com/user20132939',
@ -666,7 +663,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'license': 'by-nc',
'duration': 159,
'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/aliniamedia',
'release_date': '20160329',
@ -686,7 +683,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Firework Champions',
'upload_date': '20150910',
'timestamp': 1441901895,
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d',
'uploader_url': 'https://vimeo.com/fireworkchampions',
'tags': 'count:6',
'duration': 229,
@ -715,7 +712,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 336,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/karimhd',
'channel_url': 'https://vimeo.com/channels/staffpicks',
@ -740,7 +737,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'release_timestamp': 1627621014,
'duration': 976,
'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/txwestcapital',
'release_date': '20210730',
@ -764,7 +761,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Alex Howard',
'uploader_id': 'user54729178',
'uploader_url': 'https://vimeo.com/user54729178',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d',
'duration': 2636,
'chapters': [
{'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'},
@ -807,7 +804,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': int,
'view_count': int,
'comment_count': int,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d',
},
# 'params': {'format': 'Original'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -824,7 +821,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'rajavirdi',
'uploader_url': 'https://vimeo.com/rajavirdi',
'duration': 309,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d',
},
# 'params': {'format': 'source'},
'expected_warnings': ['Failed to parse XML: not well-formed'],

View file

@ -1,150 +0,0 @@
from .common import InfoExtractor
from ..utils import (
determine_ext,
format_field,
int_or_none,
unified_timestamp,
)
class VineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vine\.co/(?:v|oembed)/(?P<id>\w+)'
_EMBED_REGEX = [r'<iframe[^>]+src=[\'"](?P<url>(?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))']
_TESTS = [{
'url': 'https://vine.co/v/b9KOOWX7HUx',
'md5': '2f36fed6235b16da96ce9b4dc890940d',
'info_dict': {
'id': 'b9KOOWX7HUx',
'ext': 'mp4',
'title': 'Chicken.',
'alt_title': 'Vine by Jack',
'timestamp': 1368997951,
'upload_date': '20130519',
'uploader': 'Jack',
'uploader_id': '76',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
}, {
'url': 'https://vine.co/v/e192BnZnZ9V',
'info_dict': {
'id': 'e192BnZnZ9V',
'ext': 'mp4',
'title': 'ยิ้ม~ เขิน~ อาย~ น่าร้ากอ้ะ >//< @n_whitewo @orlameena #lovesicktheseries #lovesickseason2',
'alt_title': 'Vine by Pimry_zaa',
'timestamp': 1436057405,
'upload_date': '20150705',
'uploader': 'Pimry_zaa',
'uploader_id': '1135760698325307392',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://vine.co/v/MYxVapFvz2z',
'only_matching': True,
}, {
'url': 'https://vine.co/v/bxVjBbZlPUH',
'only_matching': True,
}, {
'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
f'https://archive.vine.co/posts/{video_id}.json', video_id)
def video_url(kind):
for url_suffix in ('Url', 'URL'):
format_url = data.get(f'video{kind}{url_suffix}')
if format_url:
return format_url
formats = []
for quality, format_id in enumerate(('low', '', 'dash')):
format_url = video_url(format_id.capitalize())
if not format_url:
continue
# DASH link returns plain mp4
if format_id == 'dash' and determine_ext(format_url) == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id or 'standard',
'quality': quality,
})
self._check_formats(formats, video_id)
username = data.get('username')
alt_title = format_field(username, None, 'Vine by %s')
return {
'id': video_id,
'title': data.get('description') or alt_title or 'Vine video',
'alt_title': alt_title,
'thumbnail': data.get('thumbnailUrl'),
'timestamp': unified_timestamp(data.get('created')),
'uploader': username,
'uploader_id': data.get('userIdStr'),
'view_count': int_or_none(data.get('loops')),
'like_count': int_or_none(data.get('likes')),
'comment_count': int_or_none(data.get('comments')),
'repost_count': int_or_none(data.get('reposts')),
'formats': formats,
}
class VineUserIE(InfoExtractor):
IE_NAME = 'vine:user'
_VALID_URL = r'https?://vine\.co/(?P<u>u/)?(?P<user>[^/]+)'
_VINE_BASE_URL = 'https://vine.co/'
_TESTS = [{
'url': 'https://vine.co/itsruthb',
'info_dict': {
'id': 'itsruthb',
'title': 'Ruth B',
'description': '| Instagram/Twitter: itsruthb | still a lost boy from neverland',
},
'playlist_mincount': 611,
}, {
'url': 'https://vine.co/u/942914934646415360',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if VineIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
mobj = self._match_valid_url(url)
user = mobj.group('user')
u = mobj.group('u')
profile_url = '{}api/users/profiles/{}{}'.format(
self._VINE_BASE_URL, 'vanity/' if not u else '', user)
profile_data = self._download_json(
profile_url, user, note='Downloading user profile data')
data = profile_data['data']
user_id = data.get('userId') or data['userIdStr']
profile = self._download_json(
f'https://archive.vine.co/profiles/{user_id}.json', user_id)
entries = [
self.url_result(
f'https://vine.co/v/{post_id}', ie='Vine', video_id=post_id)
for post_id in profile['posts']
if post_id and isinstance(post_id, str)]
return self.playlist_result(
entries, user, profile.get('username'), profile.get('description'))

View file

@ -124,7 +124,7 @@ def _parse_video_info(self, video_info, video_id=None):
class WeiboIE(WeiboBaseIE):
_VALID_URL = r'https?://(?:m\.weibo\.cn/status|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:m\.weibo\.cn/(?:status|detail)|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://weibo.com/7827771738/N4xlMvjhI',
'info_dict': {
@ -164,6 +164,25 @@ class WeiboIE(WeiboBaseIE):
'like_count': int,
'repost_count': int,
},
}, {
'url': 'https://m.weibo.cn/detail/4189191225395228',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagramshibainu.gaku http://t.cn/RHbmjzW ',
'duration': 53,
'timestamp': 1514264429,
'upload_date': '20171226',
'thumbnail': r're:https://.*\.jpg',
'uploader': '柴犬柴犬',
'uploader_id': '5926682210',
'uploader_url': 'https://weibo.com/u/5926682210',
'view_count': int,
'like_count': int,
'repost_count': int,
},
}, {
'url': 'https://weibo.com/0/4224132150961381',
'note': 'no playback_list example',

View file

@ -20,7 +20,7 @@
class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com|xhday\.com|xhvid\.com)'
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.(?:com|desi)|xhday\.com|xhvid\.com)'
_VALID_URL = rf'''(?x)
https?://
(?:[^/?#]+\.)?{_DOMAINS}/
@ -31,7 +31,7 @@ class XHamsterIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'md5': '34e1ab926db5dc2750fed9e1f34304bb',
'md5': 'e009ea6b849b129e3bebaeb9cf0dee51',
'info_dict': {
'id': '1509445',
'display_id': 'femaleagent-shy-beauty-takes-the-bait',
@ -43,6 +43,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'ruseful2011',
'duration': 893,
'age_limit': 18,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/u3Vr5F2vvcU3yK59_jJqVA/001/509/445/1280x720.8.jpg',
'uploader_url': 'https://xhamster.com/users/ruseful2011',
'description': '',
'view_count': int,
'comment_count': int,
},
}, {
'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
@ -56,6 +61,10 @@ class XHamsterIE(InfoExtractor):
'uploader': 'jojo747400',
'duration': 200,
'age_limit': 18,
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/kk5nio_iR-h4Z3frfVtoDw/002/221/348/1280x720.4.jpg',
'comment_count': int,
},
'params': {
'skip_download': True,
@ -73,6 +82,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'parejafree',
'duration': 72,
'age_limit': 18,
'comment_count': int,
'uploader_url': 'https://xhamster.com/users/parejafree',
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/xc8MSwVKcsQeRRiTT-saMQ/005/667/973/1280x720.2.jpg',
},
'params': {
'skip_download': True,
@ -122,6 +136,9 @@ class XHamsterIE(InfoExtractor):
}, {
'url': 'https://xhvid.com/videos/lk-mm-xhc6wn6',
'only_matching': True,
}, {
'url': 'https://xhamster20.desi/videos/my-verification-video-scottishmistress23-11937369',
'only_matching': True,
}]
def _real_extract(self, url):
@ -267,7 +284,7 @@ def get_height(s):
video, lambda x: x['rating']['likes'], int)),
'dislike_count': int_or_none(try_get(
video, lambda x: x['rating']['dislikes'], int)),
'comment_count': int_or_none(video.get('views')),
'comment_count': int_or_none(video.get('comments')),
'age_limit': age_limit if age_limit is not None else 18,
'categories': categories,
'formats': formats,

View file

@ -5,12 +5,13 @@
int_or_none,
js_to_json,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import traverse_obj
class XiaoHongShuIE(InfoExtractor):
_VALID_URL = r'https?://www\.xiaohongshu\.com/explore/(?P<id>[\da-f]+)'
_VALID_URL = r'https?://www\.xiaohongshu\.com/(?:explore|discovery/item)/(?P<id>[\da-f]+)'
IE_DESC = '小红书'
_TESTS = [{
'url': 'https://www.xiaohongshu.com/explore/6411cf99000000001300b6d9',
@ -25,6 +26,18 @@ class XiaoHongShuIE(InfoExtractor):
'duration': 101.726,
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[a-z0-9]+/[\w]+',
},
}, {
'url': 'https://www.xiaohongshu.com/discovery/item/674051740000000007027a15?xsec_token=CBgeL8Dxd1ZWBhwqRd568gAZ_iwG-9JIf9tnApNmteU2E=',
'info_dict': {
'id': '674051740000000007027a15',
'ext': 'mp4',
'title': '相互喜欢就可以了',
'uploader_id': '63439913000000001901f49a',
'duration': 28.073,
'description': '#广州[话题]# #深圳[话题]# #香港[话题]# #街头采访[话题]# #是你喜欢的类型[话题]#',
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[\da-f]+/[^/]+',
'tags': ['广州', '深圳', '香港', '街头采访', '是你喜欢的类型'],
},
}]
def _real_extract(self, url):
@ -34,7 +47,7 @@ def _real_extract(self, url):
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json)
note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note'))
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ('h264', 'av1', 'h265'), ...))
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ..., ...))
formats = []
for info in video_info:
@ -44,18 +57,32 @@ def _real_extract(self, url):
'height': ('height', {int_or_none}),
'vcodec': ('videoCodec', {str}),
'acodec': ('audioCodec', {str}),
'abr': ('audioBitrate', {int_or_none}),
'vbr': ('videoBitrate', {int_or_none}),
'abr': ('audioBitrate', {int_or_none(scale=1000)}),
'vbr': ('videoBitrate', {int_or_none(scale=1000)}),
'audio_channels': ('audioChannels', {int_or_none}),
'tbr': ('avgBitrate', {int_or_none}),
'tbr': ('avgBitrate', {int_or_none(scale=1000)}),
'format': ('qualityType', {str}),
'filesize': ('size', {int_or_none}),
'duration': ('duration', {float_or_none(scale=1000)}),
})
formats.extend(traverse_obj(info, (('mediaUrl', ('backupUrls', ...)), {
formats.extend(traverse_obj(info, (('masterUrl', ('backupUrls', ...)), {
lambda u: url_or_none(u) and {'url': u, **format_info}})))
if origin_key := traverse_obj(note_info, ('video', 'consumer', 'originVideoKey', {str})):
# Not using a head request because of false negatives
urlh = self._request_webpage(
f'https://sns-video-bd.xhscdn.com/{origin_key}', display_id,
'Checking original video availability', 'Original video is not available', fatal=False)
if urlh:
formats.append({
'format_id': 'direct',
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
'url': urlh.url,
'quality': 1,
})
thumbnails = []
for image_info in traverse_obj(note_info, ('imageList', ...)):
thumbnail_info = traverse_obj(image_info, {

File diff suppressed because it is too large Load diff

View file

@ -5,7 +5,6 @@
NO_DEFAULT,
ExtractorError,
determine_ext,
extract_attributes,
float_or_none,
int_or_none,
join_nonempty,
@ -25,6 +24,11 @@ class ZDFBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd', 'fhd', 'uhd')
def _download_v2_doc(self, document_id):
return self._download_json(
f'https://zdf-prod-futura.zdf.de/mediathekV2/document/{document_id}',
document_id)
def _call_api(self, url, video_id, item, api_token=None, referrer=None):
headers = {}
if api_token:
@ -133,6 +137,116 @@ def _extract_player(self, webpage, video_id, fatal=True):
group='json'),
video_id)
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = traverse_obj(t, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template'),
), get_all=False)
if not ptmd_path:
raise ExtractorError('Could not extract ptmd_path')
info = self._extract_ptmd(
urljoin(url, ptmd_path.replace('{playerId}', 'android_native_5')), video_id, player['apiToken'], url)
thumbnails = []
layouts = try_get(
content, lambda x: x['teaserImageRef']['layouts'], dict)
if layouts:
for layout_key, layout_url in layouts.items():
layout_url = url_or_none(layout_url)
if not layout_url:
continue
thumbnail = {
'url': layout_url,
'format_id': layout_key,
}
mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
if mobj:
thumbnail.update({
'width': int(mobj.group('width')),
'height': int(mobj.group('height')),
})
thumbnails.append(thumbnail)
chapter_marks = t.get('streamAnchorTag') or []
chapter_marks.append({'anchorOffset': int_or_none(t.get('duration'))})
chapters = [{
'start_time': chap.get('anchorOffset'),
'end_time': next_chap.get('anchorOffset'),
'title': chap.get('anchorLabel'),
} for chap, next_chap in zip(chapter_marks, chapter_marks[1:])]
return merge_dicts(info, {
'title': title,
'description': content.get('leadParagraph') or content.get('teasertext'),
'duration': int_or_none(t.get('duration')),
'timestamp': unified_timestamp(content.get('editorialDate')),
'thumbnails': thumbnails,
'chapters': chapters or None,
'episode': title,
**traverse_obj(content, ('programmeItem', 0, 'http://zdf.de/rels/target', {
'series_id': ('http://zdf.de/rels/cmdm/series', 'seriesUuid', {str}),
'series': ('http://zdf.de/rels/cmdm/series', 'seriesTitle', {str}),
'season': ('http://zdf.de/rels/cmdm/season', 'seasonTitle', {str}),
'season_number': ('http://zdf.de/rels/cmdm/season', 'seasonNumber', {int_or_none}),
'season_id': ('http://zdf.de/rels/cmdm/season', 'seasonUuid', {str}),
'episode_number': ('episodeNumber', {int_or_none}),
'episode_id': ('contentId', {str}),
})),
})
def _extract_regular(self, url, player, video_id, query=None):
player_url = player['content']
content = self._call_api(
update_url_query(player_url, query),
video_id, 'content', player['apiToken'], url)
return self._extract_entry(player_url, player, content, video_id)
def _extract_mobile(self, video_id):
video = self._download_v2_doc(video_id)
formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
document = formitaeten and video['document']
if formitaeten:
title = document['titel']
content_id = document['basename']
format_urls = set()
for f in formitaeten or []:
self._extract_format(content_id, formats, format_urls, f)
thumbnails = []
teaser_bild = document.get('teaserBild')
if isinstance(teaser_bild, dict):
for thumbnail_key, thumbnail in teaser_bild.items():
thumbnail_url = try_get(
thumbnail, lambda x: x['url'], str)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'id': thumbnail_key,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': content_id,
'title': title,
'description': document.get('beschreibung'),
'duration': int_or_none(document.get('length')),
'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
try_get(video, lambda x: x['meta']['editorialDate'], str)),
'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(document),
'formats': formats,
}
class ZDFIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
@ -183,12 +297,20 @@ class ZDFIE(ZDFBaseIE):
'info_dict': {
'id': '151025_magie_farben2_tex',
'ext': 'mp4',
'duration': 2615.0,
'title': 'Die Magie der Farben (2/2)',
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
'duration': 2615,
'timestamp': 1465021200,
'upload_date': '20160604',
'thumbnail': 'https://www.zdf.de/assets/mauve-im-labor-100~768x432?cb=1464909117806',
'upload_date': '20160604',
'episode': 'Die Magie der Farben (2/2)',
'episode_id': 'POS_954f4170-36a5-4a41-a6cf-78f1f3b1f127',
'season': 'Staffel 1',
'series': 'Die Magie der Farben',
'season_number': 1,
'series_id': 'a39900dd-cdbd-4a6a-a413-44e8c6ae18bc',
'season_id': '5a92e619-8a0f-4410-a3d5-19c76fbebb37',
'episode_number': 2,
},
}, {
'url': 'https://www.zdf.de/funk/druck-11790/funk-alles-ist-verzaubert-102.html',
@ -196,12 +318,13 @@ class ZDFIE(ZDFBaseIE):
'info_dict': {
'ext': 'mp4',
'id': 'video_funk_1770473',
'duration': 1278,
'description': 'Die Neue an der Schule verdreht Ismail den Kopf.',
'duration': 1278.0,
'title': 'Alles ist verzaubert',
'description': 'Die Neue an der Schule verdreht Ismail den Kopf.',
'timestamp': 1635520560,
'upload_date': '20211029',
'thumbnail': 'https://www.zdf.de/assets/teaser-funk-alles-ist-verzaubert-102~1920x1080?cb=1663848412907',
'upload_date': '20211029',
'episode': 'Alles ist verzaubert',
},
}, {
# Same as https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche
@ -244,123 +367,55 @@ class ZDFIE(ZDFBaseIE):
'title': 'Das Geld anderer Leute',
'description': 'md5:cb6f660850dc5eb7d1ab776ea094959d',
'duration': 2581.0,
'timestamp': 1675160100,
'upload_date': '20230131',
'timestamp': 1728983700,
'upload_date': '20241015',
'thumbnail': 'https://epg-image.zdf.de/fotobase-webdelivery/images/e2d7e55a-09f0-424e-ac73-6cac4dd65f35?layout=2400x1350',
'series': 'SOKO Stuttgart',
'series_id': 'f862ce9a-6dd1-4388-a698-22b36ac4c9e9',
'season': 'Staffel 11',
'season_number': 11,
'season_id': 'ae1b4990-6d87-4970-a571-caccf1ba2879',
'episode': 'Das Geld anderer Leute',
'episode_number': 10,
'episode_id': 'POS_7f367934-f2f0-45cb-9081-736781ff2d23',
},
}, {
'url': 'https://www.zdf.de/dokumentation/terra-x/unser-gruener-planet-wuesten-doku-100.html',
'info_dict': {
'id': '220605_dk_gruener_planet_wuesten_tex',
'id': '220525_green_planet_makingof_1_tropen_tex',
'ext': 'mp4',
'title': 'Unser grüner Planet - Wüsten',
'description': 'md5:4fc647b6f9c3796eea66f4a0baea2862',
'duration': 2613.0,
'timestamp': 1654450200,
'upload_date': '20220605',
'format_note': 'uhd, main',
'thumbnail': 'https://www.zdf.de/assets/saguaro-kakteen-102~3840x2160?cb=1655910690796',
'title': 'Making-of Unser grüner Planet - Tropen',
'description': 'md5:d7c6949dc7c75c73c4ad51c785fb0b79',
'duration': 435.0,
'timestamp': 1653811200,
'upload_date': '20220529',
'format_note': 'hd, main',
'thumbnail': 'https://www.zdf.de/assets/unser-gruener-planet-making-of-1-tropen-100~3840x2160?cb=1653493335577',
'episode': 'Making-of Unser grüner Planet - Tropen',
},
'skip': 'No longer available: "Leider kein Video verfügbar"',
}, {
'url': 'https://www.zdf.de/serien/northern-lights/begegnung-auf-der-bruecke-100.html',
'info_dict': {
'id': '240319_2310_sendung_not',
'ext': 'mp4',
'title': 'Begegnung auf der Brücke',
'description': 'md5:e53a555da87447f7f1207f10353f8e45',
'thumbnail': 'https://epg-image.zdf.de/fotobase-webdelivery/images/c5ff1d1f-f5c8-4468-86ac-1b2f1dbecc76?layout=2400x1350',
'upload_date': '20250203',
'duration': 3083.0,
'timestamp': 1738546500,
'series_id': '1d7a1879-01ee-4468-8237-c6b4ecd633c7',
'series': 'Northern Lights',
'season': 'Staffel 1',
'season_number': 1,
'season_id': '22ac26a2-4ea2-4055-ac0b-98b755cdf718',
'episode': 'Begegnung auf der Brücke',
'episode_number': 1,
'episode_id': 'POS_71049438-024b-471f-b472-4fe2e490d1fb',
},
}]
def _extract_entry(self, url, player, content, video_id):
title = content.get('title') or content['teaserHeadline']
t = content['mainVideoContent']['http://zdf.de/rels/target']
ptmd_path = traverse_obj(t, (
(('streams', 'default'), None),
('http://zdf.de/rels/streams/ptmd', 'http://zdf.de/rels/streams/ptmd-template'),
), get_all=False)
if not ptmd_path:
raise ExtractorError('Could not extract ptmd_path')
info = self._extract_ptmd(
urljoin(url, ptmd_path.replace('{playerId}', 'android_native_5')), video_id, player['apiToken'], url)
thumbnails = []
layouts = try_get(
content, lambda x: x['teaserImageRef']['layouts'], dict)
if layouts:
for layout_key, layout_url in layouts.items():
layout_url = url_or_none(layout_url)
if not layout_url:
continue
thumbnail = {
'url': layout_url,
'format_id': layout_key,
}
mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
if mobj:
thumbnail.update({
'width': int(mobj.group('width')),
'height': int(mobj.group('height')),
})
thumbnails.append(thumbnail)
chapter_marks = t.get('streamAnchorTag') or []
chapter_marks.append({'anchorOffset': int_or_none(t.get('duration'))})
chapters = [{
'start_time': chap.get('anchorOffset'),
'end_time': next_chap.get('anchorOffset'),
'title': chap.get('anchorLabel'),
} for chap, next_chap in zip(chapter_marks, chapter_marks[1:])]
return merge_dicts(info, {
'title': title,
'description': content.get('leadParagraph') or content.get('teasertext'),
'duration': int_or_none(t.get('duration')),
'timestamp': unified_timestamp(content.get('editorialDate')),
'thumbnails': thumbnails,
'chapters': chapters or None,
})
def _extract_regular(self, url, player, video_id):
content = self._call_api(
player['content'], video_id, 'content', player['apiToken'], url)
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id):
video = self._download_json(
f'https://zdf-cdn.live.cellular.de/mediathekV2/document/{video_id}',
video_id)
formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
document = formitaeten and video['document']
if formitaeten:
title = document['titel']
content_id = document['basename']
format_urls = set()
for f in formitaeten or []:
self._extract_format(content_id, formats, format_urls, f)
thumbnails = []
teaser_bild = document.get('teaserBild')
if isinstance(teaser_bild, dict):
for thumbnail_key, thumbnail in teaser_bild.items():
thumbnail_url = try_get(
thumbnail, lambda x: x['url'], str)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'id': thumbnail_key,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': content_id,
'title': title,
'description': document.get('beschreibung'),
'duration': int_or_none(document.get('length')),
'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
try_get(video, lambda x: x['meta']['editorialDate'], str)),
'thumbnails': thumbnails,
'subtitles': self._extract_subtitles(document),
'formats': formats,
}
def _real_extract(self, url):
video_id = self._match_id(url)
@ -368,13 +423,13 @@ def _real_extract(self, url):
if webpage:
player = self._extract_player(webpage, url, fatal=False)
if player:
return self._extract_regular(url, player, video_id)
return self._extract_regular(url, player, video_id, query={'profile': 'player-3'})
return self._extract_mobile(video_id)
class ZDFChannelIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/?#]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
'info_dict': {
@ -387,18 +442,19 @@ class ZDFChannelIE(ZDFBaseIE):
'info_dict': {
'id': 'planet-e',
'title': 'planet e.',
'description': 'md5:87e3b9c66a63cf1407ee443d2c4eb88e',
},
'playlist_mincount': 50,
}, {
'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest',
'info_dict': {
'id': 'aktenzeichen-xy-ungeloest',
'title': 'Aktenzeichen XY... ungelöst',
'entries': "lambda x: not any('xy580-fall1-kindermoerder-gesucht-100' in e['url'] for e in x)",
'title': 'Aktenzeichen XY... Ungelöst',
'description': 'md5:623ede5819c400c6d04943fa8100e6e7',
},
'playlist_mincount': 2,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/',
'url': 'https://www.zdf.de/serien/taunuskrimi/',
'only_matching': True,
}]
@ -406,36 +462,42 @@ class ZDFChannelIE(ZDFBaseIE):
def suitable(cls, url):
return False if ZDFIE.suitable(url) else super().suitable(url)
def _og_search_title(self, webpage, fatal=False):
title = super()._og_search_title(webpage, fatal=fatal)
return re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', title or '')[0] or None
def _extract_entry(self, entry):
return self.url_result(
entry['sharingUrl'], ZDFIE, **traverse_obj(entry, {
'id': ('basename', {str}),
'title': ('titel', {str}),
'description': ('beschreibung', {str}),
'duration': ('length', {float_or_none}),
'season_number': ('seasonNumber', {int_or_none}),
'episode_number': ('episodeNumber', {int_or_none}),
}))
def _entries(self, data, document_id):
for entry in traverse_obj(data, (
'cluster', lambda _, v: v['type'] == 'teaser',
# If 'brandId' differs, it is a 'You might also like' video. Filter these out
'teaser', lambda _, v: v['type'] == 'video' and v['brandId'] == document_id and v['sharingUrl'],
)):
yield self._extract_entry(entry)
def _real_extract(self, url):
channel_id = self._match_id(url)
webpage = self._download_webpage(url, channel_id)
document_id = self._search_regex(
r'docId\s*:\s*(["\'])(?P<doc_id>(?:(?!\1).)+)\1', webpage, 'document id', group='doc_id')
data = self._download_v2_doc(document_id)
matches = re.finditer(
rf'''<div\b[^>]*?\sdata-plusbar-id\s*=\s*(["'])(?P<p_id>[\w-]+)\1[^>]*?\sdata-plusbar-url=\1(?P<url>{ZDFIE._VALID_URL})\1''',
webpage)
main_video = traverse_obj(data, (
'cluster', lambda _, v: v['type'] == 'teaserContent',
'teaser', lambda _, v: v['type'] == 'video' and v['basename'] and v['sharingUrl'], any)) or {}
if self._downloader.params.get('noplaylist', False):
entry = next(
(self.url_result(m.group('url'), ie=ZDFIE.ie_key()) for m in matches),
None)
self.to_screen('Downloading just the main video because of --no-playlist')
if entry:
return entry
else:
self.to_screen(f'Downloading playlist {channel_id} - add --no-playlist to download just the main video')
if not self._yes_playlist(channel_id, main_video.get('basename')):
return self._extract_entry(main_video)
def check_video(m):
v_ref = self._search_regex(
r'''(<a\b[^>]*?\shref\s*=[^>]+?\sdata-target-id\s*=\s*(["']){}\2[^>]*>)'''.format(m.group('p_id')),
webpage, 'check id', default='')
v_ref = extract_attributes(v_ref)
return v_ref.get('data-target-video-type') != 'novideo'
return self.playlist_from_matches(
(m.group('url') for m in matches if check_video(m)),
channel_id, self._og_search_title(webpage, fatal=False))
return self.playlist_result(
self._entries(data, document_id), channel_id,
re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', self._og_search_title(webpage) or '')[0] or None,
join_nonempty(
'headline', 'text', delim='\n\n',
from_dict=traverse_obj(data, ('shortText', {dict}), default={})) or None)

View file

@ -25,7 +25,7 @@ def zeroise(x):
with contextlib.suppress(TypeError):
if math.isnan(x): # NB: NaN cannot be checked by membership
return 0
return x
return int(float(x))
def wrapped(a, b):
return op(zeroise(a), zeroise(b)) & 0xffffffff
@ -95,6 +95,61 @@ def _js_ternary(cndn, if_true=True, if_false=False):
return if_true
# Ref: https://es5.github.io/#x9.8.1
def js_number_to_string(val: float, radix: int = 10):
if radix in (JS_Undefined, None):
radix = 10
assert radix in range(2, 37), 'radix must be an integer at least 2 and no greater than 36'
if math.isnan(val):
return 'NaN'
if val == 0:
return '0'
if math.isinf(val):
return '-Infinity' if val < 0 else 'Infinity'
if radix == 10:
# TODO: implement special cases
...
ALPHABET = b'0123456789abcdefghijklmnopqrstuvwxyz.-'
result = collections.deque()
sign = val < 0
val = abs(val)
fraction, integer = math.modf(val)
delta = max(math.nextafter(.0, math.inf), math.ulp(val) / 2)
if fraction >= delta:
result.append(-2) # `.`
while fraction >= delta:
delta *= radix
fraction, digit = math.modf(fraction * radix)
result.append(int(digit))
# if we need to round, propagate potential carry through fractional part
needs_rounding = fraction > 0.5 or (fraction == 0.5 and int(digit) & 1)
if needs_rounding and fraction + delta > 1:
for index in reversed(range(1, len(result))):
if result[index] + 1 < radix:
result[index] += 1
break
result.pop()
else:
integer += 1
break
integer, digit = divmod(int(integer), radix)
result.appendleft(digit)
while integer > 0:
integer, digit = divmod(integer, radix)
result.appendleft(digit)
if sign:
result.appendleft(-1) # `-`
return bytes(ALPHABET[digit] for digit in result).decode('ascii')
# Ref: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence
_OPERATORS = { # None => Defined in JSInterpreter._operator
'?': None,

View file

@ -1370,12 +1370,12 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
help='Allow Unicode characters, "&" and spaces in filenames (default)')
filesystem.add_option(
'--windows-filenames',
action='store_true', dest='windowsfilenames', default=False,
action='store_true', dest='windowsfilenames', default=None,
help='Force filenames to be Windows-compatible')
filesystem.add_option(
'--no-windows-filenames',
action='store_false', dest='windowsfilenames',
help='Make filenames Windows-compatible only if using Windows (default)')
help='Sanitize filenames only minimally')
filesystem.add_option(
'--trim-filenames', '--trim-file-names', metavar='LENGTH',
dest='trim_file_name', default=0, type=int,

View file

@ -65,9 +65,14 @@ def _get_variant_and_executable_path():
machine = '_legacy' if version_tuple(platform.mac_ver()[0]) < (10, 15) else ''
else:
machine = f'_{platform.machine().lower()}'
is_64bits = sys.maxsize > 2**32
# Ref: https://en.wikipedia.org/wiki/Uname#Examples
if machine[1:] in ('x86', 'x86_64', 'amd64', 'i386', 'i686'):
machine = '_x86' if platform.architecture()[0][:2] == '32' else ''
machine = '_x86' if not is_64bits else ''
# platform.machine() on 32-bit raspbian OS may return 'aarch64', so check "64-bitness"
# See: https://github.com/yt-dlp/yt-dlp/issues/11813
elif machine[1:] == 'aarch64' and not is_64bits:
machine = '_armv7l'
# sys.executable returns a /tmp/ path for staticx builds (linux_static)
# Ref: https://staticx.readthedocs.io/en/latest/usage.html#run-time-information
if static_exe_path := os.getenv('STATICX_PROG_PATH'):
@ -525,11 +530,16 @@ def filename(self):
@functools.cached_property
def cmd(self):
"""The command-line to run the executable, if known"""
argv = None
# There is no sys.orig_argv in py < 3.10. Also, it can be [] when frozen
if getattr(sys, 'orig_argv', None):
return sys.orig_argv
argv = sys.orig_argv
elif getattr(sys, 'frozen', False):
return sys.argv
argv = sys.argv
# linux_static exe's argv[0] will be /tmp/staticx-NNNN/yt-dlp_linux if we don't fixup here
if argv and os.getenv('STATICX_PROG_PATH'):
argv = [self.filename, *argv[1:]]
return argv
def restart(self):
"""Restart the executable"""

View file

@ -686,7 +686,8 @@ def _sanitize_path_parts(parts):
elif part == '..':
if sanitized_parts and sanitized_parts[-1] != '..':
sanitized_parts.pop()
sanitized_parts.append('..')
else:
sanitized_parts.append('..')
continue
# Replace invalid segments with `#`
# - trailing dots and spaces (`asdf...` => `asdf..#`)
@ -703,7 +704,8 @@ def sanitize_path(s, force=False):
if not force:
return s
root = '/' if s.startswith('/') else ''
return root + '/'.join(_sanitize_path_parts(s.split('/')))
path = '/'.join(_sanitize_path_parts(s.split('/')))
return root + path if root or path else '.'
normed = s.replace('/', '\\')
@ -722,7 +724,8 @@ def sanitize_path(s, force=False):
root = '\\' if normed[:1] == '\\' else ''
parts = normed.split('\\')
return root + '\\'.join(_sanitize_path_parts(parts))
path = '\\'.join(_sanitize_path_parts(parts))
return root + path if root or path else '.'
def sanitize_url(url, *, scheme='http'):
@ -5331,7 +5334,7 @@ class FormatSorter:
settings = {
'vcodec': {'type': 'ordered', 'regex': True,
'order': ['av0?1', 'vp0?9.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'order': ['av0?1', r'vp0?9\.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'acodec': {'type': 'ordered', 'regex': True,
'order': ['[af]lac', 'wav|aiff', 'opus', 'vorbis|ogg', 'aac', 'mp?4a?', 'mp3', 'ac-?4', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']},
'hdr': {'type': 'ordered', 'regex': True, 'field': 'dynamic_range',
@ -5629,6 +5632,24 @@ def filesize_from_tbr(tbr, duration):
return int(duration * tbr * (1000 / 8))
def _request_dump_filename(url, video_id, data=None, trim_length=None):
if data is not None:
data = hashlib.md5(data).hexdigest()
basen = join_nonempty(video_id, data, url, delim='_')
trim_length = trim_length or 240
if len(basen) > trim_length:
h = '___' + hashlib.md5(basen.encode()).hexdigest()
basen = basen[:trim_length - len(h)] + h
filename = sanitize_filename(f'{basen}.dump', restricted=True)
# Working around MAX_PATH limitation on Windows (see
# http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
if os.name == 'nt':
absfilepath = os.path.abspath(filename)
if len(absfilepath) > 259:
filename = fR'\\?\{absfilepath}'
return filename
# XXX: Temporary
class _YDLLogger:
def __init__(self, ydl=None):

View file

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2024.12.03'
__version__ = '2025.02.19'
RELEASE_GIT_HEAD = '2b67ac300ac8b44368fb121637d1743cea8c5b6b'
RELEASE_GIT_HEAD = '4985a4041770eaa0016271809a1fd950dc809a55'
VARIANT = None
@ -12,4 +12,4 @@
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2024.12.03'
_pkg_version = '2025.02.19'