1959 Commits
0.7 ... master

Author SHA1 Message Date
Yoshiko2
997390d383 Merge pull request #1103 from yoshiko2/dependabot/pip/urllib3-1.26.18
Bump urllib3 from 1.26.5 to 1.26.18
2023-10-19 11:44:44 +08:00
Yoshiko2
bcd892ee1c Merge pull request #1095 from yoshiko2/dependabot/pip/pillow-10.0.1
Bump pillow from 9.5.0 to 10.0.1
2023-10-19 11:44:17 +08:00
Yoshiko2
19977d177e Add files via upload 2023-10-19 11:43:54 +08:00
dependabot[bot]
21b02da32e Bump urllib3 from 1.26.5 to 1.26.18
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.5 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.5...1.26.18)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-18 02:04:56 +00:00
Yoshiko2
aa893a6d2b Merge pull request #1088 from todoXu/master
修复nfo命名-UC-UC重复的问题
2023-10-16 07:33:54 +08:00
dependabot[bot]
cbdb7fa492 Bump pillow from 9.5.0 to 10.0.1
Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.5.0 to 10.0.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/9.5.0...10.0.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-04 01:50:18 +00:00
todoXu
925ae9842c 修复nfo命名-UC-UC重复的问题 2023-09-25 17:00:43 +08:00
yoshiko2
4b58a24592 Update config.ini 2023-09-17 23:35:43 +08:00
Yoshiko2
ef0957f62f Merge pull request #1083 from realize096/oscs_fix_ck257t0au51vtpiu90n0
fix(sec): upgrade urllib3 to 1.26.5
2023-09-17 22:10:53 +08:00
Yoshiko2
363f35eeb6 Merge pull request #1082 from biaji/patch-6
修正airav概率性无法获取封面的问题
2023-09-17 22:10:44 +08:00
Yoshiko2
2e26559e2d Merge pull request #1079 from popjdh/patch-1
Update dlsite.py
2023-09-17 22:10:28 +08:00
Yoshiko2
773690d2d6 Merge pull request #1076 from todoXu/master
增加direct是否写入nfo文件配置,支持 -U 匹配破解 -UC匹配 破解+字幕
2023-09-17 22:10:18 +08:00
realize096
04926005ff update urllib3 1.25.11 to 1.26.5 2023-09-15 20:54:18 +08:00
biaji
333ac49f95 修正airav概率性无法获取封面的问题
当javbus无法获取封面时,取airav的img_url作为封面
2023-09-15 19:42:03 +08:00
Trance233
cbe6f915fe Update dlsite.py
修正"系列名"获取XPATH
2023-09-08 18:50:11 +08:00
todoXu
ee4ea11706 支持 -U 匹配破解 -UC匹配 破解+字幕 2023-09-07 00:24:56 +08:00
todoXu
fbc741521d 增加direct是否写入nfo文件配置 2023-09-02 03:24:19 +08:00
Yoshiko2
bea12db8bb Update README.md 2023-08-16 16:37:53 +08:00
Yoshiko2
6da886e046 Update README.md 2023-08-12 23:21:09 +08:00
Yoshiko2
07b6f6ef73 Update README.md 2023-08-12 23:19:19 +08:00
yoshiko2
5684f70895 Update sites #4 2023-08-07 04:30:54 +08:00
yoshiko2
692ceeff7b Update sites #3 2023-08-07 03:49:21 +08:00
Yoshiko2
9020aa0530 Update 6.6.7 2023-08-07 03:22:05 +08:00
Yoshiko2
a63e1d4403 Update main.yml 2023-08-07 03:21:07 +08:00
Yoshiko2
40482ce04b Merge pull request #1059 from TachibanaKimika/master
添加自定义正则 & 番号大写转换配置
2023-08-07 03:11:56 +08:00
TachibanaKimika
65e0ff665d fix: 统一数据格式 & 处理无效正则 2023-08-06 16:14:50 +08:00
TachibanaKimika
35baf17160 doc: 注释 2023-08-06 16:01:17 +08:00
TachibanaKimika
b00c9a2587 fix: 大小写转换写入json_data 2023-08-06 16:00:06 +08:00
TachibanaKimika
e23a25b9b7 feat: 添加自定义正则 & 番号大写转换配置 2023-08-06 15:45:12 +08:00
Yoshiko2
46c0cd3030 Delete index.htm 2023-08-06 04:43:04 +08:00
Yoshiko2
c1ef42d61f Create index.htm 2023-08-06 04:39:51 +08:00
Yoshiko2
35dc7ecf25 Update main.yml 2023-08-06 04:19:51 +08:00
Yoshiko2
2012d0d8eb Update main.yml 2023-08-06 04:18:47 +08:00
Yoshiko2
194b7b3870 Update main.yml 2023-08-06 03:54:43 +08:00
Yoshiko2
08c49836da Update main.yml 2023-08-06 03:43:15 +08:00
Yoshiko2
7581d73b8c Update main.yml 2023-08-06 03:39:13 +08:00
Yoshiko2
3270604b95 Update main.yml 2023-08-06 03:37:27 +08:00
Yoshiko2
c944b5e454 Update main.yml 2023-08-06 03:29:07 +08:00
Yoshiko2
2517cfa3f8 Update main.yml 2023-08-06 03:19:05 +08:00
Yoshiko2
659d835617 Update main.yml 2023-08-06 03:15:42 +08:00
Yoshiko2
2379b9dd26 Update main.yml 2023-08-06 02:57:20 +08:00
Yoshiko2
38d23a93cb Update main.yml 2023-08-06 02:52:59 +08:00
Yoshiko2
abfcd969a7 Update main.yml 2023-08-06 02:42:42 +08:00
Yoshiko2
0414445a85 Update main.yml 2023-08-06 02:39:57 +08:00
Yoshiko2
934ba528e6 Update README.md 2023-07-31 12:08:24 +08:00
Yoshiko2
08605f254a Merge pull request #1052 from popjdh/master
对GETCHU使用指定编码"EUC-JIS-2004"("EUC-JP"编码的超集)
2023-07-21 03:09:46 +08:00
popjdh
613288d07b 对GETCHU使用指定编码"EUC-JIS-2004"("EUC-JP"编码的超集) 2023-07-20 16:46:08 +08:00
Yoshiko2
5029a78c9b Merge pull request #1047 from popjdh/master
将GETCHU设置指定编码(euc-jp)
2023-07-09 23:28:15 +08:00
Trance233
6175fc8e05 Merge pull request #1 from yoshiko2/master
将GETCHU设置指定编码(euc-jp)
2023-07-09 17:04:19 +08:00
popjdh
fb21b07dcf 对GETCHU使用指定编码"EUC-JP"(为GETCHU响应Header中指定的编码) 2023-07-09 16:46:25 +08:00
Yoshiko2
33b4a04aa3 Merge pull request #1045 from summershrimp/master
Only process file with suffix, not directory.
2023-07-09 00:43:22 +08:00
yoshiko2
26b82b1725 Update sites #2 2023-07-09 00:42:29 +08:00
Hakusai Zhang
0a263f665c only process file with suffix, not directory. 2023-07-08 18:20:30 +08:00
yoshiko2
3597a9590d Add site pxolle 2023-07-08 16:53:01 +08:00
yoshiko2
c3e5fdb09f Update 6.6.6 2023-07-08 04:59:27 +08:00
yoshiko2
9b5af4bedd Update 6.6.6 2023-07-08 04:57:57 +08:00
yoshiko2
43e9d7727e Update 6.6.6 2023-07-08 04:55:31 +08:00
yoshiko2
47a271f938 Update sites 2023-07-08 04:52:23 +08:00
yoshiko2
78619f5909 Set lower pillow version 2023-07-08 04:51:30 +08:00
yoshiko2
b7b94b8f28 Fix sources select func 2023-07-05 03:42:19 +08:00
Yoshiko2
a1408cefc5 Merge pull request #1040 from wsndshx/master
Update api.py #1033
2023-07-04 05:18:26 +08:00
zzlwd
e56dbafc37 Update api.py #1033
更新“无演员自动填充佚名”功能中对目标语言的判断条件,添加符合DeepLX的目标语言“ZH”,修正原代码在使用DeepLX翻译到中文时将无演员的情况填充为Anonymous的问题
2023-07-03 13:34:22 +08:00
yoshiko2
835767b556 Add site msin #4 2023-07-02 00:24:44 +08:00
yoshiko2
5cefc85462 Add site msin #3 2023-07-01 02:20:51 +08:00
yoshiko2
b69467f288 Update 6.6.5 2023-06-30 20:30:55 +08:00
yoshiko2
abd6503ae4 Add site msin #2 2023-06-30 20:30:46 +08:00
yoshiko2
3a4fe159db Fix multiple watermark 2023-06-29 03:31:00 +08:00
yoshiko2
185589582f Fix sites order for special sites 2023-06-29 03:30:25 +08:00
yoshiko2
99279afddd Add site msin 2023-06-29 03:29:48 +08:00
Yoshiko2
a2da2f5d90 Merge pull request #1034 from wsndshx/master
Update ADC_function.py
2023-06-20 02:24:22 +08:00
zzlwd
264c208697 Update ADC_function.py
使得程序从配置文件中读取用户配置的翻译引擎
2023-06-16 18:02:07 +08:00
yoshiko2
449174f0a0 Add ISO mark 2023-06-10 00:40:01 +08:00
yoshiko2
13af581e94 Add ISO mark 2023-06-10 00:34:21 +08:00
Yoshiko2
e10cf02347 Update config.ini 2023-05-29 18:04:51 +08:00
yoshiko2
7a65844448 Update to 6.6.4 2023-05-25 23:39:44 +08:00
yoshiko2
b5adda52dd Add source pcolle 2023-05-25 23:39:35 +08:00
Yoshiko2
35143774ba Merge pull request #1023 from andyubird/Fixed-stuck-in-javdb-cover-while-loop
Fixed javdb while loop
2023-05-24 12:13:15 +08:00
Andy Fang
14bc31d93a Fixed javdb while loop 2023-05-22 00:39:02 +08:00
Yoshiko2
0e7f7f497e Merge pull request #1020 from ramondsq/master
Add support for scraping from Caribpr
2023-05-19 22:10:25 +08:00
ramond
4383491dec Add support for scraping from Caribpr 2023-05-19 02:25:23 +08:00
yoshiko2
7dcc4c218f Add mapping table loading exception handling 2023-05-09 01:23:26 +08:00
yoshiko2
f9440cf1f1 Fix auto exit in Windows 2023-05-08 06:24:31 +08:00
yoshiko2
a5f7a1afde Update mapping table when github connection is successful #2 2023-05-08 05:53:29 +08:00
yoshiko2
04feaf6d20 Update mapping table when github connection is successful 2023-05-08 05:50:12 +08:00
yoshiko2
0e0dedbfc1 Fix face recognition in compile #3 2023-05-08 05:26:56 +08:00
yoshiko2
fbc0aceb5e Fix face recognition in compile #3 2023-05-08 05:26:21 +08:00
yoshiko2
d24ae9798d Fix face recognition in compile #2 2023-05-08 05:23:55 +08:00
yoshiko2
cb920d7163 Fix face recognition in compile 2023-05-08 05:20:26 +08:00
yoshiko2
0bec305d59 Fix face recognition in one scraper 2023-05-08 05:17:32 +08:00
yoshiko2
4942550e22 Update CI test file 2023-05-08 05:15:11 +08:00
yoshiko2
7ca6f96cbc Update 6.6.3 2023-05-07 23:08:58 +08:00
yoshiko2
351a9c5afa Update CI 2023-05-07 22:54:51 +08:00
yoshiko2
18f0298a48 Update CI 2023-05-07 22:32:11 +08:00
yoshiko2
e8c3696786 Update CI 2023-05-07 22:25:00 +08:00
yoshiko2
e686c44a76 Change urllib3, pysocks version 2023-05-07 22:12:19 +08:00
yoshiko2
961b9544d0 Change error output in Debug mode 2023-05-07 22:11:52 +08:00
yoshiko2
0fe08d0173 Add argument for search number 2023-05-07 22:11:43 +08:00
yoshiko2
dc9e885848 Fix translate list #2 2023-05-05 04:51:12 +08:00
yoshiko2
96f1f3d3c5 Fix translate list 2023-05-05 04:49:21 +08:00
yoshiko2
76a80ab488 Update mapping table 2023-05-05 04:35:40 +08:00
yoshiko2
9eb12859a2 Change google translate website 2023-05-05 03:08:36 +08:00
yoshiko2
ed00a602d3 Test importlib #5 2023-05-05 02:51:43 +08:00
yoshiko2
19515b2769 Test importlib #4 2023-05-05 02:45:13 +08:00
yoshiko2
fb56fc9deb Test importlib #3 2023-05-05 02:42:53 +08:00
yoshiko2
6e2bb9712f Test importlib #2 2023-05-05 02:39:01 +08:00
yoshiko2
785fb7db34 Test importlib 2023-05-05 02:32:55 +08:00
yoshiko2
b1028b8b79 Change CI trigger #2 2023-05-05 02:23:46 +08:00
yoshiko2
3d2eae7734 Change CI trigger 2023-05-05 02:22:25 +08:00
yoshiko2
33eb64e162 Add run test in CI 2023-05-05 02:08:06 +08:00
Yoshiko2
3ac5c6f971 Merge pull request #1008 from yoshiko2/master
1
2023-05-05 01:19:39 +08:00
yoshiko2
3e4a59d4c2 Update 6.6.2 2023-05-05 01:15:40 +08:00
yoshiko2
38ee988007 Remove importlib #2 2023-05-05 01:05:29 +08:00
yoshiko2
53e7ac42a7 Remove importlib 2023-05-05 01:02:03 +08:00
yoshiko2
9000aab763 Update requirements.txt #3 2023-05-05 00:26:02 +08:00
Yoshiko2
781537cec8 Update README.md 2023-05-04 22:58:13 +08:00
Yoshiko2
0b283ecf20 Update README_ZH.md 2023-05-04 22:56:49 +08:00
yoshiko2
19f52cd165 Update requirements.txt #2 2023-05-04 17:56:12 +08:00
yoshiko2
1865d09118 Update requirements.txt 2023-05-04 17:50:30 +08:00
yoshiko2
4ca25868c3 Faster compile 2023-05-04 17:45:30 +08:00
yoshiko2
3aa959e5e2 Add dlib binary in requirements.txt 2023-05-04 17:36:34 +08:00
yoshiko2
5ebd432483 Update action file #2 2023-05-04 17:22:02 +08:00
yoshiko2
99b2c4c6d4 Update action file 2023-05-04 16:56:05 +08:00
yoshiko2
6083856137 Add UPX 2023-05-04 16:46:55 +08:00
yoshiko2
fbc0bf93e1 Add importlib 2023-05-04 16:18:18 +08:00
yoshiko2
c6bc0867fa Update to 6.6.1 2023-05-04 04:48:14 +08:00
yoshiko2
c2e00e752f Fix: "Perhaps http" in output message 2023-05-04 04:47:50 +08:00
yoshiko2
a3e4103336 Add: If the actor is anonymous, fill in "Anonymous" 2023-05-04 04:46:20 +08:00
yoshiko2
62977aa2c8 Add: Translate to target language in config.ini 2023-05-04 04:42:08 +08:00
Yoshiko2
06d5c15c9c Update to 6.5.3 2023-05-04 03:48:21 +08:00
Yoshiko2
ac16929c92 Merge pull request #1001 from vandoi/dev
Fix hack tag
2023-04-28 03:59:23 +08:00
Yoshiko2
ef7df11fc4 Merge pull request #999 from biaji/patch-5
Update avsox.py
2023-04-28 03:59:14 +08:00
Yoshiko2
271abeaa0f Merge pull request #998 from biaji/patch-4
Should return 404 for api to handle
2023-04-28 03:58:41 +08:00
Yoshiko2
e5da2cf044 Merge pull request #997 from biaji/patch-3
Remove scrapper does not exist.
2023-04-28 03:58:22 +08:00
vandoi
87923a4267 Fix hack tag 2023-04-26 22:03:43 +08:00
biaji
ecaa9565cf Update avsox.py
avsox always return star list when there is no result.
2023-04-22 20:03:16 +08:00
biaji
2152dd99e4 Should return 404 for api to handle
None can't be handled correctly by api.py
2023-04-22 19:56:01 +08:00
biaji
cb8b4a8cf3 Remove module has not exist.
mv91 scrapper doesn't be there
2023-04-22 19:14:17 +08:00
Yoshiko2
181672324e Merge pull request #993 from hejianjun/feature/自定义爬虫番号处理
Feature/自定义爬虫番号处理
2023-03-30 20:02:20 +08:00
hejianjun
1d46a70eed 1. 动态加载爬虫
2. 修复pyinstaller路径查找子包问题
3. madou的番号处理移动到爬虫内部
4. 过滤javday中多余的tag
2023-03-27 15:37:00 +08:00
Yoshiko2
24e8b75dab Merge pull request #991 from hejianjun/master
添加新的刮削网站Javmenu
2023-03-24 05:07:04 +08:00
Yoshiko2
0edc5a2b03 Merge pull request #990 from Rhythmicc/master
增加deeplx支持
2023-03-23 21:29:11 +08:00
hejianjun
09dbc9c9c4 Merge branch 'master' of github.com:yoshiko2/Movie_Data_Capture 2023-03-22 11:31:57 +08:00
hejianjun
e2669169ea 添加Javmenu 2023-03-22 11:31:44 +08:00
Rhythmicc
468d1ab671 增加deeplx支持 2023-03-21 21:10:42 +08:00
yoshiko2
916dbd6abc Update to 6.5.2 2023-03-05 02:53:22 +08:00
yoshiko2
6cd4027265 Fix: Organize mode (main mode 2) 2023-03-05 02:10:27 +08:00
Yoshiko2
64de6c5ed9 Merge pull request #979 from WarpTraveller/master
Fix a type error and optimize the method.
2023-03-05 00:21:11 +08:00
Wayne Lui
b937e9b21a Fix a type error and optimize the method. 2023-02-27 17:29:12 +08:00
Yoshiko2
d052cb5ca7 Merge pull request #975 from Csrayz/#912
Fix: #912(PATCH)
2023-02-26 01:57:50 +08:00
Yoshiko2
5a9e86c53c Merge pull request #970 from LSD08KM/main
2处参数错误
2023-02-26 01:57:18 +08:00
Csrayz
363b149d05 Fix: #912(part 1)
Fix the bug setting the translation title causes the <originaltitle> item in the .NFO to be translated together.
2023-02-19 12:54:35 +08:00
Yoshiko2
e8505a89f6 Update README.md 2023-02-17 19:22:59 +08:00
Yoshiko2
45a46eb13b Update README.md 2023-02-17 19:09:07 +08:00
Yoshiko2
f01536b40d Update README.md 2023-02-17 19:05:46 +08:00
LSD08KM
d559d4a9aa Merge pull request #1 from LSD08KM/main_edit
2处参数错误
2023-02-10 17:45:50 +08:00
LSD08KM
05b5904578 函数参数顺序错误 2023-02-10 17:44:16 +08:00
LSD08KM
ea8328eb37 读配置文件参数错误 2023-02-10 17:40:42 +08:00
Yoshiko2
fee7873b68 Change build artifact filename 2023-02-04 00:18:59 +08:00
yoshiko2
f9a613e222 Fix: source not found in scraper.py 2023-02-03 21:16:55 +08:00
yoshiko2
db35986f87 Fix: Data analysis in API 2023-02-03 06:13:57 +08:00
yoshiko2
3b42a17dbf Fix: cn_sub data type 2023-02-03 05:06:19 +08:00
yoshiko2
275a47448b Merge remote-tracking branch 'origin/master' 2023-02-03 04:58:16 +08:00
yoshiko2
85dbbd4448 Fix: 4k tag 2023-02-03 04:57:57 +08:00
Yoshiko2
4227d67b53 Update config.ini 2023-02-03 03:58:04 +08:00
yoshiko2
6f6d3adab2 Update to 6.5.1 2023-02-03 03:41:53 +08:00
yoshiko2
351ff9863f Fix: Delete studio in tag in jabus 2023-02-03 03:41:25 +08:00
yoshiko2
979c63ae58 Fix: func getActors return list in pisplay 2023-02-03 03:40:48 +08:00
Yoshiko2
bfcc3d5b23 Merge pull request #948 from jasonpeng2014/master
bypass dmm region check not 404 event.
2023-01-10 17:15:44 +08:00
Yoshiko2
68a9cc3369 Merge pull request #946 from mark5231/add_jellyfin
添加jellyfin中特有的一些设定
2023-01-10 17:15:30 +08:00
Jason-YS Peng
b3e1448db6 bypass dmm region check not 404 event. 2023-01-02 15:56:29 +08:00
Marks
eaed709aa2 change format 2022-12-24 19:17:31 -08:00
Marks
02c84f5d41 modify doc 2022-12-24 12:47:07 -08:00
Marks
1a0164c5cf add jellyfin setting 2022-12-24 12:41:39 -08:00
yoshiko2
91371e8eb6 Fix : Replace ``\` to `/`` in Windows from Drive mounting 2022-12-18 04:32:35 +08:00
yoshiko2
b26471ae3a Update Donate 2022-12-18 04:28:42 +08:00
Yoshiko2
376b724447 Merge pull request #941 from mark5231/dev-test
add pissplay
2022-12-06 09:26:59 +08:00
Marks
bb37d6ad09 add pissplay 2022-12-04 21:01:19 -08:00
Yoshiko2
39b88090a0 Merge pull request #933 from mark5231/hotfix-cover
替换javdb封面
2022-11-29 01:32:54 +08:00
Yoshiko2
e4a30865bd Merge pull request #934 from mark5231/hotfix-tag
删除 无码破解 tag
2022-11-29 01:22:53 +08:00
Marks
b81558ac71 删除 无码破解 tag 2022-11-25 20:56:39 -08:00
Marks
e73eb6ae89 when no source 2022-11-25 20:41:02 -08:00
Marks
ab7eea1e02 替换封面 2022-11-25 20:16:57 -08:00
yoshiko2
f5db0bc2f0 Fix: If cover not found in other source, then skip using jadb cover #2 2022-11-26 05:52:15 +08:00
yoshiko2
02dd74dd08 Fix: Remove unused modules 2022-11-26 05:40:53 +08:00
yoshiko2
86d5c7501a Fix: Remove videoprops modules caused by not stable 2022-11-26 05:37:38 +08:00
Yoshiko2
ded46466d8 Merge pull request #932 from mark5231/hotfix
fix 4k tag
2022-11-26 05:20:13 +08:00
yoshiko2
44cf5b8ff7 Merge remote-tracking branch 'origin/master' 2022-11-26 05:16:09 +08:00
yoshiko2
95a5c69fec Fix: If cover not found in other source, then skip using other sources using jadb cover 2022-11-26 05:11:39 +08:00
Marks
321e3bb275 hot fix 2022-11-25 11:25:17 -08:00
Marks
4fd6e116c8 fix 4k tag 2022-11-25 11:16:05 -08:00
Yoshiko2
f23b9dd04f Update README.md 2022-11-26 01:12:32 +08:00
Yoshiko2
05376c1863 Update to Python 3.10 #2 2022-11-24 20:48:50 +08:00
Yoshiko2
2138f0fba1 Update to Python 3.10 2022-11-24 20:46:09 +08:00
Yoshiko2
e3cf6a24d5 Modify: Reformat code in ``config.ini`` 2022-11-24 20:38:28 +08:00
Yoshiko2
94318cc157 Merge pull request #928 from mark5231/mapping-ifo
删除薄马赛克 tag
2022-11-24 01:39:41 +08:00
Yoshiko2
cd07f799a3 Merge pull request #929 from mark5231/fix-javdb-cove
替换javdb的封面
2022-11-24 01:39:31 +08:00
Yoshiko2
b14bbb0272 Merge pull request #930 from mark5231/plot-dev
修复简介
2022-11-24 01:39:24 +08:00
Yoshiko2
55019d0e21 Merge pull request #927 from mark5231/fix-tag
Fix tags
2022-11-24 01:38:12 +08:00
yoshiko2
beab742992 Add: Move subtitles with 字幕 in tag 2022-11-24 01:37:40 +08:00
yoshiko2
a4baef392f Add: 4k water mark 2022-11-24 01:36:28 +08:00
yoshiko2
d724b9379b Add: Tag only have actors in config 2022-11-24 01:35:36 +08:00
yoshiko2
9db0ba27ef Add: image naming with number 2022-11-24 01:31:17 +08:00
yoshiko2
65e3cf98b0 Modify: Simply Storyline config 2022-11-24 01:29:38 +08:00
yoshiko2
92142f8f7a Simply message output without debug mode 2022-11-24 01:25:52 +08:00
Marks
8ec6018eb1 fix white space 2022-11-22 16:09:37 -08:00
Marks
54b647f534 修复简介 2022-11-22 16:07:31 -08:00
Marks
5a909e20a9 替换javdb的封面 2022-11-22 14:32:04 -08:00
Marks
3aedd2dcba 删除薄马赛克 tag 2022-11-22 11:22:55 -08:00
yoshiko2
bef1087fd9 Add 4k img 2022-11-23 02:51:39 +08:00
Marks
eef5610ef0 Fix tags 2022-11-22 10:28:08 -08:00
Yoshiko2
35d4f91676 Merge pull request #925 from Feng4/master
Update airav.py
2022-11-22 20:05:04 +08:00
Yoshiko2
d740934533 Merge pull request #918 from mark5231/skip_tags2
在jellyfin中tags和genres重复,因此可以只保存genres到nfo中
2022-11-22 20:04:44 +08:00
Yoshiko2
8b63ef00d9 Merge branch 'master' into skip_tags2 2022-11-22 20:02:15 +08:00
Yoshiko2
73d73b91fd Merge pull request #926 from WarpTraveller/master
Add get-video-properties in requirements.txt & fix some codes
2022-11-22 19:45:42 +08:00
Wayne Lui
e84d75a50a Add get-video-properties in requirements.txt & fix some codes 2022-11-22 17:14:57 +08:00
Feng4
564c173428 Update config.ini
加个无码番号
2022-11-22 11:01:39 +08:00
Feng4
b9b57c5dfa Update airav.py
获取剧照有问题,解决一下
2022-11-22 09:53:48 +08:00
Feng4
e40d3105a6 Update airav.py
网站响应变化,更改为可用
2022-11-20 23:14:22 +08:00
Yoshiko2
031294ac05 Merge pull request #917 from mark5231/test11
自动判断是否为4K视频
2022-11-20 00:26:53 +08:00
Yoshiko2
09cd3b0a18 Merge pull request #915 from mark5231/447c035
修改了一些mapping_info
2022-11-20 00:16:52 +08:00
Yoshiko2
f1d693d827 Merge pull request #914 from mark5231/master
Change 流出 tag to 无码流出
2022-11-19 23:38:09 +08:00
Yoshiko2
75373dd513 Merge pull request #913 from hejianjun/master
弃用91mv,换用新网站javday
2022-11-17 15:07:50 +08:00
Yoshiko2
e064c1a7f7 Update README.md 2022-11-15 23:55:44 +08:00
Marks
a86cfc71d7 skip tags 2022-11-07 18:55:32 -08:00
Marks
e03fcd8c47 自动判断是否为4K视频 2022-11-07 18:07:31 -08:00
Marks
d6d8bd6ebf 修改一些mapping_info 2022-11-06 13:49:12 -08:00
chaow
df47ded859 Change 流出 tag to 无码流出 2022-11-06 12:01:56 -08:00
hejianjun
b906be8099 弃用91mv,换用新网站javday 2022-11-06 20:27:55 +08:00
yoshiko2
447c035a55 Simply message output 2022-11-06 05:32:02 +08:00
yoshiko2
7fce762cf4 Update to 6.4.1 2022-11-06 04:57:33 +08:00
yoshiko2
ac3001b14c Modify: useless output in debug mode 2022-11-06 04:57:14 +08:00
Yoshiko2
c1370e96d8 Merge pull request #910 from Bluefissure/patch-2
fix: fast return if url is None
2022-11-02 01:51:24 +08:00
Yoshiko2
80e53529a8 Merge pull request #908 from Bluefissure/patch-1
feat: default javbus to actual javbus
2022-11-02 01:51:13 +08:00
Yoshiko2
07651ad259 Merge pull request #903 from hejianjun/master
修复一些脚本执行异常和添加mdbk和mdtm的规则,避免被麻豆番号污染
2022-11-02 01:48:12 +08:00
Bluefissure
4d5e816dd4 fix: fast return if url is None 2022-10-31 22:30:41 -05:00
Bluefissure
1e88d06248 default javbus to actual javbus
why do we use unstable mirrors first?
2022-10-30 22:25:04 -05:00
hejianjun
2aaea3446c 修复一些脚本执行异常和添加mdbk和mdtm的规则,避免被麻豆番号污染 2022-10-26 23:07:23 +08:00
Yoshiko2
2815762a8a Merge pull request #883 from naughtyGitCat/master
fix var name consistency error
2022-10-09 00:14:00 +08:00
Yoshiko2
2bdd872943 Merge branch 'master' into master 2022-10-09 00:13:28 +08:00
Yoshiko2
9c11008512 Merge pull request #882 from WarpTraveller/master
fix a missing
2022-10-09 00:11:14 +08:00
Name
36e092c9c9 Merge branch 'yoshiko2:master' into master 2022-09-30 16:44:03 +08:00
naughtyGitCat
78298495ac fix var name consistency error
change var name to reach human friendly
add type hinting
PEP8 formatting
2022-09-30 16:43:15 +08:00
Wayne.Lui
cde709fc82 fix a missing 2022-09-28 13:04:08 +08:00
Yoshiko2
d85879c66c Merge pull request #876 from naughtyGitCat/master
formatting code under PEP8 and some basic guidelines
2022-09-27 22:56:37 +08:00
naughtyGitCat
50ef41ee50 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	scraper.py
#	scrapinglib/api.py
2022-09-16 18:26:43 +08:00
naughtyGitCat
f56400a56b add type hinting
PEP8 formatting
2022-09-16 18:23:20 +08:00
Name
2674ab7e46 Merge branch 'yoshiko2:master' into master 2022-09-16 17:20:44 +08:00
naughtyGitCat
daedd3071c PEP8 space line pretty
PEP8 var name pretty
add __main__ comment
global and local var isolation
2022-09-16 17:18:17 +08:00
Yoshiko2
89c2626810 Merge pull request #875 from Suwmlee/master
update scrapinglib
2022-09-16 16:36:11 +08:00
Mathhew
8db24498d0 minor fixes 2022-09-16 15:35:47 +08:00
Mathhew
890b934fe9 add debug in scrapinglib 2022-09-16 15:30:01 +08:00
yoshiko2
8446489b68 Fix sources select func 2022-09-14 15:39:09 +08:00
yoshiko2
5f5a5a4a56 Fix getchu #2 2022-09-14 15:38:33 +08:00
yoshiko2
1be712767f Update Wrapper scripts 2022-09-14 15:07:57 +08:00
yoshiko2
1ec596a7fa Update 6.3.2 2022-09-14 14:55:17 +08:00
yoshiko2
ba340caa5f Fix getchu cover 2022-09-14 14:55:04 +08:00
yoshiko2
c30b235fe3 Add output select source in debug mode 2022-09-14 14:54:24 +08:00
Yoshiko2
d1fbdb3612 Merge pull request #870 from GigCloud/master
add -u option
2022-09-12 02:26:07 +08:00
yoshiko2
fcfe586f7b Merge remote-tracking branch 'origin/master' 2022-09-09 01:39:06 +08:00
Yoshiko2
5296a62c45 Merge pull request #871 from WarpTraveller/master
Update carib.py to download trailer
2022-09-09 01:38:25 +08:00
yoshiko2
6f5c67d8f9 Fix uncensored image processing 2022-09-09 01:38:02 +08:00
Yoshiko2
a650e54831 Merge pull request #869 from aedvoan/patch-1
Update javbus.py
2022-09-08 21:50:59 +08:00
Wayne.Lui
2a62b59346 Update carib.py to download trailer 2022-09-06 12:46:03 +08:00
Alex Zhao
236d2bf78f add -u option 2022-09-05 14:32:31 +08:00
aedvoan
6cb4be22ae Update javbus.py
fanbus.us already deco, removed to prevent void retry.
2022-09-04 23:14:38 +08:00
Yoshiko2
2bb50f4a47 Merge pull request #864 from WarpTraveller/master
Update fanza.py
2022-09-02 20:33:09 +08:00
Wayne.S.Lui
4502167216 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	scrapinglib/fanza.py
2022-08-31 14:17:52 +08:00
Wayne.S.Lui
593fb99723 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	scrapinglib/fanza.py
2022-08-31 04:40:08 +08:00
Wayne.S.Lui
f848a4ec24 Update fanza.py 2022-08-31 04:36:50 +08:00
Wayne.S.Lui
7af659bda7 Update fanza.py 2022-08-30 22:56:54 +08:00
Wayne.S.Lui
75aedf8601 Update fanza.py
Update to get trailer, extrafanarts and cover, using the method from older version
2022-08-30 22:46:40 +08:00
yoshiko2
c5c55be846 Update MappingTable files 2022-08-30 21:19:35 +08:00
yoshiko2
54440de9b1 Remove useless error output 2022-08-26 02:43:50 +08:00
yoshiko2
4b9b2675e2 Remove aaaaa='' 2022-08-26 02:42:58 +08:00
Yoshiko2
db9eaa7ec8 Merge pull request #859 from KyoMiko/master
Update getchu.py
2022-08-25 23:29:27 +08:00
Yoshiko2
a283da77a1 Merge pull request #858 from Suwmlee/master
fixes
2022-08-25 23:28:51 +08:00
Mathhew
793ef89f22 update init 2022-08-22 10:39:41 +08:00
KyoMiko
3318c3cf01 Update getchu.py
解决getchu匹配到无关结果的问题
2022-08-16 23:06:16 +08:00
Mathhew
153cdcde00 fixes
- 优化avsox刮削 FC2
- 修复javdb与library的specifiedUrl
- 其他
2022-08-16 09:24:16 +08:00
Yoshiko2
bb3688e67c Delete baidu.py 2022-08-11 01:30:41 +08:00
yoshiko2
a89180a386 Update 6.3.1 2022-07-31 04:38:15 +08:00
Yoshiko2
4f81f46b0b Merge pull request #849 from Suwmlee/master
support specifiedUrl & javlibrary
2022-07-31 01:41:21 +08:00
Mathhew
6de2e8f60f fix storyline 2022-07-28 23:07:51 +08:00
Mathhew
669b11b313 support specifiedUrl when scraping single movie 2022-07-28 18:47:41 +08:00
Mathhew
ce388edce8 update scrapinglib
- support specifiedUrl when scraping single movie
- support javlibrary and rating
2022-07-28 18:45:54 +08:00
Mathhew
ee1306fb3b fix(madou): split tags 2022-07-28 17:58:17 +08:00
Mathhew
17d0c638dc fix(carib): morestoryline 2022-07-28 17:56:22 +08:00
Mathhew
4b83f39241 fix user rating 2022-07-28 17:51:45 +08:00
Yoshiko2
f21c028908 Update README.md 2022-07-25 19:30:44 +08:00
Yoshiko2
64872fcaee Merge pull request #846 from raizapw19/fix-README_EN.md
Update README_EN.md
2022-07-25 19:30:14 +08:00
rmp9
76d4feec1a Update README_EN.md
- Added ':' and '.'
2022-07-23 23:30:54 -07:00
Yoshiko2
6aeff6767c Merge pull request #845 from chasedream1129/patch-1
修复gcolle无法刮削的问题
2022-07-21 02:15:59 +08:00
ChaseDream
baf915508f 修复gcolle无法刮削的问题
number前面缺了个self,请求URL会变成'https://gcolle.net/product_info.php/products_id/gcolle-xxxxx',导致无法刮削
2022-07-17 20:07:50 +08:00
Yoshiko2
e39823f616 Update README.md 2022-06-29 19:59:20 +08:00
Yoshiko2
a198271deb Update README.md 2022-06-19 03:21:49 +08:00
Yoshiko2
c9fa31ca49 Merge pull request #832 from Suwmlee/master
update scrapinglib
2022-06-16 18:40:03 +08:00
Mathhew
0dda035057 update scrapinglib
- 优化提取extrafanart,trailer等,直接使用xpath expr,不需要正则匹配
- 优化 getchu 获取cover方法,直接使用og标签信息
- 优化 www.getchu 识别 getchu-id 的资源
- 统一获取 tag 方法,返回值 list
2022-06-15 14:23:49 +08:00
Yoshiko2
eed33408a8 Update README.md 2022-06-14 21:37:57 +08:00
Yoshiko2
9c309563de Merge pull request #817 from Suwmlee/dev
refactor webcrawler
2022-06-14 20:55:51 +08:00
Mathhew
efb805a987 update lib
fix(airav): tags & extrafanart
fix(mgstage): clean
fix(fanza): outline
2022-06-13 18:24:10 +08:00
Mathhew
cd01de1344 minor fixes
- fix dlsite discount
- fix fanza: cover from og:image and extrafanart xpath expr
2022-06-13 15:15:32 +08:00
Mathhew
e7315e3ffa fix getchu
headers extrafanart
2022-06-13 10:01:03 +08:00
Mathhew
4074dcd366 update scrapinglib 2022-06-13 10:00:41 +08:00
Mathhew
8348fa167b Merge branch 'master' into dev 2022-06-13 09:04:26 +08:00
Mathhew
f11378186d update lib 2022-06-13 09:02:05 +08:00
Yoshiko2
8a342af11b Merge pull request #826 from hejianjun/feature/麻豆番号处理
麻豆番号处理
2022-06-11 19:32:33 +08:00
hejianjun
4637f2b6e8 编码问题修改 2022-06-10 12:23:25 +08:00
hejianjun
2b9e9a23d3 匹配单词边界 2022-06-08 21:07:59 +08:00
hejianjun
9e7f819eda 麻豆番号处理 2022-06-08 20:35:27 +08:00
yoshiko2
de67c5d4cd Remove series in <genre> and <tag> in nfo file (series in <set>) 2022-06-05 21:23:37 +08:00
yoshiko2
0a5685435f Add if series and label is ---- then return "" 2022-06-05 02:48:27 +08:00
yoshiko2
fc2e62c7c9 Merge remote-tracking branch 'origin/master' 2022-06-01 22:21:35 +08:00
yoshiko2
346163906b Add output series in <set> in nfo 2022-06-01 22:21:16 +08:00
yoshiko2
5b550dccfd Fix single actor in fanza 2022-06-01 22:20:28 +08:00
Mathhew
e665bceb5b update scrapinglib 2022-05-30 15:05:08 +08:00
Yoshiko2
a6aaca984f Merge pull request #821 from VergilGao/master
fix download error
2022-05-28 16:19:17 +08:00
Yoshiko2
a3b36eec44 Merge branch 'master' into master 2022-05-28 16:18:54 +08:00
yoshiko2
99cc99bb51 Remove shit code #2 2022-05-28 16:12:09 +08:00
yoshiko2
e61c72b597 Remove shit code 2022-05-28 16:10:22 +08:00
羽先生
9f6322494f fix download error 2022-05-28 13:38:46 +08:00
yoshiko2
6403fb0679 Fix end in jav321 2022-05-28 01:05:29 +08:00
yoshiko2
3c256d17e8 Update 6.2.2 2022-05-27 23:16:24 +08:00
yoshiko2
99bc50bbba Fix extrafanart download failed 2022-05-27 23:14:43 +08:00
yoshiko2
5ab121f996 Fix image download failed 2022-05-27 23:03:20 +08:00
Mathhew
feccd67115 remove webcrawler 2022-05-27 17:06:31 +08:00
Mathhew
c1fd755ccb fix sources 2022-05-27 16:50:26 +08:00
Mathhew
3014e5da96 fix parameter & clean scraper 2022-05-27 15:42:45 +08:00
Mathhew
8871355787 fix scrape parameters 2022-05-27 15:25:08 +08:00
Mathhew
9898f2918f update scrapinglib 2022-05-27 15:24:29 +08:00
yoshiko2
bb6ff56ce5 Fix Mapping Table Download FAILED 2022-05-26 23:43:11 +08:00
Mathhew
d6d0a1687b move to scrapinglib 2022-05-26 14:10:26 +08:00
Mathhew
b7ecb66210 add scrapinglib 2022-05-26 14:03:58 +08:00
yoshiko2
529aeaddd2 Update README 2022-05-25 23:03:18 +08:00
yoshiko2
c319d78888 Update FreeBSD wrapper 2022-05-25 13:11:26 +08:00
yoshiko2
1c3fe73285 Update c_number.json 2022-05-25 13:11:10 +08:00
yoshiko2
8ee23f2bd9 Update 6.2.1 2022-05-24 23:19:34 +08:00
yoshiko2
f4322cf427 Fix source getchu 2022-05-24 22:34:34 +08:00
yoshiko2
26528b7e19 Fix crawler sort 2022-05-24 22:34:17 +08:00
yoshiko2
0fdb79b8cd Remove not useful requests error output 2022-05-23 20:24:27 +08:00
yoshiko2
a3f3613742 Remove not useful requests error output 2022-05-23 20:24:17 +08:00
yoshiko2
3cc7badbad Add dual search func in getchu 2022-05-23 00:38:21 +08:00
yoshiko2
c628c25009 Add more timeout for crawler request 2022-05-22 23:16:46 +08:00
yoshiko2
5aedfb0baa Fix japanese number parse 2022-05-22 23:14:47 +08:00
yoshiko2
93e5fd2a35 Add euc_jp encode in source getchu #2 2022-05-22 02:19:26 +08:00
yoshiko2
37533e5552 Add euc_jp encode in source getchu 2022-05-22 01:28:21 +08:00
yoshiko2
daf431b9f5 Add custom headers in Image Download func 2022-05-21 23:42:30 +08:00
yoshiko2
47110c567c Add search func in getchu 2022-05-21 23:39:19 +08:00
yoshiko2
2fcd6d0401 Add anime support #4 2022-05-21 23:38:31 +08:00
yoshiko2
6f0ea5b76c Add japanese title parser 2022-05-21 23:38:01 +08:00
yoshiko2
8ed0aa68d7 Add anime support #3 2022-05-21 03:23:52 +08:00
yoshiko2
26aeb5bab8 Add anime support #2 2022-05-20 21:21:47 +08:00
yoshiko2
5a7ddf30d4 Remove file size screen 2022-05-20 21:21:08 +08:00
yoshiko2
f82d6a5317 Fix search func in source dlsite 2022-05-20 20:07:10 +08:00
yoshiko2
da76d33b8b remove not useful text in source dlsite 2022-05-20 20:06:13 +08:00
yoshiko2
964518572c cn_sub check 2022-05-20 20:05:32 +08:00
yoshiko2
417ff137b1 Remove not useful output 2022-05-20 19:03:57 +08:00
yoshiko2
94ee34046a Remove source fc2club in config.ini 2022-05-20 17:45:38 +08:00
yoshiko2
ece9967d3c Add scrape advenced and simple sleep 2022-05-20 17:26:49 +08:00
yoshiko2
7227ba9d0b Test imagecut 2022-05-20 01:14:10 +08:00
yoshiko2
d1d3036a2b Add anime support 2022-05-20 01:12:23 +08:00
yoshiko2
4f03da9814 Merge remote-tracking branch 'origin/master' 2022-05-19 09:29:45 +08:00
yoshiko2
53bf39c0d1 Remove Protagonist 2022-05-19 09:29:30 +08:00
Yoshiko2
c1d1e6a53e Update Python 3.9 2022-05-16 00:51:22 +08:00
yoshiko2
f0c776e254 Update 6.1.4 2022-05-13 02:04:12 +08:00
yoshiko2
bc6118bbf8 Merge remote-tracking branch 'origin/master' 2022-05-13 01:59:09 +08:00
yoshiko2
d09254ae87 Remove not useful code 2022-05-13 01:58:53 +08:00
yoshiko2
fca743aa86 Fix NFO file actor output error 2022-05-13 01:58:14 +08:00
Yoshiko2
c47e5b9646 Merge pull request #805 from oOtroyOo/zz
"配信開始日:" 被跳过了
2022-05-12 21:24:24 +08:00
oOtroyOo
8daafff1a0 "配信開始日:" 被跳过了 2022-05-12 20:56:46 +08:00
Yoshiko2
6a51d7f543 Update config.ini 2022-05-12 01:20:14 +08:00
yoshiko2
cff10438dc Update modules version 2022-05-12 00:41:08 +08:00
yoshiko2
f5cd1e38d0 Clean out of used modules #2 2022-05-12 00:40:53 +08:00
yoshiko2
9722f43f17 Update config.py 2022-05-12 00:10:09 +08:00
yoshiko2
2096c0908c Clean out of used modules 2022-05-11 22:51:10 +08:00
yoshiko2
b3058a6f1f Update README 2022-05-11 22:13:38 +08:00
yoshiko2
6d64076505 Update README 2022-05-11 22:10:09 +08:00
Yoshiko2
fc08b283a1 Update README.md 2022-05-08 20:44:26 +08:00
yoshiko2
8475d0cbf1 Update readme 2022-05-08 20:43:40 +08:00
yoshiko2
c0fab96191 Update c_number.json 2022-05-08 20:05:23 +08:00
Yoshiko2
f935a024f9 Update 6.1.3 2022-05-08 14:50:59 +08:00
yoshiko2
d991fecb73 fix order of sources 2022-05-08 03:26:48 +08:00
yoshiko2
e3029ef8cd Add source getchu 2022-05-08 03:25:53 +08:00
Yoshiko2
89e651b279 Fix actor_photo None 2022-05-08 02:51:24 +08:00
Yoshiko2
f7a047594c Merge pull request #796 from mpmpmp42/master
解决PLEX演员头像问题(仍需搭配 XBMCnfoMoviesImporter.bundle 使用)
2022-05-08 02:50:13 +08:00
Yoshiko2
bdf7f69390 Merge pull request #794 from 553531284/master
fix javdb xpath
2022-05-08 02:49:43 +08:00
Yoshiko2
664e3feedb Update feature-request------.md 2022-05-07 20:38:01 +08:00
mpmpmp42
f7fd62225e 1 2022-05-06 21:28:22 +08:00
DZV5
47a3560132 Merge branch 'yoshiko2:master' into master 2022-05-05 22:12:37 +08:00
Deng Zhou
7656c63afe javdb xpath 2022-05-05 22:04:51 +08:00
Yoshiko2
92de4d4bbe Update Movie_Data_Capture.py 2022-05-05 21:02:26 +08:00
Yoshiko2
bc19e74411 Update feature-request------.md 2022-05-05 20:59:33 +08:00
Yoshiko2
d3879f6979 Delete bug_report.md 2022-05-05 20:57:39 +08:00
Yoshiko2
c544134556 Update issue templates 2022-05-05 20:56:21 +08:00
Yoshiko2
7168d2cc3b Update README_TC.md 2022-05-05 03:31:09 +08:00
Yoshiko2
ba2eaa8949 Update and rename README_ZH.md to README_SC.md 2022-05-05 03:30:56 +08:00
Yoshiko2
9063c91610 Update README.md 2022-05-05 03:30:32 +08:00
Yoshiko2
978eefaaf3 Update README.md 2022-05-05 03:30:09 +08:00
Yoshiko2
96c6b6ea96 Update README.md 2022-05-04 04:51:55 +08:00
Yoshiko2
32764be888 Update README.md 2022-05-04 04:45:36 +08:00
Yoshiko2
db4c461a7d Delete readme_tc.md 2022-05-04 04:44:03 +08:00
Yoshiko2
0e741851d4 Update README.md 2022-05-04 04:40:40 +08:00
Yoshiko2
e552065031 Update README.md 2022-05-04 04:10:11 +08:00
Yoshiko2
e0a2d487ce Update README.md 2022-05-04 04:05:35 +08:00
Yoshiko2
1a120d7abf Update README.md 2022-05-04 04:00:10 +08:00
Yoshiko2
d90a63e175 Update README.md 2022-05-04 03:55:30 +08:00
Yoshiko2
d5f5c12d05 Update README.md 2022-05-04 03:52:50 +08:00
Yoshiko2
8e7361ae41 Update README.md 2022-05-04 03:29:54 +08:00
yoshiko2
7dd86f4aa4 Update README 2022-05-03 19:24:35 +00:00
yoshiko2
05f28f6966 Merge remote-tracking branch 'origin/master' 2022-05-03 19:21:48 +00:00
yoshiko2
d128ba5efd Update README 2022-05-03 19:21:36 +00:00
Yoshiko2
96bc7ea64f Update README.md 2022-05-04 03:20:14 +08:00
yoshiko2
f20ff7a201 Merge remote-tracking branch 'origin/master' 2022-05-03 19:19:10 +00:00
yoshiko2
04fcf93b9a Update README 2022-05-03 19:13:48 +00:00
Yoshiko2
4d25f53f6b Create LICENSE 2022-05-04 02:54:24 +08:00
Yoshiko2
5f98ead97e Create LICENSE 2022-05-04 02:50:52 +08:00
Yoshiko2
4c96347af1 Update config.ini 2022-04-30 20:11:34 +08:00
Yoshiko2
107d5857dd Update 6.1.2 2022-04-30 20:09:15 +08:00
Yoshiko2
e6af7c0520 Merge pull request #787 from 553531284/master
修正javdb、fanza剧照
2022-04-30 19:47:23 +08:00
Yoshiko2
7efb3aeba7 Merge branch 'master' into master 2022-04-30 19:46:46 +08:00
Yoshiko2
2eb50e9b8d Merge pull request #780 from Hittlert/master
修复标签翻译bug
2022-04-30 19:07:30 +08:00
Yoshiko2
86367cf7a5 Merge pull request #779 from code-review-doctor/fix-probably-meant-fstring
Missing `f` prefix on f-strings fix
2022-04-30 19:06:28 +08:00
Yoshiko2
7bbf9f6e7e Merge pull request #773 from lededev/un-i
新增选项兼容Jellyfin封面图文件名规则
2022-04-30 19:06:02 +08:00
Yoshiko2
6bfee6acf1 Merge branch 'master' into un-i 2022-04-30 19:05:50 +08:00
Deng Zhou
df9dae1fc2 imagecut 2022-04-30 00:54:04 +08:00
Deng Zhou
3813545e7e temp 2022-04-30 00:15:16 +08:00
Deng Zhou
dd4afaf881 fix bug 2022-04-30 00:03:16 +08:00
Deng Zhou
f063383bb7 fix bug 2022-04-29 23:59:15 +08:00
Deng Zhou
5e42eb8236 Merge branch 'upstream'
# Conflicts:
#	WebCrawler/fanza.py
2022-04-29 23:53:21 +08:00
lededev
2fd0a7a02b javdb.py:sync website 2022-04-29 22:45:46 +08:00
lededev
20dbe31b49 update UserAgent 2022-04-29 22:45:11 +08:00
jop6__
ae15e0815e 修复标签翻译bug
'''mapping_data.xpath('a[contains(@Keyword, $name)]/@' + language, name=i)[0]'''中使用了contains匹配,会导致原标签如“内S”错误命中标签“体内SJ”,因为他们也构成包含关系,xpath匹配时在name两侧添加逗号可解决该问题。
2022-04-25 13:19:22 +08:00
lededev
2a3c50a2dd 下载演员头像到.actors目录,KODI用;不联网的Jellyfin封面图文件名 2022-04-24 19:50:29 +08:00
lededev
42d9986c16 支持命令行包含多个-C参数,依次连续执行 2022-04-24 19:46:22 +08:00
code-review-doctor
736e249ad5 Fix issue probably-meant-fstring found at https://codereview.doctor 2022-04-23 23:57:04 +01:00
lededev
41d214f391 新增选项兼容Jellyfin封面图文件名规则 2022-04-24 00:54:05 +08:00
yoshiko2
1655d5ff3e Merge remote-tracking branch 'origin/master' 2022-04-23 21:33:03 +08:00
yoshiko2
0590166439 Add source jav321 2022-04-23 21:32:38 +08:00
yoshiko2
634e32e654 Fix sources sort 2022-04-23 21:28:09 +08:00
yoshiko2
cab81ce2b1 Fix is_uncensored() return None` 2022-04-23 21:27:20 +08:00
lededev
6a45b6057a resolve issue #772 2022-04-22 12:02:56 +08:00
Yoshiko2
be82758d56 Merge pull request #770 from lededev/cnn-2
fix issue #769 No Module named 'ImageProcessing.cnn'
2022-04-21 00:38:50 +08:00
lededev
5fe424abae fix issue #769 No Module named 'ImageProcessing.cnn' 2022-04-20 22:20:11 +08:00
yoshiko2
ee4af3fb6c Add 对照表异常处理 #2 2022-04-20 17:19:32 +08:00
Yoshiko2
0854571eae Update gcolle.py 2022-04-20 17:10:02 +08:00
Yoshiko2
5e5feb370b Merge pull request #767 from lededev/clu
clean up
2022-04-20 17:02:48 +08:00
lededev
95464f29ba gcolle.py:自动维持对话,再次调用时只需一次http请求 2022-04-20 13:50:15 +08:00
lededev
f7186aa347 gcolle.py:Add try block 2022-04-20 13:03:17 +08:00
lededev
0dff1a72c0 clean up 2022-04-20 12:48:38 +08:00
yoshiko2
5da99067c8 Merge remote-tracking branch 'origin/master' 2022-04-20 03:32:05 +08:00
yoshiko2
a433eb07a3 Add 对照表异常处理 2022-04-20 03:31:55 +08:00
Yoshiko2
9199cae91a Update 6.1.1 2022-04-20 03:10:45 +08:00
yoshiko2
1d1648fe1f 爬虫面向对象重构 #4 2022-04-20 01:17:01 +08:00
yoshiko2
2d3dee065d 爬虫面向对象重构 #3 2022-04-20 01:15:33 +08:00
yoshiko2
27cad6eca3 爬虫面向对象重构 #2 2022-04-20 00:54:51 +08:00
yoshiko2
87972b0335 Add crawler named gcolle.py #2 2022-04-19 21:27:09 +08:00
yoshiko2
7b0e5db6ba Merge remote-tracking branch 'origin/master' 2022-04-19 19:57:37 +08:00
yoshiko2
203ff08ac5 爬虫初步面向对象重构 2022-04-19 19:57:18 +08:00
Yoshiko2
d5615fb6c5 Update README.md 2022-04-19 00:15:20 +08:00
yoshiko2
510dfa94f5 When the ``imagecut`` = 4, face recognition cuts the cover even with censored. 2022-04-19 00:04:52 +08:00
yoshiko2
7155656d65 Modify: ``failed_move` disabled in `config.ini`` default 2022-04-19 00:03:03 +08:00
yoshiko2
9be375de9e The crawler result can be Dict or Str loaded by JSON 2022-04-19 00:00:28 +08:00
yoshiko2
022c0d30eb Add crawler named gcolle.py 2022-04-19 00:00:07 +08:00
yoshiko2
82547ca276 If current running system not Windows, program will be auto exit. 2022-04-18 23:38:16 +08:00
yoshiko2
d38b49e5fb Add func get_html() for session #3 2022-04-18 23:32:50 +08:00
yoshiko2
a6652e6636 Add func get_html() for session #2 2022-04-18 19:57:04 +08:00
yoshiko2
fbc2f3e2a4 Add func get_html() for session 2022-04-18 19:55:05 +08:00
Yoshiko2
c27a142938 Merge pull request #764 from lededev/unce-1
修复cnn人脸检测
2022-04-18 18:45:30 +08:00
Yoshiko2
0e3ea062b6 Update README.md 2022-04-18 18:37:15 +08:00
lededev
6794677006 修复cnn人脸检测,准确率高但速度慢于hog 2022-04-18 04:23:12 +08:00
lededev
de58cc89d5 config.py:取消进程池模式[storyline]run_mode=2 2022-04-18 01:42:06 +08:00
lededev
3dda5a94cf 影片刮削时Ctrl+C立刻退出,而不是加入失败列表并跳到下一部影片 2022-04-18 01:27:46 +08:00
lededev
3224f8c1ab 取消storyline进程池模式以提升兼容性 2022-04-18 01:07:58 +08:00
lededev
5d00dd29e4 -C小修改 2022-04-17 23:37:33 +08:00
lededev
c94ef3cf4a 更精确的有码无码处理 2022-04-17 23:36:41 +08:00
Yoshiko2
236892c370 Merge pull request #761 from lededev/simp-3
简化以及重新支持_CD1下划线分隔
2022-04-17 02:18:18 +08:00
lededev
0e0b92a9fa 新增-C命令可用于对任何配置文件参数进行覆盖 2022-04-16 19:31:24 +08:00
lededev
f59d3505a8 新增-D --download-images 总是下载图片开关 2022-04-14 04:50:20 +08:00
lededev
2bd294f1bd 新增-w --website参数,使用将覆盖配置文件[priority]website= 2022-04-14 04:20:57 +08:00
lededev
fa9c690e60 如果旧.nfo包含评分及投票,新的元数据中不包含,则搬运旧分数和投票 2022-04-14 04:18:21 +08:00
lededev
8b769d73b8 bug fix 2022-04-14 02:47:20 +08:00
lededev
3e1cb92001 重新支持_CD1 2022-04-14 01:30:53 +08:00
lededev
ae6d27a454 调试模式下模式3增加一条.nfo不存在的警告 2022-04-14 01:29:00 +08:00
lededev
499baf51fb 简化 2022-04-14 01:21:02 +08:00
Yoshiko2
828635e421 Merge pull request #759 番号提取更新
一些改动
2022-04-13 20:03:35 +08:00
wqzz123
fee5c4a54f uncensored忽略大小写 2022-04-13 17:09:14 +08:00
wqzz123
4ad24deb9c 只有字幕和视频名称相同时才拷贝字幕 2022-04-13 16:25:46 +08:00
wqzz123
384015e648 支持_C _CD后缀 2022-04-13 16:17:53 +08:00
wqzz123
66ddc0bf22 过滤x264,x265后缀 2022-04-13 16:15:31 +08:00
Yoshiko2
3bcaf8d318 Merge pull request #746 from lededev/num-4k
新功能:完整的多次少量处理功能,增加再次运行延迟
2022-04-12 18:59:07 +08:00
lededev
475f02fbe6 处理format空格对齐内容包含中文的情况,DEBUG INFO输出对齐 2022-04-12 06:34:08 +08:00
lededev
9a3b48140d number_parser.py:与core.py功能相对应,_CD1下划线不再支持,仅支持-CD1连字符 2022-04-12 02:28:31 +08:00
lededev
f342d42b86 除文件名规则支持-C硬字幕外,新支持ch硬字幕 2022-04-12 02:13:36 +08:00
lededev
cdbccb3b14 number_parser.py 测试用例因主机编码设置,暂不支持中文输出 2022-04-11 17:45:55 +08:00
lededev
cc3e4d1edd str.find() return -1 when fail 2022-04-11 17:32:01 +08:00
lededev
db23ffae54 number_parser.py:fix n1012-CD1.wmv return null number 2022-04-11 17:13:23 +08:00
lededev
e50e14764f fanza.py:fix [-]Movie number has changed! [ATOM-067]->[atom067so] 2022-04-11 13:46:57 +08:00
lededev
a813bf462f fanza.py:fix [-]Movie number has changed! [RED-164]->[red00164] 2022-04-11 13:37:50 +08:00
lededev
dfcc012201 .nfo文件不存在时不执行-N操作 2022-04-11 06:05:11 +08:00
lededev
9c1baef0b7 无论有码无码均优先采信网站结果 2022-04-11 00:18:40 +08:00
lededev
e5bc900b40 将欧美全部归类到无码 2022-04-10 15:29:46 +08:00
lededev
02692becfe 更可靠的无码识别方法 2022-04-10 14:48:25 +08:00
lededev
8add9fe424 裁剪封面宽高比可配置 2022-04-10 13:38:48 +08:00
lededev
c54817aa01 新增不联网批量封面剪裁(人脸识别)和打水印 2022-04-10 13:04:08 +08:00
lededev
44dc26d13e 优化封面裁剪宽高比 2022-04-10 10:09:40 +08:00
lededev
09c81d7f59 新增选项:1.避免模式3跳过人脸识别 2.避免对有码封面进行人脸识别 2022-04-10 07:14:55 +08:00
lededev
e951429ec0 减少成功信息文件路径刷屏,仅写入.nfo时显示完全路径 2022-04-10 05:39:57 +08:00
lededev
3e3ff3cfb3 优化字幕检测 2022-04-10 03:12:03 +08:00
lededev
109cc3717b strictly restrict to .nfo in order to exclude .nfo\w+ 2022-04-10 01:26:47 +08:00
lededev
9e9b799441 新增多集影片字幕支持 2022-04-09 21:35:06 +08:00
lededev
8ee1f212d2 site 38,39 2022-04-09 20:50:44 +08:00
lededev
69f52798c6 字幕后缀去除.txt以避免复制广告 2022-04-09 20:50:09 +08:00
lededev
de58647402 模式2支持硬链接;改进字幕复制适应更多文件名组合 2022-04-09 01:35:49 +08:00
lededev
b048c04310 修复-CD1 -CD2及imagecut==3下载小封面后因水印文件不存在而失败的问题 2022-04-08 16:11:48 +08:00
lededev
c20bf4cf57 fanza.py:resolve some [-]Movie number has changed 2022-04-08 14:17:50 +08:00
lededev
cbde2e4a81 更新.nfo时保留已有的用户自定义评分<userrating />标签 2022-04-08 12:57:16 +08:00
lededev
f728f33363 UserAgent update to Chrome 100.0 2022-04-07 12:38:27 +08:00
lededev
6df4d8ff76 change description 'total time:' to 'Elapsed time ' 2022-04-07 05:47:08 +08:00
lededev
580139c626 try fix issue #751 2022-04-06 12:08:51 +08:00
lededev
b251a127c8 storyline:remove _inner(), expand args directly 2022-04-06 04:35:11 +08:00
lededev
a840f52908 storyline.py:sync current amazon website 2022-04-06 04:18:50 +08:00
lededev
7ff701b5d7 show app total run time 2022-04-06 01:39:36 +08:00
lededev
ef82e73fac Add --rerun-delay -R option, rerun after delay 2022-04-03 02:27:29 +08:00
lededev
a3655e99c3 number_parser.py:more domain suffixes 2022-04-02 00:28:00 +08:00
lededev
96fd8d7682 number_parser.py:add ^4K_ and ^4K- filter 2022-04-01 05:07:10 +08:00
Yoshiko2
38b18efdb2 Merge pull request #743 from lededev/rating-2
KODI评分显示投票人数
2022-03-31 18:15:28 +08:00
lededev
f83e756581 madou.py:remove debug print 2022-03-29 23:46:40 +08:00
lededev
99c068604a madou.py:getTitle() bug fix 2022-03-29 23:39:52 +08:00
lededev
761fc762f2 KODI评分显示投票人数 2022-03-27 20:38:36 +08:00
Yoshiko2
8d3ee88cde Merge pull request #740 from lededev/minfix
Minfix
2022-03-27 19:15:51 +08:00
Yoshiko2
42114259f9 Merge pull request #741 from yoshiko2/revert-739-master
Revert "JavDB: add User Rating for Emby&Jellyfin"
2022-03-27 19:15:11 +08:00
Yoshiko2
9611171542 Revert "JavDB: add User Rating for Emby&Jellyfin" 2022-03-27 19:14:54 +08:00
Yoshiko2
c40f2965dd Merge pull request #739 from OrangeTien/master
JavDB: add User Rating for Emby&Jellyfin
2022-03-27 19:11:31 +08:00
lededev
375c6822f1 简化评分代码 2022-03-27 19:04:59 +08:00
lededev
063d58edc3 调整评分精度 2022-03-27 18:57:11 +08:00
lededev
3c78069f88 User Rating for KODI Emby Jellyfin 2022-03-27 17:38:53 +08:00
OrangeTien
1f5444e86b JavDB: add User Rating for Emby&Jellyfin 2022-03-27 17:15:11 +08:00
lededev
5af8060176 link_mode small fix 2022-03-27 17:08:11 +08:00
lededev
12e8c88a07 javdb site 37,38 2022-03-27 17:07:37 +08:00
lededev
85ecff451c download_only_missing_images=1 do not cut exist image again 2022-03-27 17:06:46 +08:00
Yoshiko2
0407f31717 Update 6.0.3 2022-03-27 04:11:54 +08:00
yoshiko2
a421da41cc Update Mapping Table files 2022-03-26 21:21:13 +08:00
Yoshiko2
02a199bb50 Adjust the priority of javdb source to the lowest 2022-03-26 21:00:25 +08:00
Yoshiko2
7c931c011a Merge pull request #736 from lededev/user-rating
JavDB: add User Rating to .NFO
2022-03-26 20:42:30 +08:00
Yoshiko2
1b489ffc95 Merge pull request #735 from lededev/fdls
dlsite.py: update to current website
2022-03-26 20:42:24 +08:00
Yoshiko2
3eacc03095 Merge pull request #734 from lededev/link-mode-1
新增硬链接支持及命令行开关
2022-03-26 20:42:13 +08:00
lededev
fdc988a481 javdb.py:simplify again 2022-03-26 15:58:22 +08:00
lededev
b585c0fbbd javdb.py:check empty str 2022-03-26 15:55:53 +08:00
lededev
00be6b86d5 JavDB: add User Rating to .NFO 2022-03-26 15:41:29 +08:00
lededev
30064745c6 dlsite.py: update to current website 2022-03-26 14:51:29 +08:00
lededev
de7b982783 core.py:Copy subtitle files in link mode 2022-03-26 12:44:03 +08:00
lededev
068ab15f90 add hard link target option and command line switch 2022-03-26 11:39:43 +08:00
Yoshiko2
218cdfe816 Merge pull request #733 from lededev/atsite
number_parser.py:remove *.com@ and *.cc@
2022-03-25 02:29:22 +08:00
lededev
5c6e011489 number_parser.py:remove *.com@ and *.cc@ 2022-03-24 12:58:57 +08:00
Yoshiko2
bea2fd6899 Merge pull request #730 from Suwmlee/master
fix: madou title
2022-03-21 18:44:46 +08:00
Mathhew
0bb2b2da3b fix: madou title 2022-03-21 11:44:33 +08:00
yoshiko2
02b52df234 Add update check exception handling 2022-03-19 17:56:54 +08:00
Yoshiko2
d161890994 Merge pull request #715 from HappyQuQu/master
add hard_link option
2022-03-19 16:45:24 +08:00
Yoshiko2
5843a62d5e Update Makefile 2022-03-17 02:11:51 +08:00
Yoshiko2
0b7d3ed0a5 Update README.md 2022-03-17 02:10:41 +08:00
Yoshiko2
1cb4cd37a2 Merge pull request #720 from lededev/md-1
madou priority against javdb
2022-03-15 16:55:53 +08:00
Yoshiko2
c45037e20c Merge pull request #717 from lededev/vsc-dbg
vscode debug AttributeError: 'NoneType' object has no attribute
2022-03-15 16:55:03 +08:00
Deng Zhou
edfddc18d8 Merge branch 'yoshiko2_master'
# Conflicts:
#	WebCrawler/javdb.py
2022-03-10 21:51:52 +08:00
Qu
bd51898ae2 [update]:hard_link选项改scan_hardlink 2022-03-07 21:33:40 +08:00
lededev
b6786ef9d7 fc2.py:fix some pages can not auto detect UTF-8 encoding 2022-03-06 21:20:20 +08:00
lededev
6b7e518fbe madou.py:fix get tags 2022-03-06 21:03:00 +08:00
lededev
8ad4997342 madou.py:simp by regex 2022-03-06 20:39:59 +08:00
lededev
3117b3a18d madou.py:fix get title for MAD039 2022-03-06 17:36:22 +08:00
lededev
31da166931 madou.py:fix get title for MD0140-2 2022-03-06 17:29:15 +08:00
lededev
788fc4a97c madou numbers MD0140-2 MD0165-8 2022-03-06 16:22:11 +08:00
lededev
1cecf66a84 support more madou numbers 2022-03-06 16:08:37 +08:00
lededev
48d14e19ae add madou and mv91 to website list 2022-03-06 01:37:09 +08:00
lededev
54d8f3af87 madou priority against javdb 2022-03-06 01:31:43 +08:00
lededev
6ffdf08bc8 vscode debug AttributeError: 'NoneType' object has no attribute 2022-03-03 06:44:38 +08:00
EvanQu
d1f40c453c Merge branch 'master' of https://github.com/HappyQuQu/Movie_Data_Capture 2022-03-01 22:08:26 +08:00
EvanQu
35340eb1e9 default false 2022-03-01 22:08:22 +08:00
EvanQu
7f2cef7bae default false 2022-03-01 22:07:03 +08:00
EvanQu
7ea3dd23d0 add hard_link option 2022-03-01 22:01:09 +08:00
Yoshiko2
9e332b0d02 Update main.yml 2022-02-28 03:52:10 +08:00
yoshiko2
64e7a3a016 Fix cc_convert convert str to list 2022-02-28 03:16:36 +08:00
Yoshiko2
3955483811 Merge pull request #712 from lededev/2t
fix OpenCC not work
2022-02-26 22:27:43 +08:00
lededev
c354518c57 vscode debug AttributeError: 'NoneType' object has no attribute 'flush' or 'fileno' 2022-02-26 15:37:05 +08:00
lededev
a3c8398f29 fix OpenCC not work 2022-02-26 15:36:05 +08:00
Yoshiko2
8e7cbd8fa6 Update main.yml 2022-02-26 05:30:16 +08:00
Yoshiko2
2b4e445f3b Fix Image Processing module wrapper 2022-02-26 05:14:28 +08:00
Yoshiko2
ccfb96be10 Update main.yml 2022-02-26 05:13:02 +08:00
yoshiko2
25272729f3 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	.github/workflows/main.yml
2022-02-26 05:12:04 +08:00
yoshiko2
02580c70a9 Fix Image Processing module wrapper 2022-02-26 05:06:47 +08:00
Yoshiko2
d4197a9d16 Update main.yml 2022-02-26 04:24:00 +08:00
Yoshiko2
a4758a3669 Update main.yml 2022-02-26 04:21:22 +08:00
Yoshiko2
2ac6b1cab8 Merge pull request #707 from naughtyGitCat/master
SOME PEP8 STYLE BLANK LINES, SOME TYPING ANNOTATION, FUNCTION COMMENT
2022-02-23 23:45:39 +08:00
FatalFurY
0bd34f907e Merge branch 'master' of https://github.com/naughtyGitCat/Movie_Data_Capture 2022-02-23 22:15:04 +08:00
FatalFurY
377a9f308b PEP8 PREFIX, AND SOME TYPING ANNOTATION, FUNCTION COMMENT 2022-02-23 22:11:45 +08:00
Yoshiko2
64c129267c Merge pull request #706 from sastar/master
fix #685
2022-02-23 05:29:42 +08:00
Yoshiko2
377b0d7707 Merge pull request #699 from lededev/mi1
小改动
2022-02-23 05:29:12 +08:00
Yoshiko2
bb0d49ee4e Merge pull request #698 from naughtyGitCat/master
typo transalte to translate,and some blank lines
2022-02-23 05:28:59 +08:00
sastar
2c6c79ac4f fix #685 2022-02-23 00:01:36 +08:00
FatalFurY
950a4dce13 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	Movie_Data_Capture.py
2022-02-18 00:05:24 +08:00
FatalFurY
c1568cd64a PEP8 PREFIX, AND SOME TYPING ANNOTATION 2022-02-18 00:01:21 +08:00
lededev
592005be01 不再以字符串长度为限制条件 2022-02-17 17:19:47 +08:00
lededev
ec28814449 update User-Agent 2022-02-17 17:18:41 +08:00
Name
e24535418c Merge branch 'yoshiko2:master' into master 2022-02-16 23:38:56 +08:00
FatalFurY
123a2a0c73 typo transalte to translate,and some blank lines 2022-02-16 23:37:31 +08:00
Yoshiko2
aa0f72edd8 Update 6.0.2 2022-02-11 03:23:32 +08:00
Yoshiko2
602d89cfe3 Merge pull request #692 from hejianjun/feature/国产化代码
修复高宽比大于3/2的图片裁剪
2022-02-11 03:19:26 +08:00
Yoshiko2
bfb0d6277e Merge pull request #688 from lededev/4k
filter 4K
2022-02-11 03:19:17 +08:00
hejianjun
3814695d88 修复高宽比大于3/2的图片裁剪 2022-02-10 19:45:48 +08:00
lededev
8bdc4fac46 filter 4K 2022-02-06 21:28:49 +08:00
Yoshiko2
ece2ee451c Merge pull request #683 from hejianjun/feature/国产化代码
支持91制片室和麻豆,优化图片裁剪功能添加了人脸识别模块
2022-02-05 03:08:46 +08:00
hejianjun
ea3e1870e7 图片靠右 2022-02-04 03:49:27 +08:00
Yoshiko2
ca527a81a4 Merge pull request #682 from godvmxi/master
describe foloder_path var in a global area to avoid no assign error
2022-01-31 01:45:43 +08:00
hejianjun
e1b5d17b05 模块化裁剪 2022-01-30 22:26:52 +08:00
hejianjun
a84452ba1c 支持91制片室和麻豆,优化图片裁剪功能添加了人脸识别模块 2022-01-30 03:37:08 +08:00
Deng Zhou
61de7863ed 屏蔽javdb
fix bug
2022-01-29 21:36:16 +08:00
godvmxi
e226aa255e describe foloder_path var in a global area to avoid no assign error in linux 2022-01-29 09:42:28 +00:00
Deng Zhou
2ade44cd32 fix fanza 剧照 2022-01-22 23:47:09 +08:00
Yoshiko2
9a9d36672f Merge pull request #670 from 553531284/master
fix
2022-01-06 23:33:37 +08:00
Deng Zhou
1195949854 fix 2022-01-06 22:01:01 +08:00
Deng Zhou
2483a9fc2f fix 2022-01-06 01:28:53 +08:00
Yoshiko2
2b7b61bd6d Update README.md 2021-12-28 23:30:14 +08:00
Yoshiko2
9acc691ca9 Create LICENSE 2021-12-27 21:51:13 +08:00
Yoshiko2
21bd3c60b7 Fix javdb Trailer url null 2021-12-19 17:45:26 +08:00
Yoshiko2
138808fb67 Update README.md 2021-12-19 17:14:57 +08:00
yoshiko2
4704ed98dd Update error output when welcome 2021-12-18 21:25:03 +08:00
yoshiko2
cd06a603d3 Merge remote-tracking branch 'origin/master' 2021-12-18 00:12:59 +08:00
yoshiko2
495ecdae15 Update to 6.0.1 #3 2021-12-18 00:12:36 +08:00
Yoshiko2
c7b6f286a5 Update README.md 2021-12-17 23:41:07 +08:00
yoshiko2
17d304920e Merge remote-tracking branch 'origin/master' 2021-12-17 23:39:29 +08:00
yoshiko2
0b0d0fcafc Update to 6.0.1 #2 2021-12-17 23:39:02 +08:00
Yoshiko2
27f8f53350 Update README.md 2021-12-17 23:32:22 +08:00
yoshiko2
d44166d9ac Update workflows 2021-12-17 23:29:47 +08:00
yoshiko2
26313bc550 Update to 6.0.1 2021-12-17 23:28:06 +08:00
yoshiko2
80e3c8d9a7 Add read ``config.ini`` error output message 2021-12-10 13:24:53 +08:00
yoshiko2
8000bb701b Update config.py 2021-12-07 13:21:07 +08:00
yoshiko2
cc5accc51e Update javdb.py 2021-12-03 01:09:56 +08:00
yoshiko2
13438bf854 Add mapping table file validity period 2021-12-02 14:48:07 +08:00
yoshiko2
629f0f050c Add mapping table file validity period 2021-12-02 14:41:47 +08:00
yoshiko2
6655d41491 Optimize error output & adjust code sequence #2 2021-12-02 14:01:35 +08:00
yoshiko2
1604f0567f Optimize error output & adjust code sequence 2021-12-02 02:00:54 +08:00
yoshiko2
5525aacae3 Add javdb sites in config 2021-12-01 23:03:21 +08:00
Yoshiko2
38e0e772b2 Update to 5.0.6 2021-12-01 22:09:11 +08:00
Yoshiko2
80621b95e2 Update mapping_info.xml 2021-12-01 22:08:23 +08:00
Yoshiko2
3df90d902e Update README.md 2021-12-01 22:02:13 +08:00
Yoshiko2
bc3f1d5bc7 Tweak for vscode
Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture
2021-11-30 16:08:32 +08:00
yjlmiss
2f169bcf70 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-11-27 21:19:14 +08:00
yoshiko2
ecbb85b481 Update mapping info 2021-11-27 03:08:15 +08:00
yoshiko2
17e45276cf Fix mapping table convert func 2021-11-26 01:09:12 +08:00
Yoshiko2
8d95adc409 Merge pull request #649 from Suwmlee/master
Fix: raise exception in get html
2021-11-25 12:41:09 +08:00
Mathhew
0fcb11be35 Fix: raise exception in get html
catching exception won't work in methods, eg javbus
2021-11-25 09:26:15 +08:00
yoshiko2
433eafb6c8 Update welcome message 2021-11-24 20:47:16 +08:00
yoshiko2
1acf8b5096 Fix web request encoding #2 2021-11-23 22:13:35 +08:00
unknown
fc933615f0 Fix web request encoding 2021-11-23 00:20:15 +08:00
unknown
a8f39201b9 Add 破解 watermark & Bigger watermark image 2021-11-22 21:59:28 +08:00
unknown
c552ab850b Update watermark image 2021-11-22 21:58:15 +08:00
Yoshiko2
df8f4ae7f0 Update to 5.0.5 2021-11-21 23:40:06 +08:00
unknown
7d0bdb810d Fix: mapping table null value change to 1st xpath value 2021-11-21 00:59:38 +08:00
Yoshiko2
e2600f75cd Remove @YOU in mapping table file 2021-11-20 23:04:17 +08:00
unknown
df2fc10c09 Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2021-11-20 04:47:12 +08:00
unknown
56e6031f60 Add platform message 2021-11-20 04:46:54 +08:00
Yoshiko2
b244122892 Delactor,director,label default cc_convert var 2021-11-20 04:04:19 +08:00
Yoshiko2
713ce3294d Update to 5.0.4 2021-11-20 04:02:14 +08:00
Yoshiko2
015367acee Update FreeBSD.sh 2021-11-17 01:25:00 +08:00
Yoshiko2
bc92afa9b0 Update FUNDING.yml 2021-11-14 20:17:15 +08:00
Yoshiko2
a356b2531e Merge pull request #636 from lededev/double-exception
bugfix
2021-11-14 20:16:57 +08:00
Yoshiko2
5cf120587d Update FUNDING.yml 2021-11-14 20:01:43 +08:00
lededev
8e9ea6d852 simp2 2021-11-14 09:24:49 +08:00
lededev
b64dc83e2e simp 2021-11-14 08:58:27 +08:00
lededev
701cc954cb bugfix 2021-11-14 08:52:45 +08:00
Yoshiko2
4d7aad19d0 Update FUNDING.yml 2021-11-14 00:22:07 +08:00
Yoshiko2
df5d1fbb15 Update FUNDING.yml 2021-11-14 00:21:48 +08:00
Yoshiko2
e79fad6661 Update README.md 2021-11-13 22:35:04 +08:00
unknown
33c699b8b6 Change OpenCC version 2021-11-11 09:12:42 +08:00
unknown
96046ff3e1 Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2021-11-10 19:56:14 +08:00
unknown
1a301d3c24 Change OpenCC version to 1.1.0 2021-11-10 19:54:17 +08:00
Yoshiko2
aaa21c6cb8 Update main.yml 2021-11-10 19:47:37 +08:00
Yoshiko2
0a78cb7d06 Update main.yml 2021-11-10 19:40:25 +08:00
unknown
b41aff2600 Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2021-11-10 18:15:45 +08:00
unknown
6841853ee1 Update to 5.0.3 2021-11-10 18:15:29 +08:00
Yoshiko2
e8132c8d43 Merge pull request #633 from lededev/storyline-4
修正调试信息选中站点字数为0的错误提示
2021-11-10 13:41:19 +08:00
unknown
b403f147ea Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2021-11-10 13:40:11 +08:00
unknown
cb66431143 Add .chs.* .cht.* sub name support 2021-11-10 13:40:01 +08:00
lededev
e025d9d99a 修正调试信息选中站点字数为0的错误提示 2021-11-10 01:22:48 +08:00
yoshiko2
699f3932e1 Delete studio replace 2021-11-09 18:43:59 +08:00
yoshiko2
1bf204ac82 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-11-09 13:39:29 +08:00
yoshiko2
963320a14e Update translate func for title 2021-11-09 13:35:58 +08:00
Yoshiko2
78bc3f7601 Add files via upload 2021-11-09 13:24:17 +08:00
yoshiko2
0aa8bc3a90 Update nfo output func 2021-11-09 07:56:59 +08:00
yoshiko2
988b22dcb7 Update debug mode 2021-11-08 22:14:12 +08:00
yoshiko2
74390a2579 Update Mapping Table 2021-11-08 21:47:20 +08:00
yoshiko2
4a58a3fda1 Add translate to japanese 2021-11-07 16:28:37 +08:00
Yoshiko2
e825ad99a3 Merge pull request #630 from lededev/storyline-3
storyline: still need improve
2021-11-06 23:20:58 +08:00
Yoshiko2
59b554ee80 Merge pull request #631 from lededev/parallel-mapdown
parallel map download
2021-11-06 23:18:55 +08:00
Yoshiko2
06297949da Merge branch 'master' into parallel-mapdown 2021-11-06 23:17:52 +08:00
Yoshiko2
b2177ce256 Merge pull request #632 from Suwmlee/master
fix: parse title with _c _leak tags
2021-11-06 23:14:50 +08:00
unknown
af8d8e881e Add Mapping Table translate 2021-11-06 22:49:58 +08:00
unknown
41f4743149 Delete all translate func in all WebCrawlers 2021-11-06 22:49:19 +08:00
unknown
0c4df0130b Delete large translate 2021-11-06 22:45:44 +08:00
Mathhew
ccb4e764ae fix: parse title with _c _leak tags 2021-11-05 12:15:46 +08:00
lededev
722c96b29a fix conflicts 2021-11-05 08:12:57 +08:00
yoshiko2
7863302db7 Change Mapping Table Path 2021-11-05 07:45:46 +08:00
lededev
a9c4b0e417 parallel map download 2021-11-05 07:42:58 +08:00
lededev
e8af0e5a53 Merge branch 'master' into storyline-3 2021-11-05 03:29:56 +08:00
lededev
744178d288 storyline:Japanese results should not be the highest priority 2021-11-05 03:19:26 +08:00
lededev
92effe53b7 storyline: still need improve 2021-11-05 02:59:48 +08:00
yoshiko2
50c3975b75 Add Donwload Mapping Table function 2021-11-05 00:49:19 +08:00
Yoshiko2
a5e8687639 Create mapping_info.xml 2021-11-04 23:48:10 +08:00
Yoshiko2
0988039a34 Create mapping_actor.xml 2021-11-04 23:47:46 +08:00
Yoshiko2
8f362470b9 Merge pull request #629 from lededev/enumerate-1
storyline:add data source airavwiki
2021-11-04 22:18:35 +08:00
lededev
0622b8bda3 map function call local function 2021-11-03 20:14:06 +08:00
lededev
1863ccb1e2 fix when cookies path is None 2021-11-03 18:47:33 +08:00
lededev
fffa78a2c4 code refactoring: replace some enumerate() with zip() 2021-11-02 07:18:07 +08:00
lededev
e564629f16 deal with websites behind Clo*dfl**e 2021-11-02 03:51:31 +08:00
lededev
3786f58bb6 opencc need in pyinstaller --add-data 2021-11-01 05:07:53 +08:00
lededev
3b498d32ca replace browser by session in some places 2021-11-01 03:49:35 +08:00
lededev
0fe1b2fcac 新增繁简可选输出功能 2021-10-31 17:12:17 +08:00
lededev
cc0d89805a 剧情简介:增加功能启用开关,因质量airavwiki取繁体简介 2021-10-31 15:40:39 +08:00
lededev
05a0838d86 UA update to chrome v95 2021-10-31 03:21:39 +08:00
lededev
b67acd256b data from json 2021-10-31 02:51:30 +08:00
lededev
a50af88409 storyline:add data source airavwiki 2021-10-31 00:09:05 +08:00
lededev
935d12f4dc refactor with enumerate 2021-10-30 13:03:36 +08:00
Yoshiko2
0548e31875 Merge pull request #626 from lededev/storyline-2
remove semicolon
2021-10-29 00:37:24 +08:00
lededev
b96b3c6481 remove semicolon 2021-10-24 18:43:04 +08:00
Yoshiko2
c59967e0bd Remove PRC National Day celebrating 2021-10-24 17:52:56 +08:00
Yoshiko2
7b2dda9fa6 Update README.md 2021-10-24 17:39:57 +08:00
Yoshiko2
f5d47d9170 Merge pull request #624 from lededev/mt-ex-down
extrafanart download speed 6x up by thread pool
2021-10-24 03:48:58 +08:00
Yoshiko2
b22f95ed64 Merge pull request #625 from lededev/uncensored-1
number_parser.py: uncensored number add heyzo
2021-10-24 03:48:37 +08:00
Yoshiko2
a6e7a7ecc4 Merge pull request #623 from lededev/browser-t-r
ADC_function.py: add browser retry and timeout
2021-10-24 03:43:29 +08:00
Yoshiko2
dc00531239 Merge pull request #622 from lededev/title-j
original_title 原标题字段
2021-10-24 03:40:29 +08:00
Yoshiko2
4517efc502 Merge pull request #621 from lededev/storyline-1
三种调用接口保持一致性
2021-10-24 03:40:18 +08:00
lededev
2f523ea540 剧照下载:只显示每个错误,如果全部成功只显示一行汇总,大幅精简日志 2021-10-24 01:31:24 +08:00
lededev
1cf1782a22 bug fix using pathlib mkdir -p 2021-10-24 01:04:23 +08:00
lededev
7bc667e3b9 avsox.py: call translateTag_to_sc() 2021-10-24 00:42:57 +08:00
lededev
90593b6d7a number_parser.py: uncensored number add heyzo 2021-10-24 00:08:32 +08:00
lededev
e8eb3ff192 change download threads number configable default parallel_download=5 2021-10-23 21:05:49 +08:00
lededev
9c5db258ec clean up unused param 2021-10-23 20:42:27 +08:00
lededev
0758104e7a output error message also 2021-10-23 19:42:36 +08:00
lededev
db1ad1d582 make download_one_file() more general 2021-10-23 19:28:20 +08:00
lededev
03954c3a35 map return result keeping the same order with args, no need sorted() 2021-10-23 19:19:38 +08:00
lededev
4e96d57204 extrafanart download speed 6x up by thread pool 2021-10-23 18:14:25 +08:00
lededev
dbbfa72268 vscode debug program/cwd/args demo settings 2021-10-23 06:15:24 +08:00
lededev
eaa9d51d00 clean up 2021-10-23 05:51:43 +08:00
lededev
921f926056 ADC_function.py: add browser retry and timeout 2021-10-23 05:21:08 +08:00
lededev
24ac95f365 storyline.py: fix debug output 2021-10-23 05:04:15 +08:00
lededev
414151b139 add original_title for config.ini [Name_Rule]location_rule and naming_rule 2021-10-22 18:44:14 +08:00
lededev
d3eef993de storyline.py:剧情简介站点名称前增加序号,数字小的优先 2021-10-22 17:19:33 +08:00
lededev
99ae5bf996 三种调用接口保持一致性 2021-10-22 16:50:36 +08:00
Yoshiko2
1aa60ccfbd Update config.ini 2021-10-22 01:22:50 +08:00
Yoshiko2
4a09981dd8 Update to 5.0.2 2021-10-22 01:19:18 +08:00
Yoshiko2
7b4d246237 Merge pull request #607 from lededev/log-3
继续完善上个月提交的新功能
2021-10-22 00:30:38 +08:00
lededev
850679705e 剧情简介:新增无码元数据站点,配置文件改为通用、有码、无码三种站点分列 2021-10-21 20:02:07 +08:00
lededev
1f9bf6b4c2 日志合并:三天之前的日志,合并为单日单个文件,以解决增量处理时小文件过多问题 2021-10-21 19:57:09 +08:00
lededev
c440315488 翻译前检查语言,已经是中文了不必翻译,只翻译日语 2021-10-20 23:07:37 +08:00
lededev
cb83e4246d 无码检测移入number_parser.py并扩充识别能力 2021-10-20 03:34:44 +08:00
lededev
b025c51852 xcity.py:尝试获得中文剧情简介,没有则用原来的。修复tag数目不对,修复runtime不显示 2021-10-19 18:40:57 +08:00
lededev
c3e9ab7957 avsox.py: 优化:完成精简 2021-10-19 17:08:00 +08:00
lededev
8559eea296 avsox.py: 元数据新增剧情介绍。优化:减少etree.fromstring高开销调用次数 2021-10-19 15:18:39 +08:00
lededev
daf7f5e0a0 carib.py: 尝试获取中文剧情介绍 2021-10-19 15:14:15 +08:00
lededev
aae4df73fa javbus.py: 清理过期代码 2021-10-19 01:00:50 +08:00
lededev
249884a27e javbus.py: 优化提速 2021-10-19 00:58:28 +08:00
lededev
5da134986a storyline.py: bug fix 2021-10-19 00:17:45 +08:00
lededev
d80b2eeb7d javbus.py: 优化,修理无码片的导演、系列等字段 2021-10-19 00:14:26 +08:00
lededev
dd106453f7 对标记为删除的tag进行清理 2021-10-19 00:03:51 +08:00
lededev
4428971135 javdb.py: 优化,修理getActorPhoto() 2021-10-18 19:52:42 +08:00
lededev
5ef16e3a6d 剧情简介新增运行模式run_mode, 0:顺序执行 1:线程池 2:进程池 2021-10-18 18:09:36 +08:00
lededev
f553927913 提速,暂时屏蔽未实现的演员照片功能 javdb javbus 2021-10-18 17:58:21 +08:00
lededev
24b4f9f5e2 将元数据的来源网站记入日志以便进行评估 2021-10-18 10:51:32 +08:00
lededev
c9b96f65ab one line file copy 2021-10-18 08:47:11 +08:00
lededev
56bbfe6f24 storyline.py: skip SequenceMatcher when number match 2021-10-17 23:25:19 +08:00
lededev
3420f918f5 fix ratio.txt log lost newline 2021-10-17 22:53:53 +08:00
lededev
6624ed7224 clean up 2021-10-17 22:47:49 +08:00
lededev
bc3cda953d fix 2021-10-17 22:29:57 +08:00
lededev
a546c4e83e Parall query on storyline data 2021-10-17 21:59:08 +08:00
lededev
b006aee34d failed_list.txt keep order remove duplication 2021-10-17 21:21:12 +08:00
lededev
189f4db616 javdb:get faster benefit from http keep-alive 2021-10-15 21:16:48 +08:00
lededev
7f8d500b13 correction mechanicalsoup browser with cookies calling method 2021-10-15 21:00:32 +08:00
lededev
416e8be351 merge PR#612 2021-10-15 10:07:53 +08:00
lededev
317449c568 try fix issue 616: onedrive OSError input/output 2021-10-15 09:11:40 +08:00
Yoshiko2
380b220df1 Update README.md 2021-10-13 13:22:15 +08:00
lededev
f26987ddf9 move into try block 2021-10-12 11:42:30 +08:00
lededev
c0a4ce638c call moveFailedFolder when empty number on debug branch 2021-10-12 11:29:53 +08:00
lededev
f8dc05a38b improve javbus and javdb outline source 2021-10-12 11:28:17 +08:00
lededev
678a8f9bc8 Add signal handler 2021-10-11 10:24:46 +08:00
Yoshiko2
e72521951c Update README.md 2021-10-10 23:59:35 +08:00
lededev
e5abac9138 add download_only_missing_image config item 2021-10-10 18:02:53 +08:00
lededev
0933e87944 fix outline of javbus and javdb which caused by airav down 2021-10-10 17:41:33 +08:00
lededev
b0959d1b18 javdb:无有效期内cookies文件时,随机选择一个站点 2021-10-09 20:29:17 +08:00
lededev
d010ea6d51 清理全部conf穿梭参数 2021-10-09 19:42:11 +08:00
lededev
3873d1aa4c update user agent 2021-10-09 19:37:40 +08:00
lededev
bd3504f3b5 javdb:only accept one login site after javdb site update 2021-10-09 19:32:00 +08:00
lededev
f601669229 javdb:change to site 31 and 32 2021-10-09 12:23:00 +08:00
lededev
890452bffd 补上漏掉没更新的config打包脚本部分,先前被我的WinMerge filter规则过滤掉了 2021-10-09 09:07:38 +08:00
lededev
288acfb264 不会造成bug,但还是改一下好一些 2021-10-09 05:28:44 +08:00
lededev
35c4bf85ae argparse:need str as default value type 2021-10-08 16:01:31 +08:00
yjlmiss
5d4f76a11f 1 2021-10-08 13:10:31 +08:00
lededev
8ab736e4fa AV_Data_Capture.py:command params new add -m -d -c -i -g -z 2021-10-08 13:02:52 +08:00
lededev
b87206870b core.py:enhancement 2021-10-08 12:29:46 +08:00
lededev
40d25d23f5 ADC_function.py:换装getInstance(),load_cookies()改用pathlib 2021-10-08 12:17:12 +08:00
lededev
a405c5c41b WebCrawler:全面换装getInstance(),厘清airav.py与javbus.py及javdb.py的相爱相杀 2021-10-08 11:46:35 +08:00
lededev
cf072e79d1 输出排版优化,number放在左边固定位置,上一行的留白以便迅速定位 2021-10-08 11:29:47 +08:00
lededev
8cb57673b0 log auto merge 2021-10-08 11:15:30 +08:00
lededev
39ad025760 config.py:override config settings by cmd params, pyinstaller add config.ini 2021-10-08 10:22:05 +08:00
lededev
3183d284b7 number_parser.py:add more studio, unit test, full disk search as unit test 2021-10-08 08:33:03 +08:00
lededev
5df0339279 用normpath()才能维持原来的大小写,normcase()会全部变为小写 2021-10-04 23:57:16 +08:00
lededev
952e2c9a30 所有makedirs()失败做相同处理 2021-10-03 10:59:25 +08:00
lededev
6d1e99d8ab fix issue 603 2021-10-03 10:44:57 +08:00
lededev
8ef87c285f 再将其它几个makedirs()一起修正,去掉错上加错的提升到admin建议信息 2021-10-03 10:43:54 +08:00
lededev
f52db0011c optimize if logic 2021-10-03 10:21:47 +08:00
lededev
f21fdcb5f5 log dir adapts to makedirs(), fix CmdLine output 2021-10-03 09:53:09 +08:00
yoshiko2
f0c37ccf4c PRC National Day 2021-10-01 15:39:23 +08:00
yoshiko2
a4aa9ec762 Repair a bug 2021-10-01 13:30:34 +08:00
yoshiko2
0453132656 Repair escape folder does not take effect 2021-10-01 03:05:00 +08:00
Yoshiko2
1d0a55b260 Merge pull request #602 from lededev/javdbjson-fix
javdb.py:javdbx.json bugfix, find path before check days
2021-09-30 23:20:37 +08:00
lededev
ccf187245c soft_link mode .nfo check loop fix 2021-09-30 08:31:01 +08:00
lededev
0aa4c7d76c javdb.py:javdbx.json bugfix, find path before check days 2021-09-30 06:29:29 +08:00
Yoshiko2
08470a4cba Merge pull request #601 from lededev/codeclean-1
还有一点小bug,以及代码清理
2021-09-29 19:06:05 +08:00
lededev
531840c3fb 还有一点小bug,以及代码清理 2021-09-29 06:37:45 +08:00
Yoshiko2
2c22d70078 Merge pull request #600 from lededev/log-2
国庆节之前最后一点小改动
2021-09-29 00:35:44 +08:00
lededev
3e1d951af8 去掉返回值为空的tag 2021-09-28 18:36:20 +08:00
lededev
b5b2e7f0d8 由于目前程序未实现演员照片功能,暂时屏蔽以提升速度 2021-09-28 18:31:20 +08:00
lededev
b5d6c7fe4f 日志内前后增加完整日期时间戳以便日志合并 2021-09-28 18:30:50 +08:00
lededev
ef1c816483 修改运行计时为时分秒增加可读性,小数点保留3位以便区分定位到秒 2021-09-28 17:55:06 +08:00
lededev
a20bfc08b0 small optimization of log feature 2021-09-28 17:32:31 +08:00
Yoshiko2
256c711c6d Merge pull request #599 from lededev/ggt-conf
google translate site configurable
2021-09-28 13:58:08 +08:00
Yoshiko2
8345bc1064 Merge pull request #598 from lededev/javdbjson-mpath
javdb*.json path search order
2021-09-28 13:57:59 +08:00
lededev
536ee3f6d8 google translate site configurable 2021-09-28 00:28:31 +08:00
lededev
b1d302a042 log fullpath 2021-09-28 00:26:01 +08:00
Yoshiko2
255b5e40ce Use latest version of requests 2021-09-28 00:00:47 +08:00
lededev
75b71888d9 javdb*.json path search order 2021-09-27 22:20:37 +08:00
Yoshiko2
07893de121 Update 5.0.1 2021-09-27 22:07:25 +08:00
Yoshiko2
8f99f4b939 Merge pull request #597 from lededev/fc2-m
fc2.py: update
2021-09-27 22:02:07 +08:00
Yoshiko2
23ae8f9666 Change version of requests to 2.20 in requirements 2021-09-27 22:01:44 +08:00
Yoshiko2
30bc6a59c6 Merge pull request #591 from lededev/xcity-f1
xcity.py: get detail page by form query
2021-09-27 22:00:43 +08:00
Yoshiko2
875c5dc3a1 Merge pull request #596 from lededev/m3newfeture
新功能:1.增量整理 2.保存运行及错误日志到目录
2021-09-27 21:43:23 +08:00
lededev
afc0118830 由调用flush()改为hook print() 2021-09-27 06:39:47 +08:00
lededev
d7aea9a09d 双重异常有时会生成两条相同记录,用集合去重写回,并利用集合提升查询速度 2021-09-27 04:34:41 +08:00
lededev
9304aab940 外层也调用moveFailedFolder()函数来处理 2021-09-27 03:56:59 +08:00
lededev
3ef0db1890 每完成一部影片刷新一下输出,以免等到程序结束才看到日志内容 2021-09-27 03:23:33 +08:00
lededev
cc36562c17 add config path search order 2021-09-27 01:05:24 +08:00
lededev
d7197d8b8c 日志信息包含命令行参数 2021-09-26 19:49:24 +08:00
lededev
620affc987 add param -q regex query filepath 2021-09-26 19:20:20 +08:00
lededev
161f4063b9 fc2.py: update 2021-09-26 11:38:48 +08:00
lededev
948a3d20b0 输出内容更详细一些 2021-09-26 09:49:28 +08:00
lededev
6a4739c035 fix file close, add moving list 2021-09-26 09:07:19 +08:00
lededev
ecd5c7de1c 小清理,优化代码排版 2021-09-26 07:43:57 +08:00
lededev
bf5bd5d444 增量整理修改为适用于所有模式 2021-09-26 06:35:35 +08:00
lededev
c6efec91dd 新增失败文件列表以避免重复刮削,模式3与软连接适用 2021-09-26 04:25:25 +08:00
lededev
e87f5b7a1f Path().parent is a better way in here 2021-09-26 02:07:33 +08:00
lededev
911f43cd5c 路径分隔符方向保持一致输出会更美观 2021-09-26 01:23:44 +08:00
lededev
16078b95b1 解决一个视频文件名不带,而.nfo文件名带'-流出'的死循环bug 2021-09-26 00:35:00 +08:00
lededev
aae5b0a8d3 using makedirs() 2021-09-25 23:45:55 +08:00
lededev
6834b7d215 help output format 2021-09-25 23:14:57 +08:00
lededev
184230f934 supplementary explanation in --help 2021-09-25 22:57:48 +08:00
lededev
4fbb6454c0 show default --log-dir for current user in --help 2021-09-25 22:41:12 +08:00
lededev
bdbd97149f turn on log by default, to turn off by '--log-dir=' 2021-09-25 22:26:21 +08:00
lededev
4ffc34a5cf xcity on top when number similar ABP321 2021-09-25 20:54:07 +08:00
lededev
0ca57fa8c1 add Google-free translate failed warnning 2021-09-25 19:31:53 +08:00
lededev
37869f58e7 新功能:1.模式3增量整理 2.保存运行及错误日志到目录 2021-09-25 18:56:18 +08:00
Yoshiko2
8cfefc60ef Merge pull request #594 from lededev/schar-2
replace special characters after translate, null str do not write back
2021-09-25 16:43:41 +08:00
Yoshiko2
cc48c26558 Merge pull request #593 from lededev/ua-up
update user agent
2021-09-25 16:42:28 +08:00
Yoshiko2
868aeb9572 Merge pull request #590 from lededev/cfg-dump-all
config.py: unit test dump all items
2021-09-25 16:35:25 +08:00
Yoshiko2
d02563c446 Merge pull request #589 from lededev/fmovacvol
fix across volume moving
2021-09-25 16:34:47 +08:00
Yoshiko2
fe39c7f114 Merge pull request #584 from lededev/s-ch-rep2
replace special characters in outline json node
2021-09-25 16:33:17 +08:00
Yoshiko2
6bd3f9b8ed Merge pull request #583 from lededev/smdir
improve the judgment method of the same directory
2021-09-25 16:32:38 +08:00
lededev
43bb64d7d0 xcity.py: Strictly limit the number 2021-09-25 06:53:40 +08:00
lededev
6c990e8482 xcity.py: Mode 3 requires the file name to remain unchanged 2021-09-25 06:45:08 +08:00
lededev
c41df40e9f replace special characters after translate, null str do not write back 2021-09-24 01:11:25 +08:00
lededev
c7b83bfc0e update user agent 2021-09-23 22:20:25 +08:00
lededev
50574a705b carib.py: add outline/series/actor_photo 2021-09-23 15:45:00 +08:00
lededev
5e0e8b9cea WebCrawler site list in default config.ini larger than 60 2021-09-23 15:43:00 +08:00
lededev
54ed626294 remove abs_url(), just urljoin() is enough 2021-09-23 08:21:01 +08:00
lededev
c599463409 rewrite getActorPhoto() to get real photo 2021-09-23 07:58:53 +08:00
lededev
c32a4a12ac speed up by reusing stateful browser 2021-09-23 07:01:24 +08:00
lededev
367d53b09b add param allow return stateful browser for follow_link() 2021-09-23 05:10:43 +08:00
lededev
d6677b717d check add cookies 2021-09-23 03:56:55 +08:00
lededev
446eb166a3 requests-2.26.0 need by MechanicalSoup-1.1.0 2021-09-22 06:13:20 +08:00
lededev
b59b4938d6 xcity.py: get detail page by form query 2021-09-22 06:03:58 +08:00
lededev
efc94917a6 shorter str formatting 2021-09-22 00:47:15 +08:00
lededev
12159abba7 reduce 2 lines 2021-09-22 00:25:07 +08:00
lededev
21a3c9fd02 config.py: unit test dump all items 2021-09-21 23:45:25 +08:00
lededev
3bbcc52aee fix across volume moving 2021-09-21 19:44:06 +08:00
lededev
d368d061f2 replace special characters in outline json node 2021-09-19 17:53:18 +08:00
lededev
d409e5006b improve the judgment method of the same directory 2021-09-19 03:11:46 +08:00
Yoshiko2
ffd80ba0e4 Merge pull request #582 from lededev/priority-1
javbus on top
2021-09-18 13:43:40 +08:00
Yoshiko2
0591400bb3 Merge pull request #580 from lededev/s-ch-rep
special characters replacement in all json text nodes
2021-09-18 13:39:28 +08:00
lededev
9d1c6e9ca0 javbus on top 2021-09-17 22:46:49 +08:00
lededev
d4f6abe1be special characters replacement in all json text nodes 2021-09-15 17:29:05 +08:00
Yoshiko2
05e3bbff2f Merge pull request #578 from lededev/conf-utopt
better config unit test output
2021-09-13 23:33:59 +08:00
lededev
1b93fcfecc as local function 2021-09-12 23:51:58 +08:00
lededev
4d86003c13 better config unit test output 2021-09-12 23:43:21 +08:00
Yoshiko2
1913d82b40 Update to 4.7.2 2021-09-06 00:44:39 +08:00
Yoshiko2
7716af2be0 Merge pull request #575 from lededev/py3.7
CLI version depends on python 3.7
2021-09-05 22:31:41 +08:00
lededev
c40be3e5a9 CLI version depends on python 3.7 2021-09-04 10:29:55 +08:00
Yoshiko2
1e4a10bbb0 Merge pull request #574 from lededev/ign-trailer
ignore -trailer video files
2021-09-04 04:11:41 +08:00
Yoshiko2
7e15800cbe Merge pull request #568 from lededev/rmdir
rewrite rm_empty_folder, add option in config.ini
2021-09-04 02:29:38 +08:00
lededev
631dee6d9a change to global variables to avoid repeated compilling in recursion 2021-09-03 19:47:12 +08:00
lededev
757e930a50 precompile and search 2021-09-03 19:30:52 +08:00
lededev
f9433e589f mode 3 also need check this match pattern 2021-09-03 19:10:49 +08:00
lededev
a98ba439bc ignore -trailer video files 2021-09-03 18:49:52 +08:00
lededev
033b1c498e rewrite rm_empty_folder, add option in config.ini 2021-08-16 00:21:18 +08:00
Yoshiko2
83b704017c Merge pull request #566 from lededev/m4v
Add media format suffixes .mpg .m4v support
2021-08-15 03:42:26 +08:00
Yoshiko2
6b5331b8a7 Merge pull request #565 from lededev/mode3
Make mode 3 run through
2021-08-15 03:41:35 +08:00
lededev
50c4be28c2 Add media format suffixes .mpg .m4v support 2021-08-13 04:45:06 +08:00
lededev
f8cf813d83 Make mode 3 run through 2021-08-13 03:12:07 +08:00
Yoshiko2
a1ca4b9cad Merge pull request #560 from Suwmlee/master
Check sources
2021-08-10 02:48:27 +08:00
Mathhew
1204076366 Check sources 2021-08-09 11:09:42 +08:00
yoshiko2
e572a55907 Update to 4.7.1 2021-08-03 03:30:16 +08:00
yoshiko2
82fc31c4e0 Revert some code styles to adapt to Python 3.7 2021-08-03 03:15:07 +08:00
yoshiko2
91c7b016a2 Adjust order for sources 2021-08-03 02:52:38 +08:00
Yoshiko2
15e8166610 Merge pull request #555 from Suwmlee/master
Move get_data_from_json to WebCrawler
2021-08-03 02:31:27 +08:00
Mathhew
08df7383a5 Make webcrawler clear 2021-07-29 10:28:25 +08:00
Mathhew
2c41487a4e Move get_data_from_json to WebCrawler 2021-07-28 16:12:08 +08:00
yoshiko2
ae2c2bcf23 Add warning text 2021-07-26 16:13:53 +08:00
Yoshiko2
8a614c45d9 Merge pull request #550 from itojito/master
add fc2club surpport and fix fc2 exception
2021-07-24 14:54:05 +08:00
tojito
1cb54fb956 fix fc2 exception of getTrail() 2021-07-20 23:54:36 +08:00
tojito
9ed239105a add fc2club 2021-07-19 01:04:50 +08:00
Yoshiko2
99e9cfc5bf Merge pull request #549 from loveritsu929/master
Cut mgstage fanart
2021-07-18 21:22:03 +08:00
loveritsu929@home
9f337effbf cut mgstage poster 2021-07-17 01:07:40 +08:00
Yoshiko2
1809f5a32d Merge pull request #548 from lededev/x-art
number_parser:add x-art support
2021-07-16 17:48:13 +08:00
Yoshiko2
e83be4736a Merge pull request #547 from lededev/cookies-over18
javdb:always include over18 cookies
2021-07-16 17:47:34 +08:00
lededev
27e47334ea minor correction 2021-07-16 13:54:24 +08:00
Yoshiko2
4b6da439e7 Merge pull request #546 from lededev/ugua
all http requests using the same user-agent
2021-07-16 13:37:36 +08:00
Yoshiko2
ed6ea7e63b Merge pull request #545 from loveritsu929/master
Download extrafanart and trailer only once for multi-part movies
2021-07-16 13:34:58 +08:00
lededev
1c759e713e number_parser:add x-art support 2021-07-16 13:23:29 +08:00
lededev
f51d91b227 try another domain name 2021-07-15 12:34:21 +08:00
lededev
d771b4e985 javdb:always include over18 cookies 2021-07-15 12:18:55 +08:00
lededev
7048a3254c all http requests using the same user-agent 2021-07-15 10:39:41 +08:00
loveritsu929@home
aabaf491b9 trailer is named with '-cd1' tag for multi-part movies
if that matters for some media management software...
2021-07-15 00:57:03 +08:00
loveritsu929@home
6e855cca01 download extrafanart and trailer only once 2021-07-15 00:48:57 +08:00
Yoshiko2
8c5d85298e Merge pull request #538 from lededev/py3.8-ae
simplified using assignment expressions, new feature in python 3.8
2021-07-14 16:02:14 +08:00
Yoshiko2
71d0adecf0 Update github actions to Python 3.8 2021-07-14 15:56:03 +08:00
Yoshiko2
7880f81f40 Merge pull request #543 from loveritsu929/master
Fixing extrafanart downloading issue
2021-07-14 15:54:08 +08:00
loveritsu929@home
cf924d5319 fixing extrafanart downloading issue
remove leading slash in filename
2021-07-13 15:00:01 +08:00
yoshiko2
03630738d9 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-07-10 15:02:07 +08:00
yoshiko2
5a80f13bec Fix crash caused by custom site 2021-07-10 15:01:56 +08:00
lededev
34b0d37748 simplified using assignment expressions, new feature in python 3.8 2021-07-10 14:51:25 +08:00
Yoshiko2
423ede4584 Add files via upload 2021-07-09 00:18:55 +08:00
Yoshiko2
0a71ec1606 Add files via upload 2021-07-09 00:17:38 +08:00
Yoshiko2
877d890ca8 Update Linux.sh 2021-07-08 22:56:16 +08:00
Yoshiko2
e9a7f6a099 Update FUNDING.yml 2021-07-08 19:59:27 +08:00
Yoshiko2
74a1e63bd6 Update FUNDING.yml 2021-07-08 19:58:41 +08:00
Yoshiko2
935362fe60 Update FUNDING.yml 2021-07-08 19:58:19 +08:00
Yoshiko2
9d5c2b5480 Update FUNDING.yml 2021-07-08 19:58:00 +08:00
Yoshiko2
c67cfea593 Update FUNDING.yml 2021-07-08 19:57:23 +08:00
Yoshiko2
da5946d981 Update FUNDING.yml 2021-07-08 19:56:23 +08:00
Yoshiko2
14b271eb08 Update FUNDING.yml 2021-07-08 19:46:36 +08:00
Yoshiko2
fe5fe4544c Update FUNDING.yml 2021-07-08 19:45:57 +08:00
Yoshiko2
5738cfbd89 Create FUNDING.yml 2021-07-08 19:44:53 +08:00
yoshiko2
80b973ef59 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-07-06 16:48:11 +08:00
yoshiko2
458e951d42 Update source javdb #2 2021-07-06 16:47:41 +08:00
Yoshiko2
bcc0e5aa7a Update README.md 2021-07-06 03:57:50 +08:00
Yoshiko2
88c378bf47 Rename readme_tc to readme_tc.md 2021-07-06 03:57:24 +08:00
Yoshiko2
c1aceafe0c Create readme_tc 2021-07-06 03:56:00 +08:00
yoshiko2
99832367a2 Update source javdb 2021-07-05 21:51:56 +08:00
Yoshiko2
1a365457cd Merge pull request #534 from lededev/vscw
tuple as return values in type annotation
2021-07-05 20:37:49 +08:00
lededev
996ee5609f up 2021-07-04 17:36:32 +08:00
lededev
d14b315e53 tuple as return values in type annotation 2021-07-04 17:29:17 +08:00
Yoshiko2
5f9441411e Merge pull request #532 from lededev/oun
javdb:only accept unique number
2021-06-29 12:06:07 +08:00
lededev
5bd044dc61 javdb:only accept unique number 2021-06-29 02:13:46 +08:00
Yoshiko2
b85977e16b Merge pull request #529 from lededev/joindir
use os.path.join() to avoid // path complaints
2021-06-27 21:22:57 +08:00
yoshiko2
9934267077 Update 4.6.8 2021-06-27 21:16:14 +08:00
lededev
15aa0a5198 use path.join() to avoid // path complaints 2021-06-26 18:21:01 +08:00
Yoshiko2
ff72e2713b Merge pull request #528 from lededev/mvfailed
core.py:moveFailedFolder() just the same action as in AV_Data_Capture.py
2021-06-26 16:46:11 +08:00
lededev
9cd4b35ea0 call Config() one time and assign refence 2021-06-26 16:18:43 +08:00
lededev
a31673d07c core.py:remove useless shuttle parameters 2021-06-26 02:45:43 +08:00
lededev
eb1d577f00 core.py:moveFailedFolder() just the same action as in ADC_function.py 2021-06-26 01:55:26 +08:00
Yoshiko2
f41c965f40 Merge pull request #525 from sdy623/master
修改了README.md 里申明部分的日文版本,使之语言更加准确。翻译更加周到
2021-06-20 17:28:23 +08:00
sdy623
967a5f293c ----文档校对----
修改了README.md 里申明部分的日文版本,使之语言更加准确。
2021-06-19 23:48:58 +08:00
Yoshiko2
b40c0407b3 Merge pull request #524 from lededev/cut-1
cover copy path fix
2021-06-18 17:33:41 +08:00
lededev
d37b06f38b cover copy path fix 2021-06-18 16:22:48 +08:00
Yoshiko2
b13b85c057 Merge pull request #519 from lededev/localpic
improve processing speed by priority of using local watermark images
2021-06-15 00:34:33 +08:00
lededev
fa49a14a0f simplify again 2021-06-14 13:29:58 +08:00
lededev
8020d24226 optimize code logic and tighten restrictions 2021-06-14 12:56:43 +08:00
lededev
4c1a0f80c1 improve processing speed by priority of using local watermark images 2021-06-14 12:28:39 +08:00
Yoshiko2
a965b35fa0 Merge pull request #517 from lededev/javbus-url
convert image relative url to absolute url
2021-06-14 02:41:22 +08:00
lededev
3cce315100 convert image relative url to absolute url 2021-06-11 13:53:08 +08:00
yoshiko2
8fbe101196 Update to 4.7.7 2021-06-08 02:26:09 +08:00
Yoshiko2
4d34940fb3 Merge pull request #512 from lededev/chrome-agent
upgrade User-Agent string to current Chrome version
2021-06-06 17:55:58 +08:00
Yoshiko2
9544d338e4 Merge pull request #511 from lededev/bus-uri
javbus:fix cover uri
2021-06-06 17:55:50 +08:00
lededev
36c734ca9e upgrade User-Agent string to current Chrome version 2021-06-06 05:57:22 +08:00
lededev
9072f8b5ec use str startswith() 2021-06-06 05:14:13 +08:00
lededev
ac22fcdf05 optimize code logic 2021-06-06 05:06:42 +08:00
lededev
e2cd1f09df check uri not start with / 2021-06-06 04:57:24 +08:00
lededev
5abeb360af if not start with http 2021-06-06 04:46:59 +08:00
lededev
0989195008 javbus:fix uri 2021-06-06 04:31:12 +08:00
Yoshiko2
9538a873bc Merge pull request #509 from lededev/lnkck
video files must be regular files, links will be ignored
2021-06-06 02:25:30 +08:00
lededev
15198a62c2 mode 3 allow links because no file moving 2021-06-05 21:12:31 +08:00
lededev
9f1e6d5206 video files must be regular files, links will be ignored 2021-06-05 21:00:35 +08:00
Yoshiko2
d46cd291c0 Merge pull request #508 from xingfanxia/master
fix import of #506
2021-06-04 17:59:27 +08:00
xingfan_xia
93cb465bde fix import 2021-06-04 02:12:01 -07:00
Yoshiko2
e913925279 Merge pull request #507 from Suwmlee/master
Refine Proxy
2021-06-04 14:10:04 +08:00
Yoshiko2
f97c6f8748 Merge pull request #506 from xingfanxia/master
add airav outline to javdb crawler javdb抓取剧情介绍
2021-06-04 14:08:27 +08:00
Mathhew
7fe5b69ad2 Refine Proxy 2021-06-04 13:43:55 +08:00
xingfan_xia
b88b2ead7e add airav outline to javdb crawler 2021-06-03 12:29:19 -07:00
Yoshiko2
863dd3bb81 Update requirements.txt 2021-06-04 02:02:10 +08:00
yoshiko2
88cef56870 Fix source avsox 2021-06-04 01:27:02 +08:00
yoshiko2
191c1f99ed Update to 4.6.6 2021-06-03 02:20:20 +08:00
yoshiko2
488f1efcca Update requirements.txt 2021-06-02 16:28:38 +08:00
Yoshiko2
57d19c1709 Update requirements.txt 2021-06-02 02:07:34 +08:00
Yoshiko2
99d4ff999f Update README.md 2021-06-01 17:21:39 +08:00
Yoshiko2
a98199f065 Update README.md 2021-06-01 17:20:55 +08:00
Yoshiko2
98f50e76cc Merge pull request #499 from lededev/fc2-nbr
javdb:FC2 PPV number precise matching
2021-05-31 21:41:48 +08:00
lededev
a5363430b1 detail page number as number 2021-05-31 19:34:06 +08:00
lededev
cf45a8bf60 simplified statement 2021-05-31 19:24:21 +08:00
lededev
2ab40201b5 javdb:FC2 PPV number precise matching 2021-05-31 19:12:06 +08:00
Yoshiko2
a3ceaf42b2 Update README.md 2021-05-30 19:15:16 +08:00
Yoshiko2
1b1ac5e308 Update README.md 2021-05-30 18:57:20 +08:00
Yoshiko2
2546da86a0 Update README.md 2021-05-30 18:56:21 +08:00
Yoshiko2
6437f55701 Add files via upload 2021-05-30 17:37:36 +08:00
Yoshiko2
2f38efe3d4 Delete alipay.png 2021-05-30 17:37:08 +08:00
Yoshiko2
fb96921378 Add files via upload 2021-05-30 17:33:35 +08:00
Yoshiko2
c582d2f50b Update config.ini 2021-05-29 23:51:21 +08:00
yoshiko2
32fe7e3e5d Remove jav321 source 2021-05-29 23:40:39 +08:00
yoshiko2
626e51d22d Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-05-29 23:35:17 +08:00
yoshiko2
0fa1e482cd Fix javbus source 2021-05-29 23:35:07 +08:00
Yoshiko2
c4c229bbc5 Update main.yml 2021-05-29 22:03:49 +08:00
yoshiko2
36f5f80ccb Update to 4.6.5 2021-05-29 00:59:12 +08:00
yoshiko2
a3eb94ddcb Fix source output 'list out of range' 2021-05-29 00:50:47 +08:00
yoshiko2
2a7c2b5095 Fix search fuction for javdb source #2 2021-05-28 03:16:42 +08:00
yoshiko2
047da6baa8 Fix search fuction for javdb source 2021-05-28 03:16:21 +08:00
Yoshiko2
c533a78357 Merge pull request #497 from sino1641/patch-2
Ignore http proxy warning
2021-05-27 20:38:18 +08:00
Sin
815a6a73a8 Ignore http proxy warning
#425 #453
2021-05-26 02:33:04 +08:00
yoshiko2
7ef59cfcf0 Remove the repetitive code in multi-threaded parts 2021-05-24 21:13:29 +08:00
yoshiko2
dcdf68ae2e Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-05-23 01:51:30 +08:00
yoshiko2
34a563562e Fix remove empty folder 2021-05-23 01:50:38 +08:00
Yoshiko2
d99aa9baa2 Update FreeBSD.sh 2021-05-22 20:27:15 +08:00
Yoshiko2
d82e591585 Update Linux.sh 2021-05-22 16:50:56 +08:00
Yoshiko2
c57fe53073 Update config.ini 2021-05-22 16:15:14 +08:00
Yoshiko2
f55a02a851 Merge pull request #493 from lededev/cmw
cookies expiration warrning
2021-05-22 16:14:41 +08:00
lededev
fa14dcc492 cookies expiration warrning 2021-05-22 15:13:01 +08:00
Yoshiko2
be9aafdc5b Update main.yml 2021-05-22 02:07:21 +08:00
Yoshiko2
d83dc61dd9 Update main.yml 2021-05-22 02:03:50 +08:00
Yoshiko2
878479cfdb Update main.yml 2021-05-22 01:56:07 +08:00
Yoshiko2
6af721b79f Update main.yml 2021-05-21 22:03:28 +08:00
yoshiko2
b7b327b9c4 Update 4.6.4 #2 2021-05-21 18:45:06 +08:00
Yoshiko2
8dcf34a1ab Merge pull request #491 from lededev/javdb_fix
javdb: fix title, cookie expiration detection
2021-05-20 21:30:02 +08:00
lededev
8908bb8892 javdb: fix title, cookie expiration detection 2021-05-20 21:25:17 +08:00
yoshiko2
ab50ec6b30 Update javdb source 2021-05-20 20:41:14 +08:00
yoshiko2
210b2cf6b8 Update to 4.6.4 2021-05-20 20:33:46 +08:00
yoshiko2
9a7ca2cf32 Fix remove empty folder 2021-05-20 20:32:43 +08:00
Yoshiko2
264c63be4e Merge pull request #490 from lededev/javdb_gdr
fix javdb actors section
2021-05-20 19:31:50 +08:00
lededev
a900019099 fix javdb actors section 2021-05-19 15:41:55 +08:00
Yoshiko2
78e725f603 Merge pull request #483 from lededev/cookies-1
javdb enable user login cookies
2021-05-17 19:52:50 +08:00
Yoshiko2
97651ba6f0 Merge pull request #486 from lededev/gedo-1
add tokyo hot gedo
2021-05-16 21:25:00 +08:00
lededev
415b0334f2 add tokyo hot gedo
match 2 formats: gedo01 .. gedo59, gedo0060 .. gedoXXXX
2021-05-12 00:38:08 +08:00
lededev
0ffe355cad change priority for fc2 2021-05-09 13:41:59 +08:00
lededev
3042001df5 javdb enable user login cookies 2021-05-09 12:23:21 +08:00
Yoshiko2
9941543d49 Update config.ini 2021-05-09 01:00:12 +08:00
yoshiko2
d039a3c44e Update to 4.6.3 2021-05-09 00:46:31 +08:00
yoshiko2
a4a679fc2f Adjust source order for uncensored 2021-05-08 16:44:47 +08:00
Yoshiko2
f181707edd Merge pull request #482 from lededev/mu-2
recovery the code overwrited by the commit order
2021-05-08 16:43:20 +08:00
lededev
d6c40a0859 recovery the code overwrited by the commit order 2021-05-08 03:27:02 +08:00
yoshiko2
d7c2c49e49 Update to 4.6.3 2021-05-08 00:50:51 +08:00
Yoshiko2
3e6763f7af Merge pull request #478 from lededev/carib-enable-proxy
carib.py: use proxy config settings
2021-05-08 00:42:42 +08:00
Yoshiko2
f02fe16254 Merge pull request #479 from lededev/10mu-1
case insensitive strip, add 10musume and pacopacomama detect
2021-05-08 00:42:30 +08:00
Yoshiko2
4e40928d94 Merge pull request #480 from lededev/sft-win
fix empty number cause prog stop, failed symlink not create on windows
2021-05-08 00:42:07 +08:00
lededev
d9b936bf62 stricter conditions 2021-05-07 09:05:52 +08:00
lededev
c8c02c4911 case insensitive strip, poco detect 2021-05-07 08:35:06 +08:00
lededev
b860788281 commit only single file 2021-05-06 22:22:02 +08:00
lededev
f3ffb5753f fix empty number cause prog stop, failed symlink not create on windows 2021-05-06 22:09:28 +08:00
lededev
e47c17a57f add 10musume detect 2021-05-06 05:31:36 +08:00
lededev
1460e2962d carib.py: use proxy config settings 2021-05-06 02:07:53 +08:00
yoshiko2
624c828968 Fix custom Adjust source order in config file for multi threading 2021-05-05 21:30:39 +08:00
Yoshiko2
031ada6f34 Merge pull request #475 from lededev/carib-1pon-1
fix carib and 1pond number issues
2021-05-05 21:24:26 +08:00
Yoshiko2
b9ab3f4856 Merge pull request #477 from yoshiko2/origin
Revert "Adjust source order for single threading"
2021-05-05 21:24:00 +08:00
yoshiko2
5258b52f4b Revert "Adjust source order for single threading"
This reverts commit 4778e6360e.
2021-05-05 21:09:48 +08:00
yoshiko2
4778e6360e Adjust source order for single threading 2021-05-05 16:23:20 +08:00
lededev
c816dca54b minor fix 2021-05-05 13:52:33 +08:00
lededev
7e11b26ea3 add missing vj in dlsite 2021-05-05 12:27:37 +08:00
lededev
10e35cbd92 fix carib and 1pond number issues 2021-05-05 12:00:10 +08:00
yoshiko2
07026f89f8 Update to 4.6.2 2021-05-04 21:09:00 +08:00
yoshiko2
031bab5219 Remove custom config file path #2 2021-05-04 20:10:58 +08:00
yoshiko2
7f51ba1514 Remove custom config file path 2021-05-04 20:08:55 +08:00
yoshiko2
3f5a7a0f52 Remove replace '_' to '-' 2021-05-04 19:06:31 +08:00
yoshiko2
05e2ac5574 Disable multi threading default 2021-05-04 16:34:08 +08:00
yoshiko2
3087f514b9 Adjust source order 2021-05-04 16:33:21 +08:00
yoshiko2
39b6aabd21 Fix Sigle File mode 2021-05-04 03:06:51 +08:00
yoshiko2
f2eebff163 Fix jav321 Exception handling #2 2021-05-04 01:32:17 +08:00
yoshiko2
a088992e74 Fix jav321 Exception handling 2021-05-04 00:29:38 +08:00
yoshiko2
b73758640e Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-05-04 00:02:20 +08:00
yoshiko2
f8a3756173 Fix FC2 fake success 2021-05-04 00:02:02 +08:00
Yoshiko2
22dda137ca Update README.md 2021-05-03 22:57:40 +08:00
Yoshiko2
8223a5e04c Update README.md 2021-05-03 22:54:38 +08:00
yoshiko2
94c5598fde Fix all source Exception handling 2021-05-03 22:27:09 +08:00
yoshiko2
c8e61936b3 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2021-05-03 22:14:41 +08:00
yoshiko2
fceed1d04e Fix mgstage source error Exception handling 2021-05-03 22:14:22 +08:00
yoshiko2
ef87a626da Fix 'local variable 'url_json' referenced before assignment' 2021-05-03 22:03:32 +08:00
Yoshiko2
49f5615252 Merge pull request #473 from CosmosAtlas/master
Use large image as cover in mgstage
2021-05-03 21:33:31 +08:00
Wenhan Zhu (Cosmos)
3ee94a2f24 Merge remote-tracking branch 'origin/master' into HEAD 2021-05-03 20:35:38 +08:00
Wenhan Zhu (Cosmos)
efb4531034 changed mgstage to use high-res cover 2021-05-03 13:21:19 +08:00
yoshiko2
de8cb063e3 Remove javlib source 2021-05-03 02:45:48 +08:00
Yoshiko2
cef296fbf1 Merge pull request #470 from SharpX2016/master
修复流出类型影片文件名未加 ‘-流出’ 和其他文件名称不匹配的问题
2021-04-29 22:34:30 +08:00
SharpX2016
1a9fd18f15 修复流出类型影片文件名未加 ‘-流出’ 和其他文件名称不匹配的问题
修复流出类型影片文件名未加 ‘-流出’ 和其他文件名称不匹配的问题
2021-04-29 18:03:50 +08:00
Yoshiko2
8b47fdaa6c Update README.md 2021-04-29 16:51:47 +08:00
Yoshiko2
d15490d772 Add files via upload 2021-04-29 16:32:28 +08:00
yoshiko2
b5abdcee4f Update config.ini 2021-04-29 10:41:51 +08:00
yoshiko2
ed044e0be3 Update config.py for Multi Threading 2021-04-29 10:00:54 +08:00
yoshiko2
043789dd45 Fix get_html() & post_html() 2021-04-29 09:59:26 +08:00
yoshiko2
6e8ebb97bb Update Version to 4.6.1 2021-04-29 09:58:36 +08:00
yoshiko2
cd8afb3353 Add Multi Threading crawler 2021-04-29 09:58:11 +08:00
Yoshiko2
b4cb2e1f4e Merge pull request #469 from SharpX2016/master
字幕文件无法移动问题,javbus标签多出问题,标签翻译#fc2区域 “中日翻译全部颠倒”和 其他部分翻译错误 问题
2021-04-28 19:41:41 +08:00
SharpX2016
e63983e046 更新 ADC_function.py
部分翻译错误修正
2021-04-28 17:31:11 +08:00
SharpX2016
9ceb6d747a 更新 ADC_function.py
'無修正': '无修正'
2021-04-28 14:30:02 +08:00
SharpX2016
98cee6255a 更新 ADC_function.py
修复标签翻译#fc2区域 中日翻译颠倒问题
2021-04-28 14:23:28 +08:00
SharpX2016
c9407e2df7 更新 javbus.py
修复javbus类别标签抓取时多出的 “多選提交”
2021-04-28 13:41:12 +08:00
SharpX2016
dc551de957 更新 core.py
修复字幕文件无法自动移动
2021-04-28 10:36:09 +08:00
Yoshiko2
8ebe2ae7d5 Merge pull request #465 from SharpX2016/master
更新 core.py
2021-04-27 17:38:08 +08:00
SharpX2016
07af676bd6 更新 core.py
1.处理元数据时无码的会加上 '无码' 标签(tag)及 风格类型(genre)
2.处理元数据时流出的会加上 '流出' 风格类型(genre)
3.处理元数据时加上 '系列'(series) 风格类型(genre)
4.将标签(tag)及 风格类型(genre)顺序对应
2021-04-27 13:29:17 +08:00
yoshiko2
ca198ff8be Fix javdb source 2021-04-25 20:26:42 +08:00
yoshiko2
746602f6d6 Update to 4.5.2 2021-04-25 20:03:56 +08:00
yoshiko2
f968eb11e6 Fix BIG BUG 2021-04-25 20:03:09 +08:00
yoshiko2
7e7412ea6f Fix update check #2 2021-04-23 15:08:38 +08:00
yoshiko2
f37ed87e62 Fix update check 2021-04-23 14:33:44 +08:00
Yoshiko2
79f17f3510 Update config.ini 2021-04-23 01:38:49 +08:00
yoshiko2
1cd1849eb9 Update version to 4.5.1 2021-04-22 03:28:29 +08:00
yoshiko2
3ffef99e31 fix small bug 2021-04-22 03:25:17 +08:00
yoshiko2
f761e5bccc javbus, javlib use outline in airav 2021-04-22 03:22:24 +08:00
yoshiko2
98c8585327 change host for func get_javlib_cookie 2021-04-22 02:50:39 +08:00
Yoshiko2
bac1b05263 Merge pull request #455 from lededev/rel-1
改为用相对路径建立软链接,以便网络访问时可以正确打开视频
2021-04-22 02:42:19 +08:00
lededev
809067fdaf 改为用相对路径建立软连接,以便网络访问时可以正确打开视频 2021-04-19 17:19:21 +08:00
Yoshiko2
d2b5eea042 Merge pull request #448 from lededev/master
移走文件后,在原来位置增加一个可追溯的软链接,指向文件新位置
2021-04-19 03:07:34 +08:00
Yoshiko2
b0d3e79508 Merge branch 'master' into master 2021-04-19 03:07:27 +08:00
Yoshiko2
abc5f6301e Merge pull request #451 from lededev/th-1
support tokyo hot filename rules
2021-04-19 03:03:19 +08:00
Yoshiko2
29c9385b0a Merge pull request #447 from mcdull/master
增加 流出 命名規則
2021-04-19 03:02:45 +08:00
lededev
d58115a699 support tokyo hot filename rules 2021-04-03 07:17:40 +08:00
lededev
b971b05a2d 移走文件后,在原来位置增加一个可追溯的软链接,指向文件新位置 2021-03-31 07:55:14 +08:00
mcdull
d4247c967a Update core.py
文件中有 uncensored 也辨認為流出
命名流出添加 -流出
2021-03-28 20:02:43 +08:00
mcdull
53487bbfe6 Merge pull request #1 from yoshiko2/master
update forks
2021-03-28 19:40:43 +08:00
Yoshiko2
68eb05c93b Update README.md 2021-03-14 14:09:20 +08:00
Yoshiko2
61a07668e9 Update version to 4.4.2 2021-03-02 10:45:54 +08:00
Yoshiko2
a1153366f2 Change website priority 2021-03-02 10:41:52 +08:00
Yoshiko2
a7e2eb210d Merge pull request #434 from mcdull/master
把 AIRAV 的大姐姐名稱改由 javbus 取得
2021-03-02 10:40:25 +08:00
mcdull
3d45e27f21 Update airav.py
change to use javbus to obtain actor name
2021-02-24 07:17:04 +08:00
mcdull
d831053203 Update airav.py
change actor name source to javbus
2021-02-24 05:48:32 +08:00
Yoshiko2
fa1d99be4c Merge pull request #427 from rosystain/patch-2
Update core.py
2021-02-11 18:28:13 +08:00
Chock
c4538e2b31 Update core.py
去除<studio></studio>中多余的"+"号
2021-02-11 17:40:43 +08:00
Yoshiko2
95b07ec60a Merge pull request #426 from pikachullk/patch-2
Update jav321.py
2021-02-10 18:49:07 +08:00
Yoshiko2
238c8e693c Merge pull request #423 from Feng4/patch-15
Update core.py
2021-02-10 18:48:58 +08:00
Yoshiko2
02dbce5b51 Merge pull request #422 from Feng4/patch-14
Update jav321.py
2021-02-10 18:48:48 +08:00
pikachullk
3efd722808 Update jav321.py
更新预告片下载链接
2021-02-10 11:30:36 +08:00
Feng4
6505d95829 Update core.py
解决封面裁切过多。
2021-02-09 09:42:33 +08:00
Feng4
20ee3e3ba4 Update jav321.py
jav321网站已经没有了简体中文,所以一些信息无法取到,这些更新一下代码
2021-02-09 08:40:35 +08:00
Yoshiko2
c093b73940 Merge pull request #419 from szwhy/master
修正README日语条款
2021-02-08 16:30:59 +08:00
szwhy
6a4900174c Update README.md 2021-02-05 02:06:24 +08:00
Yoshiko2
123646dbd4 Update README.md 2021-02-01 07:37:59 -08:00
yoshiko2
672dcde49a Welcome message output location adjustment 2 2021-01-31 19:36:05 +08:00
yoshiko2
e6fe864073 Welcome message output location adjustment 2021-01-31 19:32:59 +08:00
yoshiko2
094d170cf2 Update check optimization 2021-01-31 15:56:28 +08:00
yoshiko2
f86ed3b180 Change version to 4.4.2 2021-01-30 03:15:17 +08:00
Yoshiko2
23923b9917 Merge pull request #412 from Feng4/patch-13
Update javlib.py
2021-01-19 02:43:20 +08:00
Feng4
3e849ddc4a Update javlib.py
解决javlib部分番号匹配不到问题
2021-01-16 18:33:27 +08:00
yoshiko2
a734725678 Remove baidu translate (For users privacy) 2021-01-14 16:44:01 +08:00
Yoshiko2
62b3097e4e Merge pull request #408 from Ercsion/master
Add parameter -p, specify the path
2021-01-12 20:50:23 +08:00
Yoshiko2
e93cedc3d4 Merge pull request #407 from PikachuLiker/master
添加基于azure 和 百度 翻译api的机器翻译
2021-01-12 20:45:19 +08:00
Ercsion
d086436c3d Add parameter -p, specify the path 2021-01-11 23:27:50 +08:00
PikachuLiker
e40623ce1a 添加基于azure api的机器翻译 2021-01-11 15:28:12 +08:00
Yoshiko2
e0117a7a16 Merge pull request #406 from Ercsion/master
Add search function of airav
2021-01-11 01:23:26 +08:00
Ercsion
3198925284 Add proxy certificate 2021-01-10 23:38:05 +08:00
Ercsion
e22a1f917a Add log module 2021-01-10 23:19:34 +08:00
Ercsion
d1c94f04f9 Add search function of airav 2021-01-10 22:53:55 +08:00
PikachuLiker
cca6ed937e 添加基于百度api的机器翻译 2021-01-10 22:44:47 +08:00
Yoshiko2
1ed1717c0d Fix trailer input bugs in function for NFO print.
Update core.py
2021-01-10 00:54:15 +08:00
Yoshiko2
f46718596e Fix site jav321 trailer capture
Update jav321.py
2021-01-10 00:52:09 +08:00
Yoshiko2
9925a03af6 Fix trailer and extrafanart capture Exception handling
Update core.py
2021-01-10 00:50:14 +08:00
Feng4
92d9786257 Update core.py
plex不能使用预告片功能,有trailer导致plex无法识别nfo文件,这里做个判断
2021-01-09 22:32:33 +08:00
Feng4
5a6204b699 Update jav321.py
解决网站视频链接有问题,导致下载失败
2021-01-09 17:28:05 +08:00
Feng4
5386388771 Update core.py 2021-01-09 17:26:20 +08:00
Yoshiko2
d54f293ee8 Merge pull request #399 from Feng4/patch-8
Update javdb.py
2021-01-05 11:07:15 +08:00
Feng4
063de7c030 Update javdb.py 2021-01-05 10:47:09 +08:00
Yoshiko2
4a99b57244 Merge pull request #398 from Feng4/patch-7
解决模式三在mac与linux上无法使用的问题
2021-01-04 21:52:27 +08:00
Feng4
2c2aafdb72 解决模式三在mac与linux上无法使用的问题 2021-01-04 20:22:12 +08:00
Yoshiko2
4ea5e9d6cb Update FreeBSD.sh 2021-01-04 15:35:03 +08:00
yoshiko2
94526fb45d Change version to 4.3.2 2021-01-03 02:11:01 +08:00
yoshiko2
34a16e8585 Change media_rule to media_type 2021-01-03 02:09:01 +08:00
yoshiko2
558b901e06 Fix func paste_file_to_folder, add_to_pic 2021-01-03 02:07:12 +08:00
yoshiko2
6e89d2f8b3 Fix func movie_lists bugs, Remove all os.getcwd() 2021-01-03 02:05:35 +08:00
yoshiko2
7955777882 Repair Linux wrapper file 2021-01-02 01:45:07 +08:00
yoshiko2
ad872b3093 Update wrapper file 2021-01-02 00:39:01 +08:00
Yoshiko2
2fd7b3e3df Update main.yml 2021-01-02 00:35:26 +08:00
yoshiko2
13844b9934 Change the way to reading movie types and sub types to config file 2021-01-02 00:31:55 +08:00
yoshiko2
7135b29abb Change the way to reading movie types and sub types to config file 2021-01-02 00:30:24 +08:00
yoshiko2
7490c27e6b Disable extrafanart, trailer, failed_move default values in config.ini 2021-01-02 00:15:40 +08:00
yoshiko2
23e12e9ab2 Change func get_html 2021-01-02 00:13:45 +08:00
yoshiko2
837bd2844b Change the way of reading image to HTTPS 2021-01-02 00:13:13 +08:00
Yoshiko2
ef7a4a25f0 Update main.yml 2021-01-01 21:35:09 +08:00
yoshiko2
050970dc04 Update version to 4.3.1 2020-12-31 15:21:02 +08:00
Yoshiko2
8264eac1a9 Merge pull request #390 from hacker94/feature-389-fix
Fix: nfo文件中未去除actor前后空格导致plex产生重复分类
2020-12-30 19:33:42 +08:00
Yoshiko2
f2a657ba42 Merge pull request #387 from Feng4/patch-6
增加一个新模式:不改变原有路径,只刮削文件
2020-12-30 19:33:14 +08:00
hacker94
ce6140ad1b strip actor_list 2020-12-30 01:41:09 +08:00
Feng4
566cd7461f Merge branch 'master' into patch-6 2020-12-29 17:35:49 +08:00
Yoshiko2
3cc66021e9 Merge pull request #386 from Feng4/master
添加新功能,修改之前的功能
2020-12-28 21:28:27 +08:00
Yoshiko2
dc1e3ed2c6 Merge pull request #385 from ddtyjmyjm/ben_dev
fix program crash and wrong folder created
2020-12-28 12:07:07 +08:00
Feng4
cbb2b2a0ca Update core.py 2020-12-27 15:16:37 +08:00
Feng4
ad133fb064 Update core.py 2020-12-27 14:34:16 +08:00
Feng4
21eb792568 Update core.py
【增加】
增加一个新模式:不改变原有路径,只刮削文件
2020-12-27 12:41:58 +08:00
Feng4
e2db08a02a Update core.py 2020-12-27 11:34:47 +08:00
Feng4
1c66f775ed Update core.py 2020-12-27 11:32:17 +08:00
Feng4
322e55847f Update core.py 2020-12-27 11:19:38 +08:00
Feng4
e47509c6da Delete test 2020-12-27 11:01:47 +08:00
Feng4
51f5d111db Add files via upload 2020-12-27 11:01:21 +08:00
Feng4
70002fa2a9 Create test 2020-12-27 11:00:38 +08:00
Feng4
2ac4f550ca Update core.py 2020-12-27 10:54:44 +08:00
Feng4
461f1ba2f5 Update core.py 2020-12-27 10:53:50 +08:00
Feng4
8f4c2b6241 Update ADC_function.py 2020-12-27 10:47:58 +08:00
Feng4
9f015a16e3 Update config.ini 2020-12-27 10:43:53 +08:00
Feng4
3742294ce9 Update core.py 2020-12-27 01:41:07 +08:00
Feng4
07bb677822 Update config.py 2020-12-27 01:15:40 +08:00
Feng4
b0646ab6da Update ADC_function.py 2020-12-27 00:31:26 +08:00
Feng4
fc15e119ca Update xcity.py 2020-12-27 00:29:25 +08:00
Feng4
4816affc38 Update mgstage.py 2020-12-27 00:28:47 +08:00
Feng4
4c9cef5ec7 Update javdb.py 2020-12-27 00:27:49 +08:00
Feng4
da5a590bde Update javbus.py 2020-12-27 00:26:04 +08:00
Feng4
83e8e8cb44 Update jav321.py 2020-12-26 23:57:34 +08:00
Feng4
b35c87919d Update fc2.py 2020-12-26 23:45:11 +08:00
Feng4
0b729c3d5e Update fanza.py 2020-12-26 23:43:50 +08:00
Feng4
9335b446f9 Update airav.py 2020-12-26 23:39:57 +08:00
Feng4
ef7d4fb5fa Merge pull request #1 from yoshiko2/master
同步代码
2020-12-26 23:36:43 +08:00
oxygenkun
62d1f37565 fix: wrong root path detected 2020-12-26 15:34:44 +08:00
oxygenkun
ad541a3756 fix: failed load config.ini in function moveFailedFolder 2020-12-26 15:07:30 +08:00
yoshiko2
4c31eee978 Update to 4.2.2, remove docker-entrypoint.sh and encapsulation.sh, use Python3_Crosss_Wrapper 2020-12-25 12:12:09 +08:00
yoshiko2
38739feee2 Update to 4.2.2, remove docker-entrypoint.sh and encapsulation.sh, use Python3_Crosss_Wrapper 2020-12-25 12:11:54 +08:00
Yoshiko2
b8d5aa2beb Merge pull request #380 from ddtyjmyjm/ben_dev
fix some bugs on softlink mode
2020-12-24 14:49:32 +08:00
benjamin
fdceb5d1b8 Merge remote-tracking branch 'origin/ben_dev' into ben_dev 2020-12-24 12:37:41 +08:00
benjamin
2fe2305226 Merge remote-tracking branch 'upstream/master' into ben_dev
# Conflicts:
#	core.py
2020-12-24 12:36:58 +08:00
benjamin
7e94b024ff fix: revert the api of moveFailedFolder 2020-12-24 12:32:06 +08:00
Yoshiko2
02a1036672 Merge pull request #382 from bigfoxtail/master
修复failed_move参数无法生效的问题
2020-12-24 08:42:21 +08:00
bigfoxtail
a49de4cb12 修复failed_move参数无法生效的问题 2020-12-23 18:47:13 +08:00
benjamin
2a00cc5a48 fix: program will crash when the update checker function is failed. 2020-12-23 14:44:35 +08:00
benjamin
7183041cbe fix: if process failed, though soft_link mode is on, it will still move the file to the failed folder. 2020-12-22 00:45:41 +08:00
benjamin
8b6c40375c fix: wrong symlink created.
Using absolute path instead of relative path to reduce (potential) problems
2020-12-21 23:50:17 +08:00
benjamin
d29ad47f7b fix: fix the bug that if fail or success folder only has empty folders, it will also remove the fail or success folder. 2020-12-21 21:41:32 +08:00
benjamin
63b76d02b5 remove redundant functions and usage: same function is in AV_Data_capture.py and used 2020-12-21 20:15:40 +08:00
benjamin
144203ad3e rename fun name 'CEF' to 'rm_empty_folder' 2020-12-21 20:10:22 +08:00
Yoshiko2
9340f57b96 change version to 4.2.1 2020-12-20 21:14:09 +08:00
Yoshiko2
1aaf05b06d Merge pull request #377 from Feng4/patch-5
解决发布日期的问题
2020-12-20 21:13:07 +08:00
Feng4
4c8665f633 解决发布日期的问题 2020-12-20 21:10:43 +08:00
Yoshiko2
a1c7d644b1 change Version to 4.1.2 2020-12-20 20:51:17 +08:00
Yoshiko2
a73df0379a Merge pull request #375 from Feng4/patch-4
欧美番号规则添加,欧美番号命名,例如xxx.20.12.20
2020-12-20 20:49:10 +08:00
Feng4
23281a4a64 Update javdb.py 2020-12-20 00:49:55 +08:00
Feng4
fc4cc4c122 增加欧美的刮削判断 2020-12-20 00:42:58 +08:00
Feng4
c94fcd47fa Update number_parser.py 2020-12-20 00:37:03 +08:00
Feng4
7af0951b82 Update number_parser.py 2020-12-20 00:34:42 +08:00
Yoshiko2
86db4b132d Create FreeBSD-amd64.sh 2020-12-19 23:31:44 +08:00
Yoshiko2
c2a9261ccb Create encapsulation.sh 2020-12-19 23:27:41 +08:00
Yoshiko2
fcd782f72d Merge pull request #373 from Feng4/patch-2
解决发布时间抓取不到的问题
2020-12-19 21:56:55 +08:00
Yoshiko2
bc85a19eda Merge pull request #374 from Feng4/patch-3
解决cover_small导致程序正常刮削问题
2020-12-19 21:56:42 +08:00
Feng4
32fb9a0d7f 解决cover_small导致程序正常刮削问题 2020-12-19 21:52:55 +08:00
Feng4
7efc96f27d 解决发布时间抓取不到的问题 2020-12-19 20:40:41 +08:00
Yoshiko2
2769f0d95b Update docker-entrypoint.sh 2020-12-14 11:11:16 +08:00
Yoshiko2
f998403e82 Update docker-entrypoint.sh 2020-12-14 10:31:33 +08:00
yoshiko2
d79f6ce009 Update to 4.1.1, add auto exit argparse function, change os._exit(0) and exit() to sys.exit() 2020-12-13 02:44:33 +08:00
yoshiko2
77b4431bde core.py access dictionary (json_data) method is changed to get() 2020-12-13 02:42:07 +08:00
Yoshiko2
3de29285b8 Merge pull request #367 from Suwmlee/master
Fix file name too long in Linux
2020-12-08 16:14:48 +08:00
Yoshiko2
ab673b3272 Merge pull request #365 from newQian/master
支持iso扩展名的文件
2020-12-08 16:14:38 +08:00
Yoshiko2
97517e1ddc Merge pull request #366 from Feng4/Feng4-patch-1
Feng4 patch 1
2020-12-08 16:14:05 +08:00
root
580d57cd75 Fix file name too long in Linux 2020-12-08 13:56:08 +08:00
Feng4
5edba21558 Update core.py 2020-12-06 20:05:05 +08:00
Feng4
b6ee20b88c Update core.py 2020-12-06 19:59:29 +08:00
Feng4
1c9e2fc822 Update config.py 2020-12-06 19:58:14 +08:00
Feng4
ecd8aebda6 Update config.ini 2020-12-06 19:53:53 +08:00
Feng4
70b111fe0b Update config.ini 2020-12-06 19:53:02 +08:00
Feng4
fea17ea7a2 Update config.py 2020-12-06 19:51:37 +08:00
Feng4
9bed887316 Update core.py 2020-12-06 19:50:46 +08:00
Feng4
9aa19463dd Update core.py 2020-12-06 19:48:36 +08:00
Feng4
9454a38ce6 Create airav.py 2020-12-06 19:45:19 +08:00
newQian
6f5bd41eb0 支持iso扩展名的文件 2020-12-06 17:31:44 +08:00
yoshiko2
799e506a5a Update 4.0.3 2020-11-30 10:44:58 +08:00
Yoshiko2
467e4d0a86 Merge pull request #361 from Mosney/patch-2
fix proxy config not work
2020-11-30 10:35:32 +08:00
Mosney
180ad74948 fix proxy config not work 2020-11-29 18:32:55 +08:00
Yoshiko2
ac547e878f Update docker-entrypoint.sh 2020-11-16 14:58:17 +08:00
Yoshiko2
fc7e31be44 Update docker-entrypoint.sh 2020-11-16 11:09:45 +08:00
Yoshiko2
12b070e0fa Update docker-entrypoint.sh 2020-11-16 11:05:43 +08:00
yoshiko2
3e62698245 Update 4.0.2 2020-11-16 10:45:11 +08:00
yoshiko2
edfa9c2435 Update 4.0.2 2020-11-16 10:24:40 +08:00
Yoshiko2
024c703124 Merge pull request #349 from 68cdrBxM8YdoJ/fix-actions
Fix actions error
2020-11-11 16:21:22 +08:00
68cdrBxM8YdoJ
2b2632a478 Fix actions error 2020-11-11 17:19:15 +09:00
Yoshiko2
9141d17975 Update docker-entrypoint.sh 2020-11-11 14:36:52 +08:00
Yoshiko2
81638eb870 Update docker-entrypoint.sh 2020-11-10 20:17:32 +08:00
Yoshiko2
4138c692e7 Update docker-entrypoint.sh 2020-11-10 20:02:40 +08:00
Yoshiko2
895a909c84 Delete cloudscraper because macos build error 2020-11-10 19:58:28 +08:00
yoshiko2
afe17a0182 Update docker-entryporint.sh 2020-11-10 19:55:27 +08:00
Yoshiko2
2cd849f556 Update docker-entrypoint.sh 2020-11-10 19:46:18 +08:00
Yoshiko2
56a1688cf5 Update docker-entrypoint.sh 2020-11-10 19:05:28 +08:00
Yoshiko2
f75f538343 Update ADC_function.py 2020-11-10 17:38:58 +08:00
Yoshiko2
ef49a304d6 Update main.yml 2020-11-05 20:01:27 +08:00
Yoshiko2
1bd510e531 Update docker-entrypoint.sh 2020-11-05 19:46:13 +08:00
Yoshiko2
30c86b292f Update AV_Data_Capture.py 2020-11-05 19:37:50 +08:00
Yoshiko2
e28440ed74 Merge pull request #346 from 68cdrBxM8YdoJ/fix-actions-deprecate-warning
Fix actions deprecate warning
2020-11-05 19:17:10 +08:00
68cdrBxM8YdoJ
1645583c4b Fix actions deprecate warning 2020-11-05 12:51:04 +09:00
68cdrBxM8YdoJ
07b0137e24 Printing version 2020-11-05 12:13:08 +09:00
Yoshiko2
d9bcecb116 Merge pull request #343 from PikachuLiker/master
添加可选的翻译功能
2020-11-03 08:22:42 +08:00
PikachuLiker
fb453763df 添加可选的翻译功能 2020-11-02 22:18:17 +08:00
Yoshiko2
c1039872b9 Update docker-entrypoint.sh 2020-10-30 20:47:14 +08:00
Yoshiko2
c9e87257a7 Update Makefile 2020-10-29 14:32:17 +08:00
Yoshiko2
efcf38b0cc Add docker-entrypoint.sh for All-platform Package 2020-10-28 21:30:22 +08:00
yoshiko2
d5c0f439a0 Add docker-entrypoint.sh for All-platform Package 3 2020-10-28 20:38:47 +08:00
yoshiko2
905e46b22d Add docker-entrypoint.sh for All-platform Package 2020-10-28 20:31:47 +08:00
yoshiko2
44a37df428 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2020-10-28 20:27:24 +08:00
yoshiko2
83e9dc235d Add docker-entrypoint.sh for All-platform Package 2020-10-28 20:26:56 +08:00
Yoshiko2
ebb236af5d Update Makefile 2020-10-27 09:07:43 +08:00
Yoshiko2
3b7a8560e3 Merge pull request #338 from VergilGao/master
在 pip install 之前增加 pyinstaller 依赖
2020-10-26 17:18:16 +08:00
羽先生
87b15b6021 在 pip install 之前增加 pyinstaller 依赖 2020-10-26 11:37:08 +08:00
Yoshiko2
b34b4f3630 Update requirements.txt 2020-10-26 07:37:04 +08:00
Yoshiko2
b5e68e9633 Merge pull request #335 from VergilGao/master
Update 3.9.2
2020-10-25 20:11:46 +08:00
羽先生
5576fbac7e Update 3.9.2 2020-10-25 16:47:37 +08:00
yoshiko2
ccde72f654 Update 3.9.2 2020-10-24 20:41:28 +08:00
Yoshiko2
f589f28f7e Update README.md 2020-10-18 02:03:50 +08:00
Yoshiko2
802db851b3 Update README.md 2020-10-18 02:02:39 +08:00
Yoshiko2
7de10ff9ab Update README.md 2020-10-09 01:43:11 +08:00
Yoshiko2
0474db416d Update README.md 2020-10-09 01:39:54 +08:00
Yoshiko2
f6dea46769 Update README.md 2020-10-09 01:37:26 +08:00
Yoshiko2
6f65a87fee Update README.md 2020-10-07 03:44:14 +08:00
yoshiko2
397d9afc45 Update 3.9.1-3 2020-10-06 19:40:01 +08:00
yohane
85986e60b5 Update 3.9.1-2 2020-10-06 19:37:59 +08:00
yohane
cb2089a3db Update 3.9.1 2020-10-06 19:33:45 +08:00
Yoshiko2
bfb96c880c Update README.md 2020-09-29 17:39:54 +08:00
yoshiko2
a2408716ea Fix some bugs: 2020-09-29 16:49:19 +08:00
Yoshiko2
d625201942 Update main.yml 2020-09-26 20:09:03 +08:00
Yoshiko2
d03e832b98 Update 3.9 2020-09-26 17:40:42 +08:00
root
36e11e7f83 Merge branch 'master' of https://github.com/yoshiko2/AV_Data_Capture 2020-09-26 16:58:29 +08:00
root
c319671c5d Translate to Simplified Chinese 2020-09-26 16:57:27 +08:00
Yoshiko2
417d64b1e8 Merge pull request #322 from Suwmlee/master
Refine location_rule with title #314
2020-09-26 16:52:54 +08:00
Mathhew
8873a4a7bd Update config.ini 2020-09-25 16:45:16 +08:00
Mathhew
cc2d44b728 Add max_title_len config
default value is 50
2020-09-25 11:49:15 +08:00
Mathhew
0f16ddc198 Fix folder name 2020-09-23 09:44:14 +08:00
Mathhew
3d7bcf927c Refine location_rule with title #314 2020-09-23 09:34:14 +08:00
yoshiko2
4d7bf88ba2 Delete Useless file 2020-09-18 08:24:51 +08:00
yoshiko2
069d2c8410 Delete linux_ake.sh 2020-09-18 08:21:22 +08:00
Yoshiko2
d84703bf2d Merge pull request #318 from Suwmlee/master
Clear blank
2020-09-18 08:16:27 +08:00
Yoshiko2
165a102c07 Merge pull request #316 from yobailover/master
Revise javbus.py & core.py
2020-09-18 08:16:00 +08:00
Mathhew
c3768a6306 Clear blank 2020-09-16 21:46:21 +08:00
yoshiko2
c153e37ccf Add Makefile 2020-09-15 09:12:18 +08:00
yobailover
3421d1ecbf 為Studio新增映射表,將常見片假名映射為英文
* 為Studio新增映射表,將常見片假名映射為英文
* 修改封面图裁剪算式
2020-09-15 00:21:03 +08:00
yobailover
5d4bc3454a 改爲使用🗾日本語Metadata,同時對監修、Studio刮削進行細微調整
改爲使用🗾日本語Metadata,同時對監修、Studio刮削進行細微調整
2020-09-15 00:05:17 +08:00
yoshiko2
a2793e2723 Change Version Name to 3.8 and fix some motherfuck error in git actions 2020-09-08 20:56:17 +08:00
yoshiko2
ee7f14ef20 Update 3.7.2 2020-09-08 20:44:55 +08:00
Yoshiko2
1745c6ac3e Update main.yml 2020-09-07 21:07:07 +08:00
Yoshiko2
0d9c70625c Update main.yml 2020-09-07 21:03:21 +08:00
Yoshiko2
f45f9e243a Update main.yml 2020-09-07 20:37:42 +08:00
Yoshiko2
524d52b714 Merge pull request #310 from SharerMax/javdb
[WebCrawler/javdb] refine
2020-09-07 20:30:53 +08:00
Yoshiko2
d0a3105da2 Update main.yml 2020-09-07 20:29:57 +08:00
Yoshiko2
3db36d0499 Update Test 2020-09-07 20:18:36 +08:00
Max Zhao
42e646e92c [WebCrawler/javdb] refine title value 2020-09-06 18:09:49 +08:00
Max Zhao
3d9c92aac5 [WebCrawler/javdb] remove actor when actor is 'N/A' 2020-09-06 17:36:17 +08:00
Max Zhao
b7e0845582 [WebCrawler/javdb] cut cover as poster when gray image exists 2020-09-06 16:57:58 +08:00
Yoshiko2
38ffc56a1a Merge pull request #303 from itswait/master
Script fix
2020-08-23 00:27:49 +08:00
Yoshiko2
41780dff64 Update README.md 2020-08-23 00:25:35 +08:00
Wait
e0677c9d24 Update linux_make.sh 2020-08-22 09:32:32 +08:00
itswait
f2f5df33d6 Script fix 2020-08-22 09:20:55 +08:00
Wait
7348645410 Update main.yml 2020-08-22 07:50:53 +08:00
Yoshiko2
df88ba7bbf Merge pull request #302 from itswait/patch-2
修正 Github Action 版本号问题
2020-08-21 16:14:00 +08:00
Wait
0ba9b25b38 Update main.yml 2020-08-21 12:39:33 +08:00
itswait
51ad58445e Some fix 2020-08-21 10:31:27 +08:00
Yoshiko2
3a019a0430 Update main.yml 2020-08-21 04:16:02 +08:00
Yoshiko2
65d32cf645 Update main.yml 2020-08-21 04:14:26 +08:00
Yoshiko2
0ea1ece26a Update main.yml 2020-08-21 03:55:17 +08:00
Yoshiko2
46aac97657 Update main.yml 2020-08-21 03:52:49 +08:00
Yoshiko2
d98266471d Update main.yml 2020-08-21 03:50:04 +08:00
Yoshiko2
760aa86d3b Update main.yml 2020-08-21 03:49:13 +08:00
Yoshiko2
d109e2e4da Update main.yml 2020-08-21 03:29:03 +08:00
Yoshiko2
febe65db3f Update main.yml 2020-08-21 03:26:30 +08:00
Yoshiko2
cd545592c8 Update main.yml 2020-08-21 03:25:18 +08:00
Yoshiko2
5b18969a5d Update main.yml 2020-08-21 03:24:38 +08:00
Yoshiko2
c5e8904ee7 Update main.yml 2020-08-21 03:21:59 +08:00
Yoshiko2
ae7900cab6 Update main.yml 2020-08-21 03:21:06 +08:00
Yoshiko2
5e17e6b251 Update main.yml 2020-08-21 03:20:10 +08:00
Yoshiko2
52bc8eca07 Update main.yml 2020-08-21 03:09:32 +08:00
Yoshiko2
e194ac641a Update main.yml 2020-08-21 03:07:27 +08:00
Yoshiko2
c3a26d9c4d Update main.yml 2020-08-21 03:05:44 +08:00
Yoshiko2
ceef83f11a Update main.yml 2020-08-21 03:03:32 +08:00
Yoshiko2
7bc2c6d7b6 Update main.yml 2020-08-21 03:02:13 +08:00
Yoshiko2
a15d8d11da Update main.yml 2020-08-21 03:01:29 +08:00
Yoshiko2
90da9f669f Update main.yml 2020-08-21 02:57:38 +08:00
Yoshiko2
043c08534e Update main.yml 2020-08-21 02:49:12 +08:00
Yoshiko2
d195d9572b Update main.yml 2020-08-21 02:44:56 +08:00
Yoshiko2
46871ad993 Update main.yml 2020-08-21 02:41:40 +08:00
Yoshiko2
96f7f00098 Update main.yml 2020-08-21 02:41:19 +08:00
Yoshiko2
a485c61ff6 Update main.yml 2020-08-21 02:33:10 +08:00
Yoshiko2
0b80d5f172 Update main.yml 2020-08-21 02:28:30 +08:00
Yoshiko2
c0f319ee48 Update main.yml 2020-08-21 02:28:03 +08:00
root
64161c43b2 Update 3.7.1 2020-08-21 02:25:29 +08:00
Yoshiko2
1fe1cb6d02 Merge pull request #300 from itswait/patch-2
修正剥削网站不存在的问题
2020-08-21 02:22:55 +08:00
Wait
25da3d16d6 修正剥削网站不存在的问题
如果从配置文件中剔除了某个网站,匹配到规则的网站无法重新排序。
2020-08-20 13:16:33 +08:00
Yoshiko2
5ca490b829 Merge pull request #299 from itswait/patch-2
修正 AVSOX 重定向到发布页的问题
2020-08-19 03:46:54 +08:00
Wait
e7ad2b085a 修正 AVSOX 重定向到发布页的问题
AVSOX 硬编码的地址有时候会 302 跳转到发布页,不如直接从发布页获取最新地址。
2020-08-19 03:07:56 +08:00
Yoshiko2
b6dd790d19 Update README.md 2020-08-16 02:02:29 +08:00
Yoshiko2
71da3535bd Update README.md 2020-08-16 02:01:36 +08:00
Yoshiko2
5e3963a978 Update main.yml 2020-08-15 20:04:57 +08:00
Yoshiko2
ad7efbd90f Update main.yml 2020-08-15 20:02:54 +08:00
Yoshiko2
bfa4f010e5 Update main.yml 2020-08-15 19:15:20 +08:00
Yoshiko2
0635fa803e Update main.yml 2020-08-15 19:06:38 +08:00
Yoshiko2
8cfbc4db6c Update main.yml 2020-08-15 19:05:09 +08:00
Yoshiko2
4ccb571266 Update main.yml 2020-08-15 19:00:25 +08:00
Yoshiko2
bc8ab75c7c Update main.yml 2020-08-15 18:58:40 +08:00
Yoshiko2
ffd312a679 Update main.yml 2020-08-15 18:51:08 +08:00
Yoshiko2
7a1ccd3ec6 Update main.yml 2020-08-15 18:43:17 +08:00
Yoshiko2
c808566b9a Update main.yml 2020-08-15 17:19:21 +08:00
root
a9c03a44a8 Update 3.7-finaly2 2020-08-15 17:16:30 +08:00
root
43b0bf7c34 Update 3.7-finaly 2020-08-15 17:14:38 +08:00
root
e687035722 Update 3.7-5 DEBUG ONLY 2020-08-14 17:00:31 +08:00
Yoshiko2
c5a68715ea Add files via upload 2020-08-12 18:34:13 +08:00
root
f6da5db276 Update 3.7-2 2020-08-12 18:27:58 +08:00
root
72a9790858 Update Pre-release 3.7 2020-08-12 18:24:46 +08:00
Yoshiko2
e7a7e17e52 Update README.md 2020-08-11 01:59:58 +08:00
Yoshiko2
c9d2d14e8b Merge pull request #290 from SharerMax/issue288
add `type` section in default setting (fix #288)
2020-08-07 02:07:27 +08:00
Yoshiko2
1cd609d200 Merge pull request #289 from SharerMax/socks5h
add socks5h proxy support
2020-08-04 14:49:35 +08:00
Max Zhao
696b2632e9 add type section in default setting (fix #288) 2020-08-04 00:59:01 +08:00
Max Zhao
362539bbda add socks5h proxy support 2020-08-03 22:57:24 +08:00
Yoshiko2
6d87b05fee Update README.md 2020-07-29 16:17:37 +08:00
Yoshiko2
d85531eda5 Merge pull request #286 from jnozsc/fix_fanza
Fix fanza
2020-07-27 03:48:00 +08:00
Yoshiko2
b0accadc02 Update 3.6 Beta 2020-07-27 03:38:44 +08:00
jnozsc
5f20f61fc5 one more fix on #261 2020-07-25 14:44:37 -07:00
jnozsc
3b79cf31dc fix #251 2020-07-25 14:27:24 -07:00
jnozsc
97182f9249 fix #261 2020-07-25 14:20:01 -07:00
jnozsc
121c20864b import cloudscraper only when necessary 2020-07-25 14:14:56 -07:00
jnozsc
aca32b3cdc fix fanza 2020-07-25 14:13:49 -07:00
Yoshiko2
efd1152d56 Update 3.5.2 2020-07-18 02:36:06 +08:00
Yoshiko2
fc4eaad44a Update update_check.json 2020-07-17 15:07:30 +08:00
Yoshiko2
82315472f6 Upate 3.5.1 2020-07-17 15:06:42 +08:00
Yoshiko2
df3a959852 Update main.yml 2020-06-30 21:14:23 +08:00
Yoshiko2
f70e30d129 Update main.yml 2020-06-30 20:57:56 +08:00
Yoshiko2
1562cf8ad5 Update main.yml 2020-06-30 20:53:46 +08:00
Yoshiko2
a78dde9dd9 Update main.yml 2020-06-30 20:52:05 +08:00
Yoshiko2
c044ef087d Update main.yml 2020-06-30 20:51:17 +08:00
Yoshiko2
2065a80496 Update main.yml 2020-06-30 20:35:37 +08:00
Yoshiko2
2d0d58fc79 Update main.yml 2020-06-30 20:15:11 +08:00
Yoshiko2
7a4c84ad13 Update main.yml 2020-06-30 20:11:56 +08:00
Yoshiko2
98a311c53a Update main.yml 2020-06-30 20:08:57 +08:00
Yoshiko2
416d2bea3f Update README.md 2020-06-30 13:58:22 +08:00
Yoshiko2
a3ab6d2878 Update README.md 2020-06-30 13:56:10 +08:00
Yoshiko2
d40843e373 Update README.md 2020-06-29 19:38:28 +08:00
Yoshiko2
3acdeea94b Update README.md 2020-06-29 19:37:42 +08:00
Yoshiko2
8ddc675955 Update README.md 2020-06-29 19:37:05 +08:00
Yoshiko2
0d435d5568 Update 3.5 2020-06-22 15:59:24 +08:00
Yoshiko2
67353cde87 Merge pull request #267 from marcushsu/patch-2
fix 'series' error
2020-06-22 15:56:43 +08:00
Marcus Hsu
a782074f98 fix 'series' error 2020-06-22 15:48:17 +08:00
Yoshiko2
83476c7bc0 Update README.md 2020-06-22 00:38:14 +08:00
Yoshiko2
62bb06e207 Update main.yml 2020-06-22 00:21:28 +08:00
Yoshiko2
2a08c9a014 Update 3.5 2020-06-22 00:12:27 +08:00
Yoshiko2
2c6169b340 Update 3.5 2020-06-21 23:53:08 +08:00
Yoshiko2
b016113fc1 Merge pull request #266 from Suwmlee/master
Add socks proxy support
2020-06-21 20:27:04 +08:00
Mathhew
43d0250614 Add socks proxy support 2020-06-19 15:51:09 +08:00
Yoshiko2
562f030c3c Update 3.4.3 2020-06-14 22:54:52 +08:00
Yoshiko2
107a83e62a Merge pull request #263 from biaji/revert-257-patch-2
Revert "修正 fc2 数据无法获取的bug"
2020-06-14 20:38:27 +08:00
biaji
6fec1601c9 Revert "修正 fc2 数据无法获取的bug" 2020-06-14 09:46:13 +08:00
Yoshiko2
22cdf8e637 Update update_check.json 2020-06-11 17:28:55 +08:00
Yoshiko2
2c794d6d79 Update AV_Data_Capture.py 2020-06-11 17:28:34 +08:00
Yoshiko2
4a10512fbb Add files via upload 2020-06-11 16:48:12 +08:00
Yoshiko2
a73a583abe Merge pull request #257 from biaji/patch-2
修正 fc2 数据无法获取的bug
2020-06-11 08:19:17 +08:00
biaji
b4398985fd 修正 fc2 数据无法获取的bug
fc2club的url中包含FC2
2020-06-06 21:51:25 +08:00
Yoshiko2
98bc660604 Update update_check.json 2020-05-25 18:10:45 +08:00
Yoshiko2
ac825fa50f Merge pull request #248 from houfukude/master
修复更新 API 超限时闪退问题
2020-05-25 18:10:21 +08:00
houfukude
d9c00d3d26 fix check_update failed when API rate limit 2020-05-23 23:02:22 +08:00
Yoshiko2
4089c0f80a Update README.md 2020-05-07 19:46:27 +08:00
Yoshiko2
3851f397c2 Update README.md 2020-05-07 17:50:15 +08:00
Yoshiko2
07c5fb19f4 Merge pull request #236 from 68cdrBxM8YdoJ/change-pyinstaller-add-data
Change to use dynamically fetched paths
2020-05-06 22:49:30 +08:00
68cdrBxM8YdoJ
85cfcee4bb Change to use dynamically fetched paths 2020-05-05 18:17:48 +09:00
Yoshiko2
2ca1bafc41 Update update_check.json 2020-05-04 16:52:17 +08:00
Yoshiko2
1d786abe33 Update 3.4 2020-05-04 16:42:20 +08:00
Yoshiko2
1869c8fc44 Merge pull request #234 from SayNothingToday/master
增加简单的docker支持
2020-05-04 16:41:11 +08:00
Yoshiko2
747152be5d Merge pull request #232 from 68cdrBxM8YdoJ/fix-location-rule
Change handling location_rule
2020-05-04 16:40:42 +08:00
SayNothingToday
a73ab7d306 Update README.md 2020-05-04 12:28:36 +08:00
SayNothingToday
e8b2f6b8b7 Create docker-compose.yaml 2020-05-04 12:20:45 +08:00
SayNothingToday
d6a2464381 Create config.ini 2020-05-04 12:17:31 +08:00
SayNothingToday
0aba387807 Create Dockerfile 2020-05-04 12:16:55 +08:00
68cdrBxM8YdoJ
390bf32e43 Change handling location_rule 2020-05-01 13:20:34 +09:00
Yoshiko2
6b0f25a67e Merge pull request #226 from 68cdrBxM8YdoJ/fix-number-parsing
Fix get_number func
2020-04-29 22:16:46 +08:00
68cdrBxM8YdoJ
ed45c70e58 Fix get_number func
fixes #225
2020-04-25 15:02:48 +09:00
Yoshiko2
42591d8cda Merge pull request #220 from 68cdrBxM8YdoJ/add-support-javlib
Add support javlib
2020-04-25 01:30:16 +08:00
Yoshiko2
131702a3fa Merge pull request #221 from 68cdrBxM8YdoJ/change-check-update-func
Change check_update func
2020-04-25 01:30:03 +08:00
68cdrBxM8YdoJ
22f25ed331 Change check_update func
When using a pre-release, the previous release
version always appears as the new update.
2020-04-23 17:57:05 +09:00
68cdrBxM8YdoJ
b65581de47 Add support javlib 2020-04-23 11:11:29 +09:00
Yoshiko2
274891fc21 Update 3.3 2020-04-22 02:18:51 +08:00
Yoshiko2
ce2b6fe852 Update 3.3 2020-04-22 02:18:24 +08:00
Yoshiko2
820bb7673e Merge pull request #216 from jnozsc/fix_fanza_title
return raw title for fanza
2020-04-22 02:17:49 +08:00
Yoshiko2
57ff9cdb26 Merge pull request #218 from 68cdrBxM8YdoJ/fix-single-file-input
Fix single movie input
2020-04-22 02:17:26 +08:00
68cdrBxM8YdoJ
96cc434884 Fix single movie input 2020-04-19 19:42:19 +09:00
jnozsc
37e035d19e return raw title for fanza 2020-04-18 19:34:28 -07:00
Yoshiko2
7be685702b Merge pull request #214 from 68cdrBxM8YdoJ/remove-part-from-title-tag
Change generated nfo filename to match movie filename
2020-04-18 20:56:46 +08:00
68cdrBxM8YdoJ
da502a3d6d Change generated nfo filename to match movie filename 2020-04-18 21:05:05 +09:00
68cdrBxM8YdoJ
aba24882bf Remove part from title tag
fixes #212
2020-04-18 20:54:49 +09:00
Yoshiko2
d65e75dc0c Merge pull request #213 from 68cdrBxM8YdoJ/dev-config-manager
Create config.py
2020-04-18 18:45:09 +08:00
68cdrBxM8YdoJ
8332d65783 Create config.py 2020-04-18 14:53:42 +09:00
Yoshiko2
1f4b7e6633 Update 3.2 2020-04-15 15:15:24 +08:00
Yoshiko2
92e631ff66 Update README.md 2020-04-14 00:57:51 +08:00
Yoshiko2
698053dbdf Update README.md 2020-04-14 00:57:09 +08:00
Yoshiko2
b81c7254ab Update AV_Data_Capture.py 2020-04-13 17:54:32 +08:00
Yoshiko2
8714548d22 Update update_check.json 2020-04-13 17:49:08 +08:00
Yoshiko2
e8f046c431 Update update_check.json 2020-04-13 17:42:50 +08:00
Yoshiko
69cf3560b1 Merge pull request #208 from 68cdrBxM8YdoJ/bundle-config-to-artifact
Bundle config.ini to artifact
2020-04-13 02:05:33 +08:00
Yoshiko
5bea15c5f3 Fix "------"
fix #206
2020-04-13 02:05:13 +08:00
68cdrBxM8YdoJ
c749d185a8 Bundle config.ini to artifact 2020-04-12 11:17:38 +09:00
jnozsc
65f0536ff1 fix #206 2020-04-11 15:59:49 -07:00
Yoshiko
e24f3e0abf Update main.yml 2020-04-11 18:25:46 +08:00
Yoshiko
3c5a1534b3 Update README.md 2020-04-11 17:45:23 +08:00
Yoshiko
1792c01ad8 Update README.md 2020-04-11 17:44:07 +08:00
Yoshiko
952b010362 Update README.md 2020-04-11 17:41:50 +08:00
Yoshiko
f9b1f715f5 Update README.md 2020-04-11 17:38:58 +08:00
Yoshiko
251dbd47dc Update README.md 2020-04-11 17:33:59 +08:00
Yoshiko
ac2c3c2af7 Update main.yml 2020-04-11 15:27:30 +08:00
Yoshiko
642412a8ec Update README.md 2020-04-11 03:35:17 +08:00
Yoshiko
d514985640 Update README.md 2020-04-11 03:34:12 +08:00
Yoshiko
2f855795e5 Update README.md 2020-04-11 03:26:40 +08:00
Yoshiko
7823a1e0ce Update README.md 2020-04-11 03:24:13 +08:00
Yoshiko
73b582951c Update main.yml 2020-04-11 03:08:41 +08:00
Yoshiko
2da383ebda Update 3.1 2020-04-11 03:06:09 +08:00
Yoshiko
bd58c3de71 Merge pull request #204 from 68cdrBxM8YdoJ/fix-checkout-branch
Fix checkout branch
2020-04-11 02:19:16 +08:00
68cdrBxM8YdoJ
3b97e550a9 Fix checkout branch 2020-04-10 11:46:42 +09:00
Yoshiko
6122790eea Merge pull request #201 from rel000/patch-2
fix: path error
2020-04-10 00:58:46 +08:00
Yoshiko
eb105844f2 Merge pull request #200 from 68cdrBxM8YdoJ/master
Add github actions to build executable
2020-04-10 00:57:36 +08:00
rel000
a67ab8534e fix: path error 2020-04-08 01:23:27 +08:00
68cdrBxM8YdoJ
a5c09e95f8 Add github actions to build 2020-04-07 21:31:25 +09:00
Yoshiko
b3b18c04e8 Merge pull request #199 from 68cdrBxM8YdoJ/modify-successful-determination
Changed to include numbers in valid data determinate
2020-04-07 15:37:44 +08:00
Yoshiko
bb518d5e3e Merge pull request #198 from 68cdrBxM8YdoJ/fix-jav321-not-load-correctly
Add jav321 to source list
2020-04-07 15:37:15 +08:00
68cdrBxM8YdoJ
a0eba2272f Changed to include numbers in valid data determinate 2020-04-06 22:20:51 +09:00
68cdrBxM8YdoJ
30e7ee563b Add jav321 to source list 2020-04-06 21:32:19 +09:00
Yoshiko
67d99964af Update README.md 2020-04-05 22:53:25 +08:00
Yoshiko
06f86ed7c4 Update update_check.json 2020-04-05 16:42:49 +08:00
Yoshiko
d2d7d367bc Create update_check.json 2020-04-05 16:42:36 +08:00
Yoshiko
dd803fffbc Merge pull request #190 from 68cdrBxM8YdoJ/fix-update-check
Use github's release api for version checking
2020-04-05 16:41:50 +08:00
68cdrBxM8YdoJ
8499f7ac58 Use github release api for version checking 2020-04-03 19:11:43 +09:00
Yoshiko
04ab051c67 Merge pull request #193 from zuozishi/master
fc2-ppv番号识别问题
2020-04-03 17:50:47 +08:00
Zuozishi
f931ffda97 vscode debug 2020-04-03 17:33:00 +08:00
Zuozishi
c3e37d8374 fix fc2-ppv 2020-04-03 17:32:52 +08:00
Yoshiko
203a410ecf Merge pull request #189 from J3n5en/patch-2
fix: Fix JSON format error
2020-04-02 22:18:09 +08:00
Yoshiko
09b8271cbf Update README.md 2020-04-02 02:00:51 +08:00
Yoshiko
9398d22ab3 Update Readme 2020-04-02 02:00:17 +08:00
Yoshiko
e58c09d445 Update README.md 2020-04-02 01:42:13 +08:00
J3n5en
787cb34f40 fix: Fix JSON format error 2020-04-01 06:58:01 -05:00
yoshiko2
809183777d Del 2020-04-01 02:37:13 +08:00
yoshiko2
77bc4b0c2e Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2020-04-01 02:35:47 +08:00
yoshiko2
23ab8d516b Del 2020-04-01 02:35:01 +08:00
Yoshiko
31894c0f85 Delete clone.sh 2020-04-01 02:33:30 +08:00
Yoshiko
9b806b49ba Delete pull_movie_from_output 2020-04-01 02:33:19 +08:00
Yoshiko
d27e013224 Delete ssni-229.mmp4 2020-04-01 02:33:04 +08:00
yoshiko2
c110165373 Update 3.0 2020-04-01 02:30:00 +08:00
yoshiko2
961f25bb9a Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2020-04-01 01:50:29 +08:00
yoshiko2
67a908a054 test 2020-04-01 01:47:37 +08:00
Yoshiko
60a87b4f50 Update README.md 2020-03-31 18:38:03 +08:00
Yoshiko
7c68a33d89 Merge pull request #181 from mtxcab/patch-4
Speed up by skipping scan in escape folders
2020-03-29 00:53:48 +08:00
mtxcab
ea95bed85a Speed up by skipping scan in escape folders
Try again using older Python APIs. Let me know the detailed error if this doesn't work.
2020-03-27 17:20:55 -07:00
Yoshiko
ccee4ebb6b Update 2.9 2020-03-27 21:33:44 +08:00
Yoshiko
1f57bd2b23 Update 2.9 2020-03-27 21:12:17 +08:00
Yoshiko
4bc59a48ea Merge pull request #180 from jnozsc/normalize_line_ending
try again, normalize working tree line endings in git
2020-03-27 18:57:18 +08:00
jnozsc
f4463b5bca try again, normalize working tree line endings in git 2020-03-26 12:58:09 -07:00
yoshiko2
2c701a7e78 Update 2.9 2020-03-27 02:37:04 +08:00
yoshiko2
a65e907c2c Update 2.9 2020-03-27 02:35:26 +08:00
yoshiko2
8536c6ddfb Merge branch 'master' of https://github.com/yoshiko2/av_data_capture 2020-03-27 02:32:30 +08:00
yoshiko2
bc76c4f88f Update 2.9 2020-03-27 02:30:04 +08:00
Yoshiko
a0d2097072 Merge pull request #179 from yoshiko2/revert-175-patch-3
Revert "Speed up by avoiding scan in escape folders"
2020-03-27 02:19:06 +08:00
Yoshiko
7ff71fe880 Revert "Speed up by avoiding scan in escape folders" 2020-03-27 02:18:00 +08:00
Yoshiko
ffe99bf500 Merge pull request #176 from pastay/patch-2
Update core.py
2020-03-27 02:02:35 +08:00
Yoshiko
16cd10e0d7 Merge pull request #175 from mtxcab/patch-3
Speed up by avoiding scan in escape folders
2020-03-27 02:02:16 +08:00
Yoshiko
e6d5c20afb Merge pull request #171 from 68cdrBxM8YdoJ/add-support-jav321
Add support jav321.com
2020-03-27 02:01:41 +08:00
Yoshiko
9c396e1f7a Merge pull request #169 from mtxcab/patch-2
修正javdb有时拿不到cover的问题
2020-03-27 02:01:30 +08:00
Yoshiko
8469a721ab Merge pull request #177 from yoshiko2/revert-168-normalize_EOL
Revert " normalize working tree line endings in Git"
2020-03-27 02:01:10 +08:00
Yoshiko
54d6310426 Revert " normalize working tree line endings in Git" 2020-03-27 02:00:22 +08:00
Yoshiko
b701d7b6af Merge pull request #168 from jnozsc/normalize_EOL
normalize working tree line endings in Git
2020-03-27 01:58:18 +08:00
pastay
dad8b443f6 Update core.py 2020-03-26 17:44:08 +08:00
mtxcab
3a267ecc87 Speed up by avoiding scan in escape folders 2020-03-25 18:49:27 -07:00
68cdrBxM8YdoJ
63f64fcaea Add support jav321.com 2020-03-25 19:30:59 +09:00
mtxcab
8848d10d00 修正javdb有时拿不到cover的问题
class="column is-three-fifths column-video-cover"
2020-03-24 21:39:04 -07:00
jnozsc
08be49c998 Introduce end-of-line normalization 2020-03-24 15:59:01 -07:00
Yoshiko
8d1b1eb84d Update 2.8.3 2020-03-24 12:51:26 +08:00
Yoshiko
0acffd15e3 Update 2.8.3 2020-03-24 11:55:24 +08:00
Yoshiko
cfbe778229 Update 2.8.3 2020-03-24 11:09:36 +08:00
yoshiko2
5441254b05 Update 2.8.3 2020-03-24 11:08:28 +08:00
yoshiko2
1116d6deaa Update 2.8.3 2020-03-24 11:07:02 +08:00
yoshiko2
9e8d96f5c0 Update 2.8.3 2020-03-24 10:58:53 +08:00
yoshiko2
2ebc74fd8d Update 2.8.3 2020-03-24 10:53:58 +08:00
Yoshiko
645b30cf38 Merge pull request #166 from yoshiko2/revert-163-master
Revert "Add support jav321"
2020-03-23 21:49:04 +08:00
Yoshiko
73a0610730 Revert "Add support jav321" 2020-03-23 21:48:32 +08:00
Yoshiko
32a19bb989 Merge pull request #163 from 68cdrBxM8YdoJ/master
Add support jav321
2020-03-21 21:44:40 +08:00
68cdrBxM8YdoJ
9a530f4e46 Add support jav321 2020-03-20 14:04:00 +09:00
Yoshiko
780e47ffba Merge pull request #158 from oweaF/master
Create ruquirments.txt
2020-03-16 15:37:13 +08:00
oweaF
b31f27de97 Create ruquirments.txt 2020-03-13 11:54:50 +08:00
Yoshiko
0f720acd8a Update 2.8.2 2020-03-13 01:19:07 +08:00
Yoshiko
09cf8206a9 Update README.md 2020-03-13 01:07:01 +08:00
Yoshiko
9120937398 Merge pull request #155 from jnozsc/eol_lf
use LF instead of CR+LF for all python files
2020-03-13 00:57:01 +08:00
Yoshiko
7a66695eea Update 2.8 2020-03-13 00:54:02 +08:00
jnozsc
8d60cdbb30 use LF instead of CR+LF for all python files 2020-03-09 14:50:36 -07:00
Yoshiko
c22863ece4 Update 2.8 2020-03-08 20:38:25 +08:00
Yoshiko
73cdd797c5 Merge pull request #152 from halo9999/fix-matching-number
Fix matching chinese or japanese as number
2020-03-08 20:27:45 +08:00
Yoshiko
7eec310929 Merge pull request #151 from jnozsc/version_2.7
bump version to 2.7
2020-03-08 20:27:13 +08:00
Yoshiko
aeebfc753b Merge pull request #148 from jnozsc/fix_fanza_getCover
tweak fanza getCover()
2020-03-08 20:26:45 +08:00
Yoshiko
aafb493a17 Merge pull request #147 from halo9999/bug-fix
use failed_output_folder in config instead of hard-coding
2020-03-08 20:24:21 +08:00
halo9999
9d87d9769d Fix matching chinese or japanese as number 2020-03-07 20:15:30 +09:00
jnozsc
8b36cfb35c bump version to 2.7 2020-03-05 09:38:02 -08:00
jnozsc
3b85ebfa51 tweak fanza getCover() 2020-03-04 15:25:33 -08:00
halo9999
f415b4664a use failed_output_folder in config instead of hard-coding 2020-03-05 03:12:40 +09:00
Yoshiko
bedd76bc60 Update 2.7 2020-03-05 01:40:50 +08:00
Yoshiko
88075d7dd8 Merge pull request #134 from vicnoah/master
修复webm文件后缀匹配不上的问题
2020-02-25 21:37:49 +08:00
vicnoah
4b59f94e75 Fix the suffix matching problem of WebM file 2020-02-22 14:58:06 +08:00
Yoshiko
6b4e501180 Update README.md 2020-02-18 22:59:27 +08:00
Yoshiko
6a1af89596 Merge pull request #124 from jnozsc/fix_issue_119
fix #119
2020-02-18 15:11:38 +08:00
Yoshiko
a9fb890639 Merge pull request #130 from jnozsc/refactor-search-logic
refactor search logic
2020-02-18 15:11:24 +08:00
Yoshiko
57cdd79003 Merge pull request #129 from jnozsc/fix_javdb
fix javdb issue when it returns multiple results
2020-02-18 15:11:17 +08:00
Yoshiko
a989382888 Merge pull request #128 from jnozsc/fix_fc2fans_club
add a try catch logic for fc2fans_club.py
2020-02-18 15:11:08 +08:00
Yoshiko
4ca2d957a3 Merge pull request #126 from jnozsc/rewrite_fanza
rewrite fanza.py
2020-02-18 15:10:59 +08:00
Yoshiko
706d920d65 Merge pull request #123 from jnozsc/remove_core2
remove core2.py
2020-02-18 15:10:41 +08:00
jnozsc
a4c8bcf2b4 refactor search logic 2020-02-17 22:09:10 -08:00
jnozsc
06de0232a1 typo 2020-02-17 21:51:09 -08:00
jnozsc
8dc9be12cc fix javdb issue when it returns multiple results 2020-02-17 21:50:34 -08:00
jnozsc
53fe85e607 add a try catch logic for fc2fans_club.py 2020-02-17 21:45:13 -08:00
jnozsc
5f46f3f25d rewrite fanza.py 2020-02-17 10:47:11 -08:00
jnozsc
229066fa99 fix #119 2020-02-17 09:24:38 -08:00
jnozsc
6b6c884c47 remove core2.py 2020-02-17 09:16:39 -08:00
Yoshiko
690557f878 Update README.md 2020-02-18 00:18:00 +08:00
Yoshiko
7d59456597 2.7 Update (future)
2.7 Update (future)
2020-02-17 16:00:02 +08:00
Yoshiko
fb9c8201f5 2.7 Update (future)
2.7 Update (future)
2020-02-17 15:59:49 +08:00
Yoshiko
d647ddfe07 Merge pull request #122 from jnozsc/fix-fanza-hinban-issue
2.7 Update (future)
2020-02-17 15:59:36 +08:00
jnozsc
dabe1f2da6 add edge case 2020-02-16 15:00:29 -08:00
jnozsc
8dda8da2b3 fix fanza hinban issue 2020-02-16 14:45:40 -08:00
jnozsc
064d8a8349 refactor config proxy 2020-02-16 13:20:10 -08:00
jnozsc
847e79c6a0 typo 2020-02-16 11:17:41 -08:00
jnozsc
b628e11811 fix typo 2020-02-16 11:11:51 -08:00
jnozsc
440577b943 change code block 2020-02-16 11:06:09 -08:00
jnozsc
65741bc5cb add 2 space 2020-02-16 11:05:20 -08:00
jnozsc
fd0c18a220 fix docs 2020-02-16 11:03:48 -08:00
jnozsc
b5de3942ae fix more doc 2020-02-16 10:56:23 -08:00
jnozsc
d6d6fc5a95 fix more typo 2020-02-16 10:48:25 -08:00
jnozsc
8436a3871c fix more doc 2020-02-16 10:45:46 -08:00
jnozsc
202673dd32 fix more links 2020-02-16 10:38:19 -08:00
jnozsc
5fdda13320 fix markdown format and content 2020-02-16 10:33:57 -08:00
Yoshiko
e0d2058fa0 Update 2.6 2020-02-14 18:46:11 +08:00
Yoshiko
dd3d394d58 Merge pull request #103 from vicnoah/master
add WebM support
2020-02-12 17:19:15 +08:00
Yoshiko
6e5831d7d6 Update README.md 2020-02-09 17:48:34 +08:00
Yoshiko
ea54a149a8 Update README.md 2020-02-07 19:50:49 +08:00
Yoshiko
9e7c798cd1 Update README.md 2020-02-07 19:50:01 +08:00
Yoshiko
b4e2530b6f Update README.md 2020-02-07 17:53:52 +08:00
Yoshiko
6b5af440f1 Update README.md 2020-02-07 17:53:22 +08:00
Yoshiko
c362e7a4d7 Update README.md 2020-02-07 17:53:02 +08:00
vicnoah
068dc86480 WebM support 2020-02-05 21:26:15 +08:00
Yoshiko
87db5b5426 Update 2.5 2020-02-04 01:10:10 +08:00
Yoshiko
9b6cd74caa Update 2.5 2020-02-04 01:09:12 +08:00
Yoshiko
6cbfd2ab0e Update 2.5 2020-02-04 01:04:03 +08:00
Yoshiko
a46391c6b2 Update 2.5 2020-02-04 01:02:51 +08:00
Yoshiko
2c2867e3c6 Update README.md 2020-02-01 15:59:23 +08:00
Yoshiko
9102c28247 Update update_check.json 2020-02-01 03:32:37 +08:00
Yoshiko
05d1ac50c1 Update 2.4 2020-01-31 18:01:01 +08:00
Yoshiko
7d5da45567 Update 2.4 2020-01-31 17:47:40 +08:00
Yoshiko
03a4669e48 Merge pull request #85 from moyy996/master
Update 2.4 ,By moyy996
2020-01-29 18:44:48 +08:00
mo_yy
123b2f4cfc 2.3-修复out of range 的bug 2020-01-29 14:29:13 +08:00
mo_yy
cab36be6a2 2.3-添加排除目录 2020-01-29 14:28:30 +08:00
mo_yy
4660b1cdf2 2.3-改所有全局变量为传参 2020-01-29 14:27:37 +08:00
mo_yy
8190da3d3e 排除指定目录 2020-01-29 13:24:45 +08:00
mo_yy
5105051f53 排除指定目录 2020-01-29 13:21:42 +08:00
Yoshiko
de1c9231ad Update 2.3 2020-01-27 12:58:57 +08:00
Yoshiko
fe72afc1cf Update 2.3 2020-01-27 01:28:59 +08:00
Yoshiko
70f1b16b3c Update 2.2 2020-01-22 02:19:40 +08:00
Yoshiko
0b5d68732f Merge pull request #77 from moyy996/master
Update 2.2
2020-01-21 14:20:13 +08:00
mo_yy
241ff00a0b 2.1
修复javdb抓取bug
2020-01-21 01:20:40 +08:00
Yoshiko
4131b00965 Update 2.1 2020-01-19 17:59:52 +08:00
Yoshiko
6f6ed31b31 Update 2.1 2020-01-19 17:30:35 +08:00
Yoshiko
09006865b9 Update README.md 2020-01-19 02:24:51 +08:00
Yoshiko
c3519a2db2 Update README.md 2020-01-19 02:21:02 +08:00
Yoshiko
b21d47c55c Update 2.0 2020-01-19 02:16:14 +08:00
mo_yy
c073a1e253 删除标题中番号,使用小封面,修复‘演员、片商’获取 2020-01-10 18:29:38 +08:00
mo_yy
3545705283 Update javdb.py 2020-01-10 18:25:35 +08:00
mo_yy
9c71e5976a 删除标题中番号,使用小封面,修复‘演员、片商’获取 2020-01-10 14:09:35 +08:00
mo_yy
e25da134da Update core.py
替换release中字符‘/’,修复无法抓取‘LUXU-xxxx’。
2020-01-10 13:29:09 +08:00
mo_yy
b990c974b6 Update javdb.py
解决无法获取演员及番号不匹配
2020-01-10 13:17:58 +08:00
Yoshiko
54db38511d Update config.ini 2020-01-02 11:13:21 +08:00
Yoshiko
23c72b3f07 Update 1.9 2019-12-19 09:39:15 +08:00
Yoshiko
98faa64260 Update 1.9 2019-12-17 10:56:03 +08:00
Yoshiko
7868bd0b77 Update 1.8 2019-12-17 10:52:40 +08:00
Yoshiko
33db5f6f8e Update AV_Data_Capture.py 2019-12-15 22:16:38 +08:00
Yoshiko
86e95589a9 Update 1.8 2019-12-15 20:32:23 +08:00
Yoshiko
3b84877a53 Update 1.8 2019-12-15 20:30:20 +08:00
Yoshiko
ec9957c75c Update config.ini 2019-12-14 13:35:12 +08:00
Yoshiko
f434cbb16d Update config.ini 2019-12-14 13:33:35 +08:00
Yoshiko
697d9a9fd0 Update README.md 2019-12-14 13:20:49 +08:00
Yoshiko
899a0b18eb Update config.ini 2019-12-11 15:07:37 +08:00
Yoshiko
7ccc718b4f Update 1.7 2019-11-24 20:35:36 +08:00
Yoshiko
5e34602836 Update update_check.json 2019-11-12 23:18:57 +08:00
Yoshiko
588c161fef Update README.md 2019-11-10 16:59:00 +08:00
Yoshiko
e34c54b8a6 Update 1.5 2019-11-09 02:36:15 +08:00
Yoshiko
11a9ab6b51 Delete girl.py 2019-11-07 14:52:46 +08:00
Yoshiko
a8255adbb8 Update README.md 2019-11-07 14:51:30 +08:00
Yoshiko
ab992a3d89 Update 1.5 2019-11-06 22:59:17 +08:00
Yoshiko
d84767c609 Update 1.5 2019-11-06 21:08:57 +08:00
Yoshiko
fe12a05f8f Update README.md 2019-11-05 21:08:02 +08:00
Yoshiko
018c9b7773 Update README.md 2019-11-05 19:18:35 +08:00
Yoshiko
851e9720ae Update README.md 2019-11-05 13:07:06 +08:00
Yoshiko
1f5bc5f029 Update README.md 2019-11-04 13:48:54 +08:00
Yoshiko
e76d3b74c7 Update README.md 2019-11-04 13:48:06 +08:00
Yoshiko
f0e23ebc73 Update update_check.json 2019-11-04 13:34:44 +08:00
Yoshiko
71d44bf90c Update 1.4 2019-11-04 13:34:07 +08:00
Yoshiko
7beeee29e4 Delete readme5.png 2019-11-04 08:18:09 +08:00
Yoshiko
8eb31f94b5 Delete 1.jpg 2019-11-04 08:00:49 +08:00
Yoshiko
5fe2874892 Merge pull request #53 from ninjadogz/master
修正封面读取失败,增加路径escape字符,powershell预处理
2019-11-01 21:24:32 +08:00
Yoshiko
460c6ee868 Merge branch 'master' into master 2019-11-01 21:24:15 +08:00
ninjadogz
9663f7b473 core.py同期
core.py同期
2019-10-31 22:15:43 +09:00
ninjadogz
fb666feec5 更新重名skip
更新重名skip
2019-10-27 17:35:24 +09:00
Yoshiko
b980538946 Update README.md 2019-10-27 16:29:38 +08:00
Yoshiko
3323400937 Update LICENSE 2019-10-27 16:28:51 +08:00
Yoshiko
2f29fde426 Update README.md 2019-10-27 16:26:05 +08:00
Yoshiko
bc5c08a54c Update README.md 2019-10-27 16:21:33 +08:00
Yoshiko
a5b0931bce Update README.md 2019-10-27 16:19:56 +08:00
ninjadogz
5a6361897a 预处理更新
①视频源文件夹内文件移动到AV_Data_Capture文件夹后处理。
②config.ini内增加[movie][path]设置
2019-10-27 00:09:41 +09:00
Yoshiko
e7090682c8 Merge pull request #52 from biaji/patch-1
修正某些情况下网站数据导致的错误
2019-10-26 17:39:58 +08:00
Yoshiko
724f3a0d32 Merge pull request #51 from lhiqwj173/master
增加kodi选项,完善kodi图片显示与分集处理
2019-10-26 17:37:51 +08:00
biaji
3c6741ec82 Update core.py 2019-10-25 23:07:16 +08:00
biaji
a0260d96cb Update core.py 2019-10-25 22:59:39 +08:00
ninjadogz
bab350e8dd Create moveVideos.ps1
文件移动
2019-10-25 00:02:18 +09:00
ninjadogz
72fafef059 更新
更新内容
1.从DB读取的path里有escape文字时自动替换成空。config.ini里追加escape文字列
2.封面下载成功,但无法读取时使用dummy封面分类
3.生产文件夹时去除中文文字列
2019-10-22 20:08:28 +09:00
lhiqwj173
021830b31b 增加kodi选项,完善kodi图片显示,完善kodi分集处理 2019-10-19 23:50:13 +08:00
lhiqwj173
e082cf336e 增加kodi选项,完善kodi图片显示,完善kodi分集处理 2019-10-19 23:49:52 +08:00
lhiqwj173
3b9ab41dcc 增加kodi选项,完善kodi图片显示,完善kodi分集处理 2019-10-19 23:43:18 +08:00
lhiqwj173
f285a1ab6b 增加kodi选项,完善kodi分集处理 2019-10-19 20:05:25 +08:00
Yoshiko
9fb813ae97 Update README.md 2019-10-19 13:16:22 +08:00
Yoshiko
29259b7e5a Update README.md 2019-10-12 22:17:28 +08:00
Yoshiko
952f759e90 Update README.md 2019-10-12 22:04:25 +08:00
Yoshiko
e8f823f34c Update README.md 2019-10-07 11:07:21 +08:00
Yoshiko
52e9d407be Update README.md 2019-10-07 00:13:09 +08:00
Yoshiko
62bb672a83 Update README.md 2019-10-07 00:08:56 +08:00
Yoshiko
1eef0fa48d Update README.md 2019-10-07 00:08:04 +08:00
Yoshiko
63bc397599 Update README.md 2019-10-07 00:07:17 +08:00
Yoshiko
f99def64bb Update README.md 2019-10-06 01:24:11 +08:00
Yoshiko
94c4838b42 Update update_check.json 2019-10-06 01:15:05 +08:00
Yoshiko
73c0126fb8 Update 1.3 2019-10-06 01:14:12 +08:00
Yoshiko
ae99c652f5 Update README.md 2019-09-23 09:27:29 +08:00
Yoshiko
2b9ce63601 Version 1.2 Update 2019-08-29 22:21:59 +08:00
Yoshiko
6928df8c3f Version 1.2 Update 2019-08-29 22:20:54 +08:00
Yoshiko
8ccdf7dc5a Merge pull request #36 from moyy996/master
AV_Data_Capture-1.1(支持分集)
2019-08-19 22:04:22 +08:00
mo_yy
b438312c97 Update core.py 2019-08-19 12:23:45 +08:00
mo_yy
fd05706636 Update core.py
支持分集
2019-08-19 10:18:32 +08:00
mo_yy
1e407ef962 Update core.py 2019-08-19 00:50:43 +08:00
Yoshiko
9898932f09 Update update_check.json 2019-08-18 22:40:37 +08:00
Yoshiko
c4fc22054b Update 1.1 2019-08-18 22:40:11 +08:00
Yoshiko
449e900837 Update README.md 2019-08-18 14:52:02 +08:00
Yoshiko
e3ebbec947 Update README.md 2019-08-14 21:56:25 +08:00
Yoshiko
65a9521ab1 Update README.md 2019-08-14 21:55:51 +08:00
Yoshiko
b79a600c0d Update README.md 2019-08-14 19:29:18 +08:00
Yoshiko
30d33fe8f7 Update README.md 2019-08-14 11:50:42 +08:00
Yoshiko
b325fc1f01 Update README.md 2019-08-14 11:49:00 +08:00
Yoshiko
954fb02c0c Update README.md 2019-08-14 00:28:39 +08:00
Yoshiko
5ee398d6b5 Update Beta 11.9 2019-08-12 01:23:57 +08:00
Yoshiko
b754c11814 Update update_check.json 2019-08-12 01:21:46 +08:00
Yoshiko
5d19ae594d Update Beta 11.9 2019-08-12 01:21:34 +08:00
Yoshiko
bfa8ed3144 Update README.md 2019-08-11 00:41:01 +08:00
Yoshiko
0ec23aaa38 Update README.md 2019-08-11 00:39:31 +08:00
Yoshiko
878ae46d77 Update README.md 2019-08-11 00:31:24 +08:00
Yoshiko
766e6bbd88 Update update_check.json 2019-08-11 00:29:15 +08:00
Yoshiko
0107c7d624 Update README.md 2019-08-11 00:28:25 +08:00
Yoshiko
d0cf2d2193 Update README.md 2019-08-11 00:27:36 +08:00
Yoshiko
d1403af548 Update README.md 2019-08-10 22:28:35 +08:00
Yoshiko
bc20b09f60 Update Beta 11.8 2019-08-10 21:45:36 +08:00
Yoshiko
8e2c0c3686 Version fallback to Beta 11.6 2019-08-09 00:32:57 +08:00
Yoshiko
446e1bf7d0 Version fallback to Beta 11.6 2019-08-09 00:32:04 +08:00
Yoshiko
54437236f0 Update Beta 11.7 2019-08-07 00:19:22 +08:00
Yoshiko
9ed57a8ae9 Update README.md 2019-08-07 00:15:33 +08:00
Yoshiko
c66a53ade1 Update Beta 11.7 2019-08-06 16:46:21 +08:00
Yoshiko
7aec4c4b84 Update update_check.json 2019-08-06 16:37:16 +08:00
Yoshiko
cfb3511360 Update Beta 11.7 2019-08-06 16:36:45 +08:00
Yoshiko
2adcfacf27 Merge pull request #26 from RRRRRm/master
Fix the path error under Linux and specify Python3 as the runtime.
2019-08-05 22:52:57 +08:00
RRRRRm
09dc684ff6 Fix some bugs. 2019-08-05 20:39:41 +08:00
RRRRRm
1bc924a6ac Update README.md 2019-08-05 15:57:46 +08:00
RRRRRm
00db4741bc Calling core.py asynchronously. Allow to specify input and output paths. 2019-08-05 15:48:44 +08:00
RRRRRm
1086447369 Fix the path error under Linux. Specify Python3 as the runtime. 2019-08-05 03:00:35 +08:00
Yoshiko
642c8103c7 Update README.md 2019-07-24 08:51:40 +08:00
Yoshiko
b053ae614c Update README.md 2019-07-23 21:18:22 +08:00
Yoshiko
b7583afc9b Merge pull request #20 from biaji/master
Add encoding info to source
2019-07-21 10:28:03 +08:00
biAji
731b08f843 Add encoding info to source
According to PEP-263, add encoding info to source code
2019-07-18 09:22:28 +08:00
Yoshiko
64f235aaff Update README.md 2019-07-15 12:41:14 +08:00
Yoshiko
f0d5a2a45d Update 11.6 2019-07-14 15:07:04 +08:00
Yoshiko
01521fe390 Update 11.6 2019-07-14 10:06:49 +08:00
Yoshiko
a33b882592 Update update_check.json 2019-07-14 09:59:56 +08:00
Yoshiko
150b81453c Update 11.6 2019-07-14 09:58:46 +08:00
Yoshiko
a6df479b78 Update 11.6 2019-07-14 09:45:53 +08:00
Yoshiko
dd6445b2ba Update 11.6 2019-07-14 09:38:26 +08:00
Yoshiko
41051a915b Update README.md 2019-07-12 18:13:09 +08:00
Yoshiko
32ce390939 Update README.md 2019-07-12 18:08:45 +08:00
Yoshiko
8deec6a6c0 Update README.md 2019-07-12 18:08:20 +08:00
Yoshiko
0fab70ff3d Update README.md 2019-07-12 18:07:23 +08:00
Yoshiko
53bbb99a64 Update README.md 2019-07-12 17:59:46 +08:00
Yoshiko
0e712de805 Update README.md 2019-07-11 10:43:55 +08:00
Yoshiko
6f74254e96 Update README.md 2019-07-11 00:58:16 +08:00
Yoshiko
4220bd708b Update README.md 2019-07-11 00:49:23 +08:00
Yoshiko
3802d88972 Update README.md 2019-07-11 00:46:22 +08:00
Yoshiko
8cddbf1e1b Update README.md 2019-07-11 00:41:40 +08:00
Yoshiko
332326e5f6 Update README.md 2019-07-09 18:52:36 +08:00
Yoshiko
27f64a81d0 Update README.md 2019-07-09 17:57:09 +08:00
Yoshiko
7e3fa5ade8 Update README.md 2019-07-09 17:56:48 +08:00
Yoshiko
cc362a2a26 Beta 11.5 Update 2019-07-09 17:47:43 +08:00
Yoshiko
dde6167b05 Update update_check.json 2019-07-09 17:47:02 +08:00
Yoshiko
fe69f42f92 Update README.md 2019-07-09 17:11:09 +08:00
Yoshiko
6b050cef43 Update README.md 2019-07-09 17:09:32 +08:00
Yoshiko
c721c3c769 Update README.md 2019-07-09 16:51:06 +08:00
Yoshiko
9f8702ca12 Update README.md 2019-07-09 16:50:35 +08:00
Yoshiko
153b3a35b8 Update README.md 2019-07-09 15:58:44 +08:00
Yoshiko
88e543a16f Update README.md 2019-07-09 13:51:52 +08:00
Yoshiko
5906af6d95 Update README.md 2019-07-09 13:43:09 +08:00
Yoshiko
39953f1870 Update README.md 2019-07-09 13:17:41 +08:00
Yoshiko
047618a0df Update README.md 2019-07-09 01:55:43 +08:00
Yoshiko
2da51a51d0 Update README.md 2019-07-09 01:45:35 +08:00
Yoshiko
8c0e0a296d Update README.md 2019-07-09 01:45:05 +08:00
Yoshiko
ce0ac607c2 Update README.md 2019-07-04 14:41:29 +08:00
Yoshiko
f0437cf6af Delete py to exe.bat 2019-07-04 03:01:18 +08:00
Yoshiko
32bfc57eed Update README.md 2019-07-04 02:58:48 +08:00
Yoshiko
909ca96915 Update README.md 2019-07-04 02:57:21 +08:00
Yoshiko
341ab5b2bf Update README.md 2019-07-04 02:55:09 +08:00
Yoshiko
d899a19419 Update README.md 2019-07-04 02:54:24 +08:00
Yoshiko
61b0bc40de Update README.md 2019-07-04 02:42:31 +08:00
Yoshiko
6fde3f98dd Delete proxy.ini 2019-07-04 02:26:42 +08:00
Yoshiko
838eb9c8db Update config.ini 2019-07-04 02:26:23 +08:00
Yoshiko
687bbfce10 Update update_check.json 2019-07-04 02:26:00 +08:00
Yoshiko
4b35113932 Beta 11.4 Update 2019-07-04 02:25:40 +08:00
Yoshiko
d672d4d0d7 Update README.md 2019-07-04 02:23:57 +08:00
Yoshiko
1d3845bb91 Update README.md 2019-07-04 02:22:06 +08:00
wenead99
e5effca854 Update README.md 2019-06-30 18:25:54 +08:00
wenead99
bae82898da Update README.md 2019-06-30 02:04:22 +08:00
wenead99
2e8e7151e3 Update README.md 2019-06-30 02:01:17 +08:00
wenead99
8db74bc34d Update README.md 2019-06-30 01:00:50 +08:00
wenead99
e18392d7d3 Update README.md 2019-06-30 00:58:08 +08:00
wenead99
e4e32c06df Update README.md 2019-06-30 00:54:56 +08:00
wenead99
09802c5632 Update README.md 2019-06-30 00:52:43 +08:00
wenead99
584db78fd0 Update README.md 2019-06-30 00:44:46 +08:00
wenead99
56a41604cb Update AV_Data_Capture.py 2019-06-29 19:03:27 +08:00
wenead99
8228084a1d Update README.md 2019-06-29 18:58:39 +08:00
wenead99
f16def5f3a Update update_check.json 2019-06-29 18:49:30 +08:00
wenead99
c0303a57a1 Beta 11.2 Update 2019-06-29 18:43:45 +08:00
wenead99
07c8a7fb0e Update README.md 2019-06-29 17:02:03 +08:00
wenead99
71691e1fe9 Beta 11.1 Update 2019-06-29 16:19:58 +08:00
wenead99
e2569e4541 Add files via upload 2019-06-29 10:37:29 +08:00
wenead99
51385491de Add files via upload 2019-06-29 10:34:40 +08:00
wenead99
bb049714cf Update update_check.json 2019-06-29 10:30:41 +08:00
wenead99
5dcaa20a6c Update README.md 2019-06-28 23:29:38 +08:00
wenead99
26652bf2ed Update README.md 2019-06-24 15:12:22 +08:00
wenead99
352d2fa28a Update README.md 2019-06-24 15:09:48 +08:00
wenead99
ff5ac0d599 Update README.md 2019-06-24 15:08:32 +08:00
wenead99
f34888d2e7 Update README.md 2019-06-23 14:27:39 +08:00
wenead99
f609e647b5 Update README.md 2019-06-23 14:26:27 +08:00
wenead99
ffc280a01c Update README.md 2019-06-23 14:24:13 +08:00
wenead99
fee0ae95b3 Update README.md 2019-06-23 11:18:26 +08:00
wenead99
cd7e254d2e Update README.md 2019-06-23 11:11:32 +08:00
wenead99
ce2995123d Update README.md 2019-06-23 01:08:27 +08:00
wenead99
46e676b592 Update README.md 2019-06-23 01:08:06 +08:00
wenead99
a435d645e4 Update README.md 2019-06-23 01:00:57 +08:00
wenead99
76eecd1e6f Update README.md 2019-06-23 01:00:33 +08:00
wenead99
3c296db204 Update README.md 2019-06-23 00:57:01 +08:00
wenead99
7d6408fe29 Update README.md 2019-06-23 00:56:44 +08:00
wenead99
337c84fd1c Update README.md 2019-06-23 00:55:02 +08:00
wenead99
ad220c1ca6 Update README.md 2019-06-23 00:54:48 +08:00
wenead99
37df711cdc Update README.md 2019-06-23 00:54:28 +08:00
wenead99
92dd9cb734 Update README.md 2019-06-23 00:51:40 +08:00
wenead99
64445b5105 Update README.md 2019-06-23 00:46:11 +08:00
wenead99
bfdb094ee3 Update README.md 2019-06-23 00:35:35 +08:00
wenead99
b38942a326 Update README.md 2019-06-23 00:34:55 +08:00
wenead99
7d03a1f7f9 Update README.md 2019-06-23 00:34:12 +08:00
wenead99
f9c0df7e06 Update README.md 2019-06-23 00:32:30 +08:00
wenead99
b1783d8c75 Update AV_Data_Capture.py 2019-06-22 19:22:23 +08:00
wenead99
908da6d006 Add files via upload 2019-06-22 19:20:54 +08:00
wenead99
9ec99143d4 Update update_check.json 2019-06-22 16:16:45 +08:00
wenead99
575a710ef8 Beta 10.6更新 2019-06-22 16:16:18 +08:00
wenead99
7c16307643 Update README.md 2019-06-22 16:11:07 +08:00
wenead99
e816529260 Update README.md 2019-06-22 16:10:40 +08:00
wenead99
8282e59a39 Update README.md 2019-06-22 16:08:20 +08:00
wenead99
a96bdb8d13 Update README.md 2019-06-22 16:05:29 +08:00
wenead99
f7f1c3e871 Update README.md 2019-06-22 16:05:01 +08:00
wenead99
632250083f Update README.md 2019-06-22 16:04:18 +08:00
wenead99
0ebfe43133 Update README.md 2019-06-22 16:03:03 +08:00
wenead99
bb367fe79e Update README.md 2019-06-22 15:56:56 +08:00
wenead99
3a4d405c8e Update README.md 2019-06-22 15:53:30 +08:00
wenead99
8f8adcddbb Update README.md 2019-06-22 15:52:06 +08:00
wenead99
394c831b05 Update README.md 2019-06-22 15:47:53 +08:00
wenead99
bb8b3a3bc3 Update update_check.json 2019-06-22 13:19:10 +08:00
wenead99
6c5c932b98 修改Ini文件导致的目录名无效BUG 2019-06-22 13:16:37 +08:00
wenead99
9a151a5d4c Update README.md 2019-06-22 01:44:28 +08:00
wenead99
f24595687b Beta 10.5 更新 2019-06-22 01:29:42 +08:00
wenead99
aa130d2d25 Update README.md 2019-06-22 01:18:44 +08:00
wenead99
bccc49508e Update README.md 2019-06-22 01:12:33 +08:00
wenead99
ad6db7ca97 Update README.md 2019-06-22 01:05:15 +08:00
wenead99
b95d35d6fa Update README.md 2019-06-22 01:04:38 +08:00
wenead99
3bf0cf5fbc Update README.md 2019-06-22 00:58:28 +08:00
wenead99
dbdc0c818d Update README.md 2019-06-22 00:57:45 +08:00
wenead99
e156c34e23 Update README.md 2019-06-22 00:55:46 +08:00
wenead99
ee782e3794 Update README.md 2019-06-22 00:55:01 +08:00
wenead99
90aa77a23a Update AV_Data_Capture.py 2019-06-22 00:46:43 +08:00
wenead99
d4251c8b44 Beta 10.5更新 2019-06-22 00:46:06 +08:00
wenead99
6f684e67e2 Beta 0.15 更新 2019-06-22 00:34:36 +08:00
wenead99
18cf202b5b Update README.md 2019-06-21 23:59:15 +08:00
wenead99
54b2b71472 Update README.md 2019-06-21 23:58:12 +08:00
wenead99
44ba47bafc Update README.md 2019-06-21 23:55:23 +08:00
wenead99
7eb72634d8 Update README.md 2019-06-21 20:07:44 +08:00
wenead99
5787d3470a Update README.md 2019-06-21 20:05:53 +08:00
wenead99
1fce045ac2 Update README.md 2019-06-21 20:05:09 +08:00
wenead99
794aa74782 Update README.md 2019-06-21 20:03:07 +08:00
wenead99
b2e49a99a7 Update README.md 2019-06-21 20:01:58 +08:00
wenead99
d208d53375 Update README.md 2019-06-21 20:00:15 +08:00
wenead99
7158378eca Update README.md 2019-06-21 19:59:55 +08:00
wenead99
0961d8cbe4 Update README.md 2019-06-21 19:59:41 +08:00
wenead99
6ef5d11742 Update README.md 2019-06-21 19:57:03 +08:00
wenead99
45e1d8370c Beta 10.4 更新 2019-06-21 18:27:21 +08:00
wenead99
420f995977 Update README.md 2019-06-21 18:26:25 +08:00
wenead99
dbe1f91bd9 Update README.md 2019-06-21 18:23:59 +08:00
wenead99
770c5fcb1f Update update_check.json 2019-06-21 17:54:41 +08:00
wenead99
665d1ffe43 Beta 10.4 2019-06-21 15:40:02 +08:00
wenead99
14ed221152 Update README.md 2019-06-21 10:53:34 +08:00
wenead99
c41b9c1e32 Update README.md 2019-06-21 10:16:14 +08:00
wenead99
17d4d68cbe Update README.md 2019-06-21 10:00:25 +08:00
wenead99
b5a23fe430 Beta 10.3 Update 2019.6.20 2019-06-21 00:03:43 +08:00
wenead99
2747be4a21 Update README.md 2019-06-20 20:49:40 +08:00
wenead99
02da503a2f Update update_check.json 2019-06-20 19:13:38 +08:00
wenead99
31c5d5c314 Update update_check.json 2019-06-20 19:10:28 +08:00
wenead99
22e5b9aa44 Update update_check.json 2019-06-20 19:07:42 +08:00
wenead99
400e8c9678 Update update_check.json 2019-06-20 19:03:24 +08:00
wenead99
b06e744c0c Beta 0.10.3更新检测 2019-06-19 20:53:10 +08:00
wenead99
ddbfe7765b Beta 10.3更新检测 2019-06-19 20:50:44 +08:00
wenead99
c0f47fb712 Update README.md 2019-06-19 18:22:31 +08:00
wenead99
7b0e8bf5f7 Beta 10.2 Update 2019-06-19 18:21:19 +08:00
wenead99
fa8ea58fe6 Beta 10.2 Update 2019-06-19 18:20:30 +08:00
wenead99
8c824e5d29 Beta 10.2 Update 2019-06-19 18:20:02 +08:00
wenead99
764fba74ec Beta 10.2 Update 2019-06-19 18:19:34 +08:00
wenead99
36c436772c Update README.md 2019-06-19 13:43:04 +08:00
wenead99
897a621adc Update README.md 2019-06-19 13:42:19 +08:00
wenead99
1f5802cdb4 Update README.md 2019-06-19 13:41:05 +08:00
wenead99
0a57e2bab6 Update README.md 2019-06-19 11:03:44 +08:00
wenead99
3ddfe94f2b Update README.md 2019-06-19 11:02:31 +08:00
wenead99
c6fd5ac565 Update README.md 2019-06-19 00:05:01 +08:00
wenead99
2a7cdcf12d Update README.md 2019-06-18 23:56:34 +08:00
wenead99
759e546534 Beta 10.1 修复FC2元数据提取异常 2019-06-18 18:11:04 +08:00
wenead99
222337a5f0 修改FC2提取异常 2019-06-18 18:02:01 +08:00
wenead99
9fb6122a9d Update AV_Data_Capture.py 2019-06-18 16:58:32 +08:00
wenead99
9f0c01d62e Update README.md 2019-06-18 16:57:39 +08:00
wenead99
6ed79d8fcb Update README.md 2019-06-18 16:56:22 +08:00
wenead99
abb53c3219 Update README.md 2019-06-18 16:55:43 +08:00
wenead99
6578d807ca Update README.md 2019-06-18 16:55:10 +08:00
wenead99
e9acd32fd7 Update README.md 2019-06-18 16:54:49 +08:00
wenead99
0c64165b49 Update README.md 2019-06-18 16:53:45 +08:00
wenead99
6278659e55 Update README.md 2019-06-18 16:53:11 +08:00
wenead99
ca2c97a98f Update README.md 2019-06-17 23:45:00 +08:00
wenead99
164cc464dc Update README.md 2019-06-17 23:40:17 +08:00
wenead99
faa99507ad Update README.md 2019-06-17 19:11:54 +08:00
wenead99
d7a48d2829 Update README.md 2019-06-17 19:11:35 +08:00
wenead99
c40936f1c4 Update README.md 2019-06-17 19:10:22 +08:00
wenead99
38b26d4161 Update README.md 2019-06-17 19:09:55 +08:00
wenead99
e17dffba4e Update README.md 2019-06-17 18:34:26 +08:00
wenead99
ae1a91bf28 Update README.md 2019-06-17 18:31:46 +08:00
wenead99
208c24b606 Update README.md 2019-06-17 18:31:10 +08:00
wenead99
751450ebad Update README.md 2019-06-17 18:30:46 +08:00
wenead99
e429ca3c7d Update README.md 2019-06-17 18:29:31 +08:00
wenead99
9e26558666 Update README.md 2019-06-17 18:26:11 +08:00
wenead99
759b30ec5c Update README.md 2019-06-17 18:24:20 +08:00
wenead99
b7c195b76e Update README.md 2019-06-17 18:17:37 +08:00
wenead99
7038fcf8ed Update README.md 2019-06-17 18:12:38 +08:00
wenead99
54041313dc Add files via upload 2019-06-17 18:04:04 +08:00
wenead99
47a29f6628 Update README.md 2019-06-17 18:03:14 +08:00
wenead99
839610d230 Update README.md 2019-06-17 16:53:03 +08:00
wenead99
a0b324c1a8 Update README.md 2019-06-17 16:52:23 +08:00
wenead99
1996807702 Add files via upload 2019-06-17 16:28:07 +08:00
wenead99
e91b7a85bf 0.10 Beta10 Update 2019-06-17 16:14:17 +08:00
wenead99
dddaf5c74f Update README.md 2019-06-16 17:08:58 +08:00
wenead99
2a3935b221 Update README.md 2019-06-16 17:07:36 +08:00
wenead99
a5becea6c9 Update README.md 2019-06-16 15:39:06 +08:00
wenead99
1381b66619 Update README.md 2019-06-16 12:40:21 +08:00
wenead99
eb946d948f Update 0.9 2019-06-15 20:40:13 +08:00
wenead99
46087ba886 Update README.md 2019-06-11 19:10:57 +08:00
wenead99
f8764d1b81 Update README.md 2019-06-11 19:10:01 +08:00
wenead99
b9095452da Update README.md 2019-06-11 19:09:34 +08:00
wenead99
be8d23e782 Update README.md 2019-06-11 19:08:45 +08:00
77 changed files with 42144 additions and 1061 deletions

1
.gitattributes vendored Normal file
View File

@@ -0,0 +1 @@
*.py text=auto eol=lf

3
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1,3 @@
# These are supported funding model platforms
custom: ['https://i.postimg.cc/qBmD1v9p/donate.png']

View File

@@ -0,0 +1,37 @@
---
name: Bug report BUG报告
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug 错误描述**
Describe clearly and concisely what the error is
清晰简洁地描述错误是什么
**To Reproduce 如何重现BUG**
Steps to reproduce the behavior:
重现行为的步骤:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior 预期结果 **
A clear and concise description of what you expected to happen.
对您期望发生的事情进行清晰简洁的描述
**Screenshots BUG发生截图**
**Logs 日志**
Copy all or key content of the file to this
复制文件的全部或关键内容到此
Logs location:
Windows : C:/Users/username/.mlogs/
Linux/MacOS/BSD: /home/username/.mlogs/
**Running Env 运行环境**
- OS :
- Python Version : 3.x

View File

@@ -0,0 +1,16 @@
---
name: Feature request 新功能建议
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
For the new functions you need, you can contact me through the contact information below for a paid solution. I will respond as soon as possible and deliver it in the next version.
对于需要的新功能,可通过下方联系方式联系我有偿解决(支持众筹),我会尽快响应,会在下一个版本交付
# 联系方式 Contact
* Email : yoshiko2.dev@gmail.com
* Telegram : https://t.me/yoshiko2

103
.github/workflows/main.yml vendored Normal file
View File

@@ -0,0 +1,103 @@
name: PyInstaller
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [windows-latest, macos-latest, ubuntu-latest]
steps:
- uses: actions/checkout@v2
- name: Install UPX
uses: crazy-max/ghaction-upx@v2
if: matrix.os == 'windows-latest' || matrix.os == 'ubuntu-latest'
with:
install-only: true
- name: UPX version
if: matrix.os == 'windows-latest' || matrix.os == 'ubuntu-latest'
run: upx --version
- name: Setup Python 3.10
uses: actions/setup-python@v2
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install face_recognition --no-deps
pip install pyinstaller
- name: Test number_perser.get_number
run: |
python number_parser.py -v
- name: Build with PyInstaller for macos/ubuntu
if: matrix.os == 'macos-latest' || matrix.os == 'ubuntu-latest'
run: |
pyinstaller \
--onefile Movie_Data_Capture.py \
--python-option u \
--hidden-import "ImageProcessing.cnn" \
--add-data "$(python -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1):cloudscraper" \
--add-data "$(python -c 'import opencc as _; print(_.__path__[0])' | tail -n 1):opencc" \
--add-data "$(python -c 'import face_recognition_models as _; print(_.__path__[0])' | tail -n 1):face_recognition_models" \
--add-data "Img:Img" \
--add-data "scrapinglib:scrapinglib" \
--add-data "config.ini:." \
- name: Build with PyInstaller for windows
if: matrix.os == 'windows-latest'
run: |
pyinstaller `
--onefile Movie_Data_Capture.py `
--python-option u `
--hidden-import "ImageProcessing.cnn" `
--add-data "$(python -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1);cloudscraper" `
--add-data "$(python -c 'import opencc as _; print(_.__path__[0])' | tail -n 1);opencc" `
--add-data "$(python -c 'import face_recognition_models as _; print(_.__path__[0])' | tail -n 1);face_recognition_models" `
--add-data "Img;Img" `
--add-data "scrapinglib;scrapinglib" `
--add-data "config.ini;." `
- name: Copy config.ini
run: |
cp config.ini dist/
- name: Set VERSION variable for macos/ubuntu
if: matrix.os == 'macos-latest' || matrix.os == 'ubuntu-latest'
run: |
echo "VERSION=$(python Movie_Data_Capture.py --version)" >> $GITHUB_ENV
- name: Set VERSION variable for windows
if: matrix.os == 'windows-latest'
run: |
echo "VERSION=$(python Movie_Data_Capture.py --version)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
- name: Upload build artifact
uses: actions/upload-artifact@v1
with:
name: MDC-${{ env.VERSION }}-${{ runner.os }}-amd64
path: dist
- name: Run test (Ubuntu & MacOS)
if: matrix.os == 'ubuntu-latest' || matrix.os == 'macos-latest'
run: |
cd dist
touch IPX-292.mp4
touch STAR-437-C.mp4
touch 122922_001.mp4
./Movie_Data_Capture

10
.gitignore vendored
View File

@@ -102,3 +102,13 @@ venv.bak/
# mypy
.mypy_cache/
# movie files
*.mp4
# success/failed folder
JAV_output/**/*
failed/*
.vscode/launch.json
.idea

33
.vscode/launch.json vendored Normal file
View File

@@ -0,0 +1,33 @@
{
// 使用 IntelliSense 了解相关属性。
// 悬停以查看现有属性的描述。
// 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: 当前文件",
"type": "python",
"request": "launch",
"console": "integratedTerminal",
"env": {
"PYTHONIOENCODING": "utf-8"
},
"program": "${workspaceFolder}/Movie_Data_capture.py",
"program1": "${workspaceFolder}/WebCrawler/javbus.py",
"program2": "${workspaceFolder}/WebCrawler/javdb.py",
"program3": "${workspaceFolder}/WebCrawler/xcity.py",
"program4": "${workspaceFolder}/number_parser.py",
"program5": "${workspaceFolder}/config.py",
"cwd0": "${fileDirname}",
"cwd1": "${workspaceFolder}/dist",
"cwd2": "${env:HOME}${env:USERPROFILE}/.mdc",
"args0": ["-a","-p","J:/Downloads","-o","J:/log"],
"args1": ["-g","-m","3","-c","1","-d","0"],
"args2": ["-igd0", "-m3", "-p", "J:/output", "-q", "121220_001"],
"args3": ["-agd0","-m3", "-q", ".*","-p","J:/#output"],
"args4": ["-gic1", "-d0", "-m3", "-o", "avlog", "-p", "I:/output"],
"args5": ["-gic1", "-d0", "-m1", "-o", "avlog", "-p", "J:/Downloads"],
"args6": ["-z", "-o", "J:/log"]
}
]
}

View File

@@ -1,10 +1,602 @@
import requests
# build-in lib
import os.path
import os
import re
import uuid
import json
import time
import typing
from unicodedata import category
from concurrent.futures import ThreadPoolExecutor
def get_html(url):#网页请求核心
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}
getweb = requests.get(str(url),timeout=5,headers=headers)
getweb.encoding='utf-8'
# third party lib
import requests
from requests.adapters import HTTPAdapter
import mechanicalsoup
from pathlib import Path
from urllib3.util.retry import Retry
from lxml import etree
from cloudscraper import create_scraper
# project wide
import config
def get_xpath_single(html_code: str, xpath):
html = etree.fromstring(html_code, etree.HTMLParser())
result1 = str(html.xpath(xpath)).strip(" ['']")
return result1
G_USER_AGENT = r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.133 Safari/537.36'
def get_html(url, cookies: dict = None, ua: str = None, return_type: str = None, encoding: str = None, json_headers=None):
"""
网页请求核心函数
"""
verify = config.getInstance().cacert_file()
config_proxy = config.getInstance().proxy()
errors = ""
headers = {"User-Agent": ua or G_USER_AGENT} # noqa
if json_headers is not None:
headers.update(json_headers)
for i in range(config_proxy.retry):
try:
if config_proxy.enable:
proxies = config_proxy.proxies()
result = requests.get(str(url), headers=headers, timeout=config_proxy.timeout, proxies=proxies,
verify=verify,
cookies=cookies)
else:
result = requests.get(str(url), headers=headers, timeout=config_proxy.timeout, cookies=cookies)
if return_type == "object":
return result
elif return_type == "content":
return result.content
else:
result.encoding = encoding or result.apparent_encoding
return result.text
except Exception as e:
print("[-]Connect retry {}/{}".format(i + 1, config_proxy.retry))
errors = str(e)
if "getaddrinfo failed" in errors:
print("[-]Connect Failed! Please Check your proxy config")
debug = config.getInstance().debug()
if debug:
print("[-]" + errors)
else:
print("[-]" + errors)
print('[-]Connect Failed! Please check your Proxy or Network!')
raise Exception('Connect Failed')
def post_html(url: str, query: dict, headers: dict = None) -> requests.Response:
config_proxy = config.getInstance().proxy()
errors = ""
headers_ua = {"User-Agent": G_USER_AGENT}
if headers is None:
headers = headers_ua
else:
headers.update(headers_ua)
for i in range(config_proxy.retry):
try:
if config_proxy.enable:
proxies = config_proxy.proxies()
result = requests.post(url, data=query, proxies=proxies, headers=headers, timeout=config_proxy.timeout)
else:
result = requests.post(url, data=query, headers=headers, timeout=config_proxy.timeout)
return result
except Exception as e:
print("[-]Connect retry {}/{}".format(i + 1, config_proxy.retry))
errors = str(e)
print("[-]Connect Failed! Please check your Proxy or Network!")
print("[-]" + errors)
G_DEFAULT_TIMEOUT = 10 # seconds
class TimeoutHTTPAdapter(HTTPAdapter):
def __init__(self, *args, **kwargs):
self.timeout = G_DEFAULT_TIMEOUT
if "timeout" in kwargs:
self.timeout = kwargs["timeout"]
del kwargs["timeout"]
super().__init__(*args, **kwargs)
def send(self, request, **kwargs):
timeout = kwargs.get("timeout")
if timeout is None:
kwargs["timeout"] = self.timeout
return super().send(request, **kwargs)
# with keep-alive feature
def get_html_session(url: str = None, cookies: dict = None, ua: str = None, return_type: str = None,
encoding: str = None):
config_proxy = config.getInstance().proxy()
session = requests.Session()
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(session.cookies, cookies)
retries = Retry(total=config_proxy.retry, connect=config_proxy.retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
session.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
session.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
if config_proxy.enable:
session.verify = config.getInstance().cacert_file()
session.proxies = config_proxy.proxies()
headers = {"User-Agent": ua or G_USER_AGENT}
session.headers = headers
try:
return getweb.text
if isinstance(url, str) and len(url):
result = session.get(str(url))
else: # 空url参数直接返回可重用session对象无需设置return_type
return session
if not result.ok:
return None
if return_type == "object":
return result
elif return_type == "content":
return result.content
elif return_type == "session":
return result, session
else:
result.encoding = encoding or "utf-8"
return result.text
except requests.exceptions.ProxyError:
print("[-]get_html_session() Proxy error! Please check your Proxy")
except requests.exceptions.RequestException:
pass
except Exception as e:
print(f"[-]get_html_session() failed. {e}")
return None
def get_html_by_browser(url: str = None, cookies: dict = None, ua: str = None, return_type: str = None,
encoding: str = None, use_scraper: bool = False):
config_proxy = config.getInstance().proxy()
s = create_scraper(browser={'custom': ua or G_USER_AGENT, }) if use_scraper else requests.Session()
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(s.cookies, cookies)
retries = Retry(total=config_proxy.retry, connect=config_proxy.retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
s.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
s.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
if config_proxy.enable:
s.verify = config.getInstance().cacert_file()
s.proxies = config_proxy.proxies()
try:
browser = mechanicalsoup.StatefulBrowser(user_agent=ua or G_USER_AGENT, session=s)
if isinstance(url, str) and len(url):
result = browser.open(url)
else:
return browser
if not result.ok:
return None
if return_type == "object":
return result
elif return_type == "content":
return result.content
elif return_type == "browser":
return result, browser
else:
result.encoding = encoding or "utf-8"
return result.text
except requests.exceptions.ProxyError:
print("[-]get_html_by_browser() Proxy error! Please check your Proxy")
except Exception as e:
print(f'[-]get_html_by_browser() Failed! {e}')
return None
def get_html_by_form(url, form_select: str = None, fields: dict = None, cookies: dict = None, ua: str = None,
return_type: str = None, encoding: str = None):
config_proxy = config.getInstance().proxy()
s = requests.Session()
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(s.cookies, cookies)
retries = Retry(total=config_proxy.retry, connect=config_proxy.retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
s.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
s.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
if config_proxy.enable:
s.verify = config.getInstance().cacert_file()
s.proxies = config_proxy.proxies()
try:
browser = mechanicalsoup.StatefulBrowser(user_agent=ua or G_USER_AGENT, session=s)
result = browser.open(url)
if not result.ok:
return None
form = browser.select_form() if form_select is None else browser.select_form(form_select)
if isinstance(fields, dict):
for k, v in fields.items():
browser[k] = v
response = browser.submit_selected()
if return_type == "object":
return response
elif return_type == "content":
return response.content
elif return_type == "browser":
return response, browser
else:
result.encoding = encoding or "utf-8"
return response.text
except requests.exceptions.ProxyError:
print("[-]get_html_by_form() Proxy error! Please check your Proxy")
except Exception as e:
print(f'[-]get_html_by_form() Failed! {e}')
return None
def get_html_by_scraper(url: str = None, cookies: dict = None, ua: str = None, return_type: str = None,
encoding: str = None):
config_proxy = config.getInstance().proxy()
session = create_scraper(browser={'custom': ua or G_USER_AGENT, })
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(session.cookies, cookies)
retries = Retry(total=config_proxy.retry, connect=config_proxy.retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
session.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
session.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=config_proxy.timeout))
if config_proxy.enable:
session.verify = config.getInstance().cacert_file()
session.proxies = config_proxy.proxies()
try:
if isinstance(url, str) and len(url):
result = session.get(str(url))
else: # 空url参数直接返回可重用scraper对象无需设置return_type
return session
if not result.ok:
return None
if return_type == "object":
return result
elif return_type == "content":
return result.content
elif return_type == "scraper":
return result, session
else:
result.encoding = encoding or "utf-8"
return result.text
except requests.exceptions.ProxyError:
print("[-]get_html_by_scraper() Proxy error! Please check your Proxy")
except Exception as e:
print(f"[-]get_html_by_scraper() failed. {e}")
return None
# def get_javlib_cookie() -> [dict, str]:
# import cloudscraper
# switch, proxy, timeout, retry_count, proxytype = config.getInstance().proxy()
# proxies = get_proxy(proxy, proxytype)
#
# raw_cookie = {}
# user_agent = ""
#
# # Get __cfduid/cf_clearance and user-agent
# for i in range(retry_count):
# try:
# if switch == 1 or switch == '1':
# raw_cookie, user_agent = cloudscraper.get_cookie_string(
# "http://www.javlibrary.com/",
# proxies=proxies
# )
# else:
# raw_cookie, user_agent = cloudscraper.get_cookie_string(
# "http://www.javlibrary.com/"
# )
# except requests.exceptions.ProxyError:
# print("[-] ProxyError, retry {}/{}".format(i + 1, retry_count))
# except cloudscraper.exceptions.CloudflareIUAMError:
# print("[-] IUAMError, retry {}/{}".format(i + 1, retry_count))
#
# return raw_cookie, user_agent
def translate(
src: str,
target_language: str = config.getInstance().get_target_language(),
engine: str = config.getInstance().get_translate_engine(),
app_id: str = "",
key: str = "",
delay: int = 0,
) -> str:
"""
translate japanese kana to simplified chinese
翻译日语假名到简体中文
:raises ValueError: Non-existent translation engine
"""
trans_result = ""
# 中文句子如果包含&等符号会被谷歌翻译截断损失内容,而且中文翻译到中文也没有意义,故而忽略,只翻译带有日语假名的
if (is_japanese(src) == False) and ("zh_" in target_language):
return src
if engine == "google-free":
gsite = config.getInstance().get_translate_service_site()
if not re.match('^translate\.google\.(com|com\.\w{2}|\w{2})$', gsite):
gsite = 'translate.google.cn'
url = (
f"https://{gsite}/translate_a/single?client=gtx&dt=t&dj=1&ie=UTF-8&sl=auto&tl={target_language}&q={src}"
)
result = get_html(url=url, return_type="object")
if not result.ok:
print('[-]Google-free translate web API calling failed.')
return ''
translate_list = [i["trans"] for i in result.json()["sentences"]]
trans_result = trans_result.join(translate_list)
elif engine == "azure":
url = "https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=" + target_language
headers = {
'Ocp-Apim-Subscription-Key': key,
'Ocp-Apim-Subscription-Region': "global",
'Content-type': 'application/json',
'X-ClientTraceId': str(uuid.uuid4())
}
body = json.dumps([{'text': src}])
result = post_html(url=url, query=body, headers=headers)
translate_list = [i["text"] for i in result.json()[0]["translations"]]
trans_result = trans_result.join(translate_list)
elif engine == "deeplx":
url = config.getInstance().get_translate_service_site()
res = requests.post(f"{url}/translate", json={
'text': src,
'source_lang': 'auto',
'target_lang': target_language,
})
if res.text.strip():
trans_result = res.json().get('data')
else:
raise ValueError("Non-existent translation engine")
time.sleep(delay)
return trans_result
def load_cookies(cookie_json_filename: str) -> typing.Tuple[typing.Optional[dict], typing.Optional[str]]:
"""
加载cookie,用于以会员方式访问非游客内容
:filename: cookie文件名。获取cookie方式从网站登录后通过浏览器插件(CookieBro或EdittThisCookie)或者直接在地址栏网站链接信息处都可以复制或者导出cookie内容以JSON方式保存
# 示例: FC2-755670 url https://javdb9.com/v/vO8Mn
# json 文件格式
# 文件名: 站点名.json示例 javdb9.json
# 内容(文件编码:UTF-8)
{
"over18":"1",
"redirect_to":"%2Fv%2FvO8Mn",
"remember_me_token":"***********",
"_jdb_session":"************",
"locale":"zh",
"__cfduid":"*********",
"theme":"auto"
}
"""
filename = os.path.basename(cookie_json_filename)
if not len(filename):
return None, None
path_search_order = (
Path.cwd() / filename,
Path.home() / filename,
Path.home() / f".mdc/{filename}",
Path.home() / f".local/share/mdc/{filename}"
)
cookies_filename = None
try:
for p in path_search_order:
if p.is_file():
cookies_filename = str(p.resolve())
break
if not cookies_filename:
return None, None
return json.loads(Path(cookies_filename).read_text(encoding='utf-8')), cookies_filename
except:
print("[-]Connect Failed! Please check your Proxy.")
return None, None
def file_modification_days(filename: str) -> int:
"""
文件修改时间距此时的天数
"""
mfile = Path(filename)
if not mfile.is_file():
return 9999
mtime = int(mfile.stat().st_mtime)
now = int(time.time())
days = int((now - mtime) / (24 * 60 * 60))
if days < 0:
return 9999
return days
def file_not_exist_or_empty(filepath) -> bool:
return not os.path.isfile(filepath) or os.path.getsize(filepath) == 0
def is_japanese(raw: str) -> bool:
"""
日语简单检测
"""
return bool(re.search(r'[\u3040-\u309F\u30A0-\u30FF\uFF66-\uFF9F]', raw, re.UNICODE))
def download_file_with_filename(url: str, filename: str, path: str) -> None:
"""
download file save to give path with given name from given url
"""
conf = config.getInstance()
config_proxy = conf.proxy()
for i in range(config_proxy.retry):
try:
if config_proxy.enable:
if not os.path.exists(path):
try:
os.makedirs(path)
except:
print(f"[-]Fatal error! Can not make folder '{path}'")
os._exit(0)
r = get_html(url=url, return_type='content')
if r == '':
print('[-]Movie Download Data not found!')
return
with open(os.path.join(path, filename), "wb") as code:
code.write(r)
return
else:
if not os.path.exists(path):
try:
os.makedirs(path)
except:
print(f"[-]Fatal error! Can not make folder '{path}'")
os._exit(0)
r = get_html(url=url, return_type='content')
if r == '':
print('[-]Movie Download Data not found!')
return
with open(os.path.join(path, filename), "wb") as code:
code.write(r)
return
except requests.exceptions.ProxyError:
i += 1
print('[-]Download : Connect retry ' + str(i) + '/' + str(config_proxy.retry))
except requests.exceptions.ConnectTimeout:
i += 1
print('[-]Download : Connect retry ' + str(i) + '/' + str(config_proxy.retry))
except requests.exceptions.ConnectionError:
i += 1
print('[-]Download : Connect retry ' + str(i) + '/' + str(config_proxy.retry))
except requests.exceptions.RequestException:
i += 1
print('[-]Download : Connect retry ' + str(i) + '/' + str(config_proxy.retry))
except IOError:
raise ValueError(f"[-]Create Directory '{path}' failed!")
return
print('[-]Connect Failed! Please check your Proxy or Network!')
raise ValueError('[-]Connect Failed! Please check your Proxy or Network!')
return
def download_one_file(args) -> str:
"""
download file save to given path from given url
wrapped for map function
"""
(url, save_path, json_headers) = args
if json_headers is not None:
filebytes = get_html(url, return_type='content', json_headers=json_headers['headers'])
else:
filebytes = get_html(url, return_type='content')
if isinstance(filebytes, bytes) and len(filebytes):
with save_path.open('wb') as fpbyte:
if len(filebytes) == fpbyte.write(filebytes):
return str(save_path)
def parallel_download_files(dn_list: typing.Iterable[typing.Sequence], parallel: int = 0, json_headers=None):
"""
download files in parallel 多线程下载文件
用法示例: 2线程同时下载两个不同文件并保存到不同路径路径目录可未创建但需要具备对目标目录和文件的写权限
parallel_download_files([
('https://site1/img/p1.jpg', 'C:/temp/img/p1.jpg'),
('https://site2/cover/n1.xml', 'C:/tmp/cover/n1.xml')
])
:dn_list: 可以是 tuple或者list: ((url1, save_fullpath1),(url2, save_fullpath2),) fullpath可以是str或Path
:parallel: 并行下载的线程池线程数为0则由函数自己决定
"""
mp_args = []
for url, fullpath in dn_list:
if url and isinstance(url, str) and url.startswith('http') \
and fullpath and isinstance(fullpath, (str, Path)) and len(str(fullpath)):
fullpath = Path(fullpath)
fullpath.parent.mkdir(parents=True, exist_ok=True)
mp_args.append((url, fullpath, json_headers))
if not len(mp_args):
return []
if not isinstance(parallel, int) or parallel not in range(1, 200):
parallel = min(5, len(mp_args))
with ThreadPoolExecutor(parallel) as pool:
results = list(pool.map(download_one_file, mp_args))
return results
def delete_all_elements_in_list(string: str, lists: typing.Iterable[str]):
"""
delete same string in given list
"""
new_lists = []
for i in lists:
if i != string:
new_lists.append(i)
return new_lists
def delete_all_elements_in_str(string_delete: str, string: str):
"""
delete same string in given list
"""
for i in string:
if i == string_delete:
string = string.replace(i, "")
return string
# print format空格填充对齐内容包含中文时的空格计算
def cn_space(v: str, n: int) -> int:
return n - [category(c) for c in v].count('Lo')
"""
Usage: python ./ADC_function.py https://cn.bing.com/
Purpose: benchmark get_html_session
benchmark get_html_by_scraper
benchmark get_html_by_browser
benchmark get_html
TODO: may be this should move to unittest directory
"""
if __name__ == "__main__":
import sys, timeit
from http.client import HTTPConnection
def benchmark(times: int, url):
print(f"HTTP GET Benchmark times:{times} url:{url}")
tm = timeit.timeit(f"_ = session1.get('{url}')",
"from __main__ import get_html_session;session1=get_html_session()",
number=times)
print(f' *{tm:>10.5f}s get_html_session() Keep-Alive enable')
tm = timeit.timeit(f"_ = scraper1.get('{url}')",
"from __main__ import get_html_by_scraper;scraper1=get_html_by_scraper()",
number=times)
print(f' *{tm:>10.5f}s get_html_by_scraper() Keep-Alive enable')
tm = timeit.timeit(f"_ = browser1.open('{url}')",
"from __main__ import get_html_by_browser;browser1=get_html_by_browser()",
number=times)
print(f' *{tm:>10.5f}s get_html_by_browser() Keep-Alive enable')
tm = timeit.timeit(f"_ = get_html('{url}')",
"from __main__ import get_html",
number=times)
print(f' *{tm:>10.5f}s get_html()')
# target_url = "https://www.189.cn/"
target_url = "http://www.chinaunicom.com"
HTTPConnection.debuglevel = 1
html_session = get_html_session()
_ = html_session.get(target_url)
HTTPConnection.debuglevel = 0
# times
t = 100
if len(sys.argv) > 1:
target_url = sys.argv[1]
benchmark(t, target_url)

View File

@@ -1,60 +0,0 @@
import glob
import os
import time
import re
def movie_lists():
#MP4
a2 = glob.glob(os.getcwd() + r"\*.mp4")
# AVI
b2 = glob.glob(os.getcwd() + r"\*.avi")
# RMVB
c2 = glob.glob(os.getcwd() + r"\*.rmvb")
# WMV
d2 = glob.glob(os.getcwd() + r"\*.wmv")
# MOV
e2 = glob.glob(os.getcwd() + r"\*.mov")
# MKV
f2 = glob.glob(os.getcwd() + r"\*.mkv")
# FLV
g2 = glob.glob(os.getcwd() + r"\*.flv")
total = a2+b2+c2+d2+e2+f2+g2
return total
def lists_from_test(custom_nuber): #电影列表
a=[]
a.append(custom_nuber)
return a
def CEF(path):
files = os.listdir(path) # 获取路径下的子文件(夹)列表
for file in files:
try: #试图删除空目录,非空目录删除会报错
os.removedirs(path + '/' + file) # 删除这个空文件夹
print('[+]Deleting empty folder',path + '/' + file)
except:
a=''
def rreplace(self, old, new, *max):
#从右开始替换文件名中内容,源字符串,将被替换的子字符串, 新字符串用于替换old子字符串可选字符串, 替换不超过 max 次
count = len(self)
if max and str(max[0]).isdigit():
count = max[0]
return new.join(self.rsplit(old, count))
if __name__ =='__main__':
os.chdir(os.getcwd())
for i in movie_lists(): #遍历电影列表 交给core处理
if '_' in i:
os.rename(re.search(r'[^\\/:*?"<>|\r\n]+$', i).group(), rreplace(re.search(r'[^\\/:*?"<>|\r\n]+$', i).group(), '_', '-', 1))
i = rreplace(re.search(r'[^\\/:*?"<>|\r\n]+$', i).group(), '_', '-', 1)
os.system('python core.py' + ' "' + i + '"') #选择从py文件启动 用于源码py
#os.system('core.exe' + ' "' + i + '"') #选择从exe文件启动用于EXE版程序
print("[*]=====================================")
print("[!]Cleaning empty folders")
CEF('JAV_output')
print("[+]All finished!!!")
time.sleep(3)

114
ImageProcessing/__init__.py Normal file
View File

@@ -0,0 +1,114 @@
import sys
sys.path.append('../')
import logging
import os
import config
import importlib
from pathlib import Path
from PIL import Image
import shutil
from ADC_function import file_not_exist_or_empty
def face_crop_width(filename, width, height):
aspect_ratio = config.getInstance().face_aspect_ratio()
# 新宽度是高度的2/3
cropWidthHalf = int(height/3)
try:
locations_model = config.getInstance().face_locations_model().lower().split(',')
locations_model = filter(lambda x: x, locations_model)
for model in locations_model:
center, top = face_center(filename, model)
# 如果找到就跳出循环
if center:
cropLeft = center-cropWidthHalf
cropRight = center+cropWidthHalf
# 越界处理
if cropLeft < 0:
cropLeft = 0
cropRight = cropWidthHalf * aspect_ratio
elif cropRight > width:
cropLeft = width - cropWidthHalf * aspect_ratio
cropRight = width
return (cropLeft, 0, cropRight, height)
except:
print('[-]Not found face! ' + filename)
# 默认靠右切
return (width-cropWidthHalf * aspect_ratio, 0, width, height)
def face_crop_height(filename, width, height):
cropHeight = int(width*3/2)
try:
locations_model = config.getInstance().face_locations_model().lower().split(',')
locations_model = filter(lambda x: x, locations_model)
for model in locations_model:
center, top = face_center(filename, model)
# 如果找到就跳出循环
if top:
# 头部靠上
cropTop = top
cropBottom = cropHeight + top
if cropBottom > height:
cropTop = 0
cropBottom = cropHeight
return (0, cropTop, width, cropBottom)
except:
print('[-]Not found face! ' + filename)
# 默认从顶部向下切割
return (0, 0, width, cropHeight)
def cutImage(imagecut, path, thumb_path, poster_path, skip_facerec=False):
conf = config.getInstance()
fullpath_fanart = os.path.join(path, thumb_path)
fullpath_poster = os.path.join(path, poster_path)
aspect_ratio = conf.face_aspect_ratio()
if conf.face_aways_imagecut():
imagecut = 1
elif conf.download_only_missing_images() and not file_not_exist_or_empty(fullpath_poster):
return
# imagecut为4时同时也是有码影片 也用人脸识别裁剪封面
if imagecut == 1 or imagecut == 4: # 剪裁大封面
try:
img = Image.open(fullpath_fanart)
width, height = img.size
if width/height > 2/3: # 如果宽度大于2
if imagecut == 4:
# 以人像为中心切取
img2 = img.crop(face_crop_width(fullpath_fanart, width, height))
elif skip_facerec:
# 有码封面默认靠右切
img2 = img.crop((width - int(height / 3) * aspect_ratio, 0, width, height))
else:
# 以人像为中心切取
img2 = img.crop(face_crop_width(fullpath_fanart, width, height))
elif width/height < 2/3: # 如果高度大于3
# 从底部向上切割
img2 = img.crop(face_crop_height(fullpath_fanart, width, height))
else: # 如果等于2/3
img2 = img
img2.save(fullpath_poster)
print(f"[+]Image Cutted! {Path(fullpath_poster).name}")
except Exception as e:
print(e)
print('[-]Cover cut failed!')
elif imagecut == 0: # 复制封面
shutil.copyfile(fullpath_fanart, fullpath_poster)
print(f"[+]Image Copyed! {Path(fullpath_poster).name}")
def face_center(filename, model):
try:
mod = importlib.import_module('.' + model, 'ImageProcessing')
return mod.face_center(filename, model)
except Exception as e:
print('[-]Model found face ' + filename)
if config.getInstance().debug() == 1:
logging.error(e)
return (0, 0)
if __name__ == '__main__':
cutImage(1,'z:/t/','p.jpg','o.jpg')
#cutImage(1,'H:\\test\\','12.jpg','test.jpg')

8
ImageProcessing/cnn.py Normal file
View File

@@ -0,0 +1,8 @@
import sys
sys.path.append('../')
from ImageProcessing.hog import face_center as hog_face_center
def face_center(filename, model):
return hog_face_center(filename, model)

17
ImageProcessing/hog.py Normal file
View File

@@ -0,0 +1,17 @@
import face_recognition
def face_center(filename, model):
image = face_recognition.load_image_file(filename)
face_locations = face_recognition.face_locations(image, 1, model)
print('[+]Found person [' + str(len(face_locations)) + '] By model hog')
maxRight = 0
maxTop = 0
for face_location in face_locations:
top, right, bottom, left = face_location
# 中心点
x = int((right+left)/2)
if x > maxRight:
maxRight = x
maxTop = top
return maxRight,maxTop

BIN
Img/4K.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
Img/HACK.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
Img/ISO.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

BIN
Img/LEAK.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
Img/SUB.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

BIN
Img/UMR.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
Img/UNCENSORED.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

1047
LICENSE

File diff suppressed because it is too large Load Diff

38
Makefile Normal file
View File

@@ -0,0 +1,38 @@
#.PHONY: help prepare-dev test lint run doc
#VENV_NAME?=venv
#VENV_ACTIVATE=. $(VENV_NAME)/bin/activate
#PYTHON=${VENV_NAME}/bin/python3
SHELL = /bin/bash
.DEFAULT: make
make:
@echo "[+]make prepare-dev"
#sudo apt-get -y install python3 python3-pip
pip3 install -r requirements.txt
pip3 install pyinstaller
#@echo "[+]Set CLOUDSCRAPER_PATH variable"
#export cloudscraper_path=$(python3 -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1)
@echo "[+]Pyinstaller make"
pyinstaller --onefile Movie_Data_Capture.py --hidden-import ADC_function.py --hidden-import core.py \
--hidden-import "ImageProcessing.cnn" \
--python-option u \
--add-data "`python3 -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1`:cloudscraper" \
--add-data "`python3 -c 'import opencc as _; print(_.__path__[0])' | tail -n 1`:opencc" \
--add-data "`python3 -c 'import face_recognition_models as _; print(_.__path__[0])' | tail -n 1`:face_recognition_models" \
--add-data "Img:Img" \
--add-data "config.ini:." \
@echo "[+]Move to bin"
if [ ! -d "./bin" ];then mkdir bin; fi
mv dist/* bin/
cp config.ini bin/
rm -rf dist/
@echo "[+]Clean cache"
@find . -name '*.pyc' -delete
@find . -name '__pycache__' -type d | xargs rm -fr
@find . -name '.pytest_cache' -type d | xargs rm -fr
rm -rf build/

24750
MappingTable/c_number.json Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,411 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- 说明:可使用文本编辑器打开本文件后自行编辑。
keyword用于匹配标签/导演/系列/制作/发行的关键词,每个名字前后都需要用逗号隔开。当其中包含刮削得到的关键词时,可以输出对应语言的词。
zh_cn/zh_tw/jp指对应语言输出的词按设置的对应语言输出。当输出词为“删除”时表示遇到该关键词时在对应内容中删除该关键词-->
<info>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",成人奖,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",觸摸打字,触摸打字,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",10枚組,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",Don Cipote's choice,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",DVD多士爐,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",R-18,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",Vシネマ,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",イメージビデオ(男性),"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",サンプル動画,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",其他,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",放置,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",獨立製作,独立制作,独占配信,配信専用,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",特典ありAVベースボール,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",天堂TV,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",性愛,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",限時降價,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",亞洲女演員,亚洲女演员,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",字幕,中文字幕,中文,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",AV女优,女优,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",HDTV,HD DVD,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",MicroSD,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",R-15,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",UMD,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",VHS,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",愛好,文化,爱好、文化,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",訪問,访问,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",感官作品,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",高畫質,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",高清,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",素人作品,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",友誼,友谊,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",正常,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",蓝光,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",冒險,冒险,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",模擬,模拟,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",年輕女孩,年轻女孩,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",去背影片,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",天賦,天赋,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",形象俱樂部,形象俱乐部,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",懸疑,悬疑,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",亞洲,亚洲,"/>
<a zh_cn="删除" zh_tw="删除" jp="删除" keyword=",ハロウィーンキャンペーン,"/>
<a zh_cn="16小时+" zh_tw="16小時+" jp="16時間以上作品" keyword=",16小時以上作品,16小时以上作品,16時間以上作品,16小时+,16小時+,"/>
<a zh_cn="3D" zh_tw="3D" jp="3D" keyword=",3D,"/>
<a zh_cn="3D卡通" zh_tw="3D卡通" jp="3Dエロアニメ" keyword=",3D卡通,3Dエロアニメ,"/>
<a zh_cn="4K" zh_tw="4K" jp="4K" keyword=",4K,"/>
<a zh_cn="DMM独家" zh_tw="DMM獨家" jp="DMM獨家" keyword=",DMM獨家,DMM独家,DMM專屬,DMM专属,"/>
<a zh_cn="M女" zh_tw="M女" jp="M女" keyword=",M女,"/>
<a zh_cn="SM" zh_tw="SM" jp="SM" keyword=",SM,"/>
<a zh_cn="轻虐" zh_tw="輕虐" jp="微SM" keyword=",微SM,轻虐,輕虐,"/>
<a zh_cn="VR" zh_tw="VR" jp="VR" keyword=",VR,VR専用,高品质VR,ハイクオリティVR,"/>
<a zh_cn="武术格斗" zh_tw="武術格鬥" jp="アクション" keyword=",格鬥家,格斗家,戰鬥行動,战斗行动,アクション,武术格斗,武術格鬥,"/>
<a zh_cn="绝顶高潮" zh_tw="絕頂高潮" jp="アクメ・オーガズム" keyword=",极致·性高潮,アクメ・オーガズム,绝顶高潮,絕頂高潮,"/>
<a zh_cn="运动" zh_tw="運動" jp="アスリート" keyword=",运动员,アスリート,運動,运动,"/>
<a zh_cn="COSPLAY" zh_tw="COSPLAY" jp="COSPLAY" keyword=",COSPLAY,COSPLAY服飾,COSPLAY服饰,アニメ,"/>
<a zh_cn="动画角色" zh_tw="動畫角色" jp="動畫人物" keyword=",动漫,動画,動畫人物,动画人物,动画角色,動畫角色,"/>
<a zh_cn="角色扮演" zh_tw="角色扮演" jp="角色扮演" keyword=",角色扮演者,角色扮演,コスプレ,"/>
<a zh_cn="萝莉Cos" zh_tw="蘿莉Cos" jp="蘿莉Cos" keyword=",蘿莉角色扮演,萝莉角色扮演,萝莉Cos,蘿莉Cos,"/>
<a zh_cn="纯欲" zh_tw="純欲" jp="エロス" keyword=",エロス,纯欲,純欲,"/>
<a zh_cn="御宅族" zh_tw="御宅族" jp="オタク" keyword=",御宅族,オタク,"/>
<a zh_cn="辅助自慰" zh_tw="輔助自慰" jp="オナサポ" keyword=",自慰辅助,オナサポ,辅助自慰,輔助自慰,"/>
<a zh_cn="自慰" zh_tw="自慰" jp="自慰" keyword=",自慰,オナニー,"/>
<a zh_cn="洗浴" zh_tw="洗浴" jp="お風呂" keyword=",淋浴,お風呂,洗浴,洗澡,"/>
<a zh_cn="温泉" zh_tw="溫泉" jp="溫泉" keyword=",温泉,溫泉,"/>
<a zh_cn="寝取" zh_tw="寢取" jp="寝取られ" keyword=",寝取,寢取,寝取られ,寝取り·寝取られ·ntr,寝取り·寝取られ·NTR,"/>
<a zh_cn="老太婆" zh_tw="老太婆" jp="お婆ちゃん" keyword=",お婆ちゃん,老太婆,"/>
<a zh_cn="老年男性" zh_tw="老年男性" jp="お爺ちゃん" keyword=",高龄男,お爺ちゃん,老年男性,"/>
<a zh_cn="接吻" zh_tw="接吻" jp="キス・接吻" keyword=",接吻,キス・接吻,"/>
<a zh_cn="女同接吻" zh_tw="女同接吻" jp="女同接吻" keyword=",女同接吻,"/>
<a zh_cn="介绍影片" zh_tw="介紹影片" jp="コミック雑誌" keyword=",コミック雑誌,介绍影片,介紹影片,"/>
<a zh_cn="心理惊悚" zh_tw="心理驚悚" jp="サイコ・スリラー" keyword=",サイコ・スリラー,心理惊悚,心理驚悚,"/>
<a zh_cn="打屁股" zh_tw="打屁股" jp="スパンキング" keyword=",虐打,スパンキング,打屁股,"/>
<a zh_cn="夫妻交换" zh_tw="夫妻交換" jp="スワッピング・夫婦交換" keyword=",夫妻交换,スワッピング・夫婦交換,夫妻交換,"/>
<a zh_cn="性感" zh_tw="性感" jp="セクシー" keyword=",性感的,性感的x,セクシー,"/>
<a zh_cn="性感内衣" zh_tw="性感内衣" jp="性感内衣" keyword=",性感内衣,內衣,内衣,ランジェリー,"/>
<a zh_cn="养尊处优" zh_tw="養尊處優" jp="セレブ" keyword=",セレブ,养尊处优,養尊處優,"/>
<a zh_cn="拉拉队" zh_tw="拉拉隊" jp="チアガール" keyword=",拉拉队长,チアガール,拉拉隊,"/>
<a zh_cn="假阳具" zh_tw="假陽具" jp="ディルド" keyword=",ディルド,假阳具,假陽具,"/>
<a zh_cn="约会" zh_tw="約會" jp="デート" keyword=",约会,デート,約會,"/>
<a zh_cn="巨根" zh_tw="巨根" jp="デカチン・巨根" keyword=",巨大陰莖,巨大阴茎,デカチン・巨根,"/>
<a zh_cn="不戴套" zh_tw="不戴套" jp="生ハメ" keyword=",不戴套,生ハメ,"/>
<a zh_cn="不穿内裤" zh_tw="不穿內褲" jp="ノーパン" keyword=",无内裤,ノーパン,不穿内裤,不穿內褲,"/>
<a zh_cn="不穿胸罩" zh_tw="不穿胸罩" jp="ノーブラ" keyword=",无胸罩,ノーブラ,不穿胸罩,"/>
<a zh_cn="后宫" zh_tw="後宮" jp="ハーレム" keyword=",ハーレム,后宫,後宮,"/>
<a zh_cn="后入" zh_tw="後入" jp="バック" keyword=",背后,バック,后入,後入,"/>
<a zh_cn="妓女" zh_tw="妓女" jp="ビッチ" keyword=",ビッチ,妓女,风俗女郎(性工作者),"/>
<a zh_cn="感谢祭" zh_tw="感謝祭" jp="ファン感謝・訪問" keyword=",粉丝感谢,ファン感謝・訪問,感谢祭,感謝祭,"/>
<a zh_cn="大保健" zh_tw="大保健" jp="ヘルス・ソープ" keyword=",ヘルス・ソープ,大保健,按摩,マッサージ,"/>
<a zh_cn="按摩棒" zh_tw="按摩棒" jp="按摩棒" keyword=",女優按摩棒,女优按摩棒,按摩棒,电动按摩棒,電動按摩棒,電マ,バイブ,"/>
<a zh_cn="男同性恋" zh_tw="男同性戀" jp="ボーイ ズラブ" keyword=",ボーイズラブ,男同,男同性戀,男同性恋,"/>
<a zh_cn="酒店" zh_tw="酒店" jp="ホテル" keyword=",ホテル,酒店,飯店,"/>
<a zh_cn="酒店小姐" zh_tw="酒店小姐" jp="キャバ嬢" keyword=",キャバ嬢,酒店小姐,"/>
<a zh_cn="妈妈的朋友" zh_tw="媽媽的朋友" jp="ママ友" keyword=",ママ友,妈妈的朋友,媽媽的朋友,"/>
<a zh_cn="喜剧" zh_tw="喜劇" jp="ラブコメ" keyword=",喜剧,爱情喜剧,ラブコメ,喜劇,滑稽模仿,堵嘴·喜劇,整人・喜剧,"/>
<a zh_cn="恶搞" zh_tw="惡搞" jp="パロディ" keyword=",パロディ,惡搞,整人,"/>
<a zh_cn="白眼失神" zh_tw="白眼失神" jp="白目・失神" keyword=",翻白眼・失神,白目・失神,白眼失神,"/>
<a zh_cn="白人" zh_tw="白人" jp="白人" keyword=",白人,"/>
<a zh_cn="招待小姐" zh_tw="招待小姐" jp="受付嬢" keyword=",招待小姐,受付嬢,接待员,"/>
<a zh_cn="薄马赛克" zh_tw="薄馬賽克" jp="薄馬賽克" keyword=",薄馬賽克,薄马赛克,"/>
<a zh_cn="鼻钩" zh_tw="鼻鉤" jp="鼻フック" keyword=",鼻勾,鼻フック,鼻钩,鼻鉤,"/>
<a zh_cn="变性人" zh_tw="變性人" jp="變性者" keyword=",變性者,变性者,变性人,變性人,"/>
<a zh_cn="医院诊所" zh_tw="醫院診所" jp="病院・クリニック" keyword=",医院・诊所,病院・クリニック,医院诊所,醫院診所,"/>
<a zh_cn="社团经理" zh_tw="社團經理" jp="部活・マネージャー" keyword=",社团・经理,部活・マネージャー,社团经理,社團經理,"/>
<a zh_cn="下属·同事" zh_tw="下屬·同事" jp="部下・同僚" keyword=",下属・同事,部下・同僚,下属·同事,下屬·同事,同事,下屬,下属,"/>
<a zh_cn="残忍" zh_tw="殘忍" jp="殘忍" keyword=",殘忍,殘忍畫面,残忍画面,奇異的,奇异的,"/>
<a zh_cn="插入异物" zh_tw="插入異物" jp="插入異物" keyword=",插入異物,插入异物,"/>
<a zh_cn="超乳" zh_tw="超乳" jp="超乳" keyword=",超乳,"/>
<a zh_cn="潮吹" zh_tw="潮吹" jp="潮吹" keyword=",潮吹,潮吹き,"/>
<a zh_cn="男优潮吹" zh_tw="男優潮吹" jp="男の潮吹き" keyword=",男潮吹,男の潮吹き,男优潮吹,男優潮吹,"/>
<a zh_cn="巴士导游" zh_tw="巴士導遊" jp="車掌小姐" keyword=",車掌小姐,车掌小姐,巴士乘务员,巴士乘務員,巴士导游,巴士導遊,バスガイド,"/>
<a zh_cn="熟女" zh_tw="熟女" jp="熟女" keyword=",熟女,成熟的女人,"/>
<a zh_cn="出轨" zh_tw="出軌" jp="出軌" keyword=",出軌,出轨,"/>
<a zh_cn="白天出轨" zh_tw="白天出軌" jp="白天出轨" keyword=",白天出軌,白天出轨,通姦,"/>
<a zh_cn="处男" zh_tw="處男" jp="處男" keyword=",處男,处男,"/>
<a zh_cn="处女" zh_tw="處女" jp="處女" keyword=",處女,处女,処女,童貞,"/>
<a zh_cn="触手" zh_tw="觸手" jp="觸手" keyword=",觸手,触手,"/>
<a zh_cn="胁迫" zh_tw="胁迫" jp="胁迫" keyword=",魔鬼系,粗暴,胁迫,"/>
<a zh_cn="催眠" zh_tw="催眠" jp="催眠" keyword=",催眠,"/>
<a zh_cn="打手枪" zh_tw="打手槍" jp="打手槍" keyword=",手淫,打手枪,打手槍,手コキ,"/>
<a zh_cn="单体作品" zh_tw="單體作品" jp="單體作品" keyword=",单体作品,單體作品,単体作品,AV女优片,"/>
<a zh_cn="荡妇" zh_tw="蕩婦" jp="蕩婦" keyword=",蕩婦,荡妇,"/>
<a zh_cn="搭讪" zh_tw="搭訕" jp="搭訕" keyword=",倒追,女方搭讪,女方搭訕,搭讪,搭訕,ナンパ,"/>
<a zh_cn="女医师" zh_tw="女醫師" jp="女醫師" keyword=",女医师,女醫師,女医,"/>
<a zh_cn="主观视角" zh_tw="主觀視角" jp="主觀視角" keyword=",第一人稱攝影,第一人称摄影,主观视角,主觀視角,第一人称视点,主観,"/>
<a zh_cn="多P" zh_tw="多P" jp="多P" keyword=",多P,"/>
<a zh_cn="恶作剧" zh_tw="惡作劇" jp="惡作劇" keyword=",惡作劇,恶作剧,"/>
<a zh_cn="放尿" zh_tw="放尿" jp="放尿" keyword=",放尿,"/>
<a zh_cn="女服务生" zh_tw="女服務生" jp="ウェイトレス" keyword=",服務生,服务生,女服务生,女服務生,ウェイトレス,"/>
<a zh_cn="蒙面" zh_tw="蒙面" jp="覆面・マスク" keyword=",蒙面・面罩,蒙面・面具,覆面・マスク,"/>
<a zh_cn="肛交" zh_tw="肛交" jp="肛交" keyword=",肛交,アナル,"/>
<a zh_cn="肛内中出" zh_tw="肛內中出" jp="肛內中出" keyword=",肛内中出,肛內中出,"/>
<a zh_cn="个子高" zh_tw="個子高" jp="个子高" keyword=",高,个子高,個子高,"/>
<a zh_cn="高中生" zh_tw="高中生" jp="高中生" keyword=",高中女生,高中生,"/>
<a zh_cn="歌德萝莉" zh_tw="歌德蘿莉" jp="哥德蘿莉" keyword=",歌德萝莉,哥德蘿莉,歌德蘿莉,"/>
<a zh_cn="各种职业" zh_tw="各種職業" jp="各種職業" keyword=",各種職業,各种职业,多種職業,多种职业,職業色々,"/>
<a zh_cn="职业装" zh_tw="職業裝" jp="職業裝" keyword=",OL,洽公服装,职业装,職業裝,ビジネススーツ,"/>
<a zh_cn="女性向" zh_tw="女性向" jp="女性向け" keyword=",給女性觀眾,给女性观众,女性向,女性向け,"/>
<a zh_cn="公主" zh_tw="公主" jp="公主" keyword=",公主,"/>
<a zh_cn="故事集" zh_tw="故事集" jp="故事集" keyword=",故事集,"/>
<a zh_cn="寡妇" zh_tw="寡婦" jp="寡婦" keyword=",寡婦,寡妇,"/>
<a zh_cn="灌肠" zh_tw="灌腸" jp="灌腸" keyword=",灌腸,灌肠,"/>
<a zh_cn="进口" zh_tw="進口" jp="國外進口" keyword=",海外,進口,进口,國外進口,国外进口,"/>
<a zh_cn="流汗" zh_tw="流汗" jp="汗だく" keyword=",流汗,汗だく,"/>
<a zh_cn="共演" zh_tw="共演" jp="合作作品" keyword=",合作作品,共演,"/>
<a zh_cn="和服・丧服" zh_tw="和服・喪服" jp="和服・喪服" keyword=",和服・丧服,和服,喪服,和服、丧服,和服・喪服,和服·丧服,和服·喪服,"/>
<a zh_cn="和服・浴衣" zh_tw="和服・浴衣" jp="和服・浴衣" keyword=",浴衣,和服・浴衣,和服、浴衣,"/>
<a zh_cn="调教・奴隶" zh_tw="調教・奴隸" jp="調教・奴隸" keyword=",奴隸,奴隶,奴隷,調教・奴隷,調教,调教,调教・奴隶,调教·奴隶,調教·奴隸,調教・奴隸."/>
<a zh_cn="黑帮成员" zh_tw="黑幫成員" jp="黑幫成員" keyword=",黑幫成員,黑帮成员,"/>
<a zh_cn="黑人" zh_tw="黑人" jp="黑人演員" keyword=",黑人,黑人演員,黑人演员,黒人男優,"/>
<a zh_cn="护士" zh_tw="護士" jp="ナース" keyword=",護士,护士,ナース,"/>
<a zh_cn="痴汉" zh_tw="痴漢" jp="痴漢" keyword=",痴漢,痴汉,"/>
<a zh_cn="痴女" zh_tw="癡女" jp="癡女" keyword=",花癡,痴女,癡女,"/>
<a zh_cn="新娘" zh_tw="新娘" jp="新娘" keyword=",花嫁,新娘,新娘,年輕妻子,新娘、年轻妻子,新娘、年輕妻子,新娘、少妇,新娘、少婦,花嫁・若妻,"/>
<a zh_cn="少妇" zh_tw="少婦" jp="少婦" keyword=",少妇,少婦,"/>
<a zh_cn="妄想" zh_tw="妄想" jp="妄想" keyword=",幻想,妄想,妄想族,"/>
<a zh_cn="肌肉" zh_tw="肌肉" jp="肌肉" keyword=",肌肉,"/>
<a zh_cn="及膝袜" zh_tw="及膝襪" jp="及膝襪" keyword=",及膝襪,及膝袜,"/>
<a zh_cn="纪录片" zh_tw="紀錄片" jp="纪录片" keyword=",紀錄片,纪录片,"/>
<a zh_cn="家庭教师" zh_tw="家庭教師" jp="家庭教師" keyword=",家教,家庭教师,家庭教師,"/>
<a zh_cn="娇小" zh_tw="嬌小" jp="嬌小的" keyword=",迷你系,迷你係列,娇小,嬌小,瘦小身型,嬌小的,迷你系‧小隻女,ミニ系・小柄,"/>
<a zh_cn="性教学" zh_tw="性教學" jp="性教學" keyword=",教學,教学,性教学,性教學,"/>
<a zh_cn="姐姐" zh_tw="姐姐" jp="姐姐" keyword=",姐姐,姐姐系,お姉さん,"/>
<a zh_cn="姐·妹" zh_tw="姐·妹" jp="姐·妹" keyword=",妹妹,姐妹,姐·妹,姊妹,"/>
<a zh_cn="穿衣幹砲" zh_tw="穿衣幹砲" jp="着エロ" keyword=",穿衣幹砲,着エロ,"/>
<a zh_cn="紧缚" zh_tw="緊縛" jp="緊縛" keyword=",緊縛,紧缚,縛り・緊縛,紧缚,"/>
<a zh_cn="紧身衣" zh_tw="緊身衣" jp="緊身衣" keyword=",緊身衣,紧身衣,紧缚皮衣,緊縛皮衣,紧身衣激凸,緊身衣激凸,ボディコン,"/>
<a zh_cn="经典老片" zh_tw="經典老片" jp="經典" keyword=",經典,经典,经典老片,經典老片,"/>
<a zh_cn="拘束" zh_tw="拘束" jp="拘束" keyword=",拘束,"/>
<a zh_cn="监禁" zh_tw="監禁" jp="監禁" keyword=",監禁,监禁,"/>
<a zh_cn="强奸" zh_tw="強姦" jp="強姦" keyword=",強姦,强奸,強暴,强暴,レイプ,"/>
<a zh_cn="轮奸" zh_tw="輪姦" jp="輪姦" keyword=",輪姦,轮奸,轮姦,"/>
<a zh_cn="私处近拍" zh_tw="私處近拍" jp="私處近拍" keyword=",私处近拍,私處近拍,局部特寫,局部特写,局部アップ,"/>
<a zh_cn="巨尻" zh_tw="巨尻" jp="巨尻" keyword=",大屁股,巨大屁股,巨尻,"/>
<a zh_cn="美尻" zh_tw="美尻" jp="美尻" keyword=",美尻,"/>
<a zh_cn="巨乳" zh_tw="巨乳" jp="巨乳" keyword=",巨乳,巨乳爆乳,爱巨乳,愛巨乳,巨乳フェチ,"/>
<a zh_cn="窈窕" zh_tw="窈窕" jp="スレンダー" keyword=",窈窕,スレンダー,"/>
<a zh_cn="美腿" zh_tw="美腿" jp="美腿" keyword=",美腿,美脚,爱美腿,愛美腿,脚フェチ,"/>
<a zh_cn="修长" zh_tw="修長" jp="長身" keyword=",修長,長身,"/>
<a zh_cn="爱美臀" zh_tw="愛美臀" jp="尻フェチ" keyword=",爱美臀,愛美臀,尻フェチ,"/>
<a zh_cn="奇幻" zh_tw="奇幻" jp="科幻" keyword=",科幻,奇幻,"/>
<a zh_cn="空姐" zh_tw="空姐" jp="スチュワーデス" keyword=",空中小姐,空姐,スチュワーデス,"/>
<a zh_cn="恐怖" zh_tw="恐怖" jp="恐怖" keyword=",恐怖,"/>
<a zh_cn="口交" zh_tw="口交" jp="フェラ" keyword=",口交,フェラ,双重口交,雙重口交,Wフェラ,"/>
<a zh_cn="强迫口交" zh_tw="強迫口交" jp="強迫口交" keyword=",强迫口交,強迫口交,イラマチオ,"/>
<a zh_cn="偷拍" zh_tw="偷拍" jp="盗撮" keyword=",偷拍,盗撮,"/>
<a zh_cn="蜡烛" zh_tw="蠟燭" jp="蝋燭" keyword=",蜡烛,蝋燭,蠟燭,"/>
<a zh_cn="滥交" zh_tw="濫交" jp="濫交" keyword=",濫交,滥交,乱交,亂交,"/>
<a zh_cn="酒醉" zh_tw="酒醉" jp="爛醉如泥的" keyword=",爛醉如泥的,烂醉如泥的,酒醉,"/>
<a zh_cn="立即插入" zh_tw="立即插入" jp="立即插入" keyword=",立即口交,即兴性交,立即插入,马上幹,馬上幹,即ハメ,"/>
<a zh_cn="连裤袜" zh_tw="連褲襪" jp="連褲襪" keyword=",連褲襪,连裤袜,"/>
<a zh_cn="连发" zh_tw="連發" jp="連発" keyword=",连发,連發,連発,"/>
<a zh_cn="恋爱" zh_tw="戀愛" jp="戀愛" keyword=",戀愛,恋爱,恋愛,"/>
<a zh_cn="恋乳癖" zh_tw="戀乳癖" jp="戀乳癖" keyword=",戀乳癖,恋乳癖,"/>
<a zh_cn="恋腿癖" zh_tw="戀腿癖" jp="戀腿癖" keyword=",戀腿癖,恋腿癖,"/>
<a zh_cn="猎艳" zh_tw="獵艷" jp="獵豔" keyword=",獵豔,猎艳,獵艷,"/>
<a zh_cn="乱伦" zh_tw="亂倫" jp="亂倫" keyword=",亂倫,乱伦,"/>
<a zh_cn="萝莉" zh_tw="蘿莉" jp="蘿莉塔" keyword=",蘿莉塔,萝莉塔,ロリ,"/>
<a zh_cn="裸体围裙" zh_tw="裸體圍裙" jp="裸體圍裙" keyword=",裸體圍裙,裸体围裙,真空围裙,真空圍裙,裸エプロン,"/>
<a zh_cn="旅行" zh_tw="旅行" jp="旅行" keyword=",旅行,"/>
<a zh_cn="骂倒" zh_tw="罵倒" jp="罵倒" keyword=",罵倒,骂倒,"/>
<a zh_cn="蛮横娇羞" zh_tw="蠻橫嬌羞" jp="蠻橫嬌羞" keyword=",蠻橫嬌羞,蛮横娇羞,"/>
<a zh_cn="猫耳" zh_tw="貓耳" jp="貓耳女" keyword=",貓耳女,猫耳女,"/>
<a zh_cn="美容院" zh_tw="美容院" jp="美容院" keyword=",美容院,エステ,"/>
<a zh_cn="短裙" zh_tw="短裙" jp="短裙" keyword=",短裙,"/>
<a zh_cn="美少女" zh_tw="美少女" jp="美少女" keyword=",美少女,美少女電影,美少女电影,"/>
<a zh_cn="迷你裙" zh_tw="迷你裙" jp="迷你裙" keyword=",迷你裙,ミニスカ,"/>
<a zh_cn="迷你裙警察" zh_tw="迷你裙警察" jp="迷你裙警察" keyword=",迷你裙警察,"/>
<a zh_cn="秘书" zh_tw="秘書" jp="秘書" keyword=",秘書,秘书,"/>
<a zh_cn="面试" zh_tw="面試" jp="面接" keyword=",面试,面接,面試,"/>
<a zh_cn="苗条" zh_tw="苗條" jp="苗條" keyword=",苗條,苗条,"/>
<a zh_cn="明星脸" zh_tw="明星臉" jp="明星臉" keyword=",明星臉,明星脸,"/>
<a zh_cn="模特" zh_tw="模特" jp="模特兒" keyword=",模特兒,模特儿,モデル,"/>
<a zh_cn="魔法少女" zh_tw="魔法少女" jp="魔法少女" keyword=",魔法少女,"/>
<a zh_cn="母亲" zh_tw="母親" jp="母親" keyword=",母親,母亲,妈妈系,媽媽系,お母さん,"/>
<a zh_cn="义母" zh_tw="義母" jp="母親" keyword=",义母,義母,"/>
<a zh_cn="母乳" zh_tw="母乳" jp="母乳" keyword=",母乳,"/>
<a zh_cn="女强男" zh_tw="女强男" jp="逆レイプ" keyword=",逆レイプ,女强男,"/>
<a zh_cn="养女" zh_tw="養女" jp="娘・養女" keyword=",养女,娘・養女,"/>
<a zh_cn="女大学生" zh_tw="女大學生" jp="女子大生" keyword=",女大學生,女大学生,女子大生,"/>
<a zh_cn="女祭司" zh_tw="女祭司" jp="女祭司" keyword=",女祭司,"/>
<a zh_cn="女搜查官" zh_tw="女搜查官" jp="女檢察官" keyword=",女檢察官,女检察官,女搜查官,"/>
<a zh_cn="女教师" zh_tw="女教師" jp="女教師" keyword=",女教師,女教师,"/>
<a zh_cn="女忍者" zh_tw="女忍者" jp="女忍者" keyword=",女忍者,くノ一,"/>
<a zh_cn="女上司" zh_tw="女上司" jp="女上司" keyword=",女上司,"/>
<a zh_cn="骑乘位" zh_tw="騎乘位" jp="騎乗位" keyword=",女上位,骑乘,騎乘,骑乘位,騎乘位,騎乗位,"/>
<a zh_cn="辣妹" zh_tw="辣妹" jp="辣妹" keyword=",女生,辣妹,ギャル,"/>
<a zh_cn="女同性恋" zh_tw="女同性戀" jp="女同性戀" keyword=",女同性戀,女同性恋,女同志,レズ,"/>
<a zh_cn="女王" zh_tw="女王" jp="女王様" keyword=",女王,女王様,"/>
<a zh_cn="女医生" zh_tw="女醫生" jp="女醫生" keyword=",女醫生,女医生,"/>
<a zh_cn="女仆" zh_tw="女僕" jp="メイド" keyword=",女傭,女佣,女仆,女僕,メイド,"/>
<a zh_cn="女优最佳合集" zh_tw="女優最佳合集" jp="女優ベスト・総集編" keyword=",女優ベスト・総集編,女优最佳合集,女優最佳合集,"/>
<a zh_cn="女战士" zh_tw="女戰士" jp="超級女英雄" keyword=",行動,行动,超級女英雄,女战士,女戰士,"/>
<a zh_cn="女主播" zh_tw="女主播" jp="女子アナ" keyword=",女主播,女子アナ,"/>
<a zh_cn="女主人" zh_tw="女主人" jp="老闆娘" keyword=",女主人,老闆娘,女主人,老板娘、女主人,女主人、女老板,女将・女主人,"/>
<a zh_cn="女装人妖" zh_tw="女裝人妖" jp="女裝人妖" keyword=",女裝人妖,女装人妖,"/>
<a zh_cn="呕吐" zh_tw="嘔吐" jp="嘔吐" keyword=",呕吐,嘔吐,"/>
<a zh_cn="粪便" zh_tw="糞便" jp="糞便" keyword=",排便,粪便,糞便,食糞,食粪,"/>
<a zh_cn="坦克" zh_tw="坦克" jp="胖女人" keyword=",胖女人,坦克,"/>
<a zh_cn="泡泡袜" zh_tw="泡泡襪" jp="泡泡襪" keyword=",泡泡袜,泡泡襪,"/>
<a zh_cn="泡沫浴" zh_tw="泡沫浴" jp="泡沫浴" keyword=",泡沫浴,"/>
<a zh_cn="美臀" zh_tw="美臀" jp="屁股" keyword=",美臀,屁股,"/>
<a zh_cn="平胸" zh_tw="平胸" jp="貧乳・微乳" keyword=",平胸,貧乳・微乳,"/>
<a zh_cn="丈母娘" zh_tw="丈母娘" jp="婆婆" keyword=",婆婆,后母,丈母娘,"/>
<a zh_cn="恋物癖" zh_tw="戀物癖" jp="戀物癖" keyword="戀物癖,恋物癖,其他戀物癖,其他恋物癖,"/>
<a zh_cn="其他癖好" zh_tw="其他癖好" jp="その他フェチ" keyword="其他癖好,その他フェチ,"/>
<a zh_cn="旗袍" zh_tw="旗袍" jp="旗袍" keyword=",旗袍,"/>
<a zh_cn="企画" zh_tw="企畫" jp="企畫" keyword=",企畫,企画,"/>
<a zh_cn="车震" zh_tw="車震" jp="汽車性愛" keyword=",汽車性愛,汽车性爱,车震,車震,车床族,車床族,カーセックス,"/>
<a zh_cn="大小姐" zh_tw="大小姐" jp="千金小姐" keyword=",大小姐,千金小姐,"/>
<a zh_cn="情侣" zh_tw="情侶" jp="情侶" keyword=",情侶,情侣,伴侶,伴侣,カップル,"/>
<a zh_cn="拳交" zh_tw="拳交" jp="拳交" keyword=",拳交,"/>
<a zh_cn="晒黑" zh_tw="曬黑" jp="日焼け" keyword=",曬黑,晒黑,日焼け,"/>
<a zh_cn="美乳" zh_tw="美乳" jp="美乳" keyword=",乳房,美乳,"/>
<a zh_cn="乳交" zh_tw="乳交" jp="乳交" keyword=",乳交,パイズリ,"/>
<a zh_cn="乳液" zh_tw="乳液" jp="乳液" keyword=",乳液,ローション・オイル,ローション·オイル,"/>
<a zh_cn="软体" zh_tw="軟體" jp="軟体" keyword=",软体,軟体,軟體,"/>
<a zh_cn="搔痒" zh_tw="搔癢" jp="瘙癢" keyword=",搔痒,瘙癢,搔癢,"/>
<a zh_cn="设计环节" zh_tw="設計環節" jp="設置項目" keyword=",設置項目,设计环节,設計環節,"/>
<a zh_cn="丰乳肥臀" zh_tw="豐乳肥臀" jp="身體意識" keyword=",身體意識,身体意识,丰乳肥臀,豐乳肥臀,"/>
<a zh_cn="深喉" zh_tw="深喉" jp="深喉" keyword=",深喉,"/>
<a zh_cn="时间停止" zh_tw="時間停止" jp="時間停止" keyword=",时间停止,時間停止,"/>
<a zh_cn="插入手指" zh_tw="插入手指" jp="手指插入" keyword=",手指插入,插入手指,"/>
<a zh_cn="首次亮相" zh_tw="首次亮相" jp="首次亮相" keyword=",首次亮相,"/>
<a zh_cn="叔母" zh_tw="叔母" jp="叔母さん" keyword=",叔母,叔母さん,"/>
<a zh_cn="数位马赛克" zh_tw="數位馬賽克" jp="數位馬賽克" keyword=",數位馬賽克,数位马赛克,"/>
<a zh_cn="双性人" zh_tw="雙性人" jp="雙性人" keyword=",雙性人,双性人,"/>
<a zh_cn="韵律服" zh_tw="韻律服" jp="レオタード" keyword=",韵律服,韻律服,レオタード,"/>
<a zh_cn="水手服" zh_tw="水手服" jp="セーラー服" keyword=",水手服,セーラー服,"/>
<a zh_cn="丝袜" zh_tw="絲襪" jp="絲襪" keyword=",丝袜,絲襪,パンスト,"/>
<a zh_cn="特摄" zh_tw="特攝" jp="特攝" keyword=",特效,特摄,特攝,"/>
<a zh_cn="经历告白" zh_tw="經歷告白" jp="體驗懺悔" keyword=",體驗懺悔,经历告白,經歷告白,"/>
<a zh_cn="体操服" zh_tw="體操服" jp="體育服" keyword=",体操服,體育服,體操服,"/>
<a zh_cn="舔阴" zh_tw="舔陰" jp="舔陰" keyword=",舔陰,舔阴,舔鲍,クンニ,"/>
<a zh_cn="跳蛋" zh_tw="跳蛋" jp="ローター" keyword=",跳蛋,ローター,"/>
<a zh_cn="跳舞" zh_tw="跳舞" jp="跳舞" keyword=",跳舞,"/>
<a zh_cn="青梅竹马" zh_tw="青梅竹馬" jp="童年朋友" keyword=",童年朋友,青梅竹马,青梅竹馬,"/>
<a zh_cn="偷窥" zh_tw="偷窺" jp="偷窥" keyword=",偷窺,偷窥,"/>
<a zh_cn="投稿" zh_tw="投稿" jp="投稿" keyword=",投稿,"/>
<a zh_cn="赛车女郎" zh_tw="賽車女郎" jp="レースクィーン" keyword=",賽車女郎,赛车女郎,レースクィーン,"/>
<a zh_cn="兔女郎" zh_tw="兔女郎" jp="兔女郎" keyword=",兔女郎,バニーガール,"/>
<a zh_cn="吞精" zh_tw="吞精" jp="吞精" keyword=",吞精,ごっくん,"/>
<a zh_cn="成人动画" zh_tw="成人動畫" jp="アニメ" keyword=",成人动画,成人動畫,アニメ,"/>
<a zh_cn="成人娃娃" zh_tw="成人娃娃" jp="娃娃" keyword=",娃娃,成人娃娃,"/>
<a zh_cn="玩物" zh_tw="玩物" jp="玩具" keyword=",玩具,玩物,"/>
<a zh_cn="适合手机垂直播放" zh_tw="適合手機垂直播放" jp="為智能手機推薦垂直視頻" keyword=",スマホ専用縦動画,為智能手機推薦垂直視頻,适合手机垂直播放,適合手機垂直播放,"/>
<a zh_cn="猥亵穿着" zh_tw="猥褻穿着" jp="猥褻穿著" keyword=",猥褻穿著,猥亵穿着,猥褻穿着,"/>
<a zh_cn="无码流出" zh_tw="無碼流出" jp="无码流出" keyword=",無碼流出,无码流出,"/>
<a zh_cn="无码破解" zh_tw="無碼破解" jp="無碼破解" keyword=",無碼破解,无码破解,"/>
<a zh_cn="无毛" zh_tw="無毛" jp="無毛" keyword=",無毛,无毛,剃毛,白虎,パイパン,"/>
<a zh_cn="剧情" zh_tw="劇情" jp="戲劇" keyword=",戲劇,戏剧,剧情,劇情,戲劇x,戏剧、连续剧,戲劇、連續劇,ドラマ,"/>
<a zh_cn="性转换·男变女" zh_tw="性轉換·男變女" jp="性別轉型·女性化" keyword=",性转换・女体化,性別轉型·女性化,性转换·男变女,性轉換·男變女,"/>
<a zh_cn="性奴" zh_tw="性奴" jp="性奴" keyword=",性奴,"/>
<a zh_cn="性骚扰" zh_tw="性騷擾" jp="性騷擾" keyword=",性騷擾,性骚扰,"/>
<a zh_cn="故意露胸" zh_tw="故意露胸" jp="胸チラ" keyword=",胸チラ,故意露胸,"/>
<a zh_cn="羞耻" zh_tw="羞恥" jp="羞恥" keyword=",羞恥,羞耻,"/>
<a zh_cn="学生" zh_tw="學生" jp="學生" keyword=",學生,其他學生,其他学生,學生(其他),学生,"/>
<a zh_cn="学生妹" zh_tw="學生妹" jp="學生妹" keyword=",学生妹,學生妹,女子校生,"/>
<a zh_cn="学生服" zh_tw="學生服" jp="學生服" keyword=",学生服,學生服,"/>
<a zh_cn="学生泳装" zh_tw="學生泳裝" jp="學校泳裝" keyword=",學校泳裝,学校泳装,学生泳装,學生泳裝,校园泳装,校園泳裝,競泳・スクール水着,"/>
<a zh_cn="泳装" zh_tw="泳裝" jp="水着" keyword=",泳裝,泳装,水着,"/>
<a zh_cn="校园" zh_tw="校園" jp="學校作品" keyword=",學校作品,学校作品,校园,校園,校园物语,校園物語,学園もの,"/>
<a zh_cn="肛检" zh_tw="肛檢" jp="鴨嘴" keyword=",鴨嘴,鸭嘴,肛检,肛檢,"/>
<a zh_cn="骑脸" zh_tw="騎臉" jp="顏面騎乘" keyword=",騎乗位,颜面骑乘,顏面騎乘,骑脸,騎臉,"/>
<a zh_cn="颜射" zh_tw="顏射" jp="顔射" keyword=",顏射,颜射,顏射x,顔射,"/>
<a zh_cn="眼镜" zh_tw="眼鏡" jp="眼鏡" keyword=",眼鏡,眼镜,メガネ,"/>
<a zh_cn="药物" zh_tw="藥物" jp="藥物" keyword=",藥物,药物,药物、迷姦,藥物、迷姦,ドラッグ,"/>
<a zh_cn="野外露出" zh_tw="野外露出" jp="野外・露出" keyword=",野外・露出,野外露出,野外,"/>
<a zh_cn="业余" zh_tw="業餘" jp="業餘" keyword=",業餘,业余,素人,"/>
<a zh_cn="人妻" zh_tw="人妻" jp="已婚婦女" keyword=",已婚婦女,已婚妇女,人妻,"/>
<a zh_cn="近亲相姦" zh_tw="近親相姦" jp="近親相姦" keyword=",近亲相姦,近親相姦,"/>
<a zh_cn="自拍" zh_tw="自拍" jp="ハメ撮り" keyword=",自拍,ハメ撮り,個人撮影,个人撮影,"/>
<a zh_cn="淫语" zh_tw="淫語" jp="淫語" keyword=",淫語,淫语,"/>
<a zh_cn="酒会" zh_tw="酒會" jp="飲み会・合コン" keyword=",饮酒派对,飲み会・合コン,酒会,酒會,"/>
<a zh_cn="饮尿" zh_tw="飲尿" jp="飲尿" keyword=",飲尿,饮尿,"/>
<a zh_cn="游戏改" zh_tw="遊戲改" jp="遊戲的真人版" keyword=",遊戲的真人版,游戏改,遊戲改,"/>
<a zh_cn="漫改" zh_tw="漫改" jp="原作コラボ" keyword=",原作改編,原作改编,原作コラボ,漫改,"/>
<a zh_cn="受孕" zh_tw="受孕" jp="孕ませ" keyword=",受孕,孕ませ,"/>
<a zh_cn="孕妇" zh_tw="孕婦" jp="孕婦" keyword=",孕婦,孕妇,"/>
<a zh_cn="早泄" zh_tw="早泄" jp="早漏" keyword=",早洩,早漏,早泄,"/>
<a zh_cn="Show Girl" zh_tw="Show Girl" jp="展場女孩" keyword=",展場女孩,展场女孩,Show Girl,"/>
<a zh_cn="正太控" zh_tw="正太控" jp="正太控" keyword=",正太控,"/>
<a zh_cn="制服" zh_tw="制服" jp="制服" keyword=",制服,"/>
<a zh_cn="中出" zh_tw="中出" jp="中出" keyword=",中出,中出し,"/>
<a zh_cn="子宫颈" zh_tw="子宮頸" jp="子宮頸" keyword=",子宮頸,子宫颈,"/>
<a zh_cn="足交" zh_tw="足交" jp="足交" keyword=",足交,足コキ,"/>
<a zh_cn="4小时+" zh_tw="4小時+" jp="4小時以上作品" keyword=",4小時以上作品,4小时以上作品,4小时+,4小時+,"/>
<a zh_cn="69" zh_tw="69" jp="69" keyword=",69,"/>
<a zh_cn="学生" zh_tw="學生" jp="學生" keyword=",C学生,學生,"/>
<a zh_cn="M男" zh_tw="M男" jp="M男" keyword=",M男,"/>
<a zh_cn="暗黑系" zh_tw="暗黑系" jp="暗黑系" keyword=",暗黑系,黑暗系統,"/>
<a zh_cn="成人电影" zh_tw="成人電影" jp="成人電影" keyword=",成人電影,成人电影,"/>
<a zh_cn="成人动漫" zh_tw="成人動漫" jp="成人動漫" keyword=",成人动漫,成人動漫,"/>
<a zh_cn="导尿" zh_tw="導尿" jp="導尿" keyword=",導尿,导尿,"/>
<a zh_cn="法国" zh_tw="法國" jp="法國" keyword=",法国,法國,"/>
<a zh_cn="飞特族" zh_tw="飛特族" jp="飛特族" keyword=",飛特族,飞特族,"/>
<a zh_cn="韩国" zh_tw="韓國" jp="韓國" keyword=",韓國,韩国,"/>
<a zh_cn="户外" zh_tw="戶外" jp="戶外" keyword=",戶外,户外,"/>
<a zh_cn="角色对换" zh_tw="角色對換" jp="角色對換" keyword=",角色对换,角色對換,"/>
<a zh_cn="精选综合" zh_tw="精選綜合" jp="合集" keyword=",精選,綜合,精选、综合,合集,精选综合,精選綜合,"/>
<a zh_cn="捆绑" zh_tw="捆綁" jp="捆綁" keyword=",捆綁,捆绑,折磨,"/>
<a zh_cn="礼仪小姐" zh_tw="禮儀小姐" jp="禮儀小姐" keyword=",禮儀小姐,礼仪小姐,"/>
<a zh_cn="历史剧" zh_tw="歷史劇" jp="歷史劇" keyword=",歷史劇,历史剧,"/>
<a zh_cn="露出" zh_tw="露出" jp="露出" keyword=",露出,"/>
<a zh_cn="母狗" zh_tw="母狗" jp="母狗" keyword=",母犬,母狗,"/>
<a zh_cn="男优介绍" zh_tw="男優介紹" jp="男優介紹" keyword=",男性,男优介绍,男優介紹,"/>
<a zh_cn="女儿" zh_tw="女兒" jp="女兒" keyword=",女兒,女儿,"/>
<a zh_cn="全裸" zh_tw="全裸" jp="全裸" keyword=",全裸,"/>
<a zh_cn="窥乳" zh_tw="窺乳" jp="窺乳" keyword=",乳房偷窺,窥乳,窺乳,"/>
<a zh_cn="羞辱" zh_tw="羞辱" jp="辱め" keyword=",凌辱,羞辱,辱め,辱骂,辱罵,"/>
<a zh_cn="脱衣" zh_tw="脫衣" jp="脫衣" keyword=",脫衣,脱衣,"/>
<a zh_cn="西洋片" zh_tw="西洋片" jp="西洋片" keyword=",西洋片,"/>
<a zh_cn="写真偶像" zh_tw="寫真偶像" jp="寫真偶像" keyword=",寫真偶像,写真偶像,"/>
<a zh_cn="修女" zh_tw="修女" jp="修女" keyword=",修女,"/>
<a zh_cn="偶像艺人" zh_tw="偶像藝人" jp="アイドル芸能人" keyword=",藝人,艺人,偶像,偶像藝人,偶像艺人,偶像‧藝人,偶像‧艺人,アイドル・芸能人,"/>
<a zh_cn="淫乱真实" zh_tw="淫亂真實" jp="淫亂真實" keyword=",淫亂,真實,淫乱、真实,淫乱真实,淫亂真實,淫乱・ハード系,"/>
<a zh_cn="瑜伽·健身" zh_tw="瑜伽·健身" jp="瑜伽·健身" keyword=",瑜伽,瑜伽·健身,ヨガ,講師,讲师"/>
<a zh_cn="运动短裤" zh_tw="運動短褲" jp="運動短褲" keyword=",運動短褲,运动短裤,"/>
<a zh_cn="JK制服" zh_tw="JK制服" jp="JK制服" keyword=",制服外套,JK制服,校服,"/>
<a zh_cn="重制版" zh_tw="重製版" jp="複刻版" keyword=",重印版,複刻版,重制版,重製版,"/>
<a zh_cn="综合短篇" zh_tw="綜合短篇" jp="綜合短篇" keyword=",綜合短篇,综合短篇,"/>
<a zh_cn="被外国人干" zh_tw="被外國人乾" jp="被外國人乾" keyword=",被外國人幹,被外国人干,被外國人乾,"/>
<a zh_cn="二穴同入" zh_tw="二穴同入" jp="二穴同入" keyword=",二穴同時挿入,二穴同入,"/>
<a zh_cn="美脚" zh_tw="美腳" jp="美腳" keyword=",美腳,美脚,"/>
<a zh_cn="过膝袜" zh_tw="過膝襪" jp="過膝襪" keyword=",絲襪、過膝襪,过膝袜,"/>
<a zh_cn="名人" zh_tw="名人" jp="名人" keyword=",名人,"/>
<a zh_cn="黑白配" zh_tw="黑白配" jp="黑白配" keyword=",黑白配,"/>
<a zh_cn="欲女" zh_tw="欲女" jp="エマニエル" keyword=",エマニエル,欲女,"/>
<a zh_cn="高筒靴" zh_tw="高筒靴" jp="高筒靴" keyword=",靴子,高筒靴,"/>
<a zh_cn="双飞" zh_tw="雙飛" jp="雙飛" keyword=",兩女一男,双飞,雙飛,"/>
<a zh_cn="两女两男" zh_tw="兩女兩男" jp="兩女兩男" keyword=",兩男兩女,两女两男,兩女兩男,"/>
<a zh_cn="两男一女" zh_tw="兩男一女" jp="兩男一女" keyword=",兩男一女,两男一女,"/>
<a zh_cn="3P" zh_tw="3P" jp="3P" keyword=",3P,3p,P,p,"/>
<a zh_cn="唾液敷面" zh_tw="唾液敷面" jp="唾液敷面" keyword=",唾液敷面,"/>
<a zh_cn="kira☆kira" zh_tw="kira☆kira" jp="kira☆kira" keyword=",kira☆kira,"/>
<a zh_cn="S1 NO.1 STYLE" zh_tw="S1 NO.1 STYLE" jp="S1 NO.1 STYLE" keyword=",S1 Style,エスワン,エスワン ナンバーワンスタイル,エスワンナンバーワンスタイル,S1 NO.1 STYLE,S1NO.1STYLE,"/>
<a zh_cn="一本道" zh_tw="一本道" jp="一本道" keyword=",一本道,"/>
<a zh_cn="加勒比" zh_tw="加勒比" jp="加勒比" keyword=",加勒比,カリビアンコム,"/>
<a zh_cn="东京热" zh_tw="東京熱" jp="TOKYO-HOT" keyword=",东京热,東京熱,東熱,TOKYO-HOT,"/>
<a zh_cn="SOD" zh_tw="SOD" jp="SOD" keyword=",SOD,SODクリエイト,"/>
<a zh_cn="PRESTIGE" zh_tw="PRESTIGE" jp="PRESTIGE" keyword=",PRESTIGE,プレステージ,"/>
<a zh_cn="MOODYZ" zh_tw="MOODYZ" jp="MOODYZ" keyword=",MOODYZ,ムーディーズ,"/>
<a zh_cn="ROCKET" zh_tw="ROCKET" jp="ROCKET" keyword=",ROCKET,"/>
<a zh_cn="S级素人" zh_tw="S級素人" jp="S級素人" keyword=",S級素人,アイデアポケット,"/>
<a zh_cn="HEYZO" zh_tw="HEYZO" jp="HEYZO" keyword=",HEYZO,"/>
<a zh_cn="玛丹娜" zh_tw="瑪丹娜" jp="Madonna" keyword=",玛丹娜,瑪丹娜,マドンナ,Madonna,"/>
<a zh_cn="MAXING" zh_tw="MAXING" jp="MAXING" keyword=",MAXING,マキシング,"/>
<a zh_cn="JAPANKET" zh_tw="ALICE JAPAN" jp="ALICE JAPAN" keyword=",ALICE JAPAN,アリスJAPAN,"/>
<a zh_cn="E-BODY" zh_tw="E-BODY" jp="E-BODY" keyword=",E-BODY,"/>
<a zh_cn="Natural High" zh_tw="Natural High" jp="Natural High" keyword=",Natural High,ナチュラルハイ,"/>
<a zh_cn="美" zh_tw="美" jp="美" keyword=",美,"/>
<a zh_cn="K.M.P" zh_tw="K.M.P" jp="K.M.P" keyword=",K.M.P,ケイ・エム・プロデュース,"/>
<a zh_cn="Hunter" zh_tw="Hunter" jp="Hunter" keyword=",Hunter,"/>
<a zh_cn="OPPAI" zh_tw="OPPAI" jp="OPPAI" keyword=",OPPAI,"/>
<a zh_cn="熘池五郎" zh_tw="溜池五郎" jp="溜池ゴロー" keyword=",熘池五郎,溜池五郎,溜池ゴロー,"/>
<a zh_cn="kawaii" zh_tw="kawaii" jp="kawaii" keyword=",kawaii,"/>
<a zh_cn="PREMIUM" zh_tw="PREMIUM" jp="PREMIUM" keyword=",PREMIUM,プレミアム,"/>
<a zh_cn="ヤル男" zh_tw="ヤル男" jp="ヤル男" keyword=",ヤル男,"/>
<a zh_cn="ラグジュTV" zh_tw="ラグジュTV" jp="ラグジュTV" keyword=",ラグジュTV,"/>
<a zh_cn="シロウトTV" zh_tw="シロウトTV" jp="シロウトTV" keyword=",シロウトTV,"/>
<a zh_cn="本中" zh_tw="本中" jp="本中" keyword=",本中,"/>
<a zh_cn="WANZ" zh_tw="WANZ" jp="WANZ" keyword=",WANZ,ワンズファクトリー,"/>
<a zh_cn="BeFree" zh_tw="BeFree" jp="BeFree" keyword=",BeFree,"/>
<a zh_cn="MAX-A" zh_tw="MAX-A" jp="MAX-A" keyword=",MAX-A,マックスエー,"/>
</info>

724
Movie_Data_Capture.py Normal file
View File

@@ -0,0 +1,724 @@
import argparse
import json
import os
import random
import re
import sys
import time
import shutil
import typing
import urllib3
import signal
import platform
import config
from datetime import datetime, timedelta
from lxml import etree
from pathlib import Path
from opencc import OpenCC
from scraper import get_data_from_json
from ADC_function import file_modification_days, get_html, parallel_download_files
from number_parser import get_number
from core import core_main, core_main_no_net_op, moveFailedFolder, debug_print
def check_update(local_version):
htmlcode = get_html("https://api.github.com/repos/yoshiko2/Movie_Data_Capture/releases/latest")
data = json.loads(htmlcode)
remote = int(data["tag_name"].replace(".", ""))
local_version = int(local_version.replace(".", ""))
if local_version < remote:
print("[*]" + ("* New update " + str(data["tag_name"]) + " *").center(54))
print("[*]" + "↓ Download ↓".center(54))
print("[*]https://github.com/yoshiko2/Movie_Data_Capture/releases")
print("[*]======================================================")
def argparse_function(ver: str) -> typing.Tuple[str, str, str, str, bool, bool, str, str]:
conf = config.getInstance()
parser = argparse.ArgumentParser(epilog=f"Load Config file '{conf.ini_path}'.")
parser.add_argument("file", default='', nargs='?', help="Single Movie file path.")
parser.add_argument("-p", "--path", default='', nargs='?', help="Analysis folder path.")
parser.add_argument("-m", "--main-mode", default='', nargs='?',
help="Main mode. 1:Scraping 2:Organizing 3:Scraping in analysis folder")
parser.add_argument("-n", "--number", default='', nargs='?', help="Custom file number of single movie file.")
# parser.add_argument("-C", "--config", default='config.ini', nargs='?', help="The config file Path.")
parser.add_argument("-L", "--link-mode", default='', nargs='?',
help="Create movie file link. 0:moving movie file, do not create link 1:soft link 2:try hard link first")
default_logdir = str(Path.home() / '.mlogs')
parser.add_argument("-o", "--log-dir", dest='logdir', default=default_logdir, nargs='?',
help=f"""Duplicate stdout and stderr to logfiles in logging folder, default on.
default folder for current user: '{default_logdir}'. Change default folder to an empty file,
or use --log-dir= to turn log off.""")
parser.add_argument("-q", "--regex-query", dest='regexstr', default='', nargs='?',
help="python re module regex filepath filtering.")
parser.add_argument("-d", "--nfo-skip-days", dest='days', default='', nargs='?',
help="Override nfo_skip_days value in config.")
parser.add_argument("-c", "--stop-counter", dest='cnt', default='', nargs='?',
help="Override stop_counter value in config.")
parser.add_argument("-R", "--rerun-delay", dest='delaytm', default='', nargs='?',
help="Delay (eg. 1h10m30s or 60 (second)) time and rerun, until all movies proceed. Note: stop_counter value in config or -c must none zero.")
parser.add_argument("-i", "--ignore-failed-list", action="store_true", help="Ignore failed list '{}'".format(
os.path.join(os.path.abspath(conf.failed_folder()), 'failed_list.txt')))
parser.add_argument("-a", "--auto-exit", action="store_true",
help="Auto exit after program complete")
parser.add_argument("-g", "--debug", action="store_true",
help="Turn on debug mode to generate diagnostic log for issue report.")
parser.add_argument("-N", "--no-network-operation", action="store_true",
help="No network query, do not get metadata, for cover cropping purposes, only takes effect when main mode is 3.")
parser.add_argument("-w", "--website", dest='site', default='', nargs='?',
help="Override [priority]website= in config.")
parser.add_argument("-D", "--download-images", dest='dnimg', action="store_true",
help="Override [common]download_only_missing_images=0 force invoke image downloading.")
parser.add_argument("-C", "--config-override", dest='cfgcmd', action='append', nargs=1,
help="Common use config override. Grammar: section:key=value[;[section:]key=value] eg. 'de:s=1' or 'debug_mode:switch=1' override[debug_mode]switch=1 Note:this parameters can be used multiple times")
parser.add_argument("-z", "--zero-operation", dest='zero_op', action="store_true",
help="""Only show job list of files and numbers, and **NO** actual operation
is performed. It may help you correct wrong numbers before real job.""")
parser.add_argument("-v", "--version", action="version", version=ver)
parser.add_argument("-s", "--search", default='', nargs='?', help="Search number")
parser.add_argument("-ss", "--specified-source", default='', nargs='?', help="specified Source.")
parser.add_argument("-su", "--specified-url", default='', nargs='?', help="specified Url.")
args = parser.parse_args()
def set_natural_number_or_none(sk, value):
if isinstance(value, str) and value.isnumeric() and int(value) >= 0:
conf.set_override(f'{sk}={value}')
def set_str_or_none(sk, value):
if isinstance(value, str) and len(value):
conf.set_override(f'{sk}={value}')
def set_bool_or_none(sk, value):
if isinstance(value, bool) and value:
conf.set_override(f'{sk}=1')
set_natural_number_or_none("common:main_mode", args.main_mode)
set_natural_number_or_none("common:link_mode", args.link_mode)
set_str_or_none("common:source_folder", args.path)
set_bool_or_none("common:auto_exit", args.auto_exit)
set_natural_number_or_none("common:nfo_skip_days", args.days)
set_natural_number_or_none("advenced_sleep:stop_counter", args.cnt)
set_bool_or_none("common:ignore_failed_list", args.ignore_failed_list)
set_str_or_none("advenced_sleep:rerun_delay", args.delaytm)
set_str_or_none("priority:website", args.site)
if isinstance(args.dnimg, bool) and args.dnimg:
conf.set_override("common:download_only_missing_images=0")
set_bool_or_none("debug_mode:switch", args.debug)
if isinstance(args.cfgcmd, list):
for cmd in args.cfgcmd:
conf.set_override(cmd[0])
no_net_op = False
if conf.main_mode() == 3:
no_net_op = args.no_network_operation
if no_net_op:
conf.set_override("advenced_sleep:stop_counter=0;advenced_sleep:rerun_delay=0s;face:aways_imagecut=1")
return args.file, args.number, args.logdir, args.regexstr, args.zero_op, no_net_op, args.search, args.specified_source, args.specified_url
class OutLogger(object):
def __init__(self, logfile) -> None:
self.term = sys.stdout
self.log = open(logfile, "w", encoding='utf-8', buffering=1)
self.filepath = logfile
def __del__(self):
self.close()
def __enter__(self):
pass
def __exit__(self, *args):
self.close()
def write(self, msg):
self.term.write(msg)
self.log.write(msg)
def flush(self):
if 'flush' in dir(self.term):
self.term.flush()
if 'flush' in dir(self.log):
self.log.flush()
if 'fileno' in dir(self.log):
os.fsync(self.log.fileno())
def close(self):
if self.term is not None:
sys.stdout = self.term
self.term = None
if self.log is not None:
self.log.close()
self.log = None
class ErrLogger(OutLogger):
def __init__(self, logfile) -> None:
self.term = sys.stderr
self.log = open(logfile, "w", encoding='utf-8', buffering=1)
self.filepath = logfile
def close(self):
if self.term is not None:
sys.stderr = self.term
self.term = None
if self.log is not None:
self.log.close()
self.log = None
def dupe_stdout_to_logfile(logdir: str):
if not isinstance(logdir, str) or len(logdir) == 0:
return
log_dir = Path(logdir)
if not log_dir.exists():
try:
log_dir.mkdir(parents=True, exist_ok=True)
except:
pass
if not log_dir.is_dir():
return # Tips for disabling logs by change directory to a same name empty regular file
abslog_dir = log_dir.resolve()
log_tmstr = datetime.now().strftime("%Y%m%dT%H%M%S")
logfile = abslog_dir / f'mdc_{log_tmstr}.txt'
errlog = abslog_dir / f'mdc_{log_tmstr}_err.txt'
sys.stdout = OutLogger(logfile)
sys.stderr = ErrLogger(errlog)
def close_logfile(logdir: str):
if not isinstance(logdir, str) or len(logdir) == 0 or not os.path.isdir(logdir):
return
# 日志关闭前保存日志路径
filepath = None
try:
filepath = sys.stdout.filepath
except:
pass
sys.stdout.close()
sys.stderr.close()
log_dir = Path(logdir).resolve()
if isinstance(filepath, Path):
print(f"Log file '{filepath}' saved.")
assert (filepath.parent.samefile(log_dir))
# 清理空文件
for f in log_dir.glob(r'*_err.txt'):
if f.stat().st_size == 0:
try:
f.unlink(missing_ok=True)
except:
pass
# 合并日志 只检测日志目录内的文本日志,忽略子目录。三天前的日志,按日合并为单个日志,三个月前的日志,
# 按月合并为单个月志去年及以前的月志今年4月以后将之按年合并为年志
# 测试步骤:
"""
LOGDIR=/tmp/mlog
mkdir -p $LOGDIR
for f in {2016..2020}{01..12}{01..28};do;echo $f>$LOGDIR/mdc_${f}T235959.txt;done
for f in {01..09}{01..28};do;echo 2021$f>$LOGDIR/mdc_2021${f}T235959.txt;done
for f in {00..23};do;echo 20211001T$f>$LOGDIR/mdc_20211001T${f}5959.txt;done
echo "$(ls -1 $LOGDIR|wc -l) files in $LOGDIR"
# 1932 files in /tmp/mlog
mdc -zgic1 -d0 -m3 -o $LOGDIR
# python3 ./Movie_Data_Capture.py -zgic1 -o $LOGDIR
ls $LOGDIR
# rm -rf $LOGDIR
"""
today = datetime.today()
# 第一步合并到日。3天前的日志文件名是同一天的合并为一份日志
for i in range(1):
txts = [f for f in log_dir.glob(r'*.txt') if re.match(r'^mdc_\d{8}T\d{6}$', f.stem, re.A)]
if not txts or not len(txts):
break
e = [f for f in txts if '_err' in f.stem]
txts.sort()
tmstr_3_days_ago = (today.replace(hour=0) - timedelta(days=3)).strftime("%Y%m%dT99")
deadline_day = f'mdc_{tmstr_3_days_ago}'
day_merge = [f for f in txts if f.stem < deadline_day]
if not day_merge or not len(day_merge):
break
cutday = len('T235959.txt') # cut length mdc_20201201|T235959.txt
for f in day_merge:
try:
day_file_name = str(f)[:-cutday] + '.txt' # mdc_20201201.txt
with open(day_file_name, 'a', encoding='utf-8') as m:
m.write(f.read_text(encoding='utf-8'))
f.unlink(missing_ok=True)
except:
pass
# 第二步,合并到月
for i in range(1): # 利用1次循环的break跳到第二步避免大块if缩进或者使用goto语法
txts = [f for f in log_dir.glob(r'*.txt') if re.match(r'^mdc_\d{8}$', f.stem, re.A)]
if not txts or not len(txts):
break
txts.sort()
tmstr_3_month_ago = (today.replace(day=1) - timedelta(days=3 * 30)).strftime("%Y%m32")
deadline_month = f'mdc_{tmstr_3_month_ago}'
month_merge = [f for f in txts if f.stem < deadline_month]
if not month_merge or not len(month_merge):
break
tomonth = len('01.txt') # cut length mdc_202012|01.txt
for f in month_merge:
try:
month_file_name = str(f)[:-tomonth] + '.txt' # mdc_202012.txt
with open(month_file_name, 'a', encoding='utf-8') as m:
m.write(f.read_text(encoding='utf-8'))
f.unlink(missing_ok=True)
except:
pass
# 第三步,月合并到年
for i in range(1):
if today.month < 4:
break
mons = [f for f in log_dir.glob(r'*.txt') if re.match(r'^mdc_\d{6}$', f.stem, re.A)]
if not mons or not len(mons):
break
mons.sort()
deadline_year = f'mdc_{today.year - 1}13'
year_merge = [f for f in mons if f.stem < deadline_year]
if not year_merge or not len(year_merge):
break
toyear = len('12.txt') # cut length mdc_2020|12.txt
for f in year_merge:
try:
year_file_name = str(f)[:-toyear] + '.txt' # mdc_2020.txt
with open(year_file_name, 'a', encoding='utf-8') as y:
y.write(f.read_text(encoding='utf-8'))
f.unlink(missing_ok=True)
except:
pass
# 第四步,压缩年志 如果有压缩需求请自行手工压缩或者使用外部脚本来定时完成。推荐nongnu的lzip对于
# 这种粒度的文本日志压缩比是目前最好的。lzip -9的运行参数下日志压缩比要高于xz -9而且内存占用更少
# 多核利用率更高(plzip多线程版本)解压速度更快。压缩后的大小差不多是未压缩时的2.4%到3.7%左右,
# 100MB的日志文件能缩小到3.7MB。
return filepath
def signal_handler(*args):
print('[!]Ctrl+C detected, Exit.')
os._exit(9)
def sigdebug_handler(*args):
conf = config.getInstance()
conf.set_override(f"debug_mode:switch={int(not conf.debug())}")
print(f"[!]Debug {('oFF', 'On')[int(conf.debug())]}")
# 新增失败文件列表跳过处理,及.nfo修改天数跳过处理提示跳过视频总数调试模式(-g)下详细被跳过文件,跳过小广告
def movie_lists(source_folder, regexstr: str) -> typing.List[str]:
conf = config.getInstance()
main_mode = conf.main_mode()
debug = conf.debug()
nfo_skip_days = conf.nfo_skip_days()
link_mode = conf.link_mode()
file_type = conf.media_type().lower().split(",")
trailerRE = re.compile(r'-trailer\.', re.IGNORECASE)
cliRE = None
if isinstance(regexstr, str) and len(regexstr):
try:
cliRE = re.compile(regexstr, re.IGNORECASE)
except:
pass
failed_list_txt_path = Path(conf.failed_folder()).resolve() / 'failed_list.txt'
failed_set = set()
if (main_mode == 3 or link_mode) and not conf.ignore_failed_list():
try:
flist = failed_list_txt_path.read_text(encoding='utf-8').splitlines()
failed_set = set(flist)
if len(flist) != len(failed_set): # 检查去重并写回但是不改变failed_list.txt内条目的先后次序重复的只保留最后的
fset = failed_set.copy()
for i in range(len(flist) - 1, -1, -1):
fset.remove(flist[i]) if flist[i] in fset else flist.pop(i)
failed_list_txt_path.write_text('\n'.join(flist) + '\n', encoding='utf-8')
assert len(fset) == 0 and len(flist) == len(failed_set)
except:
pass
if not Path(source_folder).is_dir():
print('[-]Source folder not found!')
return []
total = []
source = Path(source_folder).resolve()
skip_failed_cnt, skip_nfo_days_cnt = 0, 0
escape_folder_set = set(re.split("[,]", conf.escape_folder()))
for full_name in source.glob(r'**/*'):
if main_mode != 3 and set(full_name.parent.parts) & escape_folder_set:
continue
if not full_name.is_file():
continue
if not full_name.suffix.lower() in file_type:
continue
absf = str(full_name)
if absf in failed_set:
skip_failed_cnt += 1
if debug:
print('[!]Skip failed movie:', absf)
continue
is_sym = full_name.is_symlink()
if main_mode != 3 and (is_sym or (full_name.stat().st_nlink > 1 and not conf.scan_hardlink())): # 短路布尔 符号链接不取stat(),因为符号链接可能指向不存在目标
continue # 模式不等于3下跳过软连接和未配置硬链接刮削
# 调试用0字节样本允许通过去除小于120MB的广告'苍老师强力推荐.mp4'(102.2MB)'黑道总裁.mp4'(98.4MB)'有趣的妹子激情表演.MP4'(95MB)'有趣的臺灣妹妹直播.mp4'(15.1MB)
movie_size = 0 if is_sym else full_name.stat().st_size # 同上 符号链接不取stat()及st_size直接赋0跳过小视频检测
# if 0 < movie_size < 125829120: # 1024*1024*120=125829120
# continue
if cliRE and not cliRE.search(absf) or trailerRE.search(full_name.name):
continue
if main_mode == 3:
nfo = full_name.with_suffix('.nfo')
if not nfo.is_file():
if debug:
print(f"[!]Metadata {nfo.name} not found for '{absf}'")
elif nfo_skip_days > 0 and file_modification_days(nfo) <= nfo_skip_days:
skip_nfo_days_cnt += 1
if debug:
print(f"[!]Skip movie by it's .nfo which modified within {nfo_skip_days} days: '{absf}'")
continue
total.append(absf)
if skip_failed_cnt:
print(f"[!]Skip {skip_failed_cnt} movies in failed list '{failed_list_txt_path}'.")
if skip_nfo_days_cnt:
print(
f"[!]Skip {skip_nfo_days_cnt} movies in source folder '{source}' who's .nfo modified within {nfo_skip_days} days.")
if nfo_skip_days <= 0 or not link_mode or main_mode == 3:
return total
# 软连接方式,已经成功削刮的也需要从成功目录中检查.nfo更新天数跳过N天内更新过的
skip_numbers = set()
success_folder = Path(conf.success_folder()).resolve()
for f in success_folder.glob(r'**/*'):
if not re.match(r'\.nfo$', f.suffix, re.IGNORECASE):
continue
if file_modification_days(f) > nfo_skip_days:
continue
number = get_number(False, f.stem)
if not number:
continue
skip_numbers.add(number.lower())
rm_list = []
for f in total:
n_number = get_number(False, os.path.basename(f))
if n_number and n_number.lower() in skip_numbers:
rm_list.append(f)
for f in rm_list:
total.remove(f)
if debug:
print(f"[!]Skip file successfully processed within {nfo_skip_days} days: '{f}'")
if len(rm_list):
print(
f"[!]Skip {len(rm_list)} movies in success folder '{success_folder}' who's .nfo modified within {nfo_skip_days} days.")
return total
def create_failed_folder(failed_folder: str):
"""
新建failed文件夹
"""
if not os.path.exists(failed_folder):
try:
os.makedirs(failed_folder)
except:
print(f"[-]Fatal error! Can not make folder '{failed_folder}'")
os._exit(0)
def rm_empty_folder(path):
abspath = os.path.abspath(path)
deleted = set()
for current_dir, subdirs, files in os.walk(abspath, topdown=False):
try:
still_has_subdirs = any(_ for subdir in subdirs if os.path.join(current_dir, subdir) not in deleted)
if not any(files) and not still_has_subdirs and not os.path.samefile(path, current_dir):
os.rmdir(current_dir)
deleted.add(current_dir)
print('[+]Deleting empty folder', current_dir)
except:
pass
def create_data_and_move(movie_path: str, zero_op: bool, no_net_op: bool, oCC):
# Normalized number, eg: 111xxx-222.mp4 -> xxx-222.mp4
debug = config.getInstance().debug()
n_number = get_number(debug, os.path.basename(movie_path))
movie_path = os.path.abspath(movie_path)
if debug is True:
print(f"[!] [{n_number}] As Number Processing for '{movie_path}'")
if zero_op:
return
if n_number:
if no_net_op:
core_main_no_net_op(movie_path, n_number)
else:
core_main(movie_path, n_number, oCC)
else:
print("[-] number empty ERROR")
moveFailedFolder(movie_path)
print("[*]======================================================")
else:
try:
print(f"[!] [{n_number}] As Number Processing for '{movie_path}'")
if zero_op:
return
if n_number:
if no_net_op:
core_main_no_net_op(movie_path, n_number)
else:
core_main(movie_path, n_number, oCC)
else:
raise ValueError("number empty")
print("[*]======================================================")
except Exception as err:
print(f"[-] [{movie_path}] ERROR:")
print('[-]', err)
try:
moveFailedFolder(movie_path)
except Exception as err:
print('[!]', err)
def create_data_and_move_with_custom_number(file_path: str, custom_number, oCC, specified_source, specified_url):
conf = config.getInstance()
file_name = os.path.basename(file_path)
try:
print("[!] [{1}] As Number Processing for '{0}'".format(file_path, custom_number))
if custom_number:
core_main(file_path, custom_number, oCC, specified_source, specified_url)
else:
print("[-] number empty ERROR")
print("[*]======================================================")
except Exception as err:
print("[-] [{}] ERROR:".format(file_path))
print('[-]', err)
if conf.link_mode():
print("[-]Link {} to failed folder".format(file_path))
os.symlink(file_path, os.path.join(conf.failed_folder(), file_name))
else:
try:
print("[-]Move [{}] to failed folder".format(file_path))
shutil.move(file_path, os.path.join(conf.failed_folder(), file_name))
except Exception as err:
print('[!]', err)
def main(args: tuple) -> Path:
(single_file_path, custom_number, logdir, regexstr, zero_op, no_net_op, search, specified_source,
specified_url) = args
conf = config.getInstance()
main_mode = conf.main_mode()
folder_path = ""
if main_mode not in (1, 2, 3):
print(f"[-]Main mode must be 1 or 2 or 3! You can run '{os.path.basename(sys.argv[0])} --help' for more help.")
os._exit(4)
signal.signal(signal.SIGINT, signal_handler)
if sys.platform == 'win32':
signal.signal(signal.SIGBREAK, sigdebug_handler)
else:
signal.signal(signal.SIGWINCH, sigdebug_handler)
dupe_stdout_to_logfile(logdir)
platform_total = str(
' - ' + platform.platform() + ' \n[*] - ' + platform.machine() + ' - Python-' + platform.python_version())
print('[*]================= Movie Data Capture =================')
print('[*]' + version.center(54))
print('[*]======================================================')
print('[*]' + platform_total)
print('[*]======================================================')
print('[*] - 严禁在墙内宣传本项目 - ')
print('[*]======================================================')
start_time = time.time()
print('[+]Start at', time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"[+]Load Config file '{conf.ini_path}'.")
if conf.debug():
print('[+]Enable debug')
if conf.link_mode() in (1, 2):
print('[!]Enable {} link'.format(('soft', 'hard')[conf.link_mode() - 1]))
if len(sys.argv) > 1:
print('[!]CmdLine:', " ".join(sys.argv[1:]))
print('[+]Main Working mode ## {}: {} ## {}{}{}'
.format(*(main_mode, ['Scraping', 'Organizing', 'Scraping in analysis folder'][main_mode - 1],
"" if not conf.multi_threading() else ", multi_threading on",
"" if conf.nfo_skip_days() == 0 else f", nfo_skip_days={conf.nfo_skip_days()}",
"" if conf.stop_counter() == 0 else f", stop_counter={conf.stop_counter()}"
) if not single_file_path else ('-', 'Single File', '', '', ''))
)
if conf.update_check():
try:
check_update(version)
# Download Mapping Table, parallel version
def fmd(f) -> typing.Tuple[str, Path]:
return ('https://raw.githubusercontent.com/yoshiko2/Movie_Data_Capture/master/MappingTable/' + f,
Path.home() / '.local' / 'share' / 'mdc' / f)
map_tab = (fmd('mapping_actor.xml'), fmd('mapping_info.xml'), fmd('c_number.json'))
for k, v in map_tab:
if v.exists():
if file_modification_days(str(v)) >= conf.mapping_table_validity():
print("[+]Mapping Table Out of date! Remove", str(v))
os.remove(str(v))
res = parallel_download_files(((k, v) for k, v in map_tab if not v.exists()))
for i, fp in enumerate(res, start=1):
if fp and len(fp):
print(f"[+] [{i}/{len(res)}] Mapping Table Downloaded to {fp}")
else:
print(f"[-] [{i}/{len(res)}] Mapping Table Download failed")
except:
print("[!]" + " WARNING ".center(54, "="))
print('[!]' + '-- GITHUB CONNECTION FAILED --'.center(54))
print('[!]' + 'Failed to check for updates'.center(54))
print('[!]' + '& update the mapping table'.center(54))
print("[!]" + "".center(54, "="))
try:
etree.parse(str(Path.home() / '.local' / 'share' / 'mdc' / 'mapping_actor.xml'))
except:
print('[!]' + "Failed to load mapping table".center(54))
print('[!]' + "".center(54, "="))
create_failed_folder(conf.failed_folder())
# create OpenCC converter
ccm = conf.cc_convert_mode()
try:
oCC = None if ccm == 0 else OpenCC('t2s.json' if ccm == 1 else 's2t.json')
except:
# some OS no OpenCC cpython, try opencc-python-reimplemented.
# pip uninstall opencc && pip install opencc-python-reimplemented
oCC = None if ccm == 0 else OpenCC('t2s' if ccm == 1 else 's2t')
if not search == '':
search_list = search.split(",")
for i in search_list:
json_data = get_data_from_json(i, oCC, None, None)
debug_print(json_data)
time.sleep(int(config.getInstance().sleep()))
os._exit(0)
if not single_file_path == '': # Single File
print('[+]==================== Single File =====================')
if custom_number == '':
create_data_and_move_with_custom_number(single_file_path,
get_number(conf.debug(), os.path.basename(single_file_path)), oCC,
specified_source, specified_url)
else:
create_data_and_move_with_custom_number(single_file_path, custom_number, oCC,
specified_source, specified_url)
else:
folder_path = conf.source_folder()
if not isinstance(folder_path, str) or folder_path == '':
folder_path = os.path.abspath(".")
movie_list = movie_lists(folder_path, regexstr)
count = 0
count_all = str(len(movie_list))
print('[+]Find', count_all, 'movies.')
print('[*]======================================================')
stop_count = conf.stop_counter()
if stop_count < 1:
stop_count = 999999
else:
count_all = str(min(len(movie_list), stop_count))
for movie_path in movie_list: # 遍历电影列表 交给core处理
count = count + 1
percentage = str(count / int(count_all) * 100)[:4] + '%'
print('[!] {:>30}{:>21}'.format('- ' + percentage + ' [' + str(count) + '/' + count_all + '] -',
time.strftime("%H:%M:%S")))
create_data_and_move(movie_path, zero_op, no_net_op, oCC)
if count >= stop_count:
print("[!]Stop counter triggered!")
break
sleep_seconds = random.randint(conf.sleep(), conf.sleep() + 2)
time.sleep(sleep_seconds)
if conf.del_empty_folder() and not zero_op:
rm_empty_folder(conf.success_folder())
rm_empty_folder(conf.failed_folder())
if len(folder_path):
rm_empty_folder(folder_path)
end_time = time.time()
total_time = str(timedelta(seconds=end_time - start_time))
print("[+]Running time", total_time[:len(total_time) if total_time.rfind('.') < 0 else -3],
" End at", time.strftime("%Y-%m-%d %H:%M:%S"))
print("[+]All finished!!!")
return close_logfile(logdir)
def 分析日志文件(logfile):
try:
if not (isinstance(logfile, Path) and logfile.is_file()):
raise FileNotFoundError('log file not found')
logtxt = logfile.read_text(encoding='utf-8')
扫描电影数 = int(re.findall(r'\[\+]Find (.*) movies\.', logtxt)[0])
已处理 = int(re.findall(r'\[1/(.*?)] -', logtxt)[0])
完成数 = logtxt.count(r'[+]Wrote!')
return 扫描电影数, 已处理, 完成数
except:
return None, None, None
def period(delta, pattern):
d = {'d': delta.days}
d['h'], rem = divmod(delta.seconds, 3600)
d['m'], d['s'] = divmod(rem, 60)
return pattern.format(**d)
if __name__ == '__main__':
version = '6.6.7'
urllib3.disable_warnings() # Ignore http proxy warning
app_start = time.time()
# Read config.ini first, in argparse_function() need conf.failed_folder()
conf = config.getInstance()
# Parse command line args
args = tuple(argparse_function(version))
再运行延迟 = conf.rerun_delay()
if 再运行延迟 > 0 and conf.stop_counter() > 0:
while True:
try:
logfile = main(args)
(扫描电影数, 已处理, 完成数) = 分析结果元组 = tuple(分析日志文件(logfile))
if all(isinstance(v, int) for v in 分析结果元组):
剩余个数 = 扫描电影数 - 已处理
总用时 = timedelta(seconds = time.time() - app_start)
print(f'All movies:{扫描电影数} processed:{已处理} successes:{完成数} remain:{剩余个数}' +
' Elapsed time {}'.format(
period(总用时, "{d} day {h}:{m:02}:{s:02}") if 总用时.days == 1
else period(总用时, "{d} days {h}:{m:02}:{s:02}") if 总用时.days > 1
else period(总用时, "{h}:{m:02}:{s:02}")))
if 剩余个数 == 0:
break
下次运行 = datetime.now() + timedelta(seconds=再运行延迟)
print(f'Next run time: {下次运行.strftime("%H:%M:%S")}, rerun_delay={再运行延迟}, press Ctrl+C stop run.')
time.sleep(再运行延迟)
else:
break
except:
break
else:
main(args)
if not conf.auto_exit():
if sys.platform == 'win32':
input("Press enter key exit, you can check the error message before you exit...")

104
README.md
View File

@@ -1,77 +1,49 @@
# 日本AV元数据抓取工具 (刮削器)
<h1 align="center">Movie Data Capture</h1>
## 关于本软件 ~路star谢谢
[English](https://github.com/yoshiko2/Movie_Data_Capture/blob/master/README_EN.md)
**#0.5重大更新新增对FC2,259LUXU,SIRO,300MAAN系列影片抓取支持,优化对无码视频抓取**
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/github/release/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
[Docker 版本](https://github.com/vergilgao/docker-mdc)
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/github/release/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
目前我下的AV越来越多也意味着AV要集中地管理形成媒体库。现在有两款主流的AV元数据获取器"EverAver"和"Javhelper"。前者的优点是元数据获取比较全,缺点是不能批量处理;后者优点是可以批量处理,但是元数据不够全。
**本地电影元数据 抓取工具 | 刮削器**,配合本地影片管理软件 Emby, Jellyfin, Kodi 等管理本地影片该软件起到分类与元数据metadata抓取作用利用元数据信息来分类仅供本地影片分类整理使用。
为此综合上述软件特点我写出了本软件为了方便的管理本地AV和更好的手冲体验。没女朋友怎么办ʅ(‾◡◝)ʃ
**严禁在墙内的社交平台上宣传此项目**
**预计本周末适配DS Video暂时只支持Kodi,EMBY**
* [官方Twitter](https://twitter.com/mdac_official)
**tg官方电报群:https://t.me/AV_Data_Capture_Official**
# 文档
* [官方教程WIKI](https://github.com/yoshiko2/Movie_Data_Capture/wiki)
* [VergilGao's Docker部署](https://github.com/VergilGao/docker-mdc)
### **请认真阅读下面使用说明再使用** * [如何使用](#如何使用)
# 申明
当你查阅、下载了本项目源代码或二进制程序,即代表你接受了以下条款
* 本项目和项目成果仅供技术学术交流和Python3性能测试使用
* 用户必须确保获取影片的途径在用户当地是合法的
* 运行时和运行后所获取的元数据和封面图片等数据的版权,归版权持有人持有
* 本项目贡献者编写该项目旨在学习Python3 ,提高编程水平
* 本项目不提供任何影片下载的线索
* 请勿提供运行时和运行后获取的数据提供给可能有非法目的的第三方,例如用于非法交易、侵犯未成年人的权利等
* 用户仅能在自己的私人计算机或者测试环境中使用该工具,禁止将获取到的数据用于商业目的或其他目的,如销售、传播等
* 用户在使用本项目和项目成果前,请用户了解并遵守当地法律法规,如果本项目及项目成果使用过程中存在违反当地法律法规的行为,请勿使用该项目及项目成果
* 法律后果及使用后果由使用者承担
* [GPL LICENSE](https://github.com/yoshiko2/Movie_Data_Capture/blob/master/LICENSE)
* 若用户不同意上述条款任意一条,请勿使用本项目和项目成果
![](https://i.loli.net/2019/06/02/5cf2b5d0bbecf69019.png)
## 软件流程图
![](https://i.loli.net/2019/06/02/5cf2bb9a9e2d997635.png)
## 如何使用
### **请认真阅读下面使用说明**
**release的程序可脱离python环境运行可跳过第一步仅限windows平台)**
**下载地址(Windows):https://github.com/wenead99/AV_Data_Capture/releases**
1. 请安装requests,pyquery,lxml,Beautifulsoup4,pillow模块,可在CMD逐条输入以下命令安装
```python
pip install requests
```
###
```python
pip install pyquery
```
###
```python
pip install lxml
```
###
```python
pip install Beautifulsoup4
```
###
```python
pip install pillow
```
2. 你的AV在被软件管理前最好命名为番号:
```
COSQ-004.mp4
```
或者
```
COSQ_004.mp4
```
文件名中间要有下划线或者减号"_","-",没有多余的内容只有番号为最佳,可以让软件更好获取元数据
对于多影片重命名可以用ReNamer来批量重命名
软件官网:http://www.den4b.com/products/renamer
![](https://i.loli.net/2019/06/02/5cf2b5cfbfe1070559.png)
3. 把软件拷贝到AV的所在目录下运行程序中国大陆用户必须挂VPNShsadowsocks开全局代理
4. 运行AV_Data_capture.py
5. **你也可以把单个影片拖动到core程序**
![](https://i.loli.net/2019/06/02/5cf2b5d03640e73201.gif)
6. 软件会自动把元数据获取成功的电影移动到JAV_output文件夹中根据女优分类失败的电影移动到failed文件夹中。
7. 把JAV_output文件夹导入到EMBY,KODI中根据封面选片子享受手冲乐趣
![](https://i.loli.net/2019/06/02/5cf2b5cfd1b0226763.png)
![](https://i.loli.net/2019/06/02/5cf2b5cfd1b0246492.png)
![](https://i.loli.net/2019/06/02/5cf2b5d009e4930666.png)
# 下载
* [Releases](https://github.com/yoshiko2/Movie_Data_Capture/releases/latest)
# 贡献者
[![](https://opencollective.com/movie_data_capture/contributors.svg?width=890)](https://github.com/yoshiko2/movie_data_Capture/graphs/contributors)
# 友情链接
* [CloudDrive](https://www.clouddrive2.com/)
# Star History
[![Star History Chart](https://api.star-history.com/svg?repos=yoshiko2/Movie_Data_Capture&type=Date)](https://star-history.com/#yoshiko2/Movie_Data_Capture&Date)

49
README_EN.md Normal file
View File

@@ -0,0 +1,49 @@
<h1 align="center">Movie Data Capture</h1>
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/github/release/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
[Docker Edition](https://github.com/VergilGao/docker-mdc)
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/github/release/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
**Movie Metadata Scraper**, with local JAV management software Emby, Jellyfin, Kodi, etc. to manage local movies,
the project plays the role of classification and metadata (metadata) grabbing, using metadata information to classify, only for local movie classification and organization.
[中文 | Chinese](https://github.com/yoshiko2/Movie_Data_Capture/blob/master/README.md)
# Documents
* [Official WIKI](https://github.com/yoshiko2/Movie_Data_Capture/wiki/English)
* [VergilGao's Docker Edition](https://github.com/VergilGao/docker-mdc)
# NOTICE
When you view and download the source code or binary program of this project, it means that you have accepted the following terms:
* **You must be over 18 years old, or leave the page immediately.**
* This project and its results are for technical, academic exchange and Python3 performance testing purposes only.
* The contributors to this project have written this project to learn Python3 and improve programming.
* This project does not provide any movie download trail.
* Legal consequences and the consequences of use are borne by the user.
* [GPL LICENSE](https://github.com/yoshiko2/Movie_Data_Capture/blob/master/LICENSE)
* If you do not agree to any of the above terms, please do not use the project and project results.
# Download
* [Releases](https://github.com/yoshiko2/Movie_Data_Capture/releases/latest)
# Contributors
[![](https://opencollective.com/movie_data_capture/contributors.svg?width=890)](https://github.com/yoshiko2/movie_data_Capture/graphs/contributors)
# Sponsor
I am a college student and currently have high living and tuition costs and I want to reduce my financial dependence on my family.
If the program helps you, you can sponsor by:
## Crypto
* USDT TRC-20: `TCVvFxeMuHFaECVMiHrxWD9b5QGX8DVQNV`
* BTC: `3MyXrRyKbCG6mrB3KiWoYnifsPWNCiprwe`
For new functions and new feature requirements, all can be customized for a fee by communicating in the above way. And you can give me work to do.
Thanks!

38
README_ZH.md Normal file
View File

@@ -0,0 +1,38 @@
<h1 align="center">Movie Data Capture</h1>
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/github/release/yoshiko2/Movie_data_capture.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
[Docker 版本](https://github.com/yoshiko2/docker-mdc)
![](https://img.shields.io/badge/build-passing-brightgreen.svg?style=flat)
![](https://img.shields.io/github/license/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/github/release/VergilGao/docker-mdc.svg?style=flat)
![](https://img.shields.io/badge/Python-3.9-yellow.svg?style=flat&logo=python)<br>
**本地电影元数据 抓取工具 | 刮削器**,配合本地影片管理软件 Emby, Jellyfin, Kodi 等管理本地影片该软件起到分类与元数据metadata抓取作用利用元数据信息来分类仅供本地影片分类整理使用。
### 请勿在墙内的社交平台上宣传此项目
# 文档
* [官方教程WIKI](https://github.com/yoshiko2/Movie_Data_Capture/wiki)
* [VergilGao's Docker部署](https://github.com/VergilGao/docker-mdc)
# 申明
当你查阅、下载了本项目源代码或二进制程序,即代表你接受了以下条款
* 本项目和项目成果仅供技术学术交流和Python3性能测试使用
* 本项目贡献者编写该项目旨在学习Python3 ,提高编程水平
* 本项目不提供任何影片下载的线索
* 用户在使用本项目和项目成果前,请用户了解并遵守当地法律法规,如果本项目及项目成果使用过程中存在违反当地法律法规的行为,请勿使用该项目及项目成果
* 法律后果及使用后果由使用者承担
* [GPL LICENSE](https://github.com/yoshiko2/Movie_Data_Capture/blob/master/LICENSE)
* 若用户不同意上述条款任意一条,请勿使用本项目和项目成果
# 下载
* [Releases](https://github.com/yoshiko2/Movie_Data_Capture/releases/latest)
# 贡献者
[![](https://opencollective.com/movie_data_capture/contributors.svg?width=890)](https://github.com/yoshiko2/movie_data_Capture/graphs/contributors)
# Star History
[![Star History Chart](https://api.star-history.com/svg?repos=yoshiko2/Movie_Data_Capture&type=Date)](https://star-history.com/#yoshiko2/Movie_Data_Capture&Date)

156
config.ini Executable file
View File

@@ -0,0 +1,156 @@
# 详细教程请看
# - https://github.com/yoshiko2/Movie_Data_Capture/wiki/%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6
[common]
main_mode = 1
source_folder = ./
failed_output_folder = failed
success_output_folder = JAV_output
link_mode = 0
; 0: 不刮削硬链接文件 1: 刮削硬链接文件
scan_hardlink = 0
failed_move = 0
auto_exit = 0
translate_to_sc = 0
multi_threading = 0
;actor_gender value: female(♀) or male(♂) or both(♀ ♂) or all(♂ ♀ ⚧)
actor_gender = female
del_empty_folder = 1
; 跳过最近(默认:30)天新修改过的.NFO可避免整理模式(main_mode=3)和软连接(soft_link=0)时
; 反复刮削靠前的视频文件0为处理所有视频文件
nfo_skip_days = 30
ignore_failed_list = 0
download_only_missing_images = 1
mapping_table_validity = 7
; 一些jellyfin中特有的设置 (0:不开启, 1开启) 比如
; 在jellyfin中tags和genres重复因此可以只需保存genres到nfo中
; jellyfin中只需要保存thumb不需要保存fanart
jellyfin = 0
; 开启后tag和genere只显示演员
actor_only_tag = 0
sleep = 3
anonymous_fill = 1
[advenced_sleep]
; 处理完多少个视频文件后停止0为处理所有视频文件
stop_counter = 0
; 再运行延迟时间单位h时m分s秒 举例: 1h30m45s(1小时30分45秒) 45(45秒)
; stop_counter不为零的条件下才有效每处理stop_counter部影片后延迟rerun_delay秒再次运行
rerun_delay = 0
; 以上参数配合使用可以以多次少量的方式刮削或整理数千个文件而不触发翻译或元数据站封禁
[proxy]
;proxytype: http or socks5 or socks5h switch: 0 1
switch = 0
type = socks5h
proxy = 127.0.0.1:1080
timeout = 20
retry = 3
cacert_file =
[Name_Rule]
location_rule = actor+"/"+number
naming_rule = number+"-"+title
max_title_len = 50
; 刮削后图片是否命名为番号
image_naming_with_number = 0
; 番号大写 1 | 0, 仅在写入数据时会进行大写转换, 搜索刮削流程则不影响
number_uppercase = 0
; 自定义正则表达式, 多个正则使用空格隔开, 第一个分组为提取的番号, 若自定义正则未能匹配到番号则使用默认规则
; example: ([A-Za-z]{2,4}\-\d{3}) ([A-Za-z]{2,4}00\d{3})
number_regexs =
[update]
update_check = 1
[priority]
website = javbus,airav,fanza,xcity,mgstage,avsox,jav321,madou,javday,javmenu,javdb
[escape]
literals = \()/
folders = failed,JAV_output
[debug_mode]
switch = 0
[translate]
switch = 0
; engine: google-free,azure,deeplx
engine = google-free
; en_us fr_fr de_de... (only google-free now)
target_language = zh_cn
; Azure translate API key
key =
; Translate delay, Bigger Better
delay = 3
; title,outline,actor,tag
values = title,outline
; Google translate site, or Deeplx site
service_site = translate.google.com
; 预告片
[trailer]
switch = 0
[uncensored]
uncensored_prefix = PT-,S2M,BT,LAF,SMD,SMBD,SM3D2DBD,SKY-,SKYHD,CWP,CWDV,CWBD,CW3D2DBD,MKD,MKBD,MXBD,MK3D2DBD,MCB3DBD,MCBD,RHJ,MMDV
[media]
media_type = .mp4,.avi,.rmvb,.wmv,.mov,.mkv,.flv,.ts,.webm,.iso,.mpg,.m4v
sub_type = .smi,.srt,.idx,.sub,.sup,.psb,.ssa,.ass,.usf,.xss,.ssf,.rt,.lrc,.sbv,.vtt,.ttml
; 水印
[watermark]
switch = 1
water = 2
; 左上 0, 右上 1, 右下 2 左下 3
; 剧照
[extrafanart]
switch = 1
parallel_download = 5
extrafanart_folder = extrafanart
; 剧情简介
[storyline]
switch = 1
; website为javbus javdb avsox xcity carib时site censored_site uncensored_site 为获取剧情简介信息的
; 可选数据源站点列表。列表内站点同时并发查询,取值优先级由冒号前的序号决定,从小到大,数字小的站点没数据才会采用后面站点获得的。
; 其中airavwiki airav avno1 58avgo是中文剧情简介区别是airav只能查有码avno1 airavwiki 有码无码都能查,
; 58avgo只能查无码或者流出破解马赛克的影片(此功能没使用)。
; xcity和amazon是日语的由于amazon商城没有番号信息选中对应DVD的准确率仅99.6%。如果三个列表全部为空则不查询,
; 设置成不查询可大幅提高刮削速度。
; site=
site = airav,avno1,airavwiki
censored_site = airav,avno1,xcity,amazon
uncensored_site = 58avgo
; 运行模式0:顺序执行(最慢) 1:线程池(默认值) 2:进程池(启动开销比线程池大,并发站点越多越快)
run_mode = 1
; show_result剧情简介调试信息 0关闭 1简略 2详细(详细部分不记入日志)剧情简介失效时可打开2查看原因
show_result = 0
; 繁简转换 繁简转换模式mode=0:不转换 1:繁转简 2:简转繁
[cc_convert]
mode = 1
vars = outline,series,studio,tag,title
[javdb]
sites = 521
; 人脸识别 locations_model=hog:方向梯度直方图(不太准确,速度快) cnn:深度学习模型(准确需要GPU/CUDA,速度慢)
; uncensored_only=0:对全部封面进行人脸识别 1:只识别无码封面,有码封面直接切右半部分
; aways_imagecut=0:按各网站默认行为 1:总是裁剪封面,开启此项将无视[common]download_only_missing_images=1总是覆盖封面
; 封面裁剪的宽高比可配置公式为aspect_ratio/3。默认aspect_ratio=2.12: 适配大部分有码影片封面前一版本默认为2/3即aspect_ratio=2
[face]
locations_model = hog
uncensored_only = 1
aways_imagecut = 0
aspect_ratio = 2.12
[jellyfin]
multi_part_fanart = 0
[actor_photo]
download_for_kodi = 0
[direct]
switch = 1

648
config.py Normal file
View File

@@ -0,0 +1,648 @@
import os
import re
import sys
import configparser
import time
import typing
from pathlib import Path
G_conf_override = {
# index 0 save Config() first instance for quick access by using getInstance()
0: None,
# register override config items
# no need anymore
}
def getInstance():
if isinstance(G_conf_override[0], Config):
return G_conf_override[0]
return Config()
class Config:
def __init__(self, path: str = "config.ini"):
path_search_order = (
Path(path),
Path.cwd() / "config.ini",
Path.home() / "mdc.ini",
Path.home() / ".mdc.ini",
Path.home() / ".mdc/config.ini",
Path.home() / ".config/mdc/config.ini"
)
ini_path = None
for p in path_search_order:
if p.is_file():
ini_path = p.resolve()
break
if ini_path:
self.conf = configparser.ConfigParser()
self.ini_path = ini_path
try:
if self.conf.read(ini_path, encoding="utf-8-sig"):
if G_conf_override[0] is None:
G_conf_override[0] = self
except UnicodeDecodeError:
if self.conf.read(ini_path, encoding="utf-8"):
if G_conf_override[0] is None:
G_conf_override[0] = self
except Exception as e:
print("ERROR: Config file can not read!")
print("读取配置文件出错!")
print('=================================')
print(e)
print("======= Auto exit in 60s ======== ")
time.sleep(60)
os._exit(-1)
else:
print("ERROR: Config file not found!")
print("Please put config file into one of the following path:")
print('\n'.join([str(p.resolve()) for p in path_search_order[2:]]))
# 对于找不到配置文件的情况,还是在打包时附上对应版本的默认配置文件,有需要时为其在搜索路径中生成,
# 要比用户乱找一个版本不对应的配置文件会可靠些。这样一来,单个执行文件就是功能完整的了,放在任何
# 执行路径下都可以放心使用。
res_path = None
# pyinstaller打包的在打包中找config.ini
if hasattr(sys, '_MEIPASS') and (Path(getattr(sys, '_MEIPASS')) / 'config.ini').is_file():
res_path = Path(getattr(sys, '_MEIPASS')) / 'config.ini'
# 脚本运行的所在位置找
elif (Path(__file__).resolve().parent / 'config.ini').is_file():
res_path = Path(__file__).resolve().parent / 'config.ini'
if res_path is None:
os._exit(2)
ins = input("Or, Do you want me create a config file for you? (Yes/No)[Y]:")
if re.search('n', ins, re.I):
os._exit(2)
# 用户目录才确定具有写权限,因此选择 ~/mdc.ini 作为配置文件生成路径,而不是有可能并没有写权限的
# 当前目录。目前版本也不再鼓励使用当前路径放置配置文件了,只是作为多配置文件的切换技巧保留。
write_path = path_search_order[2] # Path.home() / "mdc.ini"
write_path.write_text(res_path.read_text(encoding='utf-8'), encoding='utf-8')
print("Config file '{}' created.".format(write_path.resolve()))
input("Press Enter key exit...")
os._exit(0)
# self.conf = self._default_config()
# try:
# self.conf = configparser.ConfigParser()
# try: # From single crawler debug use only
# self.conf.read('../' + path, encoding="utf-8-sig")
# except:
# self.conf.read('../' + path, encoding="utf-8")
# except Exception as e:
# print("[-]Config file not found! Use the default settings")
# print("[-]",e)
# os._exit(3)
# #self.conf = self._default_config()
def set_override(self, option_cmd: str):
"""
通用的参数覆盖选项 -C 配置覆盖串
配置覆盖串语法:小节名:键名=值[;[小节名:]键名=值][;[小节名:]键名+=值] 多个键用分号分隔 名称可省略部分尾部字符
或 小节名:键名+=值[;[小节名:]键名=值][;[小节名:]键名+=值] 在已有值的末尾追加内容,多个键的=和+=可以交叉出现
例子: face:aspect_ratio=2;aways_imagecut=1;priority:website=javdb
小节名必须出现在开头至少一次,分号后可只出现键名=值,不再出现小节名,如果后续全部键名都属于同一个小节
例如配置文件存在两个小节[proxy][priority]那么pro可指代proxypri可指代priority
[face] ;face小节下方有4个键名locations_model= uncensored_only= aways_imagecut= aspect_ratio=
l,lo,loc,loca,locat,locati...直到locations_model完整名称都可以用来指代locations_model=键名
u,un,unc...直到uncensored_only完整名称都可以用来指代uncensored_only=键名
aw,awa...直到aways_imagecut完整名称都可以用来指代aways_imagecut=键名
as,asp...aspect_ratio完整名称都可以用来指代aspect_ratio=键名
a则因为二义性不是合法的省略键名
"""
def err_exit(str):
print(str)
os._exit(2)
sections = self.conf.sections()
sec_name = None
for cmd in option_cmd.split(';'):
syntax_err = True
rex = re.findall(r'^(.*?):(.*?)(=|\+=)(.*)$', cmd, re.U)
if len(rex) and len(rex[0]) == 4:
(sec, key, assign, val) = rex[0]
sec_lo = sec.lower().strip()
key_lo = key.lower().strip()
syntax_err = False
elif sec_name: # 已经出现过一次小节名,属于同一个小节的后续键名可以省略小节名
rex = re.findall(r'^(.*?)(=|\+=)(.*)$', cmd, re.U)
if len(rex) and len(rex[0]) == 3:
(key, assign, val) = rex[0]
sec_lo = sec_name.lower()
key_lo = key.lower().strip()
syntax_err = False
if syntax_err:
err_exit(f"[-]Config override syntax incorrect. example: 'd:s=1' or 'debug_mode:switch=1'. cmd='{cmd}' all='{option_cmd}'")
if not len(sec_lo):
err_exit(f"[-]Config override Section name '{sec}' is empty! cmd='{cmd}'")
if not len(key_lo):
err_exit(f"[-]Config override Key name '{key}' is empty! cmd='{cmd}'")
if not len(val.strip()):
print(f"[!]Conig overide value '{val}' is empty! cmd='{cmd}'")
sec_name = None
for s in sections:
if not s.lower().startswith(sec_lo):
continue
if sec_name:
err_exit(f"[-]Conig overide Section short name '{sec_lo}' is not unique! dup1='{sec_name}' dup2='{s}' cmd='{cmd}'")
sec_name = s
if sec_name is None:
err_exit(f"[-]Conig overide Section name '{sec}' not found! cmd='{cmd}'")
key_name = None
keys = self.conf[sec_name]
for k in keys:
if not k.lower().startswith(key_lo):
continue
if key_name:
err_exit(f"[-]Conig overide Key short name '{key_lo}' is not unique! dup1='{key_name}' dup2='{k}' cmd='{cmd}'")
key_name = k
if key_name is None:
err_exit(f"[-]Conig overide Key name '{key}' not found! cmd='{cmd}'")
if assign == "+=":
val = keys[key_name] + val
if self.debug():
print(f"[!]Set config override [{sec_name}]{key_name}={val} by cmd='{cmd}'")
self.conf.set(sec_name, key_name, val)
def main_mode(self) -> int:
try:
return self.conf.getint("common", "main_mode")
except ValueError:
self._exit("common:main_mode")
def source_folder(self) -> str:
return self.conf.get("common", "source_folder").replace("\\\\", "/").replace("\\", "/")
def failed_folder(self) -> str:
return self.conf.get("common", "failed_output_folder").replace("\\\\", "/").replace("\\", "/")
def success_folder(self) -> str:
return self.conf.get("common", "success_output_folder").replace("\\\\", "/").replace("\\", "/")
def actor_gender(self) -> str:
return self.conf.get("common", "actor_gender")
def link_mode(self) -> int:
return self.conf.getint("common", "link_mode")
def scan_hardlink(self) -> bool:
return self.conf.getboolean("common", "scan_hardlink", fallback=False)#未找到配置选项,默认不刮削
def failed_move(self) -> bool:
return self.conf.getboolean("common", "failed_move")
def auto_exit(self) -> bool:
return self.conf.getboolean("common", "auto_exit")
def translate_to_sc(self) -> bool:
return self.conf.getboolean("common", "translate_to_sc")
def multi_threading(self) -> bool:
return self.conf.getboolean("common", "multi_threading")
def del_empty_folder(self) -> bool:
return self.conf.getboolean("common", "del_empty_folder")
def nfo_skip_days(self) -> int:
return self.conf.getint("common", "nfo_skip_days", fallback=30)
def ignore_failed_list(self) -> bool:
return self.conf.getboolean("common", "ignore_failed_list")
def download_only_missing_images(self) -> bool:
return self.conf.getboolean("common", "download_only_missing_images")
def mapping_table_validity(self) -> int:
return self.conf.getint("common", "mapping_table_validity")
def jellyfin(self) -> int:
return self.conf.getint("common", "jellyfin")
def actor_only_tag(self) -> bool:
return self.conf.getboolean("common", "actor_only_tag")
def sleep(self) -> int:
return self.conf.getint("common", "sleep")
def anonymous_fill(self) -> bool:
return self.conf.getint("common", "anonymous_fill")
def stop_counter(self) -> int:
return self.conf.getint("advenced_sleep", "stop_counter", fallback=0)
def rerun_delay(self) -> int:
value = self.conf.get("advenced_sleep", "rerun_delay")
if not (isinstance(value, str) and re.match(r'^[\dsmh]+$', value, re.I)):
return 0 # not match '1h30m45s' or '30' or '1s2m1h4s5m'
if value.isnumeric() and int(value) >= 0:
return int(value)
sec = 0
sec += sum(int(v) for v in re.findall(r'(\d+)s', value, re.I))
sec += sum(int(v) for v in re.findall(r'(\d+)m', value, re.I)) * 60
sec += sum(int(v) for v in re.findall(r'(\d+)h', value, re.I)) * 3600
return sec
def is_translate(self) -> bool:
return self.conf.getboolean("translate", "switch")
def is_trailer(self) -> bool:
return self.conf.getboolean("trailer", "switch")
def is_watermark(self) -> bool:
return self.conf.getboolean("watermark", "switch")
def is_extrafanart(self) -> bool:
return self.conf.getboolean("extrafanart", "switch")
def extrafanart_thread_pool_download(self) -> int:
try:
v = self.conf.getint("extrafanart", "parallel_download")
return v if v >= 0 else 5
except:
return 5
def watermark_type(self) -> int:
return int(self.conf.get("watermark", "water"))
def get_uncensored(self):
try:
sec = "uncensored"
uncensored_prefix = self.conf.get(sec, "uncensored_prefix")
# uncensored_poster = self.conf.get(sec, "uncensored_poster")
return uncensored_prefix
except ValueError:
self._exit("uncensored")
def get_extrafanart(self):
try:
extrafanart_download = self.conf.get("extrafanart", "extrafanart_folder")
return extrafanart_download
except ValueError:
self._exit("extrafanart_folder")
def get_translate_engine(self) -> str:
return self.conf.get("translate", "engine")
def get_target_language(self) -> str:
return self.conf.get("translate", "target_language")
# def get_translate_appId(self) ->str:
# return self.conf.get("translate","appid")
def get_translate_key(self) -> str:
return self.conf.get("translate", "key")
def get_translate_delay(self) -> int:
return self.conf.getint("translate", "delay")
def translate_values(self) -> str:
return self.conf.get("translate", "values")
def get_translate_service_site(self) -> str:
return self.conf.get("translate", "service_site")
def proxy(self):
try:
sec = "proxy"
switch = self.conf.get(sec, "switch")
proxy = self.conf.get(sec, "proxy")
timeout = self.conf.getint(sec, "timeout")
retry = self.conf.getint(sec, "retry")
proxytype = self.conf.get(sec, "type")
iniProxy = IniProxy(switch, proxy, timeout, retry, proxytype)
return iniProxy
except ValueError:
self._exit("common")
def cacert_file(self) -> str:
return self.conf.get('proxy', 'cacert_file')
def media_type(self) -> str:
return self.conf.get('media', 'media_type')
def sub_rule(self) -> typing.Set[str]:
return set(self.conf.get('media', 'sub_type').lower().split(','))
def naming_rule(self) -> str:
return self.conf.get("Name_Rule", "naming_rule")
def location_rule(self) -> str:
return self.conf.get("Name_Rule", "location_rule")
def max_title_len(self) -> int:
"""
Maximum title length
"""
try:
return self.conf.getint("Name_Rule", "max_title_len")
except:
return 50
def image_naming_with_number(self) -> bool:
try:
return self.conf.getboolean("Name_Rule", "image_naming_with_number")
except:
return False
def number_uppercase(self) -> bool:
try:
return self.conf.getboolean("Name_Rule", "number_uppercase")
except:
return False
def number_regexs(self) -> str:
try:
return self.conf.get("Name_Rule", "number_regexs")
except:
return ""
def update_check(self) -> bool:
try:
return self.conf.getboolean("update", "update_check")
except ValueError:
self._exit("update:update_check")
def sources(self) -> str:
return self.conf.get("priority", "website")
def escape_literals(self) -> str:
return self.conf.get("escape", "literals")
def escape_folder(self) -> str:
return self.conf.get("escape", "folders")
def debug(self) -> bool:
return self.conf.getboolean("debug_mode", "switch")
def get_direct(self) -> bool:
return self.conf.getboolean("direct", "switch")
def is_storyline(self) -> bool:
try:
return self.conf.getboolean("storyline", "switch")
except:
return True
def storyline_site(self) -> str:
try:
return self.conf.get("storyline", "site")
except:
return "1:avno1,4:airavwiki"
def storyline_censored_site(self) -> str:
try:
return self.conf.get("storyline", "censored_site")
except:
return "2:airav,5:xcity,6:amazon"
def storyline_uncensored_site(self) -> str:
try:
return self.conf.get("storyline", "uncensored_site")
except:
return "3:58avgo"
def storyline_show(self) -> int:
v = self.conf.getint("storyline", "show_result", fallback=0)
return v if v in (0, 1, 2) else 2 if v > 2 else 0
def storyline_mode(self) -> int:
return 1 if self.conf.getint("storyline", "run_mode", fallback=1) > 0 else 0
def cc_convert_mode(self) -> int:
v = self.conf.getint("cc_convert", "mode", fallback=1)
return v if v in (0, 1, 2) else 2 if v > 2 else 0
def cc_convert_vars(self) -> str:
return self.conf.get("cc_convert", "vars",
fallback="actor,director,label,outline,series,studio,tag,title")
def javdb_sites(self) -> str:
return self.conf.get("javdb", "sites", fallback="38,39")
def face_locations_model(self) -> str:
return self.conf.get("face", "locations_model", fallback="hog")
def face_uncensored_only(self) -> bool:
return self.conf.getboolean("face", "uncensored_only", fallback=True)
def face_aways_imagecut(self) -> bool:
return self.conf.getboolean("face", "aways_imagecut", fallback=False)
def face_aspect_ratio(self) -> float:
return self.conf.getfloat("face", "aspect_ratio", fallback=2.12)
def jellyfin_multi_part_fanart(self) -> bool:
return self.conf.getboolean("jellyfin", "multi_part_fanart", fallback=False)
def download_actor_photo_for_kodi(self) -> bool:
return self.conf.getboolean("actor_photo", "download_for_kodi", fallback=False)
@staticmethod
def _exit(sec: str) -> None:
print("[-] Read config error! Please check the {} section in config.ini", sec)
input("[-] Press ENTER key to exit.")
exit()
@staticmethod
def _default_config() -> configparser.ConfigParser:
conf = configparser.ConfigParser()
sec1 = "common"
conf.add_section(sec1)
conf.set(sec1, "main_mode", "1")
conf.set(sec1, "source_folder", "./")
conf.set(sec1, "failed_output_folder", "failed")
conf.set(sec1, "success_output_folder", "JAV_output")
conf.set(sec1, "link_mode", "0")
conf.set(sec1, "scan_hardlink", "0")
conf.set(sec1, "failed_move", "1")
conf.set(sec1, "auto_exit", "0")
conf.set(sec1, "translate_to_sc", "1")
# actor_gender value: female or male or both or all(含人妖)
conf.set(sec1, "actor_gender", "female")
conf.set(sec1, "del_empty_folder", "1")
conf.set(sec1, "nfo_skip_days", "30")
conf.set(sec1, "ignore_failed_list", "0")
conf.set(sec1, "download_only_missing_images", "1")
conf.set(sec1, "mapping_table_validity", "7")
conf.set(sec1, "jellyfin", "0")
conf.set(sec1, "actor_only_tag", "0")
conf.set(sec1, "sleep", "3")
conf.set(sec1, "anonymous_fill", "0")
sec2 = "advenced_sleep"
conf.add_section(sec2)
conf.set(sec2, "stop_counter", "0")
conf.set(sec2, "rerun_delay", "0")
sec3 = "proxy"
conf.add_section(sec3)
conf.set(sec3, "proxy", "")
conf.set(sec3, "timeout", "5")
conf.set(sec3, "retry", "3")
conf.set(sec3, "type", "socks5")
conf.set(sec3, "cacert_file", "")
sec4 = "Name_Rule"
conf.add_section(sec4)
conf.set(sec4, "location_rule", "actor + '/' + number")
conf.set(sec4, "naming_rule", "number + '-' + title")
conf.set(sec4, "max_title_len", "50")
conf.set(sec4, "image_naming_with_number", "0")
conf.set(sec4, "number_uppercase", "0")
conf.set(sec4, "number_regexs", "")
sec5 = "update"
conf.add_section(sec5)
conf.set(sec5, "update_check", "1")
sec6 = "priority"
conf.add_section(sec6)
conf.set(sec6, "website", "airav,javbus,javdb,fanza,xcity,mgstage,fc2,fc2club,avsox,jav321,xcity")
sec7 = "escape"
conf.add_section(sec7)
conf.set(sec7, "literals", "\()/") # noqa
conf.set(sec7, "folders", "failed, JAV_output")
sec8 = "debug_mode"
conf.add_section(sec8)
conf.set(sec8, "switch", "0")
sec9 = "translate"
conf.add_section(sec9)
conf.set(sec9, "switch", "0")
conf.set(sec9, "engine", "google-free")
conf.set(sec9, "target_language", "zh_cn")
# conf.set(sec8, "appid", "")
conf.set(sec9, "key", "")
conf.set(sec9, "delay", "1")
conf.set(sec9, "values", "title,outline")
conf.set(sec9, "service_site", "translate.google.cn")
sec10 = "trailer"
conf.add_section(sec10)
conf.set(sec10, "switch", "0")
sec11 = "uncensored"
conf.add_section(sec11)
conf.set(sec11, "uncensored_prefix", "S2M,BT,LAF,SMD")
sec12 = "media"
conf.add_section(sec12)
conf.set(sec12, "media_type",
".mp4,.avi,.rmvb,.wmv,.mov,.mkv,.flv,.ts,.webm,iso")
conf.set(sec12, "sub_type",
".smi,.srt,.idx,.sub,.sup,.psb,.ssa,.ass,.usf,.xss,.ssf,.rt,.lrc,.sbv,.vtt,.ttml")
sec13 = "watermark"
conf.add_section(sec13)
conf.set(sec13, "switch", "1")
conf.set(sec13, "water", "2")
sec14 = "extrafanart"
conf.add_section(sec14)
conf.set(sec14, "switch", "1")
conf.set(sec14, "extrafanart_folder", "extrafanart")
conf.set(sec14, "parallel_download", "1")
sec15 = "storyline"
conf.add_section(sec15)
conf.set(sec15, "switch", "1")
conf.set(sec15, "site", "1:avno1,4:airavwiki")
conf.set(sec15, "censored_site", "2:airav,5:xcity,6:amazon")
conf.set(sec15, "uncensored_site", "3:58avgo")
conf.set(sec15, "show_result", "0")
conf.set(sec15, "run_mode", "1")
conf.set(sec15, "cc_convert", "1")
sec16 = "cc_convert"
conf.add_section(sec16)
conf.set(sec16, "mode", "1")
conf.set(sec16, "vars", "actor,director,label,outline,series,studio,tag,title")
sec17 = "javdb"
conf.add_section(sec17)
conf.set(sec17, "sites", "33,34")
sec18 = "face"
conf.add_section(sec18)
conf.set(sec18, "locations_model", "hog")
conf.set(sec18, "uncensored_only", "1")
conf.set(sec18, "aways_imagecut", "0")
conf.set(sec18, "aspect_ratio", "2.12")
sec19 = "jellyfin"
conf.add_section(sec19)
conf.set(sec19, "multi_part_fanart", "0")
sec20 = "actor_photo"
conf.add_section(sec20)
conf.set(sec20, "download_for_kodi", "0")
return conf
class IniProxy():
""" Proxy Config from .ini
"""
SUPPORT_PROXY_TYPE = ("http", "socks5", "socks5h")
enable = False
address = ""
timeout = 5
retry = 3
proxytype = "socks5"
def __init__(self, switch, address, timeout, retry, proxytype) -> None:
""" Initial Proxy from .ini
"""
if switch == '1' or switch == 1:
self.enable = True
self.address = address
self.timeout = timeout
self.retry = retry
self.proxytype = proxytype
def proxies(self):
"""
获得代理参数默认http代理
get proxy params, use http proxy for default
"""
if self.address:
if self.proxytype in self.SUPPORT_PROXY_TYPE:
proxies = {"http": self.proxytype + "://" + self.address,
"https": self.proxytype + "://" + self.address}
else:
proxies = {"http": "http://" + self.address, "https": "https://" + self.address}
else:
proxies = {}
return proxies
if __name__ == "__main__":
def evprint(evstr):
code = compile(evstr, "<string>", "eval")
print('{}: "{}"'.format(evstr, eval(code)))
config = Config()
mfilter = {'conf', 'proxy', '_exit', '_default_config', 'ini_path', 'set_override'}
for _m in [m for m in dir(config) if not m.startswith('__') and m not in mfilter]:
evprint(f'config.{_m}()')
pfilter = {'proxies', 'SUPPORT_PROXY_TYPE'}
# test getInstance()
assert (getInstance() == config)
for _p in [p for p in dir(getInstance().proxy()) if not p.startswith('__') and p not in pfilter]:
evprint(f'getInstance().proxy().{_p}')
# Create new instance
conf2 = Config()
assert getInstance() != conf2
assert getInstance() == config
conf2.set_override("d:s=1;face:asp=2;f:aw=0;pri:w=javdb;f:l=")
assert conf2.face_aspect_ratio() == 2
assert conf2.face_aways_imagecut() == False
assert conf2.sources() == "javdb"
print(f"Load Config file '{conf2.ini_path}'.")

1229
core.py

File diff suppressed because it is too large Load Diff

16
docker/Dockerfile Normal file
View File

@@ -0,0 +1,16 @@
FROM python:slim
RUN sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list \
&& sed -i 's/security.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pip -U \
&& pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN apt-get update \
&& apt-get install -y wget ca-certificates \
&& wget -O - 'https://github.com/yoshiko2/AV_Data_Capture/archive/master.tar.gz' | tar xz \
&& mv AV_Data_Capture-master /jav \
&& cd /jav \
&& ( pip install --no-cache-dir -r requirements.txt || true ) \
&& pip install --no-cache-dir requests lxml Beautifulsoup4 pillow \
&& apt-get purge -y wget
WORKDIR /jav

27
docker/config.ini Normal file
View File

@@ -0,0 +1,27 @@
[common]
main_mode=1
failed_output_folder=data/failure_output
success_output_folder=data/organized
link_mode=0
[proxy]
proxy=
timeout=10
retry=3
[Name_Rule]
location_rule=actor+'/'+number
naming_rule=number+'-'+title
[update]
update_check=0
[escape]
literals=\()/
folders=data/failure_output,data/organized
[debug_mode]
switch=0
[media]
media_warehouse=plex

View File

@@ -0,0 +1,13 @@
version: "2.2"
services:
jav:
user: "${JAVUID}:${JAVGID}"
image: jav:local
build: .
volumes:
- ./config.ini:/jav/config.ini
- ${JAV_PATH}:/jav/data
command:
- python
- /jav/AV_Data_Capture.py
- -a

BIN
donate.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 163 KiB

View File

@@ -1,58 +0,0 @@
import re
from lxml import etree#need install
import json
import ADC_function
def getTitle(htmlcode): #获取厂商
#print(htmlcode)
html = etree.fromstring(htmlcode,etree.HTMLParser())
result = str(html.xpath('/html/body/div[2]/div/div[1]/h3/text()')).strip(" ['']")
return result
def getStudio(htmlcode): #获取厂商
html = etree.fromstring(htmlcode,etree.HTMLParser())
result = str(html.xpath('/html/body/div[2]/div/div[1]/h5[3]/a[1]/text()')).strip(" ['']")
return result
def getNum(htmlcode): #获取番号
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[1]/span[2]/text()')).strip(" ['']")
return result
def getRelease(number):
a=ADC_function.get_html('http://adult.contents.fc2.com/article_search.php?id='+str(number).lstrip("FC2-").lstrip("fc2-").lstrip("fc2_").lstrip("fc2-")+'&utm_source=aff_php&utm_medium=source_code&utm_campaign=from_aff_php')
html=etree.fromstring(a,etree.HTMLParser())
result = str(html.xpath('//*[@id="container"]/div[1]/div/article/section[1]/div/div[2]/dl/dd[4]/text()')).strip(" ['']")
return result
def getCover(htmlcode,number): #获取厂商
a = ADC_function.get_html('http://adult.contents.fc2.com/article_search.php?id=' + str(number).lstrip("FC2-").lstrip("fc2-").lstrip("fc2_").lstrip("fc2-") + '&utm_source=aff_php&utm_medium=source_code&utm_campaign=from_aff_php')
html = etree.fromstring(a, etree.HTMLParser())
result = str(html.xpath('//*[@id="container"]/div[1]/div/article/section[1]/div/div[1]/a/img/@src')).strip(" ['']")
return 'http:'+result
def getOutline(htmlcode,number): #获取番号
a = ADC_function.get_html('http://adult.contents.fc2.com/article_search.php?id=' + str(number).lstrip("FC2-").lstrip("fc2-").lstrip("fc2_").lstrip("fc2-") + '&utm_source=aff_php&utm_medium=source_code&utm_campaign=from_aff_php')
html = etree.fromstring(a, etree.HTMLParser())
result = str(html.xpath('//*[@id="container"]/div[1]/div/article/section[4]/p/text()')).replace("\\n",'',10000).strip(" ['']").replace("'",'',10000)
return result
# def getTag(htmlcode,number): #获取番号
# a = ADC_function.get_html('http://adult.contents.fc2.com/article_search.php?id=' + str(number).lstrip("FC2-").lstrip("fc2-").lstrip("fc2_").lstrip("fc2-") + '&utm_source=aff_php&utm_medium=source_code&utm_campaign=from_aff_php')
# html = etree.fromstring(a, etree.HTMLParser())
# result = str(html.xpath('//*[@id="container"]/div[1]/div/article/section[4]/p/text()')).replace("\\n",'',10000).strip(" ['']").replace("'",'',10000)
# return result
def main(number):
str(number).lstrip("FC2-").lstrip("fc2-").lstrip("fc2_").lstrip("fc2-")
htmlcode = ADC_function.get_html('http://fc2fans.club/html/FC2-' + number + '.html')
dic = {
'title': getTitle(htmlcode),
'studio': getStudio(htmlcode),
'year': getRelease(number),
'outline': getOutline(htmlcode,number),
'runtime': '',
'director': getStudio(htmlcode),
'actor': '',
'release': getRelease(number),
'number': number,
'cover': getCover(htmlcode,number),
'imagecut': 0,
'tag':" ",
}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, indent=4, separators=(',', ':'),)#.encode('UTF-8')
return js

172
javbus.py
View File

@@ -1,172 +0,0 @@
import re
import requests #need install
from pyquery import PyQuery as pq#need install
from lxml import etree#need install
import os
import os.path
import shutil
from bs4 import BeautifulSoup#need install
from PIL import Image#need install
import time
import json
def get_html(url):#网页请求核心
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}
getweb = requests.get(str(url),timeout=5,headers=headers).text
try:
return getweb
except:
print("[-]Connect Failed! Please check your Proxy.")
def getTitle(htmlcode): #获取标题
doc = pq(htmlcode)
title=str(doc('div.container h3').text()).replace(' ','-')
return title
def getStudio(htmlcode): #获取厂商
html = etree.fromstring(htmlcode,etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[5]/a/text()')).strip(" ['']")
return result
def getYear(htmlcode): #获取年份
html = etree.fromstring(htmlcode,etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[2]/text()')).strip(" ['']")
return result
def getCover(htmlcode): #获取封面链接
doc = pq(htmlcode)
image = doc('a.bigImage')
return image.attr('href')
print(image.attr('href'))
def getRelease(htmlcode): #获取出版日期
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[2]/text()')).strip(" ['']")
return result
def getRuntime(htmlcode): #获取分钟
soup = BeautifulSoup(htmlcode, 'lxml')
a = soup.find(text=re.compile('分鐘'))
return a
def getActor(htmlcode): #获取女优
b=[]
soup=BeautifulSoup(htmlcode,'lxml')
a=soup.find_all(attrs={'class':'star-name'})
for i in a:
b.append(i.get_text())
return b
def getNum(htmlcode): #获取番号
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[1]/span[2]/text()')).strip(" ['']")
return result
def getDirector(htmlcode): #获取导演
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('/html/body/div[5]/div[1]/div[2]/p[4]/a/text()')).strip(" ['']")
return result
def getOutline(htmlcode): #获取演员
doc = pq(htmlcode)
result = str(doc('tr td div.mg-b20.lh4 p.mg-b20').text())
return result
def getTag(htmlcode): # 获取演员
tag = []
soup = BeautifulSoup(htmlcode, 'lxml')
a = soup.find_all(attrs={'class': 'genre'})
for i in a:
if 'onmouseout' in str(i):
continue
tag.append(i.get_text())
return tag
def main(number):
htmlcode=get_html('https://www.javbus.com/'+number)
dww_htmlcode=get_html("https://www.dmm.co.jp/mono/dvd/-/detail/=/cid=" + number.replace("-", ''))
dic = {
'title': getTitle(htmlcode),
'studio': getStudio(htmlcode),
'year': getYear(htmlcode),
'outline': getOutline(dww_htmlcode),
'runtime': getRuntime(htmlcode),
'director': getDirector(htmlcode),
'actor': getActor(htmlcode),
'release': getRelease(htmlcode),
'number': getNum(htmlcode),
'cover': getCover(htmlcode),
'imagecut': 1,
'tag':getTag(htmlcode)
}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, indent=4, separators=(',', ':'),)#.encode('UTF-8')
return js
def main_uncensored(number):
htmlcode = get_html('https://www.javbus.com/' + number)
dww_htmlcode = get_html("https://www.dmm.co.jp/mono/dvd/-/detail/=/cid=" + number.replace("-", ''))
#print('un')
#print('https://www.javbus.com/' + number)
dic = {
'title': getTitle(htmlcode),
'studio': getStudio(htmlcode),
'year': getYear(htmlcode),
'outline': getOutline(htmlcode),
'runtime': getRuntime(htmlcode),
'director': getDirector(htmlcode),
'actor': getActor(htmlcode),
'release': getRelease(htmlcode),
'number': getNum(htmlcode),
'cover': getCover(htmlcode),
'tag': getTag(htmlcode),
'imagecut': 0,
}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, indent=4, separators=(',', ':'), ) # .encode('UTF-8')
if getYear(htmlcode) == '':
#print('un2')
number2 = number.replace('-', '_')
htmlcode = get_html('https://www.javbus.com/' + number2)
#print('https://www.javbus.com/' + number2)
dww_htmlcode = get_html("https://www.dmm.co.jp/mono/dvd/-/detail/=/cid=" + number2.replace("_", ''))
dic = {
'title': getTitle(htmlcode),
'studio': getStudio(htmlcode),
'year': getYear(htmlcode),
'outline': getOutline(htmlcode),
'runtime': getRuntime(htmlcode),
'director': getDirector(htmlcode),
'actor': getActor(htmlcode),
'release': getRelease(htmlcode),
'number': getNum(htmlcode),
'cover': getCover(htmlcode),
'tag': getTag(htmlcode),
'imagecut': 0,
}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, indent=4, separators=(',', ':'), ) # .encode('UTF-8')
#print(js)
return js
else:
bbb=''
# def return1():
# json_data=json.loads(main('ipx-292'))
#
# title = str(json_data['title'])
# studio = str(json_data['studio'])
# year = str(json_data['year'])
# outline = str(json_data['outline'])
# runtime = str(json_data['runtime'])
# director = str(json_data['director'])
# actor = str(json_data['actor'])
# release = str(json_data['release'])
# number = str(json_data['number'])
# cover = str(json_data['cover'])
# tag = str(json_data['tag'])
#
# print(title)
# print(studio)
# print(year)
# print(outline)
# print(runtime)
# print(director)
# print(actor)
# print(release)
# print(number)
# print(cover)
# print(tag)
# return1()

287
number_parser.py Executable file
View File

@@ -0,0 +1,287 @@
import os
import re
import sys
import config
import typing
G_spat = re.compile(
"^\w+\.(cc|com|net|me|club|jp|tv|xyz|biz|wiki|info|tw|us|de)@|^22-sht\.me|"
"^(fhd|hd|sd|1080p|720p|4K)(-|_)|"
"(-|_)(fhd|hd|sd|1080p|720p|4K|x264|x265|uncensored|hack|leak)",
re.IGNORECASE)
def get_number(debug: bool, file_path: str) -> str:
"""
从文件路径中提取番号 from number_parser import get_number
>>> get_number(False, "/Users/Guest/AV_Data_Capture/snis-829.mp4")
'snis-829'
>>> get_number(False, "/Users/Guest/AV_Data_Capture/snis-829-C.mp4")
'snis-829'
>>> get_number(False, "/Users/Guest/AV_Data_Capture/[脸肿字幕组][PoRO]牝教師4穢された教壇 「生意気ドジっ娘女教師・美結高飛車ハメ堕ち2濁金」[720p][x264_aac].mp4")
'牝教師4穢された教壇 「生意気ドジっ娘女教師・美結高飛車ハメ堕ち2濁金」'
>>> get_number(False, "C:¥Users¥Guest¥snis-829.mp4")
'snis-829'
>>> get_number(False, "C:¥Users¥Guest¥snis-829-C.mp4")
'snis-829'
>>> get_number(False, "./snis-829.mp4")
'snis-829'
>>> get_number(False, "./snis-829-C.mp4")
'snis-829'
>>> get_number(False, ".¥snis-829.mp4")
'snis-829'
>>> get_number(False, ".¥snis-829-C.mp4")
'snis-829'
>>> get_number(False, "snis-829.mp4")
'snis-829'
>>> get_number(False, "snis-829-C.mp4")
'snis-829'
"""
filepath = os.path.basename(file_path)
# debug True 和 False 两块代码块合并原因是此模块及函数只涉及字符串计算没有IO操作debug on时输出导致异常信息即可
try:
# 先对自定义正则进行匹配
if config.getInstance().number_regexs().split().__len__() > 0:
for regex in config.getInstance().number_regexs().split():
try:
if re.search(regex, filepath):
return re.search(regex, filepath).group()
except Exception as e:
print(f'[-]custom regex exception: {e} [{regex}]')
file_number = get_number_by_dict(filepath)
if file_number:
return file_number
elif '字幕组' in filepath or 'SUB' in filepath.upper() or re.match(r'[\u30a0-\u30ff]+', filepath):
filepath = G_spat.sub("", filepath)
filepath = re.sub("\[.*?\]","",filepath)
filepath = filepath.replace(".chs", "").replace(".cht", "")
file_number = str(re.findall(r'(.+?)\.', filepath)).strip(" [']")
return file_number
elif '-' in filepath or '_' in filepath: # 普通提取番号 主要处理包含减号-和_的番号
filepath = G_spat.sub("", filepath)
filename = str(re.sub("\[\d{4}-\d{1,2}-\d{1,2}\] - ", "", filepath)) # 去除文件名中时间
lower_check = filename.lower()
if 'fc2' in lower_check:
filename = lower_check.replace('--', '-').replace('_', '-').upper()
filename = re.sub("[-_]cd\d{1,2}", "", filename, flags=re.IGNORECASE)
if not re.search("-|_", filename): # 去掉-CD1之后再无-的情况例如n1012-CD1.wmv
return str(re.search(r'\w+', filename[:filename.find('.')], re.A).group())
file_number = os.path.splitext(filename)
filename = re.search(r'[\w\-_]+', filename, re.A)
if filename:
file_number = str(filename.group())
else:
file_number = file_number[0]
new_file_number = file_number
if re.search("-c", file_number, flags=re.IGNORECASE):
new_file_number = re.sub("(-|_)c$", "", file_number, flags=re.IGNORECASE)
elif re.search("-u$", file_number, flags=re.IGNORECASE):
new_file_number = re.sub("(-|_)u$", "", file_number, flags=re.IGNORECASE)
elif re.search("-uc$", file_number, flags=re.IGNORECASE):
new_file_number = re.sub("(-|_)uc$", "", file_number, flags=re.IGNORECASE)
elif re.search("\d+ch$", file_number, flags=re.I):
new_file_number = file_number[:-2]
return new_file_number.upper()
else: # 提取不含减号-的番号FANZA CID
# 欧美番号匹配规则
oumei = re.search(r'[a-zA-Z]+\.\d{2}\.\d{2}\.\d{2}', filepath)
if oumei:
return oumei.group()
try:
return str(
re.findall(r'(.+?)\.',
str(re.search('([^<>/\\\\|:""\\*\\?]+)\\.\\w+$', filepath).group()))).strip(
"['']").replace('_', '-')
except:
return str(re.search(r'(.+?)\.', filepath)[0])
except Exception as e:
if debug:
print(f'[-]Number Parser exception: {e} [{file_path}]')
return None
# 按javdb数据源的命名规范提取number
G_TAKE_NUM_RULES = {
'tokyo.*hot': lambda x: str(re.search(r'(cz|gedo|k|n|red-|se)\d{2,4}', x, re.I).group()),
'carib': lambda x: str(re.search(r'\d{6}(-|_)\d{3}', x, re.I).group()).replace('_', '-'),
'1pon|mura|paco': lambda x: str(re.search(r'\d{6}(-|_)\d{3}', x, re.I).group()).replace('-', '_'),
'10mu': lambda x: str(re.search(r'\d{6}(-|_)\d{2}', x, re.I).group()).replace('-', '_'),
'x-art': lambda x: str(re.search(r'x-art\.\d{2}\.\d{2}\.\d{2}', x, re.I).group()),
'xxx-av': lambda x: ''.join(['xxx-av-', re.findall(r'xxx-av[^\d]*(\d{3,5})[^\d]*', x, re.I)[0]]),
'heydouga': lambda x: 'heydouga-' + '-'.join(re.findall(r'(\d{4})[\-_](\d{3,4})[^\d]*', x, re.I)[0]),
'heyzo': lambda x: 'HEYZO-' + re.findall(r'heyzo[^\d]*(\d{4})', x, re.I)[0],
'mdbk': lambda x: str(re.search(r'mdbk(-|_)(\d{4})', x, re.I).group()),
'mdtm': lambda x: str(re.search(r'mdtm(-|_)(\d{4})', x, re.I).group()),
'caribpr': lambda x: str(re.search(r'\d{6}(-|_)\d{3}', x, re.I).group()).replace('_', '-'),
}
def get_number_by_dict(filename: str) -> typing.Optional[str]:
try:
for k, v in G_TAKE_NUM_RULES.items():
if re.search(k, filename, re.I):
return v(filename)
except:
pass
return None
class Cache_uncensored_conf:
prefix = None
def is_empty(self):
return bool(self.prefix is None)
def set(self, v: list):
if not v or not len(v) or not len(v[0]):
raise ValueError('input prefix list empty or None')
s = v[0]
if len(v) > 1:
for i in v[1:]:
s += f"|{i}.+"
self.prefix = re.compile(s, re.I)
def check(self, number):
if self.prefix is None:
raise ValueError('No init re compile')
return self.prefix.match(number)
G_cache_uncensored_conf = Cache_uncensored_conf()
# ========================================================================是否为无码
def is_uncensored(number) -> bool:
if re.match(
r'[\d-]{4,}|\d{6}_\d{2,3}|(cz|gedo|k|n|red-|se)\d{2,4}|heyzo.+|xxx-av-.+|heydouga-.+|x-art\.\d{2}\.\d{2}\.\d{2}',
number,
re.I
):
return True
if G_cache_uncensored_conf.is_empty():
G_cache_uncensored_conf.set(config.getInstance().get_uncensored().split(','))
return bool(G_cache_uncensored_conf.check(number))
if __name__ == "__main__":
# import doctest
# doctest.testmod(raise_on_error=True)
test_use_cases = (
"MEYD-594-C.mp4",
"SSIS-001_C.mp4",
"SSIS100-C.mp4",
"SSIS101_C.mp4",
"ssni984.mp4",
"ssni666.mp4",
"SDDE-625_uncensored_C.mp4",
"SDDE-625_uncensored_leak_C.mp4",
"SDDE-625_uncensored_leak_C_cd1.mp4",
"Tokyo Hot n9001 FHD.mp4", # 无-号,以前无法正确提取
"TokyoHot-n1287-HD SP2006 .mp4",
"caribean-020317_001.nfo", # -号误命名为_号的
"257138_3xplanet_1Pondo_080521_001.mp4",
"ADV-R0624-CD3.wmv", # 多碟影片
"XXX-AV 22061-CD5.iso", # 支持片商格式 xxx-av-22061 命名规则来自javdb数据源
"xxx-av 20589.mp4",
"Muramura-102114_145-HD.wmv", # 支持片商格式 102114_145 命名规则来自javdb数据源
"heydouga-4102-023-CD2.iso", # 支持片商格式 heydouga-4102-023 命名规则来自javdb数据源
"HeyDOuGa4236-1048 Ai Qiu - .mp4", # heydouga-4236-1048 命名规则来自javdb数据源
"pacopacomama-093021_539-FHD.mkv", # 支持片商格式 093021_539 命名规则来自javdb数据源
"sbw99.cc@heyzo_hd_2636_full.mp4",
"hhd800.com@STARS-566-HD.mp4",
"jav20s8.com@GIGL-677_4K.mp4",
"sbw99.cc@iesp-653-4K.mp4",
"4K-ABP-358_C.mkv",
"n1012-CD1.wmv",
"[]n1012-CD2.wmv",
"rctd-460ch.mp4", # 除支持-C硬字幕外新支持ch硬字幕
"rctd-461CH-CD2.mp4", # ch后可加CDn
"rctd-461-Cd3-C.mp4", # CDn后可加-C
"rctd-461-C-cD4.mp4", # cD1 Cd1 cd1 CD1 最终生成.nfo时统一为大写CD1
"MD-123.ts",
"MDSR-0001-ep2.ts",
"MKY-NS-001.mp4"
)
def evprint(evstr):
code = compile(evstr, "<string>", "eval")
print("{1:>20} # '{0}'".format(evstr[18:-2], eval(code)))
for t in test_use_cases:
evprint(f'get_number(True, "{t}")')
if len(sys.argv) <= 1 or not re.search('^[A-Z]:?', sys.argv[1], re.IGNORECASE):
sys.exit(0)
# 使用Everything的ES命令行工具搜集全盘视频文件名作为用例测试number数据参数为盘符 A .. Z 或带盘符路径
# https://www.voidtools.com/support/everything/command_line_interface/
# ES命令行工具需要Everything文件搜索引擎处于运行状态es.exe单个执行文件需放入PATH路径中。
# Everything是免费软件
# 示例:
# python.exe .\number_parser.py ALL # 从所有磁盘搜索视频
# python.exe .\number_parser.py D # 从D盘搜索
# python.exe .\number_parser.py D: # 同上
# python.exe .\number_parser.py D:\download\JAVs # 搜索D盘的\download\JAVs目录路径必须带盘符
# ==================
# Linux/WSL1|2 使用mlocate(Ubuntu/Debian)或plocate(Debian sid)搜集全盘视频文件名作为测试用例number数据
# 需安装'sudo apt install mlocate或plocate'并首次运行sudo updatedb建立全盘索引
# MAC OS X 使用findutils的glocate需安装'sudo brew install findutils'并首次运行sudo gupdatedb建立全盘索引
# 示例:
# python3 ./number_parser.py ALL
import subprocess
ES_search_path = "ALL disks"
if sys.argv[1] == "ALL":
if sys.platform == "win32":
# ES_prog_path = 'C:/greensoft/es/es.exe'
ES_prog_path = 'es.exe' # es.exe需要放在PATH环境变量的路径之内
ES_cmdline = f'{ES_prog_path} -name size:gigantic ext:mp4;avi;rmvb;wmv;mov;mkv;flv;ts;webm;iso;mpg;m4v'
out_bytes = subprocess.check_output(ES_cmdline.split(' '))
out_text = out_bytes.decode('gb18030') # 中文版windows 10 x64默认输出GB18030此编码为UNICODE方言与UTF-8系全射关系无转码损失
out_list = out_text.splitlines()
elif sys.platform in ("linux", "darwin"):
ES_prog_path = 'locate' if sys.platform == 'linux' else 'glocate'
ES_cmdline = r"{} -b -i --regex '\.mp4$|\.avi$|\.rmvb$|\.wmv$|\.mov$|\.mkv$|\.webm$|\.iso$|\.mpg$|\.m4v$'".format(
ES_prog_path)
out_bytes = subprocess.check_output(ES_cmdline.split(' '))
out_text = out_bytes.decode('utf-8')
out_list = [os.path.basename(line) for line in out_text.splitlines()]
else:
print('[-]Unsupported platform! Please run on OS Windows/Linux/MacOSX. Exit.')
sys.exit(1)
else: # Windows single disk
if sys.platform != "win32":
print('[!]Usage: python3 ./number_parser.py ALL')
sys.exit(0)
# ES_prog_path = 'C:/greensoft/es/es.exe'
ES_prog_path = 'es.exe' # es.exe需要放在PATH环境变量的路径之内
if os.path.isdir(sys.argv[1]):
ES_search_path = sys.argv[1]
else:
ES_search_path = sys.argv[1][0] + ':/'
if not os.path.isdir(ES_search_path):
ES_search_path = 'C:/'
ES_search_path = os.path.normcase(ES_search_path)
ES_cmdline = f'{ES_prog_path} -path {ES_search_path} -name size:gigantic ext:mp4;avi;rmvb;wmv;mov;mkv;webm;iso;mpg;m4v'
out_bytes = subprocess.check_output(ES_cmdline.split(' '))
out_text = out_bytes.decode('gb18030') # 中文版windows 10 x64默认输出GB18030此编码为UNICODE方言与UTF-8系全射关系无转码损失
out_list = out_text.splitlines()
print(f'\n[!]{ES_prog_path} is searching {ES_search_path} for movies as number parser test cases...')
print(f'[+]Find {len(out_list)} Movies.')
for filename in out_list:
try:
n = get_number(True, filename)
if n:
print(' [{0}] {2}# {1}'.format(n, filename, '#无码' if is_uncensored(n) else ''))
else:
print(f'[-]Number return None. # {filename}')
except Exception as e:
print(f'[-]Number Parser exception: {e} [{filename}]')
sys.exit(0)

View File

@@ -1,2 +0,0 @@
pyinstaller --onefile AV_Data_Capture.py
pyinstaller --onefile core.py --hidden-import ADC_function.py --hidden-import fc2fans_club.py --hidden-import javbus.py --hidden-import siro.py

26
py_to_exe.ps1 Normal file
View File

@@ -0,0 +1,26 @@
# If you can't run this script, please execute the following command in PowerShell.
# Set-ExecutionPolicy RemoteSigned -Scope CurrentUser -Force
$CLOUDSCRAPER_PATH = $( python -c 'import cloudscraper as _; print(_.__path__[0])' | select -Last 1 )
$OPENCC_PATH = $( python -c 'import opencc as _; print(_.__path__[0])' | select -Last 1 )
$FACE_RECOGNITION_MODELS = $( python -c 'import face_recognition_models as _; print(_.__path__[0])' | select -Last 1 )
mkdir build
mkdir __pycache__
pyinstaller --onefile Movie_Data_Capture.py `
--hidden-import "ImageProcessing.cnn" `
--python-option u `
--add-data "$FACE_RECOGNITION_MODELS;face_recognition_models" `
--add-data "$CLOUDSCRAPER_PATH;cloudscraper" `
--add-data "$OPENCC_PATH;opencc" `
--add-data "Img;Img" `
--add-data "config.ini;." `
--add-data "scrapinglib;scrapinglib" `
rmdir -Recurse -Force build
rmdir -Recurse -Force __pycache__
rmdir -Recurse -Force Movie_Data_Capture.spec
echo "[Make]Finish"
pause

Binary file not shown.

Before

Width:  |  Height:  |  Size: 457 KiB

14
requirements.txt Normal file
View File

@@ -0,0 +1,14 @@
requests
dlib-bin
Click
numpy
face-recognition-models
lxml
beautifulsoup4
pillow==10.0.1
cloudscraper
pysocks==1.7.1
urllib3==1.26.18
certifi
MechanicalSoup
opencc-python-reimplemented

322
scraper.py Normal file
View File

@@ -0,0 +1,322 @@
# build-in lib
import json
import secrets
import typing
from pathlib import Path
# third party lib
import opencc
from lxml import etree
# project wide definitions
import config
from ADC_function import (translate,
load_cookies,
file_modification_days,
delete_all_elements_in_str,
delete_all_elements_in_list
)
from scrapinglib.api import search
def get_data_from_json(
file_number: str,
open_cc: opencc.OpenCC,
specified_source: str, specified_url: str) -> typing.Optional[dict]:
"""
iterate through all services and fetch the data 从网站上查询片名解析JSON返回元数据
:param file_number: 影片名称
:param open_cc: 简繁转换器
:param specified_source: 指定的媒体数据源
:param specified_url: 指定的数据查询地址, 目前未使用
:return 给定影片名称的具体信息
"""
try:
actor_mapping_data = etree.parse(str(Path.home() / '.local' / 'share' / 'mdc' / 'mapping_actor.xml'))
info_mapping_data = etree.parse(str(Path.home() / '.local' / 'share' / 'mdc' / 'mapping_info.xml'))
except:
actor_mapping_data = etree.fromstring("<html></html>", etree.HTMLParser())
info_mapping_data = etree.fromstring("<html></html>", etree.HTMLParser())
conf = config.getInstance()
# default fetch order list, from the beginning to the end
sources = conf.sources()
# TODO 准备参数
# - 清理 ADC_function, webcrawler
proxies: dict = None
config_proxy = conf.proxy()
if config_proxy.enable:
proxies = config_proxy.proxies()
# javdb website logic
# javdb have suffix
javdb_sites = conf.javdb_sites().split(',')
for i in javdb_sites:
javdb_sites[javdb_sites.index(i)] = "javdb" + i
javdb_sites.append("javdb")
# 不加载过期的cookiejavdb登录界面显示为7天免登录故假定cookie有效期为7天
has_valid_cookie = False
for cj in javdb_sites:
javdb_site = cj
cookie_json = javdb_site + '.json'
cookies_dict, cookies_filepath = load_cookies(cookie_json)
if isinstance(cookies_dict, dict) and isinstance(cookies_filepath, str):
cdays = file_modification_days(cookies_filepath)
if cdays < 7:
javdb_cookies = cookies_dict
has_valid_cookie = True
break
elif cdays != 9999:
print(
f'[!]Cookies file {cookies_filepath} was updated {cdays} days ago, it will not be used for HTTP requests.')
if not has_valid_cookie:
# get real random site from javdb_sites, because random is not really random when the seed value is known
javdb_site = secrets.choice(javdb_sites)
javdb_cookies = None
ca_cert = None
if conf.cacert_file():
ca_cert = conf.cacert_file()
json_data = search(file_number, sources, proxies=proxies, verify=ca_cert,
dbsite=javdb_site, dbcookies=javdb_cookies,
morestoryline=conf.is_storyline(),
specifiedSource=specified_source, specifiedUrl=specified_url,
debug = conf.debug())
# Return if data not found in all sources
if not json_data:
print('[-]Movie Number not found!')
return None
# 增加number严格判断避免提交任何number总是返回"本橋実来 ADZ335"这种返回number不一致的数据源故障
# 目前选用number命名规则是javdb.com Domain Creation Date: 2013-06-19T18:34:27Z
# 然而也可以跟进关注其它命名规则例如airav.wiki Domain Creation Date: 2019-08-28T07:18:42.0Z
# 如果将来javdb.com命名规则下不同Studio出现同名碰撞导致无法区分可考虑更换规则更新相应的number分析和抓取代码。
if str(json_data.get('number')).upper() != file_number.upper():
try:
if json_data.get('allow_number_change'):
pass
except:
print('[-]Movie number has changed! [{}]->[{}]'.format(file_number, str(json_data.get('number'))))
return None
# ================================================网站规则添加结束================================================
if json_data.get('title') == '':
print('[-]Movie Number or Title not found!')
return None
title = json_data.get('title')
actor_list = str(json_data.get('actor')).strip("[ ]").replace("'", '').split(',') # 字符串转列表
actor_list = [actor.strip() for actor in actor_list] # 去除空白
director = json_data.get('director')
release = json_data.get('release')
number = json_data.get('number')
studio = json_data.get('studio')
source = json_data.get('source')
runtime = json_data.get('runtime')
outline = json_data.get('outline')
label = json_data.get('label')
series = json_data.get('series')
year = json_data.get('year')
if json_data.get('cover_small'):
cover_small = json_data.get('cover_small')
else:
cover_small = ''
if json_data.get('trailer'):
trailer = json_data.get('trailer')
else:
trailer = ''
if json_data.get('extrafanart'):
extrafanart = json_data.get('extrafanart')
else:
extrafanart = ''
imagecut = json_data.get('imagecut')
tag = str(json_data.get('tag')).strip("[ ]").replace("'", '').replace(" ", '').split(',') # 字符串转列表 @
while 'XXXX' in tag:
tag.remove('XXXX')
while 'xxx' in tag:
tag.remove('xxx')
if json_data['source'] =='pissplay': # pissplay actor为英文名不用去除空格
actor = str(actor_list).strip("[ ]").replace("'", '')
else:
actor = str(actor_list).strip("[ ]").replace("'", '').replace(" ", '')
# if imagecut == '3':
# DownloadFileWithFilename()
# ====================处理异常字符====================== #\/:*?"<>|
actor = special_characters_replacement(actor)
actor_list = [special_characters_replacement(a) for a in actor_list]
title = special_characters_replacement(title)
label = special_characters_replacement(label)
outline = special_characters_replacement(outline)
series = special_characters_replacement(series)
studio = special_characters_replacement(studio)
director = special_characters_replacement(director)
tag = [special_characters_replacement(t) for t in tag]
release = release.replace('/', '-')
tmpArr = cover_small.split(',')
if len(tmpArr) > 0:
cover_small = tmpArr[0].strip('\"').strip('\'')
# ====================处理异常字符 END================== #\/:*?"<>|
# 处理大写
if conf.number_uppercase():
json_data['number'] = number.upper()
# 返回处理后的json_data
json_data['title'] = title
json_data['original_title'] = title
json_data['actor'] = actor
json_data['release'] = release
json_data['cover_small'] = cover_small
json_data['tag'] = tag
json_data['year'] = year
json_data['actor_list'] = actor_list
json_data['trailer'] = trailer
json_data['extrafanart'] = extrafanart
json_data['label'] = label
json_data['outline'] = outline
json_data['series'] = series
json_data['studio'] = studio
json_data['director'] = director
if conf.is_translate():
translate_values = conf.translate_values().split(",")
for translate_value in translate_values:
if json_data[translate_value] == "":
continue
if translate_value == "title":
title_dict = json.loads(
(Path.home() / '.local' / 'share' / 'mdc' / 'c_number.json').read_text(encoding="utf-8"))
try:
json_data[translate_value] = title_dict[number]
continue
except:
pass
if conf.get_translate_engine() == "azure":
t = translate(
json_data[translate_value],
target_language="zh-Hans",
engine=conf.get_translate_engine(),
key=conf.get_translate_key(),
)
else:
if len(json_data[translate_value]):
if type(json_data[translate_value]) == str:
json_data[translate_value] = special_characters_replacement(json_data[translate_value])
json_data[translate_value] = translate(json_data[translate_value])
else:
for i in range(len(json_data[translate_value])):
json_data[translate_value][i] = special_characters_replacement(
json_data[translate_value][i])
list_in_str = ",".join(json_data[translate_value])
json_data[translate_value] = translate(list_in_str).split(',')
if open_cc:
cc_vars = conf.cc_convert_vars().split(",")
ccm = conf.cc_convert_mode()
def convert_list(mapping_data, language, vars):
total = []
for i in vars:
if len(mapping_data.xpath('a[contains(@keyword, $name)]/@' + language, name=f",{i},")) != 0:
i = mapping_data.xpath('a[contains(@keyword, $name)]/@' + language, name=f",{i},")[0]
total.append(i)
return total
def convert(mapping_data, language, vars):
if len(mapping_data.xpath('a[contains(@keyword, $name)]/@' + language, name=vars)) != 0:
return mapping_data.xpath('a[contains(@keyword, $name)]/@' + language, name=vars)[0]
else:
raise IndexError('keyword not found')
for cc in cc_vars:
if json_data[cc] == "" or len(json_data[cc]) == 0:
continue
if cc == "actor":
try:
if ccm == 1:
json_data['actor_list'] = convert_list(actor_mapping_data, "zh_cn", json_data['actor_list'])
json_data['actor'] = convert(actor_mapping_data, "zh_cn", json_data['actor'])
elif ccm == 2:
json_data['actor_list'] = convert_list(actor_mapping_data, "zh_tw", json_data['actor_list'])
json_data['actor'] = convert(actor_mapping_data, "zh_tw", json_data['actor'])
elif ccm == 3:
json_data['actor_list'] = convert_list(actor_mapping_data, "jp", json_data['actor_list'])
json_data['actor'] = convert(actor_mapping_data, "jp", json_data['actor'])
except:
json_data['actor_list'] = [open_cc.convert(aa) for aa in json_data['actor_list']]
json_data['actor'] = open_cc.convert(json_data['actor'])
elif cc == "tag":
try:
if ccm == 1:
json_data[cc] = convert_list(info_mapping_data, "zh_cn", json_data[cc])
json_data[cc] = delete_all_elements_in_list("删除", json_data[cc])
elif ccm == 2:
json_data[cc] = convert_list(info_mapping_data, "zh_tw", json_data[cc])
json_data[cc] = delete_all_elements_in_list("删除", json_data[cc])
elif ccm == 3:
json_data[cc] = convert_list(info_mapping_data, "jp", json_data[cc])
json_data[cc] = delete_all_elements_in_list("删除", json_data[cc])
except:
json_data[cc] = [open_cc.convert(t) for t in json_data[cc]]
else:
try:
if ccm == 1:
json_data[cc] = convert(info_mapping_data, "zh_cn", json_data[cc])
json_data[cc] = delete_all_elements_in_str("删除", json_data[cc])
elif ccm == 2:
json_data[cc] = convert(info_mapping_data, "zh_tw", json_data[cc])
json_data[cc] = delete_all_elements_in_str("删除", json_data[cc])
elif ccm == 3:
json_data[cc] = convert(info_mapping_data, "jp", json_data[cc])
json_data[cc] = delete_all_elements_in_str("删除", json_data[cc])
except IndexError:
json_data[cc] = open_cc.convert(json_data[cc])
except:
pass
naming_rule = ""
original_naming_rule = ""
for i in conf.naming_rule().split("+"):
if i not in json_data:
naming_rule += i.strip("'").strip('"')
original_naming_rule += i.strip("'").strip('"')
else:
item = json_data.get(i)
naming_rule += item if type(item) is not list else "&".join(item)
# PATCH处理[title]存在翻译的情况后续NFO文件的original_name只会直接沿用naming_rule,这导致original_name非原始名
# 理应在翻译处处理 naming_rule和original_naming_rule
if i == 'title':
item = json_data.get('original_title')
original_naming_rule += item if type(item) is not list else "&".join(item)
json_data['naming_rule'] = naming_rule
json_data['original_naming_rule'] = original_naming_rule
return json_data
def special_characters_replacement(text) -> str:
if not isinstance(text, str):
return text
return (text.replace('\\', ''). # U+2216 SET MINUS @ Basic Multilingual Plane
replace('/', ''). # U+2215 DIVISION SLASH @ Basic Multilingual Plane
replace(':', ''). # U+A789 MODIFIER LETTER COLON @ Latin Extended-D
replace('*', ''). # U+2217 ASTERISK OPERATOR @ Basic Multilingual Plane
replace('?', ''). # U+FF1F FULLWIDTH QUESTION MARK @ Basic Multilingual Plane
replace('"', ''). # U+FF02 FULLWIDTH QUOTATION MARK @ Basic Multilingual Plane
replace('<', ''). # U+1438 CANADIAN SYLLABICS PA @ Basic Multilingual Plane
replace('>', ''). # U+1433 CANADIAN SYLLABICS PO @ Basic Multilingual Plane
replace('|', 'ǀ'). # U+01C0 LATIN LETTER DENTAL CLICK @ Basic Multilingual Plane
replace('&lsquo;', ''). # U+02018 LEFT SINGLE QUOTATION MARK
replace('&rsquo;', ''). # U+02019 RIGHT SINGLE QUOTATION MARK
replace('&hellip;', '').
replace('&amp;', '').
replace("&", '')
)

2
scrapinglib/__init__.py Normal file
View File

@@ -0,0 +1,2 @@
# -*- coding: utf-8 -*-
from .api import search, getSupportedSources

171
scrapinglib/airav.py Normal file
View File

@@ -0,0 +1,171 @@
# -*- coding: utf-8 -*-
import json
import re
from .parser import Parser
from .javbus import Javbus
class Airav(Parser):
source = 'airav'
expr_title = '/html/head/title/text()'
expr_number = '/html/head/title/text()'
expr_studio = '//a[contains(@href,"?video_factory=")]/text()'
expr_release = '//li[contains(text(),"發片日期")]/text()'
expr_outline = "string(//div[@class='d-flex videoDataBlock']/div[@class='synopsis']/p)"
expr_actor = '//ul[@class="videoAvstarList"]/li/a[starts-with(@href,"/idol/")]/text()'
expr_cover = '//img[contains(@src,"/storage/big_pic/")]/@src'
expr_tags = '//div[@class="tagBtnMargin"]/a/text()'
expr_extrafanart = '//div[@class="mobileImgThumbnail"]/a/@href'
def extraInit(self):
# for javbus
self.specifiedSource = None
self.addtion_Javbus = True
def search(self, number):
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = "https://www.airav.wiki/api/video/barcode/" + self.number.upper() + "?lng=zh-CN"
if self.addtion_Javbus:
engine = Javbus()
javbusinfo = engine.scrape(self.number, self)
if javbusinfo == 404:
self.javbus = {"title": ""}
else:
self.javbus = json.loads(javbusinfo)
self.htmlcode = self.getHtml(self.detailurl)
# htmltree = etree.fromstring(self.htmlcode, etree.HTMLParser())
#result = self.dictformat(htmltree)
htmltree = json.loads(self.htmlcode)["result"]
result = self.dictformat(htmltree)
return result
# def queryNumberUrl(self, number):
# queryUrl = "https://cn.airav.wiki/?search=" + number
# queryTree = self.getHtmlTree(queryUrl)
# results = self.getTreeAll(queryTree, '//div[contains(@class,"videoList")]/div/a')
# for i in results:
# num = self.getTreeElement(i, '//div/div[contains(@class,"videoNumber")]/p[1]/text()')
# if num.replace('-','') == number.replace('-','').upper():
# self.number = num
# return "https://cn.airav.wiki" + i.attrib['href']
# return 'https://cn.airav.wiki/video/' + number
def getNum(self, htmltree):
# if self.addtion_Javbus:
# result = self.javbus.get('number')
# if isinstance(result, str) and len(result):
# return result
# number = super().getNum(htmltree)
# result = str(re.findall('^\[(.*?)]', number)[0])
result = htmltree["barcode"]
return result
def getTitle(self, htmltree):
# title = super().getTitle(htmltree)
# result = str(re.findall('](.*?)- AIRAV-WIKI', title)[0]).strip()
result = htmltree["name"]
return result
def getStudio(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('studio')
if isinstance(result, str) and len(result):
return result
return super().getStudio(htmltree)
def getRelease(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('release')
if isinstance(result, str) and len(result):
return result
try:
return re.search(r'\d{4}-\d{2}-\d{2}', str(super().getRelease(htmltree))).group()
except:
return ''
def getYear(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('year')
if isinstance(result, str) and len(result):
return result
release = self.getRelease(htmltree)
return str(re.findall('\d{4}', release)).strip(" ['']")
def getOutline(self, htmltree):
# return self.getTreeAll(htmltree, self.expr_outline).replace('\n','').strip()
try:
result = htmltree["description"]
except:
result = ""
return result
def getRuntime(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('runtime')
if isinstance(result, str) and len(result):
return result
return ''
def getDirector(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('director')
if isinstance(result, str) and len(result):
return result
return ''
def getActors(self, htmltree):
# a = super().getActors(htmltree)
# b = [ i.strip() for i in a if len(i)]
# if len(b):
# return b
# if self.addtion_Javbus:
# result = self.javbus.get('actor')
# if isinstance(result, list) and len(result):
# return result
# return []
a = htmltree["actors"]
if a:
b = []
for i in a:
b.append(i["name"])
else:
b = []
return b
def getCover(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('cover')
if isinstance(result, str) and len(result):
return result
result = htmltree['img_url']
if isinstance(result, str) and len(result):
return result
return super().getCover(htmltree)
def getSeries(self, htmltree):
if self.addtion_Javbus:
result = self.javbus.get('series')
if isinstance(result, str) and len(result):
return result
return ''
def getExtrafanart(self,htmltree):
try:
result = htmltree["images"]
except:
result = ""
return result
def getTags(self, htmltree):
try:
tag = htmltree["tags"]
tags = []
for i in tag:
tags.append(i["name"])
except:
tags = []
return tags

259
scrapinglib/api.py Normal file
View File

@@ -0,0 +1,259 @@
# -*- coding: utf-8 -*-
import re
import json
from .parser import Parser
import config
import importlib
def search(number, sources: str = None, **kwargs):
""" 根据`番号/电影`名搜索信息
:param number: number/name depends on type
:param sources: sources string with `,` Eg: `avsox,javbus`
:param type: `adult`, `general`
"""
sc = Scraping()
return sc.search(number, sources, **kwargs)
def getSupportedSources(tag='adult'):
"""
:param tag: `adult`, `general`
"""
sc = Scraping()
if tag == 'adult':
return ','.join(sc.adult_full_sources)
else:
return ','.join(sc.general_full_sources)
class Scraping:
"""
"""
adult_full_sources = ['javlibrary', 'javdb', 'javbus', 'airav', 'fanza', 'xcity', 'jav321',
'mgstage', 'fc2', 'avsox', 'dlsite', 'carib', 'madou', 'msin',
'getchu', 'gcolle', 'javday', 'pissplay', 'javmenu', 'pcolle', 'caribpr'
]
general_full_sources = ['tmdb', 'imdb']
debug = False
proxies = None
verify = None
specifiedSource = None
specifiedUrl = None
dbcookies = None
dbsite = None
# 使用storyline方法进一步获取故事情节
morestoryline = False
def search(self, number, sources=None, proxies=None, verify=None, type='adult',
specifiedSource=None, specifiedUrl=None,
dbcookies=None, dbsite=None, morestoryline=False,
debug=False):
self.debug = debug
self.proxies = proxies
self.verify = verify
self.specifiedSource = specifiedSource
self.specifiedUrl = specifiedUrl
self.dbcookies = dbcookies
self.dbsite = dbsite
self.morestoryline = morestoryline
if type == 'adult':
return self.searchAdult(number, sources)
else:
return self.searchGeneral(number, sources)
def searchGeneral(self, name, sources):
""" 查询电影电视剧
imdb,tmdb
"""
if self.specifiedSource:
sources = [self.specifiedSource]
else:
sources = self.checkGeneralSources(sources, name)
json_data = {}
for source in sources:
try:
if self.debug:
print('[+]select', source)
try:
module = importlib.import_module('.' + source, 'scrapinglib')
parser_type = getattr(module, source.capitalize())
parser: Parser = parser_type()
data = parser.scrape(name, self)
if data == 404:
continue
json_data = json.loads(data)
except Exception as e:
if config.getInstance().debug():
print(e)
# if any service return a valid return, break
if self.get_data_state(json_data):
if self.debug:
print(f"[+]Find movie [{name}] metadata on website '{source}'")
break
except:
continue
# Return if data not found in all sources
if not json_data or json_data['title'] == "":
return None
# If actor is anonymous, Fill in Anonymous
if len(json_data['actor']) == 0:
if config.getInstance().anonymous_fill() == True:
if "zh_" in config.getInstance().get_target_language() or "ZH" in config.getInstance().get_target_language():
json_data['actor'] = "佚名"
else:
json_data['actor'] = "Anonymous"
return json_data
def searchAdult(self, number, sources):
if self.specifiedSource:
sources = [self.specifiedSource]
elif type(sources) is list:
pass
else:
sources = self.checkAdultSources(sources, number)
json_data = {}
for source in sources:
try:
if self.debug:
print('[+]select', source)
try:
module = importlib.import_module('.' + source, 'scrapinglib')
parser_type = getattr(module, source.capitalize())
parser: Parser = parser_type()
data = parser.scrape(number, self)
if data == 404:
continue
json_data = json.loads(data)
except Exception as e:
if config.getInstance().debug():
print(e)
# json_data = self.func_mapping[source](number, self)
# if any service return a valid return, break
if self.get_data_state(json_data):
if self.debug:
print(f"[+]Find movie [{number}] metadata on website '{source}'")
break
except:
continue
# javdb的封面有水印如果可以用其他源的封面来替换javdb的封面
if 'source' in json_data and json_data['source'] == 'javdb':
# search other sources
# If cover not found in other source, then skip using other sources using javdb cover instead
try:
other_sources = sources[sources.index('javdb') + 1:]
other_json_data = self.searchAdult(number, other_sources)
if other_json_data is not None and 'cover' in other_json_data and other_json_data['cover'] != '':
json_data['cover'] = other_json_data['cover']
if self.debug:
print(f"[+]Find movie [{number}] cover on website '{other_json_data['cover']}'")
except:
pass
# Return if data not found in all sources
if not json_data or json_data['title'] == "":
return None
# If actor is anonymous, Fill in Anonymous
if len(json_data['actor']) == 0:
if config.getInstance().anonymous_fill() == True:
if "zh_" in config.getInstance().get_target_language() or "ZH" in config.getInstance().get_target_language():
json_data['actor'] = "佚名"
else:
json_data['actor'] = "Anonymous"
return json_data
def checkGeneralSources(self, c_sources, name):
if not c_sources:
sources = self.general_full_sources
else:
sources = c_sources.split(',')
# check sources in func_mapping
todel = []
for s in sources:
if not s in self.general_full_sources:
print('[!] Source Not Exist : ' + s)
todel.append(s)
for d in todel:
print('[!] Remove Source : ' + s)
sources.remove(d)
return sources
def checkAdultSources(self, c_sources, file_number):
if not c_sources:
sources = self.adult_full_sources
else:
sources = c_sources.split(',')
def insert(sources, source):
if source in sources:
sources.insert(0, sources.pop(sources.index(source)))
return sources
if len(sources) <= len(self.adult_full_sources):
# if the input file name matches certain rules,
# move some web service to the beginning of the list
lo_file_number = file_number.lower()
if "carib" in sources:
sources = insert(sources, "caribpr")
sources = insert(sources, "carib")
elif "item" in file_number or "GETCHU" in file_number.upper():
sources = ["getchu"]
elif "rj" in lo_file_number or "vj" in lo_file_number:
sources = ["dlsite"]
elif re.search(r"[\u3040-\u309F\u30A0-\u30FF]+", file_number):
sources = ["dlsite", "getchu"]
elif "pcolle" in sources and "pcolle" in lo_file_number:
sources = ["pcolle"]
elif "fc2" in lo_file_number:
sources = ["fc2", "avsox", "msin"]
elif (re.search(r"\d+\D+-", file_number) or "siro" in lo_file_number):
if "mgstage" in sources:
sources = insert(sources, "mgstage")
elif "gcolle" in sources and (re.search("\d{6}", file_number)):
sources = insert(sources, "gcolle")
elif re.search(r"^\d{5,}", file_number) or \
(re.search(r"^\d{6}-\d{3}", file_number)) or "heyzo" in lo_file_number:
sources = ["avsox", "carib", "caribpr", "javbus", "xcity", "javdb"]
elif re.search(r"^[a-z0-9]{3,}$", lo_file_number):
if "xcity" in sources:
sources = insert(sources, "xcity")
if "madou" in sources:
sources = insert(sources, "madou")
# check sources in func_mapping
todel = []
for s in sources:
if not s in self.adult_full_sources and config.getInstance().debug():
print('[!] Source Not Exist : ' + s)
todel.append(s)
for d in todel:
if config.getInstance().debug():
print('[!] Remove Source : ' + d)
sources.remove(d)
return sources
def get_data_state(self, data: dict) -> bool: # 元数据获取失败检测
if "title" not in data or "number" not in data:
return False
if data["title"] is None or data["title"] == "" or data["title"] == "null":
return False
if data["number"] is None or data["number"] == "" or data["number"] == "null":
return False
if (data["cover"] is None or data["cover"] == "" or data["cover"] == "null") \
and (data["cover_small"] is None or data["cover_small"] == "" or
data["cover_small"] == "null"):
return False
return True

94
scrapinglib/avsox.py Normal file
View File

@@ -0,0 +1,94 @@
# -*- coding: utf-8 -*-
from .parser import Parser
class Avsox(Parser):
source = 'avsox'
expr_number = '//span[contains(text(),"识别码:")]/../span[2]/text()'
expr_actor = '//a[@class="avatar-box"]'
expr_actorphoto = '//a[@class="avatar-box"]'
expr_title = '/html/body/div[2]/h3/text()'
expr_studio = '//p[contains(text(),"制作商: ")]/following-sibling::p[1]/a/text()'
expr_release = '//span[contains(text(),"发行时间:")]/../text()'
expr_cover = '/html/body/div[2]/div[1]/div[1]/a/img/@src'
expr_smallcover = '//*[@id="waterfall"]/div/a/div[1]/img/@src'
expr_tags = '/html/head/meta[@name="keywords"]/@content'
expr_label = '//p[contains(text(),"系列:")]/following-sibling::p[1]/a/text()'
expr_series = '//span[contains(text(),"系列:")]/../span[2]/text()'
def extraInit(self):
self.imagecut = 3
self.originalnum = ''
def queryNumberUrl(self, number: str):
upnum = number.upper()
if 'FC2' in upnum and 'FC2-PPV' not in upnum:
number = upnum.replace('FC2', 'FC2-PPV')
self.number = number
qurySiteTree = self.getHtmlTree('https://tellme.pw/avsox')
site = self.getTreeElement(qurySiteTree, '//div[@class="container"]/div/a/@href')
self.searchtree = self.getHtmlTree(site + '/cn/search/' + number)
result1 = self.getTreeElement(self.searchtree, '//*[@id="waterfall"]/div/a/@href')
if result1 == '' or result1 == 'null' or result1 == 'None' or result1.find('movie') == -1:
self.searchtree = self.getHtmlTree(site + '/cn/search/' + number.replace('-', '_'))
result1 = self.getTreeElement(self.searchtree, '//*[@id="waterfall"]/div/a/@href')
if result1 == '' or result1 == 'null' or result1 == 'None' or result1.find('movie') == -1:
self.searchtree = self.getHtmlTree(site + '/cn/search/' + number.replace('_', ''))
result1 = self.getTreeElement(self.searchtree, '//*[@id="waterfall"]/div/a/@href')
if result1 == '' or result1 == 'null' or result1 == 'None' or result1.find('movie') == -1:
return None
return "https:" + result1
def getNum(self, htmltree):
new_number = self.getTreeElement(htmltree, self.expr_number)
if new_number.upper() != self.number.upper():
raise ValueError('number not found in ' + self.source)
self.originalnum = new_number
if 'FC2-PPV' in new_number.upper():
new_number = new_number.upper().replace('FC2-PPV', 'FC2')
self.number = new_number
return self.number
def getTitle(self, htmltree):
return super().getTitle(htmltree).replace('/', '').strip(self.originalnum).strip()
def getStudio(self, htmltree):
return super().getStudio(htmltree).replace("', '", ' ')
def getSmallCover(self, htmltree):
""" 使用搜索页面的预览小图
"""
try:
return self.getTreeElement(self.searchtree, self.expr_smallcover)
except:
self.imagecut = 1
return ''
def getTags(self, htmltree):
tags = self.getTreeElement(htmltree, self.expr_tags).split(',')
return [i.strip() for i in tags[2:]] if len(tags) > 2 else []
def getOutline(self, htmltree):
if self.morestoryline:
from .storyline import getStoryline
return getStoryline(self.number, proxies=self.proxies, verify=self.verify)
return ''
def getActors(self, htmltree):
a = super().getActors(htmltree)
d = []
for i in a:
d.append(i.find('span').text)
return d
def getActorPhoto(self, htmltree):
a = self.getTreeAll(htmltree, self.expr_actorphoto)
d = {}
for i in a:
l = i.find('.//img').attrib['src']
t = i.find('span').text
p2 = {t: l}
d.update(p2)
return d

106
scrapinglib/carib.py Normal file
View File

@@ -0,0 +1,106 @@
# -*- coding: utf-8 -*-
import re
from urllib.parse import urljoin
from lxml import html
from .parser import Parser
class Carib(Parser):
source = 'carib'
expr_title = "//div[@class='movie-info section']/div[@class='heading']/h1[@itemprop='name']/text()"
expr_release = "//li[2]/span[@class='spec-content']/text()"
expr_runtime = "//span[@class='spec-content']/span[@itemprop='duration']/text()"
expr_actor = "//span[@class='spec-content']/a[@itemprop='actor']/span/text()"
expr_tags = "//span[@class='spec-content']/a[@itemprop='genre']/text()"
expr_extrafanart = "//*[@id='sampleexclude']/div[2]/div/div[@class='grid-item']/div/a/@href"
expr_label = "//span[@class='spec-title'][contains(text(),'シリーズ')]/../span[@class='spec-content']/a/text()"
expr_series = "//span[@class='spec-title'][contains(text(),'シリーズ')]/../span[@class='spec-content']/a/text()"
expr_outline = "//div[@class='movie-info section']/p[@itemprop='description']/text()"
def extraInit(self):
self.imagecut = 1
self.uncensored = True
def search(self, number):
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = f'https://www.caribbeancom.com/moviepages/{number}/index.html'
htmlcode = self.getHtml(self.detailurl)
if htmlcode == 404 or 'class="movie-info section"' not in htmlcode:
return 404
htmltree = html.fromstring(htmlcode)
result = self.dictformat(htmltree)
return result
def getStudio(self, htmltree):
return '加勒比'
def getActors(self, htmltree):
r = []
actors = super().getActors(htmltree)
for act in actors:
if str(act) != '':
r.append(act)
return r
def getNum(self, htmltree):
return self.number
def getCover(self, htmltree):
return f'https://www.caribbeancom.com/moviepages/{self.number}/images/l_l.jpg'
def getExtrafanart(self, htmltree):
r = []
genres = self.getTreeAll(htmltree, self.expr_extrafanart)
for g in genres:
jpg = str(g)
if '/member/' in jpg:
break
else:
r.append('https://www.caribbeancom.com' + jpg)
return r
def getTrailer(self, htmltree):
return f'https://smovie.caribbeancom.com/sample/movies/{self.number}/1080p.mp4'
def getActorPhoto(self, htmltree):
htmla = htmltree.xpath("//*[@id='moviepages']/div[@class='container']/div[@class='inner-container']/div[@class='movie-info section']/ul/li[@class='movie-spec']/span[@class='spec-content']/a[@itemprop='actor']")
names = htmltree.xpath("//*[@id='moviepages']/div[@class='container']/div[@class='inner-container']/div[@class='movie-info section']/ul/li[@class='movie-spec']/span[@class='spec-content']/a[@itemprop='actor']/span[@itemprop='name']/text()")
t = {}
for name, a in zip(names, htmla):
if name.strip() == '':
continue
p = {name.strip(): a.attrib['href']}
t.update(p)
o = {}
for k, v in t.items():
if '/search_act/' not in v:
continue
r = self.getHtml(urljoin('https://www.caribbeancom.com', v), type='object')
if not r.ok:
continue
html = r.text
pos = html.find('.full-bg')
if pos<0:
continue
css = html[pos:pos+100]
cssBGjpgs = re.findall(r'background: url\((.+\.jpg)', css, re.I)
if not cssBGjpgs or not len(cssBGjpgs[0]):
continue
p = {k: urljoin(r.url, cssBGjpgs[0])}
o.update(p)
return o
def getOutline(self, htmltree):
if self.morestoryline:
from .storyline import getStoryline
result = getStoryline(self.number, uncensored=self.uncensored,
proxies=self.proxies, verify=self.verify)
if len(result):
return result
return super().getOutline(htmltree)

106
scrapinglib/caribpr.py Normal file
View File

@@ -0,0 +1,106 @@
# -*- coding: utf-8 -*-
import re
from urllib.parse import urljoin
from lxml import html
from .parser import Parser
class Caribpr(Parser):
source = 'caribpr'
expr_title = "//div[@class='movie-info']/div[@class='section is-wide']/div[@class='heading']/h1/text()"
expr_release = "//li[2]/span[@class='spec-content']/text()"
expr_runtime = "//li[3]/span[@class='spec-content']/text()"
expr_actor = "//li[1]/span[@class='spec-content']/a[@class='spec-item']/text()"
expr_tags = "//li[5]/span[@class='spec-content']/a[@class='spec-item']/text()"
expr_extrafanart = "//div[@class='movie-gallery']/div[@class='section is-wide']/div[2]/div[@class='grid-item']/div/a/@href"
# expr_label = "//span[@class='spec-title'][contains(text(),'シリーズ')]/../span[@class='spec-content']/a/text()"
# expr_series = "//span[@class='spec-title'][contains(text(),'シリーズ')]/../span[@class='spec-content']/a/text()"
expr_outline = "//div[@class='movie-info']/div[@class='section is-wide']/p/text()"
def extraInit(self):
self.imagecut = 1
self.uncensored = True
def search(self, number):
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = f'https://www.caribbeancompr.com/moviepages/{number}/index.html'
htmlcode = self.getHtml(self.detailurl)
if htmlcode == 404 or 'class="movie-info"' not in htmlcode:
return 404
htmltree = html.fromstring(htmlcode)
result = self.dictformat(htmltree)
return result
def getStudio(self, htmltree):
return '加勒比'
def getActors(self, htmltree):
r = []
actors = super().getActors(htmltree)
for act in actors:
if str(act) != '':
r.append(act)
return r
def getNum(self, htmltree):
return self.number
def getCover(self, htmltree):
return f'https://www.caribbeancompr.com/moviepages/{self.number}/images/l_l.jpg'
def getExtrafanart(self, htmltree):
r = []
genres = self.getTreeAll(htmltree, self.expr_extrafanart)
for g in genres:
jpg = str(g)
if '/member/' in jpg:
break
else:
r.append(jpg)
return r
def getTrailer(self, htmltree):
return f'https://smovie.caribbeancompr.com/sample/movies/{self.number}/480p.mp4'
def getActorPhoto(self, htmltree):
htmla = htmltree.xpath("//*[@id='moviepages']/div[@class='container']/div[@class='inner-container']/div[@class='movie-info section']/ul/li[@class='movie-spec']/span[@class='spec-content']/a[@itemprop='actor']")
names = htmltree.xpath("//*[@id='moviepages']/div[@class='container']/div[@class='inner-container']/div[@class='movie-info section']/ul/li[@class='movie-spec']/span[@class='spec-content']/a[@itemprop='actor']/span[@itemprop='name']/text()")
t = {}
for name, a in zip(names, htmla):
if name.strip() == '':
continue
p = {name.strip(): a.attrib['href']}
t.update(p)
o = {}
for k, v in t.items():
if '/search_act/' not in v:
continue
r = self.getHtml(urljoin('https://www.caribbeancompr.com', v), type='object')
if not r.ok:
continue
html = r.text
pos = html.find('.full-bg')
if pos<0:
continue
css = html[pos:pos+100]
cssBGjpgs = re.findall(r'background: url\((.+\.jpg)', css, re.I)
if not cssBGjpgs or not len(cssBGjpgs[0]):
continue
p = {k: urljoin(r.url, cssBGjpgs[0])}
o.update(p)
return o
def getOutline(self, htmltree):
if self.morestoryline:
from .storyline import getStoryline
result = getStoryline(self.number, uncensored=self.uncensored,
proxies=self.proxies, verify=self.verify)
if len(result):
return result
return super().getOutline(htmltree)

104
scrapinglib/dlsite.py Normal file
View File

@@ -0,0 +1,104 @@
# -*- coding: utf-8 -*-
import re
from .parser import Parser
class Dlsite(Parser):
source = 'dlsite'
expr_title = '/html/head/title/text()'
expr_actor = '//th[contains(text(),"声优")]/../td/a/text()'
expr_studio = '//th[contains(text(),"商标名")]/../td/span[1]/a/text()'
expr_studio2 = '//th[contains(text(),"社团名")]/../td/span[1]/a/text()'
expr_runtime = '//strong[contains(text(),"時長")]/../span/text()'
expr_runtime2 = '//strong[contains(text(),"時長")]/../span/a/text()'
expr_outline = '//*[@class="work_parts_area"]/p/text()'
expr_series = '//th[contains(text(),"系列名")]/../td/a/text()'
expr_series2 = '//th[contains(text(),"社团名")]/../td/span[1]/a/text()'
expr_director = '//th[contains(text(),"剧情")]/../td/a/text()'
expr_release = '//th[contains(text(),"贩卖日")]/../td/a/text()'
expr_cover = '//*[@id="work_left"]/div/div/div[2]/div/div[1]/div[1]/ul/li[1]/picture/source/@srcset'
expr_tags = '//th[contains(text(),"分类")]/../td/div/a/text()'
expr_label = '//th[contains(text(),"系列名")]/../td/a/text()'
expr_label2 = '//th[contains(text(),"社团名")]/../td/span[1]/a/text()'
expr_extrafanart = '//*[@id="work_left"]/div/div/div[1]/div/@data-src'
def extraInit(self):
self.imagecut = 4
self.allow_number_change = True
def search(self, number):
self.cookies = {'locale': 'zh-cn'}
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
# TODO 应该从页面内获取 number
self.number = str(re.findall("\wJ\w+", self.detailurl)).strip(" [']")
htmltree = self.getHtmlTree(self.detailurl)
elif "RJ" in number or "VJ" in number:
self.number = number.upper()
self.detailurl = 'https://www.dlsite.com/maniax/work/=/product_id/' + self.number + '.html/?locale=zh_CN'
htmltree = self.getHtmlTree(self.detailurl)
else:
self.detailurl = f'https://www.dlsite.com/maniax/fsr/=/language/jp/sex_category/male/keyword/{number}/order/trend/work_type_category/movie'
htmltree = self.getHtmlTree(self.detailurl)
search_result = self.getTreeAll(htmltree, '//*[@id="search_result_img_box"]/li[1]/dl/dd[2]/div[2]/a/@href')
if len(search_result) == 0:
number = number.replace("THE ANIMATION", "").replace("he Animation", "").replace("t", "").replace("T","")
htmltree = self.getHtmlTree(f'https://www.dlsite.com/maniax/fsr/=/language/jp/sex_category/male/keyword/{number}/order/trend/work_type_category/movie')
search_result = self.getTreeAll(htmltree, '//*[@id="search_result_img_box"]/li[1]/dl/dd[2]/div[2]/a/@href')
if len(search_result) == 0:
if "" in number:
number = number.replace("","")
elif "" in number:
number = number.replace("","")
htmltree = self.getHtmlTree(f'https://www.dlsite.com/maniax/fsr/=/language/jp/sex_category/male/keyword/{number}/order/trend/work_type_category/movie')
search_result = self.getTreeAll(htmltree, '//*[@id="search_result_img_box"]/li[1]/dl/dd[2]/div[2]/a/@href')
if len(search_result) == 0:
number = number.replace('上巻', '').replace('下巻', '').replace('前編', '').replace('後編', '')
htmltree = self.getHtmlTree(f'https://www.dlsite.com/maniax/fsr/=/language/jp/sex_category/male/keyword/{number}/order/trend/work_type_category/movie')
search_result = self.getTreeAll(htmltree, '//*[@id="search_result_img_box"]/li[1]/dl/dd[2]/div[2]/a/@href')
self.detailurl = search_result[0]
htmltree = self.getHtmlTree(self.detailurl)
self.number = str(re.findall("\wJ\w+", self.detailurl)).strip(" [']")
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
return self.number
def getTitle(self, htmltree):
result = super().getTitle(htmltree)
result = result[:result.rfind(' | DLsite')]
result = result[:result.rfind(' [')]
if 'OFF】' in result:
result = result[result.find('')+1:]
result = result.replace('【HD版】', '')
return result
def getOutline(self, htmltree):
total = []
result = self.getTreeAll(htmltree, self.expr_outline)
total = [ x.strip() for x in result if x.strip()]
return '\n'.join(total)
def getRelease(self, htmltree):
return super().getRelease(htmltree).replace('','-').replace('','-').replace('','')
def getCover(self, htmltree):
return 'https:' + super().getCover(htmltree).replace('.webp', '.jpg')
def getExtrafanart(self, htmltree):
try:
result = []
for i in self.getTreeAll(self.expr_extrafanart):
result.append("https:" + i)
except:
result = ''
return result
def getTags(self, htmltree):
tags = super().getTags(htmltree)
tags.append("DLsite")
return tags

175
scrapinglib/fanza.py Normal file
View File

@@ -0,0 +1,175 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from urllib.parse import urlencode
from .parser import Parser
class Fanza(Parser):
source = 'fanza'
expr_title = '//*[starts-with(@id, "title")]/text()'
expr_actor = "//td[contains(text(),'出演者')]/following-sibling::td/span/a/text()"
# expr_cover = './/head/meta[@property="og:image"]/@content'
# expr_extrafanart = '//a[@name="sample-image"]/img/@src'
expr_outline = "//div[@class='mg-b20 lh4']/text()"
expr_outline2 = "//div[@class='mg-b20 lh4']//p/text()"
expr_outline_og = '//head/meta[@property="og:description"]/@content'
expr_runtime = "//td[contains(text(),'収録時間')]/following-sibling::td/text()"
def search(self, number):
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
durl = "https://www.dmm.co.jp/age_check/=/declared=yes/?"+ urlencode({"rurl": self.detailurl})
self.htmltree = self.getHtmlTree(durl)
result = self.dictformat(self.htmltree)
return result
# fanza allow letter + number + underscore, normalize the input here
# @note: I only find the usage of underscore as h_test123456789
fanza_search_number = number
# AV_Data_Capture.py.getNumber() over format the input, restore the h_ prefix
if fanza_search_number.startswith("h-"):
fanza_search_number = fanza_search_number.replace("h-", "h_")
fanza_search_number = re.sub(r"[^0-9a-zA-Z_]", "", fanza_search_number).lower()
fanza_urls = [
"https://www.dmm.co.jp/digital/videoa/-/detail/=/cid=",
"https://www.dmm.co.jp/mono/dvd/-/detail/=/cid=",
"https://www.dmm.co.jp/digital/anime/-/detail/=/cid=",
"https://www.dmm.co.jp/mono/anime/-/detail/=/cid=",
"https://www.dmm.co.jp/digital/videoc/-/detail/=/cid=",
"https://www.dmm.co.jp/digital/nikkatsu/-/detail/=/cid=",
"https://www.dmm.co.jp/rental/-/detail/=/cid=",
]
for url in fanza_urls:
self.detailurl = url + fanza_search_number
url = "https://www.dmm.co.jp/age_check/=/declared=yes/?"+ urlencode({"rurl": self.detailurl})
self.htmlcode = self.getHtml(url)
if self.htmlcode != 404 \
and 'Sorry! This content is not available in your region.' not in self.htmlcode:
self.htmltree = etree.HTML(self.htmlcode)
if self.htmltree is not None:
result = self.dictformat(self.htmltree)
return result
return 404
def getNum(self, htmltree):
# for some old page, the input number does not match the page
# for example, the url will be cid=test012
# but the hinban on the page is test00012
# so get the hinban first, and then pass it to following functions
self.fanza_hinban = self.getFanzaString('品番:')
self.number = self.fanza_hinban
number_lo = self.number.lower()
if (re.sub('-|_', '', number_lo) == self.fanza_hinban or
number_lo.replace('-', '00') == self.fanza_hinban or
number_lo.replace('-', '') + 'so' == self.fanza_hinban
):
self.number = self.number
return self.number
def getStudio(self, htmltree):
return self.getFanzaString('メーカー')
def getOutline(self, htmltree):
try:
result = self.getTreeElement(htmltree, self.expr_outline).replace("\n", "")
if result == '':
result = self.getTreeElement(htmltree, self.expr_outline2).replace("\n", "")
if "※ 配信方法によって収録内容が異なる場合があります。" == result:
result = self.getTreeElement(htmltree, self.expr_outline_og)
return result
except:
return ''
def getRuntime(self, htmltree):
return str(re.search(r'\d+', super().getRuntime(htmltree)).group()).strip(" ['']")
def getDirector(self, htmltree):
if "anime" not in self.detailurl:
return self.getFanzaString('監督:')
return ''
def getActors(self, htmltree):
if "anime" not in self.detailurl:
return super().getActors(htmltree)
return ''
def getRelease(self, htmltree):
result = self.getFanzaString('発売日:')
if result == '' or result == '----':
result = self.getFanzaString('配信開始日:')
return result.replace("/", "-").strip('\\n')
def getTags(self, htmltree):
return self.getFanzaStrings('ジャンル:')
def getLabel(self, htmltree):
ret = self.getFanzaString('レーベル')
if ret == "----":
return ''
return ret
def getSeries(self, htmltree):
ret = self.getFanzaString('シリーズ:')
if ret == "----":
return ''
return ret
def getCover(self, htmltree):
cover_number = self.number
try:
result = htmltree.xpath('//*[@id="' + cover_number + '"]/@href')[0]
except:
# sometimes fanza modify _ to \u0005f for image id
if "_" in cover_number:
cover_number = cover_number.replace("_", r"\u005f")
try:
result = htmltree.xpath('//*[@id="' + cover_number + '"]/@href')[0]
except:
# (TODO) handle more edge case
# print(html)
# raise exception here, same behavior as before
# people's major requirement is fetching the picture
raise ValueError("can not find image")
return result
def getExtrafanart(self, htmltree):
htmltext = re.search(r'<div id=\"sample-image-block\"[\s\S]*?<br></div>\s*?</div>', self.htmlcode)
if htmltext:
htmltext = htmltext.group()
extrafanart_images = re.findall(r'<img.*?src=\"(.*?)\"', htmltext)
if extrafanart_images:
sheet = []
for img_url in extrafanart_images:
url_cuts = img_url.rsplit('-', 1)
sheet.append(url_cuts[0] + 'jp-' + url_cuts[1])
return sheet
return ''
def getTrailer(self, htmltree):
htmltext = re.search(r'<script type=\"application/ld\+json\">[\s\S].*}\s*?</script>', self.htmlcode)
if htmltext:
htmltext = htmltext.group()
url = re.search(r'\"contentUrl\":\"(.*?)\"', htmltext)
if url:
url = url.group(1)
url = url.rsplit('_', 2)[0] + '_mhb_w.mp4'
return url
return ''
def getFanzaString(self, expr):
result1 = str(self.htmltree.xpath("//td[contains(text(),'"+expr+"')]/following-sibling::td/a/text()")).strip(" ['']")
result2 = str(self.htmltree.xpath("//td[contains(text(),'"+expr+"')]/following-sibling::td/text()")).strip(" ['']")
return result1+result2
def getFanzaStrings(self, string):
result1 = self.htmltree.xpath("//td[contains(text(),'" + string + "')]/following-sibling::td/a/text()")
if len(result1) > 0:
return result1
result2 = self.htmltree.xpath("//td[contains(text(),'" + string + "')]/following-sibling::td/text()")
return result2

67
scrapinglib/fc2.py Normal file
View File

@@ -0,0 +1,67 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from urllib.parse import urljoin
from .parser import Parser
class Fc2(Parser):
source = 'fc2'
expr_title = '/html/head/title/text()'
expr_studio = '//*[@id="top"]/div[1]/section[1]/div/section/div[2]/ul/li[3]/a/text()'
expr_release = '//*[@id="top"]/div[1]/section[1]/div/section/div[2]/div[2]/p/text()'
expr_runtime = "//p[@class='items_article_info']/text()"
expr_director = '//*[@id="top"]/div[1]/section[1]/div/section/div[2]/ul/li[3]/a/text()'
expr_actor = '//*[@id="top"]/div[1]/section[1]/div/section/div[2]/ul/li[3]/a/text()'
expr_cover = "//div[@class='items_article_MainitemThumb']/span/img/@src"
expr_extrafanart = '//ul[@class="items_article_SampleImagesArea"]/li/a/@href'
expr_tags = "//a[@class='tag tagTag']/text()"
def extraInit(self):
self.imagecut = 0
self.allow_number_change = True
def search(self, number):
self.number = number.lower().replace('fc2-ppv-', '').replace('fc2-', '')
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = 'https://adult.contents.fc2.com/article/' + self.number + '/'
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.HTML(self.htmlcode)
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
return 'FC2-' + self.number
def getRelease(self, htmltree):
return super().getRelease(htmltree).strip(" ['販売日 : ']").replace('/','-')
def getActors(self, htmltree):
actors = super().getActors(htmltree)
if not actors:
actors = '素人'
return actors
def getCover(self, htmltree):
return urljoin('https://adult.contents.fc2.com', super().getCover(htmltree))
def getTrailer(self, htmltree):
video_pather = re.compile(r'\'[a-zA-Z0-9]{32}\'')
video = video_pather.findall(self.htmlcode)
if video:
try:
video_url = video[0].replace('\'', '')
video_url = 'https://adult.contents.fc2.com/api/v2/videos/' + self.number + '/sample?key=' + video_url
url_json = eval(self.getHtml(video_url))['path'].replace('\\', '')
return url_json
except:
return ''
else:
return ''

73
scrapinglib/gcolle.py Normal file
View File

@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from .httprequest import request_session
from .parser import Parser
class Gcolle(Parser):
source = 'gcolle'
expr_r18 = '//*[@id="main_content"]/table[1]/tbody/tr/td[2]/table/tbody/tr/td/h4/a[2]/@href'
expr_number = '//td[contains(text(),"商品番号")]/../td[2]/text()'
expr_title = '//*[@id="cart_quantity"]/table/tr[1]/td/h1/text()'
expr_studio = '//td[contains(text(),"アップロード会員名")]/b/text()'
expr_director = '//td[contains(text(),"アップロード会員名")]/b/text()'
expr_actor = '//td[contains(text(),"アップロード会員名")]/b/text()'
expr_label = '//td[contains(text(),"アップロード会員名")]/b/text()'
expr_series = '//td[contains(text(),"アップロード会員名")]/b/text()'
expr_release = '//td[contains(text(),"商品登録日")]/../td[2]/time/@datetime'
expr_cover = '//*[@id="cart_quantity"]/table/tr[3]/td/table/tr/td/a/@href'
expr_tags = '//*[@id="cart_quantity"]/table/tr[4]/td/a/text()'
expr_outline = '//*[@id="cart_quantity"]/table/tr[3]/td/p/text()'
expr_extrafanart = '//*[@id="cart_quantity"]/table/tr[3]/td/div/img/@src'
expr_extrafanart2 = '//*[@id="cart_quantity"]/table/tr[3]/td/div/a/img/@src'
def extraInit(self):
self.imagecut = 4
def search(self, number: str):
self.number = number.upper().replace('GCOLLE-', '')
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = 'https://gcolle.net/product_info.php/products_id/' + self.number
session = request_session(cookies=self.cookies, proxies=self.proxies, verify=self.verify)
htmlcode = session.get(self.detailurl).text
htmltree = etree.HTML(htmlcode)
r18url = self.getTreeElement(htmltree, self.expr_r18)
if r18url and r18url.startswith('http'):
htmlcode = session.get(r18url).text
htmltree = etree.HTML(htmlcode)
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
num = super().getNum(htmltree)
if self.number != num:
raise Exception(f'[!] {self.number}: find [{num}] in gcolle, not match')
return "GCOLLE-" + str(num)
def getOutline(self, htmltree):
result = self.getTreeAll(htmltree, self.expr_outline)
try:
return "\n".join(result)
except:
return ""
def getRelease(self, htmltree):
return re.findall('\d{4}-\d{2}-\d{2}', super().getRelease(htmltree))[0]
def getCover(self, htmltree):
return "https:" + super().getCover(htmltree)
def getExtrafanart(self, htmltree):
extrafanart = self.getTreeAll(htmltree, self.expr_extrafanart)
if len(extrafanart) == 0:
extrafanart = self.getTreeAll(htmltree, self.expr_extrafanart2)
# Add "https:" in each extrafanart url
for i in range(len(extrafanart)):
extrafanart[i] = 'https:' + extrafanart[i]
return extrafanart

175
scrapinglib/getchu.py Normal file
View File

@@ -0,0 +1,175 @@
# -*- coding: utf-8 -*-
import re
import json
from urllib.parse import quote
from scrapinglib import httprequest
from .parser import Parser
class Getchu():
source = 'getchu'
def scrape(self, number, core: None):
dl = dlGetchu()
www = wwwGetchu()
number = number.replace("-C", "")
dic = {}
if "item" in number:
sort = ["dl.scrape(number, core)", "www.scrape(number, core)"]
else:
sort = ["www.scrape(number, core)", "dl.scrape(number, core)"]
for i in sort:
try:
dic = eval(i)
if dic != None and json.loads(dic).get('title') != '':
break
except:
pass
return dic
class wwwGetchu(Parser):
expr_title = '//*[@id="soft-title"]/text()'
expr_cover = '//head/meta[@property="og:image"]/@content'
expr_director = "//td[contains(text(),'ブランド')]/following-sibling::td/a[1]/text()"
expr_studio = "//td[contains(text(),'ブランド')]/following-sibling::td/a[1]/text()"
expr_actor = "//td[contains(text(),'ブランド')]/following-sibling::td/a[1]/text()"
expr_label = "//td[contains(text(),'ジャンル:')]/following-sibling::td/text()"
expr_release = "//td[contains(text(),'発売日:')]/following-sibling::td/a/text()"
expr_tags = "//td[contains(text(),'カテゴリ')]/following-sibling::td/a/text()"
expr_outline = "//div[contains(text(),'商品紹介')]/following-sibling::div/text()"
expr_extrafanart = "//div[contains(text(),'サンプル画像')]/following-sibling::div/a/@href"
expr_series = "//td[contains(text(),'ジャンル:')]/following-sibling::td/text()"
def extraInit(self):
self.imagecut = 0
self.allow_number_change = True
self.cookies = {'getchu_adalt_flag': 'getchu.com', "adult_check_flag": "1"}
self.GETCHU_WWW_SEARCH_URL = 'http://www.getchu.com/php/search.phtml?genre=anime_dvd&search_keyword=_WORD_&check_key_dtl=1&submit='
def queryNumberUrl(self, number):
if 'GETCHU' in number.upper():
idn = re.findall('\d+',number)[0]
return "http://www.getchu.com/soft.phtml?id=" + idn
else:
queryUrl = self.GETCHU_WWW_SEARCH_URL.replace("_WORD_", quote(number, encoding="euc_jp"))
# NOTE dont know why will try 2 times
retry = 2
for i in range(retry):
queryTree = self.getHtmlTree(queryUrl)
detailurl = self.getTreeElement(queryTree, '//*[@id="detail_block"]/div/table/tr[1]/td/a[1]/@href')
if detailurl:
break
if detailurl == "":
return None
return detailurl.replace('../', 'http://www.getchu.com/')
def getHtml(self, url, type = None):
""" 访问网页(指定EUC-JP)
"""
resp = httprequest.get(url, cookies=self.cookies, proxies=self.proxies, extra_headers=self.extraheader, encoding='euc_jis_2004', verify=self.verify, return_type=type)
if '<title>404 Page Not Found' in resp \
or '<title>未找到页面' in resp \
or '404 Not Found' in resp \
or '<title>404' in resp \
or '<title>お探しの商品が見つかりません' in resp:
return 404
return resp
def getNum(self, htmltree):
return 'GETCHU-' + re.findall('\d+', self.detailurl.replace("http://www.getchu.com/soft.phtml?id=", ""))[0]
def getActors(self, htmltree):
return super().getDirector(htmltree)
def getOutline(self, htmltree):
outline = ''
_list = self.getTreeAll(htmltree, self.expr_outline)
for i in _list:
outline = outline + i.strip()
return outline
def getCover(self, htmltree):
url = super().getCover(htmltree)
if "getchu.com" in url:
return url
return "http://www.getchu.com" + url
def getExtrafanart(self, htmltree):
arts = super().getExtrafanart(htmltree)
extrafanart = []
for i in arts:
i = "http://www.getchu.com" + i.replace("./", '/')
if 'jpg' in i:
extrafanart.append(i)
return extrafanart
def extradict(self, dic: dict):
""" 额外新增的 headers
"""
dic['headers'] = {'referer': self.detailurl}
return dic
def getTags(self, htmltree):
tags = super().getTags(htmltree)
tags.append("Getchu")
return tags
class dlGetchu(wwwGetchu):
""" 二者基本一致
headers extrafanart 略有区别
"""
expr_title = "//div[contains(@style,'color: #333333; padding: 3px 0px 0px 5px;')]/text()"
expr_director = "//td[contains(text(),'作者')]/following-sibling::td/text()"
expr_studio = "//td[contains(text(),'サークル')]/following-sibling::td/a/text()"
expr_label = "//td[contains(text(),'サークル')]/following-sibling::td/a/text()"
expr_runtime = "//td[contains(text(),'画像数&ページ数')]/following-sibling::td/text()"
expr_release = "//td[contains(text(),'配信開始日')]/following-sibling::td/text()"
expr_tags = "//td[contains(text(),'趣向')]/following-sibling::td/a/text()"
expr_outline = "//*[contains(text(),'作品内容')]/following-sibling::td/text()"
expr_extrafanart = "//td[contains(@style,'background-color: #444444;')]/a/@href"
expr_series = "//td[contains(text(),'サークル')]/following-sibling::td/a/text()"
def extraInit(self):
self.imagecut = 4
self.allow_number_change = True
self.cookies = {"adult_check_flag": "1"}
self.extraheader = {"Referer": "https://dl.getchu.com/"}
self.GETCHU_DL_SEARCH_URL = 'https://dl.getchu.com/search/search_list.php?dojin=1&search_category_id=&search_keyword=_WORD_&btnWordSearch=%B8%A1%BA%F7&action=search&set_category_flag=1'
self.GETCHU_DL_URL = 'https://dl.getchu.com/i/item_WORD_'
def queryNumberUrl(self, number):
if "item" in number or 'GETCHU' in number.upper():
self.number = re.findall('\d+',number)[0]
else:
queryUrl = self.GETCHU_DL_SEARCH_URL.replace("_WORD_", quote(number, encoding="euc_jp"))
queryTree = self.getHtmlTree(queryUrl)
detailurl = self.getTreeElement(queryTree, '/html/body/div[1]/table/tr/td/table[4]/tr/td[2]/table/tr[2]/td/table/tr/td/table/tr/td[2]/div/a[1]/@href')
if detailurl == "":
return None
self.number = re.findall('\d+', detailurl)[0]
return self.GETCHU_DL_URL.replace("_WORD_", self.number)
def getNum(self, htmltree):
return 'GETCHU-' + re.findall('\d+', self.number)[0]
def extradict(self, dic: dict):
return dic
def getExtrafanart(self, htmltree):
arts = self.getTreeAll(htmltree, self.expr_extrafanart)
extrafanart = []
for i in arts:
i = "https://dl.getchu.com" + i
extrafanart.append(i)
return extrafanart
def getTags(self, htmltree):
tags = super().getTags(htmltree)
tags.append("Getchu")
return tags

193
scrapinglib/httprequest.py Normal file
View File

@@ -0,0 +1,193 @@
# -*- coding: utf-8 -*-
import mechanicalsoup
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from cloudscraper import create_scraper
import config
G_USER_AGENT = r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.133 Safari/537.36'
G_DEFAULT_TIMEOUT = 10
def get(url: str, cookies=None, ua: str = None, extra_headers=None, return_type: str = None, encoding: str = None,
retry: int = 3, timeout: int = G_DEFAULT_TIMEOUT, proxies=None, verify=None):
"""
网页请求核心函数
是否使用代理应由上层处理
"""
errors = ""
headers = {"User-Agent": ua or G_USER_AGENT}
if extra_headers != None:
headers.update(extra_headers)
for i in range(retry):
try:
result = requests.get(url, headers=headers, timeout=timeout, proxies=proxies,
verify=verify, cookies=cookies)
if return_type == "object":
return result
elif return_type == "content":
return result.content
else:
result.encoding = encoding or result.apparent_encoding
return result.text
except Exception as e:
if config.getInstance().debug():
print(f"[-]Connect: {url} retry {i + 1}/{retry}")
errors = str(e)
if config.getInstance().debug():
if "getaddrinfo failed" in errors:
print("[-]Connect Failed! Please Check your proxy config")
print("[-]" + errors)
else:
print("[-]" + errors)
print('[-]Connect Failed! Please check your Proxy or Network!')
raise Exception('Connect Failed')
def post(url: str, data: dict=None, files=None, cookies=None, ua: str=None, return_type: str=None, encoding: str=None,
retry: int=3, timeout: int=G_DEFAULT_TIMEOUT, proxies=None, verify=None):
"""
是否使用代理应由上层处理
"""
errors = ""
headers = {"User-Agent": ua or G_USER_AGENT}
for i in range(retry):
try:
result = requests.post(url, data=data, files=files, headers=headers, timeout=timeout, proxies=proxies,
verify=verify, cookies=cookies)
if return_type == "object":
return result
elif return_type == "content":
return result.content
else:
result.encoding = encoding or result.apparent_encoding
return result
except Exception as e:
if config.getInstance().debug():
print(f"[-]Connect: {url} retry {i + 1}/{retry}")
errors = str(e)
if config.getInstance().debug():
if "getaddrinfo failed" in errors:
print("[-]Connect Failed! Please Check your proxy config")
print("[-]" + errors)
else:
print("[-]" + errors)
print('[-]Connect Failed! Please check your Proxy or Network!')
raise Exception('Connect Failed')
class TimeoutHTTPAdapter(HTTPAdapter):
def __init__(self, *args, **kwargs):
self.timeout = G_DEFAULT_TIMEOUT
if "timeout" in kwargs:
self.timeout = kwargs["timeout"]
del kwargs["timeout"]
super().__init__(*args, **kwargs)
def send(self, request, **kwargs):
timeout = kwargs.get("timeout")
if timeout is None:
kwargs["timeout"] = self.timeout
return super().send(request, **kwargs)
def request_session(cookies=None, ua: str=None, retry: int=3, timeout: int=G_DEFAULT_TIMEOUT, proxies=None, verify=None):
"""
keep-alive
"""
session = requests.Session()
retries = Retry(total=retry, connect=retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
session.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
session.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(session.cookies, cookies)
if verify:
session.verify = verify
if proxies:
session.proxies = proxies
session.headers = {"User-Agent": ua or G_USER_AGENT}
return session
# storyline xcity only
def get_html_by_form(url, form_select: str = None, fields: dict = None, cookies: dict = None, ua: str = None,
return_type: str = None, encoding: str = None,
retry: int = 3, timeout: int = G_DEFAULT_TIMEOUT, proxies=None, verify=None):
session = requests.Session()
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(session.cookies, cookies)
retries = Retry(total=retry, connect=retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
session.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
session.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
if verify:
session.verify = verify
if proxies:
session.proxies = proxies
try:
browser = mechanicalsoup.StatefulBrowser(user_agent=ua or G_USER_AGENT, session=session)
result = browser.open(url)
if not result.ok:
return None
form = browser.select_form() if form_select is None else browser.select_form(form_select)
if isinstance(fields, dict):
for k, v in fields.items():
browser[k] = v
response = browser.submit_selected()
if return_type == "object":
return response
elif return_type == "content":
return response.content
elif return_type == "browser":
return response, browser
else:
result.encoding = encoding or "utf-8"
return response.text
except requests.exceptions.ProxyError:
print("[-]get_html_by_form() Proxy error! Please check your Proxy")
except Exception as e:
print(f'[-]get_html_by_form() Failed! {e}')
return None
# storyline javdb only
def get_html_by_scraper(url: str = None, cookies: dict = None, ua: str = None, return_type: str = None,
encoding: str = None, retry: int = 3, proxies=None, timeout: int = G_DEFAULT_TIMEOUT, verify=None):
session = create_scraper(browser={'custom': ua or G_USER_AGENT, })
if isinstance(cookies, dict) and len(cookies):
requests.utils.add_dict_to_cookiejar(session.cookies, cookies)
retries = Retry(total=retry, connect=retry, backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504])
session.mount("https://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
session.mount("http://", TimeoutHTTPAdapter(max_retries=retries, timeout=timeout))
if verify:
session.verify = verify
if proxies:
session.proxies = proxies
try:
if isinstance(url, str) and len(url):
result = session.get(str(url))
else: # 空url参数直接返回可重用scraper对象无需设置return_type
return session
if not result.ok:
return None
if return_type == "object":
return result
elif return_type == "content":
return result.content
elif return_type == "scraper":
return result, session
else:
result.encoding = encoding or "utf-8"
return result.text
except requests.exceptions.ProxyError:
print("[-]get_html_by_scraper() Proxy error! Please check your Proxy")
except Exception as e:
print(f"[-]get_html_by_scraper() failed. {e}")
return None

24
scrapinglib/imdb.py Normal file
View File

@@ -0,0 +1,24 @@
# -*- coding: utf-8 -*-
from .parser import Parser
class Imdb(Parser):
source = 'imdb'
imagecut = 0
expr_title = '//h1[@data-testid="hero-title-block__title"]/text()'
expr_release = '//a[contains(text(),"Release date")]/following-sibling::div[1]/ul/li/a/text()'
expr_cover = '//head/meta[@property="og:image"]/@content'
expr_outline = '//head/meta[@property="og:description"]/@content'
expr_actor = '//h3[contains(text(),"Top cast")]/../../../following-sibling::div[1]/div[2]/div/div/a/text()'
expr_tags = '//div[@data-testid="genres"]/div[2]/a/ul/li/text()'
def queryNumberUrl(self, number):
"""
TODO 区分 ID 与 名称
"""
id = number
movieUrl = "https://www.imdb.com/title/" + id
return movieUrl

60
scrapinglib/jav321.py Normal file
View File

@@ -0,0 +1,60 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from . import httprequest
from .parser import Parser
class Jav321(Parser):
source = 'jav321'
expr_title = "/html/body/div[2]/div[1]/div[1]/div[1]/h3/text()"
expr_cover = "/html/body/div[2]/div[2]/div[1]/p/a/img/@src"
expr_outline = "/html/body/div[2]/div[1]/div[1]/div[2]/div[3]/div/text()"
expr_number = '//b[contains(text(),"品番")]/following-sibling::node()'
expr_actor = '//b[contains(text(),"出演者")]/following-sibling::a[starts-with(@href,"/star")]/text()'
expr_label = '//b[contains(text(),"メーカー")]/following-sibling::a[starts-with(@href,"/company")]/text()'
expr_tags = '//b[contains(text(),"ジャンル")]/following-sibling::a[starts-with(@href,"/genre")]/text()'
expr_studio = '//b[contains(text(),"メーカー")]/following-sibling::a[starts-with(@href,"/company")]/text()'
expr_release = '//b[contains(text(),"配信開始日")]/following-sibling::node()'
expr_runtime = '//b[contains(text(),"収録時間")]/following-sibling::node()'
expr_series = '//b[contains(text(),"シリーズ")]/following-sibling::node()'
expr_extrafanart = '//div[@class="col-md-3"]/div[@class="col-xs-12 col-md-12"]/p/a/img/@src'
def queryNumberUrl(self, number):
return 'https://www.jav321.com/search'
def getHtmlTree(self, url):
"""
特殊处理 仅获取页面调用一次
"""
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
resp = httprequest.get(self.detailurl, cookies=self.cookies, proxies=self.proxies, verify=self.verify)
self.detailhtml = resp
return etree.fromstring(resp, etree.HTMLParser())
resp = httprequest.post(url, data={"sn": self.number}, cookies=self.cookies, proxies=self.proxies, verify=self.verify)
if "/video/" in resp.url:
self.detailurl = resp.url
self.detailhtml = resp.text
return etree.fromstring(resp.text, etree.HTMLParser())
return None
def getNum(self, htmltree):
return super().getNum(htmltree).split(": ")[1]
def getTrailer(self, htmltree):
videourl_pather = re.compile(r'<source src=\"(.*?)\"')
videourl = videourl_pather.findall(self.detailhtml)
if videourl:
url = videourl[0].replace('awscc3001.r18.com', 'cc3001.dmm.co.jp').replace('cc3001.r18.com', 'cc3001.dmm.co.jp')
return url
else:
return ''
def getRelease(self, htmltree):
return super().getRelease(htmltree).split(": ")[1]
def getRuntime(self, htmltree):
return super().getRuntime(htmltree).split(": ")[1]

140
scrapinglib/javbus.py Normal file
View File

@@ -0,0 +1,140 @@
# -*- coding: utf-8 -*-
import re
import os
import secrets
import inspect
from lxml import etree
from urllib.parse import urljoin
from .parser import Parser
class Javbus(Parser):
source = 'javbus'
expr_number = '/html/head/meta[@name="keywords"]/@content'
expr_title = '/html/head/title/text()'
expr_studio = '//span[contains(text(),"製作商:")]/../a/text()'
expr_studio2 = '//span[contains(text(),"メーカー:")]/../a/text()'
expr_director = '//span[contains(text(),"導演:")]/../a/text()'
expr_directorJa = '//span[contains(text(),"監督:")]/../a/text()'
expr_series = '//span[contains(text(),"系列:")]/../a/text()'
expr_series2 = '//span[contains(text(),"シリーズ:")]/../a/text()'
expr_label = '//span[contains(text(),"系列:")]/../a/text()'
expr_cover = '//a[@class="bigImage"]/@href'
expr_release = '/html/body/div[5]/div[1]/div[2]/p[2]/text()'
expr_runtime = '/html/body/div[5]/div[1]/div[2]/p[3]/text()'
expr_actor = '//div[@class="star-name"]/a'
expr_actorphoto = '//div[@class="star-name"]/../a/img'
expr_extrafanart = '//div[@id="sample-waterfall"]/a/@href'
expr_tags = '/html/head/meta[@name="keywords"]/@content'
expr_uncensored = '//*[@id="navbar"]/ul[1]/li[@class="active"]/a[contains(@href,"uncensored")]'
def search(self, number):
self.number = number
try:
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
htmltree = self.getHtmlTree(self.detailurl)
result = self.dictformat(htmltree)
return result
try:
self.detailurl = 'https://www.javbus.com/' + number
self.htmlcode = self.getHtml(self.detailurl)
except:
mirror_url = "https://www." + secrets.choice([
'buscdn.fun', 'busdmm.fun', 'busfan.fun', 'busjav.fun',
'cdnbus.fun',
'dmmbus.fun', 'dmmsee.fun',
'seedmm.fun',
]) + "/"
self.detailurl = mirror_url + number
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.fromstring(self.htmlcode,etree.HTMLParser())
result = self.dictformat(htmltree)
return result
except:
self.searchUncensored(number)
def searchUncensored(self, number):
""" 二次搜索无码
"""
self.imagecut = 0
self.uncensored = True
w_number = number.replace('.', '-')
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = 'https://www.javbus.red/' + w_number
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.fromstring(self.htmlcode, etree.HTMLParser())
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
return super().getNum(htmltree).split(',')[0]
def getTitle(self, htmltree):
title = super().getTitle(htmltree)
title = str(re.findall('^.+?\s+(.*) - JavBus$', title)[0]).strip()
return title
def getStudio(self, htmltree):
if self.uncensored:
return self.getTreeElement(htmltree, self.expr_studio2)
else:
return self.getTreeElement(htmltree, self.expr_studio)
def getCover(self, htmltree):
return urljoin("https://www.javbus.com", super().getCover(htmltree))
def getRuntime(self, htmltree):
return super().getRuntime(htmltree).strip(" ['']分鐘")
def getActors(self, htmltree):
actors = super().getActors(htmltree)
b=[]
for i in actors:
b.append(i.attrib['title'])
return b
def getActorPhoto(self, htmltree):
actors = self.getTreeAll(htmltree, self.expr_actorphoto)
d = {}
for i in actors:
p = i.attrib['src']
if "nowprinting.gif" in p:
continue
t = i.attrib['title']
d[t] = urljoin("https://www.javbus.com", p)
return d
def getDirector(self, htmltree):
if self.uncensored:
return self.getTreeElement(htmltree, self.expr_directorJa)
else:
return self.getTreeElement(htmltree, self.expr_director)
def getSeries(self, htmltree):
if self.uncensored:
return self.getTreeElement(htmltree, self.expr_series2)
else:
return self.getTreeElement(htmltree, self.expr_series)
def getTags(self, htmltree):
tags = self.getTreeElement(htmltree, self.expr_tags).split(',')
return tags[2:]
def getOutline(self, htmltree):
if self.morestoryline:
if any(caller for caller in inspect.stack() if os.path.basename(caller.filename) == 'airav.py'):
return '' # 从airav.py过来的调用不计算outline直接返回避免重复抓取数据拖慢处理速度
from .storyline import getStoryline
return getStoryline(self.number , uncensored = self.uncensored,
proxies=self.proxies, verify=self.verify)
return ''

46
scrapinglib/javday.py Normal file
View File

@@ -0,0 +1,46 @@
# -*- coding: utf-8 -*-
from lxml import etree
from .parser import Parser
class Javday(Parser):
source = 'javday'
expr_url = '/html/head/meta[@property="og:url"]/@content'
expr_cover = '/html/head/meta[@property="og:image"]/@content'
expr_tags = '/html/head/meta[@name="keywords"]/@content'
expr_title = "/html/head/title/text()"
expr_actor = "//span[@class='vod_actor']/a/text()"
expr_studio = '//span[@class="producer"]/a/text()'
expr_number = '//span[@class="jpnum"]/text()'
def extraInit(self):
self.imagecut = 4
self.uncensored = True
def search(self, number):
self.number = number.strip().upper()
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = "https://javday.tv/videos/" + self.number.replace("-","") + "/"
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.fromstring(self.htmlcode, etree.HTMLParser())
self.detailurl = self.getTreeElement(htmltree, self.expr_url)
result = self.dictformat(htmltree)
return result
def getTitle(self, htmltree):
title = super().getTitle(htmltree)
# 删除番号和网站名
result = title.replace(self.number,"").replace("- JAVDAY.TV","").strip()
return result
def getTags(self, htmltree) -> list:
tags = super().getTags(htmltree)
return [tag for tag in tags if 'JAVDAY.TV' not in tag]

242
scrapinglib/javdb.py Normal file
View File

@@ -0,0 +1,242 @@
# -*- coding: utf-8 -*-
import re
from urllib.parse import urljoin
from lxml import etree
from .httprequest import request_session
from .parser import Parser
class Javdb(Parser):
source = 'javdb'
expr_number = '//strong[contains(text(),"番號")]/../span/text()'
expr_number2 = '//strong[contains(text(),"番號")]/../span/a/text()'
expr_title = "/html/head/title/text()"
expr_title_no = '//*[contains(@class,"movie-list")]/div/a/div[contains(@class, "video-title")]/text()'
expr_runtime = '//strong[contains(text(),"時長")]/../span/text()'
expr_runtime2 = '//strong[contains(text(),"時長")]/../span/a/text()'
expr_uncensored = '//strong[contains(text(),"類別")]/../span/a[contains(@href,"/tags/uncensored?") or contains(@href,"/tags/western?")]'
expr_actor = '//span[@class="value"]/a[contains(@href,"/actors/")]/text()'
expr_actor2 = '//span[@class="value"]/a[contains(@href,"/actors/")]/../strong/@class'
expr_release = '//strong[contains(text(),"日期")]/../span/text()'
expr_release_no = '//*[contains(@class,"movie-list")]/div/a/div[contains(@class, "meta")]/text()'
expr_studio = '//strong[contains(text(),"片商")]/../span/a/text()'
expr_studio2 = '//strong[contains(text(),"賣家:")]/../span/a/text()'
expr_director = '//strong[contains(text(),"導演")]/../span/text()'
expr_director2 = '//strong[contains(text(),"導演")]/../span/a/text()'
expr_cover = "//div[contains(@class, 'column-video-cover')]/a/img/@src"
expr_cover2 = "//div[contains(@class, 'column-video-cover')]/img/@src"
expr_cover_no = '//*[contains(@class,"movie-list")]/div/a/div[contains(@class, "cover")]/img/@src'
expr_trailer = '//span[contains(text(),"預告片")]/../../video/source/@src'
expr_extrafanart = "//article[@class='message video-panel']/div[@class='message-body']/div[@class='tile-images preview-images']/a[contains(@href,'/samples/')]/@href"
expr_tags = '//strong[contains(text(),"類別")]/../span/a/text()'
expr_tags2 = '//strong[contains(text(),"類別")]/../span/text()'
expr_series = '//strong[contains(text(),"系列")]/../span/text()'
expr_series2 = '//strong[contains(text(),"系列")]/../span/a/text()'
expr_label = '//strong[contains(text(),"系列")]/../span/text()'
expr_label2 = '//strong[contains(text(),"系列")]/../span/a/text()'
expr_userrating = '//span[@class="score-stars"]/../text()'
expr_uservotes = '//span[@class="score-stars"]/../text()'
expr_actorphoto = '//strong[contains(text(),"演員:")]/../span/a[starts-with(@href,"/actors/")]'
def extraInit(self):
self.fixstudio = False
self.noauth = False
def updateCore(self, core):
if core.proxies:
self.proxies = core.proxies
if core.verify:
self.verify = core.verify
if core.morestoryline:
self.morestoryline = True
if core.specifiedSource == self.source:
self.specifiedUrl = core.specifiedUrl
# special
if core.dbcookies:
self.cookies = core.dbcookies
else:
self.cookies = {'over18':'1', 'theme':'auto', 'locale':'zh'}
if core.dbsite:
self.dbsite = core.dbsite
else:
self.dbsite = 'javdb'
def search(self, number: str):
self.number = number
self.session = request_session(cookies=self.cookies, proxies=self.proxies, verify=self.verify)
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = self.queryNumberUrl(number)
self.deatilpage = self.session.get(self.detailurl).text
if '此內容需要登入才能查看或操作' in self.deatilpage or '需要VIP權限才能訪問此內容' in self.deatilpage:
self.noauth = True
self.imagecut = 0
result = self.dictformat(self.querytree)
else:
htmltree = etree.fromstring(self.deatilpage, etree.HTMLParser())
result = self.dictformat(htmltree)
return result
def queryNumberUrl(self, number):
javdb_url = 'https://' + self.dbsite + '.com/search?q=' + number + '&f=all'
try:
resp = self.session.get(javdb_url)
except Exception as e:
#print(e)
raise Exception(f'[!] {self.number}: page not fond in javdb')
self.querytree = etree.fromstring(resp.text, etree.HTMLParser())
# javdb sometime returns multiple results,
# and the first elememt maybe not the one we are looking for
# iterate all candidates and find the match one
urls = self.getTreeAll(self.querytree, '//*[contains(@class,"movie-list")]/div/a/@href')
# 记录一下欧美的ids ['Blacked','Blacked']
if re.search(r'[a-zA-Z]+\.\d{2}\.\d{2}\.\d{2}', number):
correct_url = urls[0]
else:
ids = self.getTreeAll(self.querytree, '//*[contains(@class,"movie-list")]/div/a/div[contains(@class, "video-title")]/strong/text()')
try:
self.queryid = ids.index(number)
correct_url = urls[self.queryid]
except:
# 为避免获得错误番号,只要精确对应的结果
if ids[0].upper() != number.upper():
raise ValueError("number not found in javdb")
correct_url = urls[0]
return urljoin(resp.url, correct_url)
def getNum(self, htmltree):
if self.noauth:
return self.number
# 番号被分割开,需要合并后才是完整番号
part1 = self.getTreeElement(htmltree, self.expr_number)
part2 = self.getTreeElement(htmltree, self.expr_number2)
dp_number = part2 + part1
# NOTE 检测匹配与更新 self.number
if dp_number.upper() != self.number.upper():
raise Exception(f'[!] {self.number}: find [{dp_number}] in javdb, not match')
self.number = dp_number
return self.number
def getTitle(self, htmltree):
if self.noauth:
return self.getTreeElement(htmltree, self.expr_title_no, self.queryid)
browser_title = super().getTitle(htmltree)
title = browser_title[:browser_title.find(' | JavDB')].strip()
return title.replace(self.number, '').strip()
def getCover(self, htmltree):
if self.noauth:
return self.getTreeElement(htmltree, self.expr_cover_no, self.queryid)
return super().getCover(htmltree)
def getRelease(self, htmltree):
if self.noauth:
return self.getTreeElement(htmltree, self.expr_release_no, self.queryid).strip()
return super().getRelease(htmltree)
def getDirector(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_director, self.expr_director2)
def getSeries(self, htmltree):
# NOTE 不清楚javdb是否有一部影片多个series的情况暂时保留
results = self.getTreeAllbyExprs(htmltree, self.expr_series, self.expr_series2)
result = ''.join(results)
if not result and self.fixstudio:
result = self.getStudio(htmltree)
return result
def getLabel(self, htmltree):
results = self.getTreeAllbyExprs(htmltree, self.expr_label, self.expr_label2)
result = ''.join(results)
if not result and self.fixstudio:
result = self.getStudio(htmltree)
return result
def getActors(self, htmltree):
actors = self.getTreeAll(htmltree, self.expr_actor)
genders = self.getTreeAll(htmltree, self.expr_actor2)
r = []
idx = 0
# NOTE only female, we dont care others
actor_gendor = 'female'
for act in actors:
if((actor_gendor == 'all')
or (actor_gendor == 'both' and genders[idx] in ['symbol female', 'symbol male'])
or (actor_gendor == 'female' and genders[idx] == 'symbol female')
or (actor_gendor == 'male' and genders[idx] == 'symbol male')):
r.append(act)
idx = idx + 1
if re.match(r'FC2-[\d]+', self.number, re.A) and not r:
r = '素人'
self.fixstudio = True
return r
def getOutline(self, htmltree):
if self.morestoryline:
from .storyline import getStoryline
return getStoryline(self.number, self.getUncensored(htmltree),
proxies=self.proxies, verify=self.verify)
return ''
def getTrailer(self, htmltree):
video = super().getTrailer(htmltree)
# 加上数组判空
if video:
if not 'https:' in video:
video_url = 'https:' + video
else:
video_url = video
else:
video_url = ''
return video_url
def getTags(self, htmltree):
return self.getTreeAllbyExprs(htmltree, self.expr_tags, self.expr_tags2)
def getUserRating(self, htmltree):
try:
numstrs = self.getTreeElement(htmltree, self.expr_userrating)
nums = re.findall('[0-9.]+', numstrs)
return float(nums[0])
except:
return ''
def getUserVotes(self, htmltree):
try:
result = self.getTreeElement(htmltree, self.expr_uservotes)
v = re.findall('[0-9.]+', result)
return int(v[1])
except:
return ''
def getaphoto(self, url, session):
html_page = session.get(url).text
img_url = re.findall(r'<span class\=\"avatar\" style\=\"background\-image\: url\((.*?)\)', html_page)
return img_url[0] if img_url else ''
def getActorPhoto(self, htmltree):
actorall = self.getTreeAll(htmltree, self.expr_actorphoto)
if not actorall:
return {}
actors = self.getActors(htmltree)
actor_photo = {}
for i in actorall:
x = re.findall(r'/actors/(.*)', i.attrib['href'], re.A)
if not len(x) or not len(x[0]) or i.text not in actors:
continue
# NOTE: https://c1.jdbstatic.com 会经常变动,直接使用页面内的地址获取
# actor_id = x[0]
# pic_url = f"https://c1.jdbstatic.com/avatars/{actor_id[:2].lower()}/{actor_id}.jpg"
# if not self.session.head(pic_url).ok:
try:
pic_url = self.getaphoto(urljoin('https://javdb.com', i.attrib['href']), self.session)
if len(pic_url):
actor_photo[i.text] = pic_url
except:
pass
return actor_photo

84
scrapinglib/javlibrary.py Normal file
View File

@@ -0,0 +1,84 @@
# -*- coding: utf-8 -*-
from lxml import etree
from .httprequest import request_session
from .parser import Parser
class Javlibrary(Parser):
source = 'javlibrary'
expr_number = '//div[@id="video_id"]/table/tr/td[@class="text"]/text()'
expr_title = '//div[@id="video_title"]/h3/a/text()'
expr_actor = '//div[@id="video_cast"]/table/tr/td[@class="text"]/span/span[@class="star"]/a/text()'
expr_tags = '//div[@id="video_genres"]/table/tr/td[@class="text"]/span/a/text()'
expr_cover = '//img[@id="video_jacket_img"]/@src'
expr_release = '//div[@id="video_date"]/table/tr/td[@class="text"]/text()'
expr_studio = '//div[@id="video_maker"]/table/tr/td[@class="text"]/span/a/text()'
expr_runtime = '//div[@id="video_length"]/table/tr/td/span[@class="text"]/text()'
expr_userrating = '//div[@id="video_review"]/table/tr/td/span[@class="score"]/text()'
expr_director = '//div[@id="video_director"]/table/tr/td[@class="text"]/span/a/text()'
expr_extrafanart = '//div[@class="previewthumbs"]/img/@src'
def extraInit(self):
self.htmltree = None
def updateCore(self, core):
if core.proxies:
self.proxies = core.proxies
if core.verify:
self.verify = core.verify
if core.morestoryline:
self.morestoryline = True
if core.specifiedSource == self.source:
self.specifiedUrl = core.specifiedUrl
self.cookies = {'over18':'1'}
def search(self, number):
self.number = number.upper()
self.session = request_session(cookies=self.cookies, proxies=self.proxies, verify=self.verify)
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = self.queryNumberUrl(self.number)
if not self.detailurl:
return 404
if self.htmltree is None:
deatils = self.session.get(self.detailurl)
self.htmltree = etree.fromstring(deatils.text, etree.HTMLParser())
result = self.dictformat(self.htmltree)
return result
def queryNumberUrl(self, number:str):
queryUrl = "http://www.javlibrary.com/cn/vl_searchbyid.php?keyword=" + number
queryResult = self.session.get(queryUrl)
if queryResult and "/?v=jav" in queryResult.url:
self.htmltree = etree.fromstring(queryResult.text, etree.HTMLParser())
return queryResult.url
else:
queryTree = etree.fromstring(queryResult.text, etree.HTMLParser())
numbers = queryTree.xpath('//div[@class="id"]/text()')
if number in numbers:
urls = queryTree.xpath('//div[@class="id"]/../@href')
detailurl = urls[numbers.index(number)]
return "http://www.javlibrary.com/cn" + detailurl.strip('.')
return None
def getTitle(self, htmltree):
title = super().getTitle(htmltree)
title = title.replace(self.getNum(htmltree), '').strip()
return title
def getCover(self, htmltree):
url = super().getCover(htmltree)
if not url.startswith('http'):
url = 'https:' + url
return url
def getOutline(self, htmltree):
if self.morestoryline:
from .storyline import getStoryline
return getStoryline(self.number, self.getUncensored(htmltree),
proxies=self.proxies, verify=self.verify)
return ''

61
scrapinglib/javmenu.py Normal file
View File

@@ -0,0 +1,61 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from urllib.parse import urljoin
from .parser import Parser
class Javmenu(Parser):
source = 'javmenu'
expr_title = '/html/head/meta[@property="og:title"]/@content'
expr_cover = '/html/head/meta[@property="og:image"]/@content'
expr_number = '//span[contains(text(),"番號") or contains(text(),"番号")]/../a/text()'
expr_number2 = '//span[contains(text(),"番號") or contains(text(),"番号")]/../span[2]/text()'
expr_runtime = '//span[contains(text(),"時長;") or contains(text(),"时长")]/../span[2]/text()'
expr_release = '//span[contains(text(),"日期")]/../span[2]/text()'
expr_studio = '//span[contains(text(),"製作")]/../span[2]/a/text()'
expr_actor = '//a[contains(@class,"actress")]/text()'
expr_tags = '//a[contains(@class,"genre")]/text()'
def extraInit(self):
self.imagecut = 4
self.uncensored = True
def search(self, number):
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = 'https://javmenu.com/zh/' + self.number + '/'
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.HTML(self.htmlcode)
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
# 番号被分割开,需要合并后才是完整番号
part1 = self.getTreeElement(htmltree, self.expr_number)
part2 = self.getTreeElement(htmltree, self.expr_number2)
dp_number = part1 + part2
# NOTE 检测匹配与更新 self.number
if dp_number.upper() != self.number.upper():
raise Exception(f'[!] {self.number}: find [{dp_number}] in javmenu, not match')
self.number = dp_number
return self.number
def getTitle(self, htmltree):
browser_title = super().getTitle(htmltree)
# 删除番号
number = re.findall("\d+",self.number)[1]
title = browser_title.split(number,1)[-1]
title = title.replace(' | JAV目錄大全 | 每日更新',"")
title = title.replace(' | JAV目录大全 | 每日更新',"").strip()
return title.replace(self.number, '').strip()

94
scrapinglib/madou.py Normal file
View File

@@ -0,0 +1,94 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from urllib.parse import urlparse, unquote
from .parser import Parser
NUM_RULES3=[
r'(mmz{2,4})-?(\d{2,})(-ep\d*|-\d*)?.*',
r'(msd)-?(\d{2,})(-ep\d*|-\d*)?.*',
r'(yk)-?(\d{2,})(-ep\d*|-\d*)?.*',
r'(pm)-?(\d{2,})(-ep\d*|-\d*)?.*',
r'(mky-[a-z]{2,2})-?(\d{2,})(-ep\d*|-\d*)?.*',
]
# modou提取number
def change_number(number):
number = number.lower().strip()
m = re.search(r'(md[a-z]{0,2})-?(\d{2,})(-ep\d*|-\d*)?.*', number, re.I)
if m:
return f'{m.group(1)}{m.group(2).zfill(4)}{m.group(3) or ""}'
for rules in NUM_RULES3:
m = re.search(rules, number, re.I)
if m:
return f'{m.group(1)}{m.group(2).zfill(3)}{m.group(3) or ""}'
return number
class Madou(Parser):
source = 'madou'
expr_url = '//a[@class="share-weixin"]/@data-url'
expr_title = "/html/head/title/text()"
expr_studio = '//a[@rel="category tag"]/text()'
expr_tags = '/html/head/meta[@name="keywords"]/@content'
def extraInit(self):
self.imagecut = 4
self.uncensored = True
self.allow_number_change = True
def search(self, number):
self.number = change_number(number)
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = "https://madou.club/" + number + ".html"
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.fromstring(self.htmlcode, etree.HTMLParser())
self.detailurl = self.getTreeElement(htmltree, self.expr_url)
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
try:
# 解码url
filename = unquote(urlparse(self.detailurl).path)
# 裁剪文件名
result = filename[1:-5].upper().strip()
# 移除中文
if result.upper() != self.number.upper():
result = re.split(r'[^\x00-\x7F]+', result, 1)[0]
# 移除多余的符号
return result.strip('-')
except:
return ''
def getTitle(self, htmltree):
# <title>MD0140-2 / 家有性事EP2 爱在身边-麻豆社</title>
# <title>MAD039 机灵可爱小叫花 强诱僧人迫犯色戒-麻豆社</title>
# <title>MD0094贫嘴贱舌中出大嫂坏嫂嫂和小叔偷腥内射受孕-麻豆社</title>
# <title>TM0002-我的痴女女友-麻豆社</title>
browser_title = str(super().getTitle(htmltree))
title = str(re.findall(r'^[A-Z0-9 /\-]*(.*)-麻豆社$', browser_title)[0]).strip()
return title
def getCover(self, htmltree):
try:
url = str(re.findall("shareimage : '(.*?)'", self.htmlcode)[0])
return url.strip()
except:
return ''
def getTags(self, htmltree):
studio = self.getStudio(htmltree)
tags = super().getTags(htmltree)
return [tag for tag in tags if studio not in tag and '麻豆' not in tag]

55
scrapinglib/mgstage.py Normal file
View File

@@ -0,0 +1,55 @@
# -*- coding: utf-8 -*-
from .parser import Parser
class Mgstage(Parser):
source = 'mgstage'
expr_number = '//th[contains(text(),"品番:")]/../td/a/text()'
expr_title = '//*[@id="center_column"]/div[1]/h1/text()'
expr_studio = '//th[contains(text(),"メーカー:")]/../td/a/text()'
expr_outline = '//dl[@id="introduction"]/dd/p/text()'
expr_runtime = '//th[contains(text(),"収録時間:")]/../td/a/text()'
expr_director = '//th[contains(text(),"シリーズ")]/../td/a/text()'
expr_actor = '//th[contains(text(),"出演:")]/../td/a/text()'
expr_release = '//th[contains(text(),"配信開始日:")]/../td/a/text()'
expr_cover = '//*[@id="EnlargeImage"]/@href'
expr_label = '//th[contains(text(),"レーベル:")]/../td/a/text()'
expr_tags = '//th[contains(text(),"ジャンル:")]/../td/a/text()'
expr_tags2 = '//th[contains(text(),"ジャンル:")]/../td/text()'
expr_series = '//th[contains(text(),"シリーズ")]/../td/a/text()'
expr_extrafanart = '//a[@class="sample_image"]/@href'
def extraInit(self):
self.imagecut = 4
def search(self, number):
self.number = number.upper()
self.cookies = {'adc': '1'}
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = 'https://www.mgstage.com/product/product_detail/' + str(self.number) + '/'
htmltree = self.getHtmlTree(self.detailurl)
result = self.dictformat(htmltree)
return result
def getTitle(self, htmltree):
return super().getTitle(htmltree).replace('/', ',').strip()
def getTags(self, htmltree):
return self.getTreeAllbyExprs(htmltree, self.expr_tags, self.expr_tags2)
def getTreeAll(self, tree, expr):
alls = super().getTreeAll(tree, expr)
return [ x.strip() for x in alls if x.strip()]
def getTreeElement(self, tree, expr, index=0):
if expr == '':
return ''
result1 = ''.join(self.getTreeAll(tree, expr))
result2 = ''.join(self.getTreeAll(tree, expr.replace('td/a/','td/')))
if result1 == result2:
return result1
return result1 + result2

70
scrapinglib/msin.py Normal file
View File

@@ -0,0 +1,70 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from .httprequest import request_session
from .parser import Parser
class Msin(Parser):
source = 'msin'
expr_number = '//div[@class="mv_fileName"]/text()'
expr_title = '//div[@class="mv_title"]/text()'
expr_title_unsubscribe = '//div[@class="mv_title unsubscribe"]/text()'
expr_studio = '//a[@class="mv_writer"]/text()'
expr_director = '//a[@class="mv_writer"]/text()'
expr_actor = '//div[contains(text(),"出演者:")]/following-sibling::div[1]/div/div[@class="performer_text"]/a/text()'
expr_label = '//a[@class="mv_mfr"]/text()'
expr_series = '//a[@class="mv_mfr"]/text()'
expr_release = '//a[@class="mv_createDate"]/text()'
expr_cover = '//div[@class="movie_top"]/img/@src'
expr_tags = '//div[@class="mv_tag"]/label/text()'
expr_genres = '//div[@class="mv_genre"]/label/text()'
# expr_outline = '//p[@class="fo-14"]/text()'
# expr_extrafanart = '//*[@class="item-nav"]/ul/li/a/img/@src'
# expr_extrafanart2 = '//*[@id="cart_quantity"]/table/tr[3]/td/div/a/img/@src'
def extraInit(self):
self.imagecut = 4
def search(self, number: str):
self.number = number.lower().replace('fc2-ppv-', '').replace('fc2-', '')
self.cookies = {"age": "off"}
self.detailurl = 'https://db.msin.jp/search/movie?str=fc2-ppv-' + self.number
session = request_session(cookies=self.cookies, proxies=self.proxies, verify=self.verify)
htmlcode = session.get(self.detailurl).text
htmltree = etree.HTML(htmlcode)
# if title are null, use unsubscribe title
if super().getTitle(htmltree) == "":
self.expr_title = self.expr_title_unsubscribe
# if tags are null, use genres
if len(super().getTags(htmltree)) == 0:
self.expr_tags = self.expr_genres
if len(super().getActors(htmltree)) == 0:
self.expr_actor = self.expr_director
result = self.dictformat(htmltree)
return result
def getActors(self, htmltree):
actors = super().getActors(htmltree)
i = 0
while i < len(actors):
actors[i] = actors[i].replace("FC2動画", "")
i = i + 1
return actors
def getTags(self, htmltree) -> list:
return super().getTags(htmltree)
def getRelease(self, htmltree):
return super().getRelease(htmltree).replace('', '-').replace('', '-').replace('', '')
def getCover(self, htmltree):
if ".gif" in super().getCover(htmltree) and len(super().getExtrafanart(htmltree)) != 0:
return super().getExtrafanart(htmltree)[0]
return super().getCover(htmltree)
def getNum(self, htmltree):
return 'FC2-' + self.number

323
scrapinglib/parser.py Normal file
View File

@@ -0,0 +1,323 @@
# -*- coding: utf-8 -*-
import json
import re
from lxml import etree, html
import config
from . import httprequest
from .utils import getTreeElement, getTreeAll
class Parser:
""" 基础刮削类
"""
source = 'base'
# xpath expr
expr_number = ''
expr_title = ''
expr_studio = ''
expr_studio2 = ''
expr_runtime = ''
expr_runtime2 = ''
expr_release = ''
expr_outline = ''
expr_director = ''
expr_actor = ''
expr_tags = ''
expr_label = ''
expr_label2 = ''
expr_series = ''
expr_series2 = ''
expr_cover = ''
expr_cover2 = ''
expr_smallcover = ''
expr_extrafanart = ''
expr_trailer = ''
expr_actorphoto = ''
expr_uncensored = ''
expr_userrating = ''
expr_uservotes = ''
def init(self):
""" 初始化参数
"""
# 推荐剪切poster封面:
# `0` 复制cover
# `1` 裁剪cover
# `3` 下载小封面
self.imagecut = 1
self.uncensored = False
self.allow_number_change = False
# update
self.proxies = None
self.verify = None
self.extraheader = None
self.cookies = None
self.morestoryline = False
self.specifiedUrl = None
self.extraInit()
def extraInit(self):
""" 自定义初始化内容
"""
pass
def scrape(self, number, core: None):
""" 刮削番号
"""
# 每次调用,初始化参数
self.init()
self.updateCore(core)
result = self.search(number)
return result
def search(self, number):
""" 查询番号
查询主要流程:
1. 获取 url
2. 获取详情页面
3. 解析
4. 返回 result
"""
self.number = number
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
self.detailurl = self.queryNumberUrl(number)
if not self.detailurl:
return 404
htmltree = self.getHtmlTree(self.detailurl)
result = self.dictformat(htmltree)
return result
def updateCore(self, core):
""" 从`core`内更新参数
针对需要传递的参数: cookies, proxy等
子类继承后修改
"""
if not core:
return
if core.proxies:
self.proxies = core.proxies
if core.verify:
self.verify = core.verify
if core.morestoryline:
self.morestoryline = True
if core.specifiedSource == self.source:
self.specifiedUrl = core.specifiedUrl
def queryNumberUrl(self, number):
""" 根据番号查询详细信息url
需要针对不同站点修改,或者在上层直接获取
备份查询页面,预览图可能需要
"""
url = "http://detailurl.ai/" + number
return url
def getHtml(self, url, type = None):
""" 访问网页
"""
resp = httprequest.get(url, cookies=self.cookies, proxies=self.proxies, extra_headers=self.extraheader, verify=self.verify, return_type=type)
if '<title>404 Page Not Found' in resp \
or '<title>未找到页面' in resp \
or '404 Not Found' in resp \
or '<title>404' in resp \
or '<title>お探しの商品が見つかりません' in resp:
return 404
return resp
def getHtmlTree(self, url, type = None):
""" 访问网页,返回`etree`
"""
resp = self.getHtml(url, type)
if resp == 404:
return 404
ret = etree.fromstring(resp, etree.HTMLParser())
return ret
def dictformat(self, htmltree):
try:
dic = {
'number': self.getNum(htmltree),
'title': self.getTitle(htmltree),
'studio': self.getStudio(htmltree),
'release': self.getRelease(htmltree),
'year': self.getYear(htmltree),
'outline': self.getOutline(htmltree),
'runtime': self.getRuntime(htmltree),
'director': self.getDirector(htmltree),
'actor': self.getActors(htmltree),
'actor_photo': self.getActorPhoto(htmltree),
'cover': self.getCover(htmltree),
'cover_small': self.getSmallCover(htmltree),
'extrafanart': self.getExtrafanart(htmltree),
'trailer': self.getTrailer(htmltree),
'tag': self.getTags(htmltree),
'label': self.getLabel(htmltree),
'series': self.getSeries(htmltree),
'userrating': self.getUserRating(htmltree),
'uservotes': self.getUserVotes(htmltree),
'uncensored': self.getUncensored(htmltree),
'website': self.detailurl,
'source': self.source,
'imagecut': self.getImagecut(htmltree),
}
dic = self.extradict(dic)
except Exception as e:
if config.getInstance().debug():
print(e)
dic = {"title": ""}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, separators=(',', ':'))
return js
def extradict(self, dic:dict):
""" 额外修改dict
"""
return dic
def getNum(self, htmltree):
""" 增加 strip 过滤
"""
return self.getTreeElement(htmltree, self.expr_number)
def getTitle(self, htmltree):
return self.getTreeElement(htmltree, self.expr_title).strip()
def getRelease(self, htmltree):
return self.getTreeElement(htmltree, self.expr_release).strip().replace('/','-')
def getYear(self, htmltree):
""" year基本都是从release中解析的
"""
try:
release = self.getRelease(htmltree)
return str(re.findall('\d{4}', release)).strip(" ['']")
except:
return release
def getRuntime(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_runtime, self.expr_runtime2).strip().rstrip('mi')
def getOutline(self, htmltree):
return self.getTreeElement(htmltree, self.expr_outline).strip()
def getDirector(self, htmltree):
return self.getTreeElement(htmltree, self.expr_director).strip()
def getActors(self, htmltree) -> list:
return self.getTreeAll(htmltree, self.expr_actor)
def getTags(self, htmltree) -> list:
alls = self.getTreeAll(htmltree, self.expr_tags)
tags = []
for t in alls:
for tag in t.strip().split(','):
tag = tag.strip()
if tag:
tags.append(tag)
return tags
def getStudio(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_studio, self.expr_studio2)
def getLabel(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_label, self.expr_label2)
def getSeries(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_series, self.expr_series2)
def getCover(self, htmltree):
return self.getTreeElementbyExprs(htmltree, self.expr_cover, self.expr_cover2)
def getSmallCover(self, htmltree):
return self.getTreeElement(htmltree, self.expr_smallcover)
def getExtrafanart(self, htmltree) -> list:
return self.getTreeAll(htmltree, self.expr_extrafanart)
def getTrailer(self, htmltree):
return self.getTreeElement(htmltree, self.expr_trailer)
def getActorPhoto(self, htmltree) -> dict:
return {}
def getUncensored(self, htmltree) -> bool:
"""
tag: 無码 無修正 uncensored 无码
title: 無碼 無修正 uncensored
"""
if self.uncensored:
return self.uncensored
tags = [x.lower() for x in self.getTags(htmltree) if len(x)]
title = self.getTitle(htmltree)
if self.expr_uncensored:
u = self.getTreeAll(htmltree, self.expr_uncensored)
self.uncensored = bool(u)
elif '無码' in tags or '無修正' in tags or 'uncensored' in tags or '无码' in tags:
self.uncensored = True
elif '無码' in title or '無修正' in title or 'uncensored' in title.lower():
self.uncensored = True
return self.uncensored
def getImagecut(self, htmltree):
""" 修正 无码poster不裁剪cover
"""
# if self.imagecut == 1 and self.getUncensored(htmltree):
# self.imagecut = 0
return self.imagecut
def getUserRating(self, htmltree):
numstrs = self.getTreeElement(htmltree, self.expr_userrating)
nums = re.findall('[0-9.]+', numstrs)
if len(nums) == 1:
return float(nums[0])
return ''
def getUserVotes(self, htmltree):
votestrs = self.getTreeElement(htmltree, self.expr_uservotes)
votes = re.findall('[0-9]+', votestrs)
if len(votes) == 1:
return int(votes[0])
return ''
def getTreeElement(self, tree: html.HtmlElement, expr, index=0):
""" 根据表达式从`xmltree`中获取匹配值,默认 index 为 0
"""
return getTreeElement(tree, expr, index)
def getTreeAll(self, tree: html.HtmlElement, expr):
""" 根据表达式从`xmltree`中获取全部匹配值
"""
return getTreeAll(tree, expr)
def getTreeElementbyExprs(self, tree: html.HtmlElement, expr, expr2=''):
""" 多个表达式获取element
使用内部的 getTreeElement 防止继承修改后出现问题
"""
try:
first = self.getTreeElement(tree, expr).strip()
if first:
return first
second = self.getTreeElement(tree, expr2).strip()
if second:
return second
return ''
except:
return ''
def getTreeAllbyExprs(self, tree: html.HtmlElement, expr, expr2=''):
""" 多个表达式获取所有element
合并并剔除重复元素
"""
try:
result1 = self.getTreeAll(tree, expr)
result2 = self.getTreeAll(tree, expr2)
clean = [ x.strip() for x in result1 if x.strip() and x.strip() != ',']
clean2 = [ x.strip() for x in result2 if x.strip() and x.strip() != ',']
result = list(set(clean + clean2))
return result
except:
return []

58
scrapinglib/pcolle.py Normal file
View File

@@ -0,0 +1,58 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from .httprequest import request_session
from .parser import Parser
class Pcolle(Parser):
source = 'pcolle'
expr_number = '//th[contains(text(),"商品ID")]/../td/text()'
expr_title = '//div[@class="title-04"]/div/text()'
expr_studio = '//th[contains(text(),"販売会員")]/../td/a/text()'
expr_director = '//th[contains(text(),"販売会員")]/../td/a/text()'
expr_actor = '//th[contains(text(),"販売会員")]/../td/a/text()'
expr_label = '//th[contains(text(),"カテゴリー")]/../td/ul/li/a/text()'
expr_series = '//th[contains(text(),"カテゴリー")]/../td/ul/li/a/text()'
expr_release = '//th[contains(text(),"販売開始日")]/../td/text()'
expr_cover = '/html/body/div[1]/div/div[4]/div[2]/div/div[1]/div/article/a/img/@src'
expr_tags = '//p[contains(text(),"商品タグ")]/../ul/li/a/text()'
expr_outline = '//p[@class="fo-14"]/text()'
expr_extrafanart = '//*[@class="item-nav"]/ul/li/a/img/@src'
# expr_extrafanart2 = '//*[@id="cart_quantity"]/table/tr[3]/td/div/a/img/@src'
def extraInit(self):
self.imagecut = 4
def search(self, number: str):
self.number = number.upper().replace('PCOLLE-', '')
self.detailurl = 'https://www.pcolle.com/product/detail/?product_id=' + self.number
session = request_session(cookies=self.cookies, proxies=self.proxies, verify=self.verify)
htmlcode = session.get(self.detailurl).text
htmltree = etree.HTML(htmlcode)
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
num = super().getNum(htmltree).upper()
if self.number != num:
raise Exception(f'[!] {self.number}: find [{num}] in pcolle, not match')
return "PCOLLE-" + str(num)
def getOutline(self, htmltree):
result = self.getTreeAll(htmltree, self.expr_outline)
try:
return "\n".join(result)
except:
return ""
def getRelease(self, htmltree):
return super().getRelease(htmltree).replace('', '-').replace('', '-').replace('', '')
def getCover(self, htmltree):
if ".gif" in super().getCover(htmltree) and len(super().getExtrafanart(htmltree)) != 0:
return super().getExtrafanart(htmltree)[0]
return super().getCover(htmltree)

87
scrapinglib/pissplay.py Normal file
View File

@@ -0,0 +1,87 @@
# -*- coding: utf-8 -*-
import re
from lxml import etree
from .parser import Parser
from datetime import datetime
# 搜刮 https://pissplay.com/ 中的视频
# pissplay中的视频没有番号所以要通过文件名搜索
# 只用文件名和网站视频名完全一致时才可以被搜刮
class Pissplay(Parser):
source = 'pissplay'
expr_number = '//*[@id="video_title"]/text()' #这个网站上的视频没有番号,因此用标题代替
expr_title = '//*[@id="video_title"]/text()'
expr_cover = '/html/head//meta[@property="og:image"]/@content'
expr_tags = '//div[@id="video_tags"]/a/text()'
expr_release = '//div[@class="video_date"]/text()'
expr_outline = '//*[@id="video_description"]/p//text()'
def extraInit(self):
self.imagecut = 0 # 不裁剪封面
self.specifiedSource = None
def search(self, number):
self.number = number.strip().upper()
if self.specifiedUrl:
self.detailurl = self.specifiedUrl
else:
newName = re.sub(r"[^a-zA-Z0-9 ]", "", number) # 删除特殊符号
self.detailurl = "https://pissplay.com/videos/" + newName.lower().replace(" ","-") + "/"
self.htmlcode = self.getHtml(self.detailurl)
if self.htmlcode == 404:
return 404
htmltree = etree.fromstring(self.htmlcode, etree.HTMLParser())
result = self.dictformat(htmltree)
return result
def getNum(self, htmltree):
title = self.getTitle(htmltree)
return title
def getTitle(self, htmltree):
title = super().getTitle(htmltree)
title = re.sub(r"[^a-zA-Z0-9 ]", "", title) # 删除特殊符号
return title
def getCover(self, htmltree):
url = super().getCover(htmltree)
if not url.startswith('http'):
url = 'https:' + url
return url
def getRelease(self, htmltree):
releaseDate = super().getRelease(htmltree)
isoData = datetime.strptime(releaseDate, '%d %b %Y').strftime('%Y-%m-%d')
return isoData
def getStudio(self, htmltree):
return 'PissPlay'
def getTags(self, htmltree):
tags = self.getTreeAll(htmltree, self.expr_tags)
if 'Guests' in tags:
if tags[0] == 'Collaboration' or tags[0] == 'Toilet for a Day' or tags[0] == 'Collaboration':
del tags[1]
else:
tags = tags[1:]
return tags
def getActors(self, htmltree) -> list:
tags = self.getTreeAll(htmltree, self.expr_tags)
if 'Guests' in tags:
if tags[0] == 'Collaboration' or tags[0] == 'Toilet for a Day' or tags[0] == 'Collaboration':
return [tags[1]]
else:
return [tags[0]]
else:
return ['Bruce and Morgan']
def getOutline(self, htmltree):
outline = self.getTreeAll(htmltree, self.expr_outline)
if ' Morgan xx' in outline:
num = outline.index(' Morgan xx')
outline = outline[:num]
rstring = ''.join(outline).replace("&","and")
return rstring

274
scrapinglib/storyline.py Normal file
View File

@@ -0,0 +1,274 @@
# -*- coding: utf-8 -*-
"""
此部分暂未修改
"""
import json
import os
import re
import time
import secrets
import builtins
import config
from urllib.parse import urljoin
from lxml.html import fromstring
from multiprocessing.dummy import Pool as ThreadPool
from .airav import Airav
from .xcity import Xcity
from .httprequest import get_html_by_form, get_html_by_scraper, request_session
# 舍弃 Amazon 源
G_registered_storyline_site = {"airavwiki", "airav", "avno1", "xcity", "58avgo"}
G_mode_txt = ('顺序执行','线程池')
def is_japanese(raw: str) -> bool:
"""
日语简单检测
"""
return bool(re.search(r'[\u3040-\u309F\u30A0-\u30FF\uFF66-\uFF9F]', raw, re.UNICODE))
class noThread(object):
def map(self, fn, param):
return list(builtins.map(fn, param))
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
pass
# 获取剧情介绍 从列表中的站点同时查,取值优先级从前到后
def getStoryline(number, title=None, sites: list=None, uncensored=None, proxies=None, verify=None):
start_time = time.time()
debug = False
storyine_sites = config.getInstance().storyline_site().split(",") # "1:airav,4:airavwiki".split(',')
if uncensored:
storyine_sites = config.getInstance().storyline_uncensored_site().split(
",") + storyine_sites # "3:58avgo".split(',')
else:
storyine_sites = config.getInstance().storyline_censored_site().split(
",") + storyine_sites # "2:airav,5:xcity".split(',')
r_dup = set()
sort_sites = []
for s in storyine_sites:
if s in G_registered_storyline_site and s not in r_dup:
sort_sites.append(s)
r_dup.add(s)
# sort_sites.sort()
mp_args = ((site, number, title, debug, proxies, verify) for site in sort_sites)
cores = min(len(sort_sites), os.cpu_count())
if cores == 0:
return ''
run_mode = 1
with ThreadPool(cores) if run_mode > 0 else noThread() as pool:
results = pool.map(getStoryline_mp, mp_args)
sel = ''
# 以下debug结果输出会写入日志
s = f'[!]Storyline{G_mode_txt[run_mode]}模式运行{len(sort_sites)}个任务共耗时(含启动开销){time.time() - start_time:.3f}秒,结束于{time.strftime("%H:%M:%S")}'
sel_site = ''
for site, desc in zip(sort_sites, results):
if isinstance(desc, str) and len(desc):
if not is_japanese(desc):
sel_site, sel = site, desc
break
if not len(sel_site):
sel_site, sel = site, desc
for site, desc in zip(sort_sites, results):
sl = len(desc) if isinstance(desc, str) else 0
s += f'[选中{site}字数:{sl}]' if site == sel_site else f'{site}字数:{sl}' if sl else f'{site}:空'
if config.getInstance().debug():
print(s)
return sel
def getStoryline_mp(args):
(site, number, title, debug, proxies, verify) = args
start_time = time.time()
storyline = None
if not isinstance(site, str):
return storyline
elif site == "airavwiki":
storyline = getStoryline_airavwiki(number, debug, proxies, verify)
elif site == "airav":
storyline = getStoryline_airav(number, debug, proxies, verify)
elif site == "avno1":
storyline = getStoryline_avno1(number, debug, proxies, verify)
elif site == "xcity":
storyline = getStoryline_xcity(number, debug, proxies, verify)
elif site == "58avgo":
storyline = getStoryline_58avgo(number, debug, proxies, verify)
if not debug:
return storyline
if config.getInstance().debug():
print("[!]MP 线程[{}]运行{:.3f}秒,结束于{}返回结果: {}".format(
site,
time.time() - start_time,
time.strftime("%H:%M:%S"),
storyline if isinstance(storyline, str) and len(storyline) else '[空]')
)
return storyline
def getStoryline_airav(number, debug, proxies, verify):
try:
site = secrets.choice(('airav.cc','airav4.club'))
url = f'https://{site}/searchresults.aspx?Search={number}&Type=0'
session = request_session(proxies=proxies, verify=verify)
res = session.get(url)
if not res:
raise ValueError(f"get_html_by_session('{url}') failed")
lx = fromstring(res.text)
urls = lx.xpath('//div[@class="resultcontent"]/ul/li/div/a[@class="ga_click"]/@href')
txts = lx.xpath('//div[@class="resultcontent"]/ul/li/div/a[@class="ga_click"]/h3[@class="one_name ga_name"]/text()')
detail_url = None
for txt, url in zip(txts, urls):
if re.search(number, txt, re.I):
detail_url = urljoin(res.url, url)
break
if detail_url is None:
raise ValueError("number not found")
res = session.get(detail_url)
if not res.ok:
raise ValueError(f"session.get('{detail_url}') failed")
lx = fromstring(res.text)
t = str(lx.xpath('/html/head/title/text()')[0]).strip()
airav_number = str(re.findall(r'^\s*\[(.*?)]', t)[0])
if not re.search(number, airav_number, re.I):
raise ValueError(f"page number ->[{airav_number}] not match")
desc = str(lx.xpath('//span[@id="ContentPlaceHolder1_Label2"]/text()')[0]).strip()
return desc
except Exception as e:
if debug:
print(f"[-]MP getStoryline_airav Error: {e},number [{number}].")
pass
return None
def getStoryline_airavwiki(number, debug, proxies, verify):
try:
kwd = number[:6] if re.match(r'\d{6}[\-_]\d{2,3}', number) else number
airavwiki = Airav()
airavwiki.addtion_Javbus = False
airavwiki.proxies = proxies
airavwiki.verify = verify
jsons = airavwiki.search(kwd)
outline = json.loads(jsons).get('outline')
return outline
except Exception as e:
if debug:
print(f"[-]MP def getStoryline_airavwiki Error: {e}, number [{number}].")
pass
return ''
def getStoryline_58avgo(number, debug, proxies, verify):
try:
url = 'http://58avgo.com/cn/index.aspx' + secrets.choice([
'', '?status=3', '?status=4', '?status=7', '?status=9', '?status=10', '?status=11', '?status=12',
'?status=1&Sort=Playon', '?status=1&Sort=dateupload', 'status=1&Sort=dateproduce'
]) # 随机选一个避免网站httpd日志中单个ip的请求太过单一
kwd = number[:6] if re.match(r'\d{6}[\-_]\d{2,3}', number) else number
result, browser = get_html_by_form(url,
fields = {'ctl00$TextBox_SearchKeyWord' : kwd},
proxies=proxies, verify=verify,
return_type = 'browser')
if not result:
raise ValueError(f"get_html_by_form('{url}','{number}') failed")
if f'searchresults.aspx?Search={kwd}' not in browser.url:
raise ValueError("number not found")
s = browser.page.select('div.resultcontent > ul > li.listItem > div.one-info-panel.one > a.ga_click')
link = None
for a in s:
title = a.h3.text.strip()
list_number = title[title.rfind(' ')+1:].strip()
if re.search(number, list_number, re.I):
link = a
break
if link is None:
raise ValueError("number not found")
result = browser.follow_link(link)
if not result.ok or 'playon.aspx' not in browser.url:
raise ValueError("detail page not found")
title = browser.page.select_one('head > title').text.strip()
detail_number = str(re.findall('\[(.*?)]', title)[0])
if not re.search(number, detail_number, re.I):
raise ValueError(f"detail page number not match, got ->[{detail_number}]")
return browser.page.select_one('#ContentPlaceHolder1_Label2').text.strip()
except Exception as e:
if debug:
print(f"[-]MP getOutline_58avgo Error: {e}, number [{number}].")
pass
return ''
def getStoryline_avno1(number, debug, proxies, verify): #获取剧情介绍 从avno1.cc取得
try:
site = secrets.choice(['1768av.club','2nine.net','av999.tv','avno1.cc',
'hotav.biz','iqq2.xyz','javhq.tv',
'www.hdsex.cc','www.porn18.cc','www.xxx18.cc',])
url = f'http://{site}/cn/search.php?kw_type=key&kw={number}'
lx = fromstring(get_html_by_scraper(url, proxies=proxies, verify=verify))
descs = lx.xpath('//div[@class="type_movie"]/div/ul/li/div/@data-description')
titles = lx.xpath('//div[@class="type_movie"]/div/ul/li/div/a/h3/text()')
if not descs or not len(descs):
raise ValueError(f"number not found")
partial_num = bool(re.match(r'\d{6}[\-_]\d{2,3}', number))
for title, desc in zip(titles, descs):
page_number = title[title.rfind(' ')+1:].strip()
if not partial_num:
# 不选择title中带破坏版的简介
if re.match(f'^{number}$', page_number, re.I) and title.rfind('破坏版')== -1:
return desc.strip()
elif re.search(number, page_number, re.I):
return desc.strip()
raise ValueError(f"page number ->[{page_number}] not match")
except Exception as e:
if debug:
print(f"[-]MP getOutline_avno1 Error: {e}, number [{number}].")
pass
return ''
def getStoryline_avno1OLD(number, debug, proxies, verify): #获取剧情介绍 从avno1.cc取得
try:
url = 'http://www.avno1.cc/cn/' + secrets.choice(['usercenter.php?item=' +
secrets.choice(['pay_support', 'qa', 'contact', 'guide-vpn']),
'?top=1&cat=hd', '?top=1', '?cat=hd', 'porn', '?cat=jp', '?cat=us', 'recommend_category.php'
]) # 随机选一个避免网站httpd日志中单个ip的请求太过单一
result, browser = get_html_by_form(url,
form_select='div.wrapper > div.header > div.search > form',
fields = {'kw' : number},
proxies=proxies, verify=verify,
return_type = 'browser')
if not result:
raise ValueError(f"get_html_by_form('{url}','{number}') failed")
s = browser.page.select('div.type_movie > div > ul > li > div')
for div in s:
title = div.a.h3.text.strip()
page_number = title[title.rfind(' ')+1:].strip()
if re.search(number, page_number, re.I):
return div['data-description'].strip()
raise ValueError(f"page number ->[{page_number}] not match")
except Exception as e:
if debug:
print(f"[-]MP getOutline_avno1 Error: {e}, number [{number}].")
pass
return ''
def getStoryline_xcity(number, debug, proxies, verify): #获取剧情介绍 从xcity取得
try:
xcityEngine = Xcity()
xcityEngine.proxies = proxies
xcityEngine.verify = verify
jsons = xcityEngine.search(number)
outline = json.loads(jsons).get('outline')
return outline
except Exception as e:
if debug:
print(f"[-]MP getOutline_xcity Error: {e}, number [{number}].")
pass
return ''

35
scrapinglib/tmdb.py Normal file
View File

@@ -0,0 +1,35 @@
# -*- coding: utf-8 -*-
from .parser import Parser
class Tmdb(Parser):
"""
两种实现,带apikey与不带key
apikey
"""
source = 'tmdb'
imagecut = 0
apikey = None
expr_title = '//head/meta[@property="og:title"]/@content'
expr_release = '//div/span[@class="release"]/text()'
expr_cover = '//head/meta[@property="og:image"]/@content'
expr_outline = '//head/meta[@property="og:description"]/@content'
# def search(self, number):
# self.detailurl = self.queryNumberUrl(number)
# detailpage = self.getHtml(self.detailurl)
def queryNumberUrl(self, number):
"""
TODO 区分 ID 与 名称
"""
id = number
movieUrl = "https://www.themoviedb.org/movie/" + id + "?language=zh-CN"
return movieUrl
def getCover(self, htmltree):
return "https://www.themoviedb.org" + self.getTreeElement(htmltree, self.expr_cover)

31
scrapinglib/utils.py Normal file
View File

@@ -0,0 +1,31 @@
# -*- coding: utf-8 -*-
from lxml.html import HtmlElement
def getTreeElement(tree: HtmlElement, expr='', index=0):
""" 根据表达式从`xmltree`中获取匹配值,默认 index 为 0
:param tree (html.HtmlElement)
:param expr
:param index
"""
if expr == '':
return ''
result = tree.xpath(expr)
try:
return result[index]
except:
return ''
def getTreeAll(tree: HtmlElement, expr=''):
""" 根据表达式从`xmltree`中获取全部匹配值
:param tree (html.HtmlElement)
:param expr
:param index
"""
if expr == '':
return []
result = tree.xpath(expr)
try:
return result
except:
return []

92
scrapinglib/xcity.py Normal file
View File

@@ -0,0 +1,92 @@
# -*- coding: utf-8 -*-
import re
import secrets
from urllib.parse import urljoin
from .httprequest import get_html_by_form
from .parser import Parser
class Xcity(Parser):
source = 'xcity'
expr_number = '//*[@id="hinban"]/text()'
expr_title = '//*[@id="program_detail_title"]/text()'
expr_actor = '//ul/li[@class="credit-links"]/a/text()'
expr_actor_link = '//ul/li[@class="credit-links"]/a'
expr_actorphoto = '//div[@class="frame"]/div/p/img/@src'
expr_studio = '//*[@id="avodDetails"]/div/div[3]/div[2]/div/ul[1]/li[4]/a/span/text()'
expr_studio2 = '//strong[contains(text(),"片商")]/../following-sibling::span/a/text()'
expr_runtime = '//span[@class="koumoku" and text()="収録時間"]/../text()'
expr_label = '//*[@id="avodDetails"]/div/div[3]/div[2]/div/ul[1]/li[5]/a/span/text()'
expr_release = '//*[@id="avodDetails"]/div/div[3]/div[2]/div/ul[1]/li[2]/text()'
expr_tags = '//span[@class="koumoku" and text()="ジャンル"]/../a[starts-with(@href,"/avod/genre/")]/text()'
expr_cover = '//*[@id="avodDetails"]/div/div[3]/div[1]/p/a/@href'
expr_director = '//*[@id="program_detail_director"]/text()'
expr_series = "//span[contains(text(),'シリーズ')]/../a/span/text()"
expr_series2 = "//span[contains(text(),'シリーズ')]/../span/text()"
expr_extrafanart = '//div[@id="sample_images"]/div/a/@href'
expr_outline = '//head/meta[@property="og:description"]/@content'
def queryNumberUrl(self, number):
xcity_number = number.replace('-','')
query_result, browser = get_html_by_form(
'https://xcity.jp/' + secrets.choice(['sitemap/','policy/','law/','help/','main/']),
fields = {'q' : xcity_number.lower()},
cookies=self.cookies, proxies=self.proxies, verify=self.verify,
return_type = 'browser')
if not query_result or not query_result.ok:
raise ValueError("xcity.py: page not found")
prelink = browser.links('avod\/detail')[0]['href']
return urljoin('https://xcity.jp', prelink)
def getStudio(self, htmltree):
return super().getStudio(htmltree).strip('+').replace("', '", '').replace('"', '')
def getRuntime(self, htmltree):
return self.getTreeElement(htmltree, self.expr_runtime, 1).strip()
def getRelease(self, htmltree):
try:
result = self.getTreeElement(htmltree, self.expr_release, 1)
return re.findall('\d{4}/\d{2}/\d{2}', result)[0].replace('/','-')
except:
return ''
def getCover(self, htmltree):
try:
result = super().getCover(htmltree)
return 'https:' + result
except:
return ''
def getDirector(self, htmltree):
try:
result = super().getDirector(htmltree).replace(u'\n','').replace(u'\t', '')
return result
except:
return ''
def getActorPhoto(self, htmltree):
treea = self.getTreeAll(htmltree, self.expr_actor_link)
t = {i.text.strip(): i.attrib['href'] for i in treea}
o = {}
for k, v in t.items():
actorpageUrl = "https://xcity.jp" + v
try:
adtree = self.getHtmlTree(actorpageUrl)
picUrl = self.getTreeElement(adtree, self.expr_actorphoto)
if 'noimage.gif' in picUrl:
continue
o[k] = urljoin("https://xcity.jp", picUrl)
except:
pass
return o
def getExtrafanart(self, htmltree):
arts = self.getTreeAll(htmltree, self.expr_extrafanart)
extrafanart = []
for i in arts:
i = "https:" + i
extrafanart.append(i)
return extrafanart

81
siro.py
View File

@@ -1,81 +0,0 @@
import re
from lxml import etree
import json
import requests
from bs4 import BeautifulSoup
def get_html(url):#网页请求核心
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}
cookies = {'adc':'1'}
getweb = requests.get(str(url),timeout=5,cookies=cookies,headers=headers).text
try:
return getweb
except:
print("[-]Connect Failed! Please check your Proxy.")
def getTitle(a):
html = etree.fromstring(a, etree.HTMLParser())
result = str(html.xpath('//*[@id="center_column"]/div[2]/h1/text()')).strip(" ['']")
return result
def getActor(a): #//*[@id="center_column"]/div[2]/div[1]/div/table/tbody/tr[1]/td/text()
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[1]/td/a/text()')).strip(" ['\\n ']")
return result
def getStudio(a):
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[2]/td/a/text()')).strip(" ['\\n ']")
return result
def getRuntime(a):
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[3]/td/text()')).strip(" ['\\n ']")
return result
def getNum(a):
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[4]/td/text()')).strip(" ['\\n ']")
return result
def getYear(a):
html = etree.fromstring(a, etree.HTMLParser())
#result=str(html.xpath('//table[2]/tr[5]/td/text()')).strip(" ['\\n ']")
result=str(html.xpath('//table[2]/tr[5]/td/text()')).strip(" ['\\n ']")
return result
def getRelease(a):
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[5]/td/text()')).strip(" ['\\n ']")
return result
def getTag(a):
html = etree.fromstring(a, etree.HTMLParser())
result=str(html.xpath('//table[2]/tr[9]/td/text()')).strip(" ['\\n ']")
return result
def getCover(htmlcode):
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('//*[@id="center_column"]/div[2]/div[1]/div/div/h2/img/@src')).strip(" ['']")
return result
def getDirector(a):
html = etree.fromstring(a, etree.HTMLParser())
result = str(html.xpath('//table[2]/tr[7]/td/a/text()')).strip(" ['\\n ']")
return result
def getOutline(htmlcode):
html = etree.fromstring(htmlcode, etree.HTMLParser())
result = str(html.xpath('//*[@id="introduction"]/dd/p[1]/text()')).strip(" ['']")
return result
def main(number):
htmlcode=get_html('https://www.mgstage.com/product/product_detail/'+str(number))
soup = BeautifulSoup(htmlcode, 'lxml')
a = str(soup.find(attrs={'class': 'detail_data'})).replace('\n ','')
dic = {
'title': getTitle(htmlcode).replace("\\n",'').replace(' ',''),
'studio': getStudio(a),
'year': getYear(a),
'outline': getOutline(htmlcode),
'runtime': getRuntime(a),
'director': getDirector(a),
'actor': getActor(a),
'release': getRelease(a),
'number': number,
'cover': getCover(htmlcode),
'imagecut': 0,
'tag':' ',
}
js = json.dumps(dic, ensure_ascii=False, sort_keys=True, indent=4, separators=(',', ':'),)#.encode('UTF-8')
return js

5
update_check.json Normal file
View File

@@ -0,0 +1,5 @@
{
"version": "4.6.7",
"version_show": "4.6.7",
"download": "https://github.com/yoshiko2/AV_Data_Capture/releases"
}

12
wrapper/FreeBSD.sh Executable file
View File

@@ -0,0 +1,12 @@
pkg install python39 py39-requests py39-pip py39-lxml py39-pillow py39-cloudscraper py39-pysocks git zip py39-beautifulsoup448 py39-mechanicalsoup
pip install pyinstaller
pyinstaller --onefile Movie_Data_Capture.py --hidden-import ADC_function.py --hidden-import core.py \
--hidden-import "ImageProcessing.cnn" \
--python-option u \
--add-data "$(python3.9 -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1):cloudscraper" \
--add-data "$(python3.9 -c 'import opencc as _; print(_.__path__[0])' | tail -n 1):opencc" \
--add-data "$(python3.9 -c 'import face_recognition_models as _; print(_.__path__[0])' | tail -n 1):face_recognition_models" \
--add-data "Img:Img" \
--add-data "config.ini:." \
cp config.ini ./dist

24
wrapper/Linux.sh Executable file
View File

@@ -0,0 +1,24 @@
#if [ '$(dpkg --print-architecture)' != 'amd64' ] || [ '$(dpkg --print-architecture)' != 'i386' ]; then
# apt install python3 python3-pip git sudo libxml2-dev libxslt-dev build-essential wget nano libcmocka-dev libcmocka0 -y
# apt install zlib* libjpeg-dev -y
#wget https://files.pythonhosted.org/packages/82/96/21ba3619647bac2b34b4996b2dbbea8e74a703767ce24192899d9153c058/pyinstaller-4.0.tar.gz
#tar -zxvf pyinstaller-4.0.tar.gz
#cd pyinstaller-4.0/bootloader
#sed -i "s/ '-Werror',//" wscript
#python3 ./waf distclean all
#cd ../
#python3 setup.py install
#cd ../
#fi
pip3 install -r requirements.txt
pip3 install cloudscraper==1.2.52
pyinstaller --onefile Movie_Data_Capture.py --hidden-import ADC_function.py --hidden-import core.py \
--hidden-import "ImageProcessing.cnn" \
--python-option u \
--add-data "$(python3 -c 'import cloudscraper as _; print(_.__path__[0])' | tail -n 1):cloudscraper" \
--add-data "$(python3 -c 'import opencc as _; print(_.__path__[0])' | tail -n 1):opencc" \
--add-data "$(python3 -c 'import face_recognition_models as _; print(_.__path__[0])' | tail -n 1):face_recognition_models" \
--add-data "Img:Img" \
--add-data "config.ini:." \
cp config.ini ./dist

329
xlog.py Executable file
View File

@@ -0,0 +1,329 @@
import os
import sys
import time
from datetime import datetime
import traceback
import threading
import json
import shutil
CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0
class Logger:
def __init__(self, name, buffer_size=0, file_name=None, roll_num=1):
self.err_color = '\033[0m'
self.warn_color = '\033[0m'
self.debug_color = '\033[0m'
self.reset_color = '\033[0m'
self.set_console_color = lambda color: sys.stderr.write(color)
self.name = str(name)
self.file_max_size = 1024 * 1024
self.buffer_lock = threading.Lock()
self.buffer = {} # id => line
self.buffer_size = buffer_size
self.last_no = 0
self.min_level = NOTSET
self.log_fd = None
self.roll_num = roll_num
if file_name:
self.set_file(file_name)
def set_buffer(self, buffer_size):
with self.buffer_lock:
self.buffer_size = buffer_size
buffer_len = len(self.buffer)
if buffer_len > self.buffer_size:
for i in range(self.last_no - buffer_len, self.last_no - self.buffer_size):
try:
del self.buffer[i]
except:
pass
def setLevel(self, level):
if level == "DEBUG":
self.min_level = DEBUG
elif level == "INFO":
self.min_level = INFO
elif level == "WARN":
self.min_level = WARN
elif level == "ERROR":
self.min_level = ERROR
elif level == "FATAL":
self.min_level = FATAL
else:
print(("log level not support:%s", level))
def set_color(self):
self.err_color = None
self.warn_color = None
self.debug_color = None
self.reset_color = None
self.set_console_color = lambda x: None
if hasattr(sys.stderr, 'isatty') and sys.stderr.isatty():
if os.name == 'nt':
self.err_color = 0x04
self.warn_color = 0x06
self.debug_color = 0x002
self.reset_color = 0x07
import ctypes
SetConsoleTextAttribute = ctypes.windll.kernel32.SetConsoleTextAttribute
GetStdHandle = ctypes.windll.kernel32.GetStdHandle
self.set_console_color = lambda color: SetConsoleTextAttribute(GetStdHandle(-11), color)
elif os.name == 'posix':
self.err_color = '\033[31m'
self.warn_color = '\033[33m'
self.debug_color = '\033[32m'
self.reset_color = '\033[0m'
self.set_console_color = lambda color: sys.stderr.write(color)
def set_file(self, file_name):
self.log_filename = file_name
if os.path.isfile(file_name):
self.file_size = os.path.getsize(file_name)
if self.file_size > self.file_max_size:
self.roll_log()
self.file_size = 0
else:
self.file_size = 0
self.log_fd = open(file_name, "a+")
def roll_log(self):
for i in range(self.roll_num, 1, -1):
new_name = "%s.%d" % (self.log_filename, i)
old_name = "%s.%d" % (self.log_filename, i - 1)
if not os.path.isfile(old_name):
continue
# self.info("roll_log %s -> %s", old_name, new_name)
shutil.move(old_name, new_name)
shutil.move(self.log_filename, self.log_filename + ".1")
def log_console(self, level, console_color, fmt, *args, **kwargs):
try:
console_string = '[%s] %s\n' % (level, fmt % args)
self.set_console_color(console_color)
sys.stderr.write(console_string)
self.set_console_color(self.reset_color)
except:
pass
def log_to_file(self, level, console_color, fmt, *args, **kwargs):
if self.log_fd:
if level == 'e':
string = '%s' % (fmt % args)
else:
time_str = datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:23]
string = '%s [%s] [%s] %s\n' % (time_str, self.name, level, fmt % args)
self.log_fd.write(string)
try:
self.log_fd.flush()
except:
pass
self.file_size += len(string)
if self.file_size > self.file_max_size:
self.log_fd.close()
self.log_fd = None
self.roll_log()
self.log_fd = open(self.log_filename, "w")
self.file_size = 0
def log(self, level, console_color, html_color, fmt, *args, **kwargs):
self.buffer_lock.acquire()
try:
self.log_console(level, console_color, fmt, *args, **kwargs)
self.log_to_file(level, console_color, fmt, *args, **kwargs)
if self.buffer_size:
self.last_no += 1
self.buffer[self.last_no] = string
buffer_len = len(self.buffer)
if buffer_len > self.buffer_size:
del self.buffer[self.last_no - self.buffer_size]
except Exception as e:
string = '%s - [%s]LOG_EXCEPT: %s, Except:%s<br> %s' % (
time.ctime()[4:-5], level, fmt % args, e, traceback.format_exc())
self.last_no += 1
self.buffer[self.last_no] = string
buffer_len = len(self.buffer)
if buffer_len > self.buffer_size:
del self.buffer[self.last_no - self.buffer_size]
finally:
self.buffer_lock.release()
def debug(self, fmt, *args, **kwargs):
if self.min_level > DEBUG:
return
self.log('-', self.debug_color, '21610b', fmt, *args, **kwargs)
def info(self, fmt, *args, **kwargs):
if self.min_level > INFO:
return
self.log('+', self.reset_color, '000000', fmt, *args)
def warning(self, fmt, *args, **kwargs):
if self.min_level > WARN:
return
self.log('#', self.warn_color, 'FF8000', fmt, *args, **kwargs)
def warn(self, fmt, *args, **kwargs):
self.warning(fmt, *args, **kwargs)
def error(self, fmt, *args, **kwargs):
if self.min_level > ERROR:
return
self.log('!', self.err_color, 'FE2E2E', fmt, *args, **kwargs)
def exception(self, fmt, *args, **kwargs):
self.error(fmt, *args, **kwargs)
string = '%s' % (traceback.format_exc())
self.log_to_file('e', self.err_color, string)
def critical(self, fmt, *args, **kwargs):
if self.min_level > CRITICAL:
return
self.log('!', self.err_color, 'D7DF01', fmt, *args, **kwargs)
def tofile(self, fmt, *args, **kwargs):
self.log_to_file('@', self.warn_color, fmt, *args, **kwargs)
# =================================================================
def set_buffer_size(self, set_size):
self.buffer_lock.acquire()
self.buffer_size = set_size
buffer_len = len(self.buffer)
if buffer_len > self.buffer_size:
for i in range(self.last_no - buffer_len, self.last_no - self.buffer_size):
try:
del self.buffer[i]
except:
pass
self.buffer_lock.release()
def get_last_lines(self, max_lines):
self.buffer_lock.acquire()
buffer_len = len(self.buffer)
if buffer_len > max_lines:
first_no = self.last_no - max_lines
else:
first_no = self.last_no - buffer_len + 1
jd = {}
if buffer_len > 0:
for i in range(first_no, self.last_no + 1):
jd[i] = self.unicode_line(self.buffer[i])
self.buffer_lock.release()
return json.dumps(jd)
def get_new_lines(self, from_no):
self.buffer_lock.acquire()
jd = {}
first_no = self.last_no - len(self.buffer) + 1
if from_no < first_no:
from_no = first_no
if self.last_no >= from_no:
for i in range(from_no, self.last_no + 1):
jd[i] = self.unicode_line(self.buffer[i])
self.buffer_lock.release()
return json.dumps(jd)
def unicode_line(self, line):
try:
if type(line) is str:
return line
else:
return str(line, errors='ignore')
except Exception as e:
print(("unicode err:%r" % e))
print(("line can't decode:%s" % line))
print(("Except stack:%s" % traceback.format_exc()))
return ""
loggerDict = {}
def getLogger(name=None, buffer_size=0, file_name=None, roll_num=1):
global loggerDict, default_log
if name is None:
for n in loggerDict:
name = n
break
if name is None:
name = u"default"
if not isinstance(name, str):
raise TypeError('A logger name must be string or Unicode')
if isinstance(name, bytes):
name = name.encode('utf-8')
if name in loggerDict:
return loggerDict[name]
else:
logger_instance = Logger(name, buffer_size, file_name, roll_num)
loggerDict[name] = logger_instance
default_log = logger_instance
return logger_instance
default_log = getLogger()
def debg(fmt, *args, **kwargs):
default_log.debug(fmt, *args, **kwargs)
def info(fmt, *args, **kwargs):
default_log.info(fmt, *args, **kwargs)
def warn(fmt, *args, **kwargs):
default_log.warning(fmt, *args, **kwargs)
def erro(fmt, *args, **kwargs):
default_log.error(fmt, *args, **kwargs)
def excp(fmt, *args, **kwargs):
default_log.exception(fmt, *args, **kwargs)
def crit(fmt, *args, **kwargs):
default_log.critical(fmt, *args, **kwargs)
def tofile(fmt, *args, **kwargs):
default_log.tofile(fmt, *args, **kwargs)
if __name__ == '__main__':
log_file = os.path.join(os.path.dirname(sys.argv[0]), "test.log")
getLogger().set_file(log_file)
debg("debug")
info("info")
warn("warning")
erro("error")
crit("critical")
tofile("write to file only")
try:
1 / 0
except Exception as e:
excp("An error has occurred")