$ whisper ../audio.wav --model tiny 100%|█████████████████████████████████████| 72.1M/72.1M [00:36<00:00, 2.08MiB/s] /home/jetson/.local/lib/python3.8/site-packages/whisper/__init__.py:146: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use casewhere you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint = torch.load(fp, map_location=device) /home/jetson/.local/lib/python3.8/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") Detecting language using up to the first 30 seconds. Use `--language` to specify the language Detected language: Chinese [00:00.000 --> 00:03.680] 四川美時確實以辣文明 但以有不辣的選擇 [00:03.680 --> 00:07.200] 比如潛水面 賴湯圓 再轟高夜熱八等 [00:07.200 --> 00:11.560] 這些小市口維溫和 然後甜而不膩也很受歡迎
# Run on GPU with FP16 # model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8 model = WhisperModel(model_size, device="cuda", compute_type="int8_float16") # or run on CPU with INT8 # model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Version 9+ of nvidia-cudnn-cu12 appears to cause issues due its reliance on cuDNN 9 (Faster-Whisper does not currently support cuDNN 9). Ensure your version of the Python package is for cuDNN 8.
那我安装 Cudnn 8 不就行了?果断下载cudnn8 for cuda 12.x,但是每次都安装cudnn9.4,除了降cuda版本,否则没办法恢复到cudnn8。
# 贴心的提示我们:For all these methods below, keep in mind the above note # regarding CUDA versions. Depending on your setup, you may need to install the # CUDA 11 versions of libraries that correspond to the CUDA 12 libraries listed # in the instructions below.
$ pip install --extra-index-url https://pypi.nvidia.com nvidia-cudnn-cu11 ... The installation of nvidia-cudnn-cu11 for version 9.0.0.312 failed.
This is a special placeholder package which downloads a real wheel package from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we cannot download the real wheel file to install.
You might try installing this package via $ pip install --extra-index-url https://pypi.nvidia.com nvidia-cudnn-cu11
Here is some debug information about your platform to include in any bug report:
Python Version: CPython 3.8.10 Operating System: Linux 5.10.104-tegra CPU Architecture: aarch64 nvidia-smi command not found. Ensure NVIDIA drivers are installed.
Note: Latest versions of ctranslate2 support CUDA 12 only. For CUDA 11, the current workaround is downgrading to the 3.24.0 version of ctranslate2 (This can be done with pip install –force-reinstall ctranslate2==3.24.0 or specifying the version in a requirements.txt).
又是cuda11的幺蛾子,它说要使用降级的方法:
1 2 3 4 5 6 7
$ pip install --force-reinstall ctranslate2==3.24.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. mediapipe 0.8.4 requires opencv-contrib-python, which is not installed. onnx-graphsurgeon 0.3.12 requires onnx, which is not installed. d2l 0.17.6 requires numpy==1.21.5, but you have numpy 1.24.4 which is incompatible. d2l 0.17.6 requires requests==2.25.1, but you have requests 2.32.3 which is incompatible. faster-whisper 1.0.3 requires ctranslate2<5,>=4.0, but you have ctranslate2 3.24.0 which is incompatible.
$ pip3 uninstall ctranslate2 whisper-ctranslate2 $ git clone --recursive https://github.com/OpenNMT/CTranslate2.git $ mkdir build && cd build $ cmake .. ... CMake Error at CMakeLists.txt:294 (message): Intel OpenMP runtime libiomp5 not found
-- Configuring incomplete, errors occurred!
哪来的intel?找找,原来是,By default, the library is compiled with the Intel MKL backend which should be installed separately. See the Build options to select or add another backend. 改一下,不用老in家的:
# Run on GPU with FP16 # model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8 model = WhisperModel(model_size, device="cuda", compute_type="int8_float16") # or run on CPU with INT8 # model = WhisperModel(model_size, device="cpu", compute_type="int8")
$ git clone git@github.com:ufal/whisper_streaming.git $ cd whisper_streaming $ python3 whisper_online.py ../audio.wav --language zh --min-chunk-size 1 INFO Audio duration is: 11.68 seconds INFO Loading Whisper large-v2 model for zh... INFO done. It took 14.19 seconds. DEBUG PROMPT: DEBUG CONTEXT: DEBUG transcribing 1.00 seconds from 0.00 DEBUG >>>>COMPLETE NOW: (None, None, '') DEBUG INCOMPLETE: (0.0, 0.98, '四川美食群') DEBUG len of buffer now: 1.00 DEBUG ## last processed 1.00 s, now is 5.30, the latency is 4.29 DEBUG PROMPT: DEBUG CONTEXT: DEBUG transcribing 5.30 seconds from 0.00 DEBUG >>>>COMPLETE NOW: (0.0, 0.88, '四川美食') DEBUG INCOMPLETE: (0.88, 5.26, '确实以辣为名,但也有不辣的选择,比如甜水面赖淘宝。') DEBUG len of buffer now: 5.30 11643.5227 0 880 四川美食 11643.5227 0 880 四川美食 DEBUG ## last processed 5.30 s, now is 11.64, the latency is 6.35 DEBUG PROMPT: DEBUG CONTEXT: 四川美食 DEBUG transcribing 11.64 seconds from 0.00 DEBUG >>>>COMPLETE NOW: (None, None, '') DEBUG INCOMPLETE: (0.88, 11.24, '確實以辣聞名,但也有不辣的選擇,比如甜水麵、瀨湯圓、炸烘糕 、葉子粑等,這些小吃口味溫和,然後甜而不膩,也很受歡迎。') DEBUG len of buffer now: 11.64 DEBUG ## last processed 11.64 s, now is 21.61, the latency is 9.96 DEBUG PROMPT: DEBUG CONTEXT: 四川美食 DEBUG transcribing 11.68 seconds from 0.00 DEBUG >>>>COMPLETE NOW: (None, None, '') DEBUG INCOMPLETE: (0.88, 11.32, '确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕 叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。') DEBUG len of buffer now: 11.68 DEBUG ## last processed 21.61 s, now is 31.53, the latency is 9.92 DEBUG last, noncommited: (0.88, 11.32, '确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。') 31528.1091 880 11320 确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。 31528.1091 880 11320 确实以辣闻名,但也有不辣的选择,比如甜水面、赖汤圆、炸烘糕叶、热巴等,这些小吃口味温和,然后甜而不腻,也很受欢迎。
注:更改模型量化:
1 2 3 4 5 6
# this worked fast and reliably on NVIDIA L40 # model = WhisperModel(model_size_or_path, device="cuda", compute_type="float16", download_root=cache_dir)
# or run on GPU with INT8 # tested: the transcripts were different, probably worse than with FP16, and it was slightly (appx 20%) slower model = WhisperModel(model_size_or_path, device="cuda", compute_type="int8_float16")