@JackBoner

Tensorflow работает быстрее на CPU чем на GPU. Как правильно настроить?

Не могу понять почему модель обучается на CPU в 2-3 быстрее чем на GPU.

windows 10
tensorflow 1.13.1
keras 2.2.4
CUDA 10.1

Имеется модель:
network = models.Sequential()
network.add(layers.Dense(5, activation='relu', input_shape=(5,), kernel_regularizer=regularizers.l2(0.05), activity_regularizer=regularizers.l1(0.01)))
network.add(layers.Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])


Лог:
Using TensorFlow backend.
2019-04-26 19:42:22.001733: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-04-26 19:42:22.242292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2019-04-26 19:42:22.242786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.858856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.859063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.859197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.859446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9401497665143581718
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6620742943
locality {
  bus_id: 1
  links {
  }
}
incarnation: 3794371743575443843
physical_device_desc: "device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
2019-04-26 19:42:22.871318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.871539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.871806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:22.871938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:22.872124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:22.874432: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-26 19:42:24.455242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:24.455451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:24.455650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-26 19:42:24.455810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-26 19:42:24.455997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:24.846946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_90.dll locally


В %PATH% добавлены пути к тулкиту и cupti
CUDA\lib64
CUDA\include
CUDA\bin


При обучении GPU загружен на 10%, но при этом память занята проактически вся
А CPU загружен на 60-70%, будто обучение проходит на нем, а не на GPU

Где в действительности происходит выполнение? Если на GPU, то почему оно в несколько раз медленне чем на CPU?
  • Вопрос задан
  • 2360 просмотров
Пригласить эксперта
Ответы на вопрос 1
@AkumeiNiHao
Machine learning enthusiast
Вам нужно установить tensorflow-gpu.
И проверьте что все ok:
#test.py
import tensorflow as tf

#allow growth to take up minimal resources
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

sess = tf.Session(config=config)
Ответ написан
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы