Не могу понять почему модель обучается на CPU в 2-3 быстрее чем на GPU.
windows 10
tensorflow 1.13.1
keras 2.2.4
CUDA 10.1
Имеется модель:
network = models.Sequential()
network.add(layers.Dense(5, activation='relu', input_shape=(5,), kernel_regularizer=regularizers.l2(0.05), activity_regularizer=regularizers.l1(0.01)))
network.add(layers.Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
network.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
Лог:
Using TensorFlow backend.
2019-04-26 19:42:22.001733: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-04-26 19:42:22.242292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2019-04-26 19:42:22.242786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.858856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.859063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-26 19:42:22.859197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-26 19:42:22.859446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9401497665143581718
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6620742943
locality {
bus_id: 1
links {
}
}
incarnation: 3794371743575443843
physical_device_desc: "device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
2019-04-26 19:42:22.871318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:22.871539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:22.871806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-26 19:42:22.871938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-26 19:42:22.872124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:22.874432: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Program Files\Miniconda\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-26 19:42:24.455242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-26 19:42:24.455451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-26 19:42:24.455650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-26 19:42:24.455810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-26 19:42:24.455997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6314 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-04-26 19:42:24.846946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_90.dll locally
В %PATH% добавлены пути к тулкиту и cupti
CUDA\lib64
CUDA\include
CUDA\bin
При обучении GPU загружен на 10%, но при этом память занята проактически вся
А CPU загружен на 60-70%, будто обучение проходит на нем, а не на GPU
Где в действительности происходит выполнение? Если на GPU, то почему оно в несколько раз медленне чем на CPU?