Fix: unrecognised host compiler flags passed to nvcc #161

hodlen · 2024-03-07T13:43:06Z

No description provided.

* name ffn tensors properly * debug: add debug printings * refactor mul_mat and axpy subgraph in sparse ffn * Revert "debug: add debug printings" This reverts commit ade011ff82a37ba29220f0f81f097d293cf9eec7. * bugfix in computational graph * support basic full gpu offloading for mul_mat and axpy * wip: sum gpu_idx indicator * calculate gpu_idx sum on the fly * wip: code refactoring * remove gpu axpy impl duplicate * axpy without gpu_bucket * minor: clean dead code * minor on comments * refactor: mul_mat and axpy should not return NULL * remove unsed lock * refactor: better naming for mul_mat_idx * refactor: separate sparse mul_mat from mul_mat_q * use mul_mat_idx at full gpu * refactor: reorg sparse mul_mat cuda host code * support full GPU comp of mul_mat_idx * refactor: remove llama_dense * refactor: add new opcode MUL_MAT_SPARSE * fix: CPU decoding for MUL_MAT_SPARSE * fix bugs on full-gpu computing * use op_params to mark sparse mul_mat/axpy fully offloaded or not * fix: disable cuda sync * minor bugfix * chore: gpu perf timing * refactor: def of gpu split structures * fix: unknown host compiler flags passed to nvcc (#161) * calc gpu_idx sum at load time * refactor sparse ffn building and bugfix * wip: more assersions * wip * fix: access invalid data ptr at MUL_MAT_IDX cpu op * fix: hidden bug when sparsity idx is computed on GPU * fix: ffn split when offload_ratio=0 * fix: splitting ffn when tensor offlloading incomplete * fix: bugs in CPU-GPU tensor interplay * fix: row_lookup pointer * minor: refactoring CUDA host code * minor refactor and bugfix on comp. graph * optimize: hybrid threading off on default; remove cuda sync * fix: offloading merged tensor * add assertion on n_threads for hybrid inference * fix: GPU-CPU sync issue * improve ffn input tensor placement for lower CPU-GPU sync overhead

fix: unknown host compiler flags passed to nvcc

d43f32d

YixinSong-e approved these changes Mar 7, 2024

View reviewed changes

YixinSong-e merged commit c36f1df into main Mar 7, 2024
12 of 36 checks passed

hodlen added a commit that referenced this pull request Mar 13, 2024

fix: unknown host compiler flags passed to nvcc (#161)

a6ea338

hodlen deleted the fix/cuda-warning-options branch March 17, 2024 07:17

hodlen mentioned this pull request Apr 6, 2024

nvcc fails due to illegal options #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: unrecognised host compiler flags passed to nvcc #161

Fix: unrecognised host compiler flags passed to nvcc #161

hodlen commented Mar 7, 2024

Fix: unrecognised host compiler flags passed to nvcc #161

Fix: unrecognised host compiler flags passed to nvcc #161

Conversation

hodlen commented Mar 7, 2024