Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: unrecognised host compiler flags passed to nvcc #161

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

hodlen
Copy link
Collaborator

@hodlen hodlen commented Mar 7, 2024

No description provided.

@YixinSong-e YixinSong-e merged commit c36f1df into main Mar 7, 2024
12 of 36 checks passed
hodlen added a commit that referenced this pull request Mar 17, 2024
* name ffn tensors properly

* debug: add debug printings

* refactor mul_mat and axpy subgraph in sparse ffn

* Revert "debug: add debug printings"

This reverts commit ade011ff82a37ba29220f0f81f097d293cf9eec7.

* bugfix in computational graph

* support basic full gpu offloading for mul_mat and axpy

* wip: sum gpu_idx indicator

* calculate gpu_idx sum on the fly

* wip: code refactoring

* remove gpu axpy impl duplicate

* axpy without gpu_bucket

* minor: clean dead code

* minor on comments

* refactor: mul_mat and axpy should not return NULL

* remove unsed lock

* refactor: better naming for mul_mat_idx

* refactor: separate sparse mul_mat from mul_mat_q

* use mul_mat_idx at full gpu

* refactor: reorg sparse mul_mat cuda host code

* support full GPU comp of mul_mat_idx

* refactor: remove llama_dense

* refactor: add new opcode MUL_MAT_SPARSE

* fix: CPU decoding for MUL_MAT_SPARSE

* fix bugs on full-gpu computing

* use op_params to mark sparse mul_mat/axpy fully offloaded or not

* fix: disable cuda sync

* minor bugfix

* chore: gpu perf timing

* refactor: def of gpu split structures

* fix: unknown host compiler flags passed to nvcc (#161)

* calc gpu_idx sum at load time

* refactor sparse ffn building and bugfix

* wip: more assersions

* wip

* fix: access invalid data ptr at MUL_MAT_IDX cpu op

* fix: hidden bug when sparsity idx is computed on GPU

* fix: ffn split when offload_ratio=0

* fix: splitting ffn when tensor offlloading incomplete

* fix: bugs in CPU-GPU tensor interplay

* fix: row_lookup pointer

* minor: refactoring CUDA host code

* minor refactor and bugfix on comp. graph

* optimize: hybrid threading off on default; remove cuda sync

* fix: offloading merged tensor

* add assertion on n_threads for hybrid inference

* fix: GPU-CPU sync issue

* improve ffn input tensor placement for lower CPU-GPU sync overhead
@hodlen hodlen deleted the fix/cuda-warning-options branch March 17, 2024 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants