# Circle Hough CUDA Core Code This repository holds the code for the GPU version of the Circle Hough algorithm. Just the GPU core; the rest is at the internal IKP repository. ## Source Currently at this BitBucket repo: ## Compilation After downloading/cloning the source, follow the usual step for compilation with CMake: ```bash mkdir -p build && cd build cmake ../ ``` The code is split up into the circle hough core (`CircleHough.cu`) and the standalone interface (`chStandalone.cu`). CMake takes care for you to compile the correct version with the adequate compiler variable definitions, but note that this project is also intended to be run from within `PandaRoot`. ## Input data By default, the file containing hit data is in the `data/` directory. An example, small file is provided. A much larger one can be found at `/home/ikp1/bianchi/ikpgpu/data-ch/1e3-evt-10x.txt`. The following parts of this readme seem to be **outdated**. ## Workflow The `scripts/` folder contains the prototypes for the `run.sh` run script. Each runfile is associated with a `runid` (run01, run02, ... ) corresponding to a different configuration (GPU card, type of test, ... ). The `CMakeLists.txt` file contains the instructions for CMake to copy the particular runfile to `run.sh` in the build folder. Usually this means proceeding in this way: 1. Copy/create runfile in the `scripts/` folder 2. Edit `CMakeLists.txt` to select the correct file to copy 3. From the `build` directory, run `make && run.sh` to create `run.sh`, compile the CUDA file (if needed) and run the profiling. 4. CSV files are written by default to `${CMAKE_SOURCE_DIR}/csv//` ## Example runfiles - `run03` loops over the number of hits read from the input, and prints the GPU trace (relevvant `nvprof` flag: `--print-gpu-trace`). This creates a timeline of the GPU activity, and is the basis for the analysis of the timing of the various portions of the code. - `run08` collects all available kernel metrics (relevant `nvprof` flag: `--metrics all`). I've found out that this is the most convenient method, i.e. collecting all possible metrics to CSV files and then selecting the interesting ones later during the analysis phase. ## Extending to other code Modifications to the CUDA source are required to run a similar system with other applications. - To select the portion of the code to be profiled, and get rid of the CUDA initialization timing overhead, in association with the `--profile-from-start off` flag: ```c++ cudaFree(0); // First call to the CUDA API, initializes drivers, create contexts, ... cudaProfilerStart(); // exclude init overhead from the performance metrics // Portion of the program to be timed cudaProfilerStop(); ``` Other useful flags: - To use the same time unit consistently: `--normalized-time-unit us` flag (the same thing is unfortunately not available for the size units, which requires a workaround during the analysis phase). - To avoid the "API call" `nvprof` mode: (messier to analyze): `--profile-api-trace none` ## Notes - Check if `nvcc` flags in `CMakeLists.txt` are compatible with a particular card (i.e. supported architecture) - At the moment, settable parameters (number of angles, number of hits, ... ) are set with environment variables, so the program must be run from `run.sh`. Otherwise, there will be a segfault. I intend to insert additional logic with default values and/or set the parameters via command line flags to circumvent this. - At the moment, the `remove-nvprof-header()` function, using `sed` to remove lines starting with `===` at the beginning of the CSV file, is essential to read the files with `pynvprof`. In the future, I'll implement a more robust way of doing so.