Commit 6dcab81
committed
[prof] in gux_taptamggux.mad counters.h, improve the handling of counter overhead
These are the results
(1) keep overhead
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 4.4766s
[COUNTERS] Fortran Other ( 0 ) : 0.1202s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0685s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2400s for 1087437 events => throughput is 3.36E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.1007s for 32768 events => throughput is 3.25E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1673s for 16384 events => throughput is 9.79E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0521s for 16384 events => throughput is 3.14E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0687s for 16384 events => throughput is 2.38E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1237s for 1087437 events => throughput is 8.79E+06 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4728s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 2.3496s for 14136681 events => throughput is 6.02E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4409s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 5.3144s
[COUNTERS] Fortran Other ( 0 ) : 0.1588s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 4.0191s for 1087437 events => throughput is 2.71E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.0996s for 32768 events => throughput is 3.29E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1660s for 16384 events => throughput is 9.87E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0508s for 16384 events => throughput is 3.22E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0704s for 16384 events => throughput is 2.33E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1482s for 1087437 events => throughput is 7.34E+06 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 2.8646s for 14136681 events => throughput is 4.94E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 5.2787s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
(2) remove overhead
CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0338s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.8244s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.8905s
-------------------------------------------------------------
[COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.9339s
[COUNTERS] Fortran Other ( 0 ) : 0.2954s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7332s for 1087437 events => throughput is 3.98E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.1003s for 32768 events => throughput is 3.27E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1688s for 16384 events => throughput is 9.71E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0507s for 16384 events => throughput is 3.23E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0695s for 16384 events => throughput is 2.36E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0924s for 1087437 events => throughput is 1.18E+07 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4692s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 1.8723s for 14136681 events => throughput is 7.55E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 3.8982s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0637s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.8826s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 1.6786s
-------------------------------------------------------------
[COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 4.2040s
[COUNTERS] Fortran Other ( 0 ) : 0.4831s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0691s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9924s for 1087437 events => throughput is 3.63E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.0983s for 32768 events => throughput is 3.33E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1669s for 16384 events => throughput is 9.81E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0506s for 16384 events => throughput is 3.24E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0676s for 16384 events => throughput is 2.42E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0698s for 1087437 events => throughput is 1.56E+07 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4712s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 1.9227s for 14136681 events => throughput is 7.35E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1690s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s
(3) remove overhead, disable individual timers (so here the overhead is 0)
CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0333s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.1897s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.3330s
-------------------------------------------------------------
[COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8567s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0659s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5119s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.6594s
-------------------------------------------------------------
[COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8525s
(4) do not remove overhead, disable individual timers (remove also the overhead from the estimation of the overhead)
(this test was done on another day on the same machine and build, but the results are compatible with the previous ones)
CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8072s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8214s1 parent 3577a55 commit 6dcab81
1 file changed
Lines changed: 12 additions & 10 deletions
Lines changed: 12 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
175 | | - | |
176 | 174 | | |
177 | 175 | | |
178 | 176 | | |
179 | 177 | | |
180 | | - | |
| 178 | + | |
181 | 179 | | |
182 | 180 | | |
183 | 181 | | |
| |||
193 | 191 | | |
194 | 192 | | |
195 | 193 | | |
196 | | - | |
| 194 | + | |
197 | 195 | | |
198 | 196 | | |
199 | 197 | | |
| |||
202 | 200 | | |
203 | 201 | | |
204 | 202 | | |
| 203 | + | |
| 204 | + | |
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
| |||
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
219 | | - | |
220 | | - | |
221 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
222 | 222 | | |
| 223 | + | |
| 224 | + | |
223 | 225 | | |
224 | 226 | | |
225 | 227 | | |
| |||
235 | 237 | | |
236 | 238 | | |
237 | 239 | | |
238 | | - | |
| 240 | + | |
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
| |||
259 | 261 | | |
260 | 262 | | |
261 | 263 | | |
262 | | - | |
| 264 | + | |
263 | 265 | | |
264 | 266 | | |
265 | 267 | | |
| |||
280 | 282 | | |
281 | 283 | | |
282 | 284 | | |
283 | | - | |
| 285 | + | |
284 | 286 | | |
285 | 287 | | |
286 | 288 | | |
| |||
0 commit comments