-
Notifications
You must be signed in to change notification settings - Fork 5
Big memory allocations in L1GS #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@termi-official these are the outputs from the two examples in the Also one thing that might be optimized is |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #238 +/- ##
==========================================
+ Coverage 71.70% 71.73% +0.02%
==========================================
Files 78 78
Lines 6482 6491 +9
==========================================
+ Hits 4648 4656 +8
- Misses 1834 1835 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating Abdelrahman! I left some comments on things which I catched on the fly.
Also take a look at CodeCov, it looks like not all of your code is covered in the CI.
termi-official
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes Abdelrahman. There are some minor things left tho.
@AbdAlazezAhmed can you confirm that this works now on the ECG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing tests for the symmetric case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thunderbolt.jl/test/test_preconditioners.jl
Lines 157 to 163 in 8f872e3
| @testset "Symmetric A" begin | |
| md = mdopen("HB/bcsstk15") | |
| A = md.A # type here is Symmetric{Float64, SparseMatrixCSC{Float64, Int64}} | |
| b = ones(size(A, 1)) | |
| test_l1gs_prec(A, b) | |
| end | |
| end |
matrix here is already of type Symmetric
also for the test algorithmic logic I added this
Thunderbolt.jl/test/test_preconditioners.jl
Lines 21 to 33 in 8f872e3
| function test_sym(testname, A, x, y_exp, D_Dl1_exp, SLbuffer_exp, partsize) | |
| @testset "$testname Symmetric" begin | |
| total_ncores = 8 # Assuming 8 cores for testing | |
| for ncores in 1:total_ncores # testing for multiple cores to check that the answer is independent of the number of cores | |
| builder = L1GSPrecBuilder(PolyesterDevice(ncores)) | |
| P = A isa Symmetric ? builder(A, partsize) : builder(A, partsize; isSymA=true) | |
| @test P.D_Dl1 ≈ D_Dl1_exp | |
| @test P.SLbuffer ≈ SLbuffer_exp | |
| y = P \ x | |
| @test y ≈ y_exp | |
| end | |
| end | |
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
However, we should use CG for the symmetric problems to ensure that the preconditioner actually conserves the symmetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some issues.
- ThreadedSparseMatrixCSC support is missing
- The preconditioner tanks the symmetry
- There is too much of memory usage
Lets try to make the following work
using Thunderbolt
using SparseArrays,OrderedCollections,TimerOutputs,LinearAlgebra
# TimerOutputs.enable_debug_timings(Thunderbolt)
# TimerOutputs.enable_debug_timings(Thunderbolt.Preconditioners)
geo = Hexahedron
nel = 2
size = 2.
signal_strength = 0.04
nel_heart = (6, 6, 6) .* 1 .* nel
nel_torso = (8, 8, 8) .* Int(size) .* nel
ground_vertex = Vec(0.0, 0.0, 0.0)
electrodes = [
ground_vertex,
Vec(-size, 0., 0.),
Vec( size, 0., 0.),
Vec( 0., -size, 0.),
Vec( 0., size, 0.),
Vec( 0., 0., -size),
Vec( 0., 0., size),
]
electrode_pairs = [[i,1] for i in 2:length(electrodes)]
heart_grid = generate_mesh(geo, nel_heart)
Ferrite.transform_coordinates!(heart_grid, x->Vec{3}(sign.(x) .* x.^2))
κ = ConstantCoefficient(SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)))
κᵢ = AnalyticalCoefficient(
(x,t) -> norm(x,Inf) ≤ 1.0 ? SymmetricTensor{2,3,Float64}((1.0, 0, 0, 1.0, 0, 1.0)) : SymmetricTensor{2,3,Float64}((0.0, 0, 0, 0.0, 0, 0.0)),
CartesianCoordinateSystem{3}()
)
heart_model = TransientDiffusionModel(
κᵢ,
NoStimulationProtocol(), # Poisoning to detecte if we accidentally touch these
:φₘ
)
heart_fun = semidiscretize(
heart_model,
FiniteElementDiscretization(
Dict(:φₘ => LagrangeCollection{1}()),
Dirichlet[]
),
heart_grid
)
op = Thunderbolt.setup_assembled_operator(
Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
Thunderbolt.BilinearDiffusionIntegrator(
κ,
QuadratureRuleCollection(2),
:φₘ,
),
Thunderbolt.SparseMatrixCSC,
heart_fun.dh,
)
Thunderbolt.update_operator!(op, 0.0) # trigger assembly
torso_grid_ = generate_grid(geo, nel_torso, Vec((-size,-size,-size)), Vec((size,size,size)))
addcellset!(torso_grid_, "heart", x->norm(x,Inf) ≤ 1.0)
# addcellset!(torso_grid_, "surrounding-tissue", x->norm(x,Inf) ≥ 1.0)
torso_grid_.cellsets["surrounding-tissue"] = OrderedSet([i for i in 1:getncells(torso_grid_) if i ∉ torso_grid_.cellsets["heart"]])
torso_grid = Thunderbolt.to_mesh(torso_grid_)
u = zeros(Thunderbolt.solution_size(heart_fun))
geselowitz_electrodes = [[electrodes[1], electrodes[i]] for i in 2:length(electrodes)]
TimerOutputs.reset_timer!()
@info prod(nel_torso)
geselowitz_ecg = Thunderbolt.Geselowitz1989ECGLeadCache(
heart_fun,
torso_grid,
κᵢ, κ,
geselowitz_electrodes;
ground = OrderedSet([Thunderbolt.get_closest_vertex(ground_vertex, torso_grid)]),
torso_heart_domain=["heart"],
ipc = LagrangeCollection{1}(),
qrc = QuadratureRuleCollection(3),
linear_solver = Thunderbolt.LinearSolve.KrylovJL_CG(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, SparseArrays.size(A,1)/8); isSymA=true), LinearAlgebra.I)),
# linear_solver = Thunderbolt.LinearSolve.KrylovJL_GMRES(precs = (A, p) -> (L1GSPrecBuilder(PolyesterDevice(8))(A, ceil(Int, 24)), LinearAlgebra.I)),
system_matrix_type = Thunderbolt.SparseMatrixCSR{Float64,Int64}, #Thunderbolt.ThreadedSparseMatrixCSR{Float64,Int64},
strategy = Thunderbolt.PerColorAssemblyStrategy(Thunderbolt.PolyesterDevice(8)),
)
TimerOutputs.print_timer()You need to apply these patches.
diff --git a/src/solver/interface.jl b/src/solver/interface.jl
index 8234949c..4279e657 100644
--- a/src/solver/interface.jl
+++ b/src/solver/interface.jl
@@ -148,13 +148,17 @@ update_constraints_block!(f::NullFunction, i::Block, solver_cache::AbstractTimeS
create_system_matrix(T::Type{<:AbstractMatrix}, f::AbstractSemidiscreteFunction) = create_system_matrix(T, f.dh)
-function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR{Tv,Ti}}, dh::AbstractDofHandler) where {Tv,Ti}
- Acsct = transpose(convert(SparseMatrixCSC{Tv,Ti}, allocate_matrix(dh)))
+function create_system_matrix(::Type{<:ThreadedSparseMatrixCSR}, dh::AbstractDofHandler)
+ Acsct = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
return ThreadedSparseMatrixCSR(Acsct)
end
function create_system_matrix(SpMatType::Type{<:SparseMatrixCSC}, dh::AbstractDofHandler)
- A = convert(SpMatType, allocate_matrix(dh))
+ A = allocate_matrix(SparseMatrixCSC{Float64,Int64}, dh)
+ return A
+end
+function create_system_matrix(SpMatType::Type{<:SparseMatrixCSR}, dh::AbstractDofHandler)
+ A = allocate_matrix(SparseMatrixCSR{1,Float64,Int64}, dh)
return A
enddiff --git a/src/modeling/electrophysiology/ecg.jl b/src/modeling/electrophysiology/ecg.jl
index b0f47170..9db426db 100644
--- a/src/modeling/electrophysiology/ecg.jl
+++ b/src/modeling/electrophysiology/ecg.jl
@@ -402,6 +402,7 @@ function Geselowitz1989ECGLeadCache(
linear_solver = LinearSolve.KrylovJL_CG(),
solution_vector_type = Vector{Float64},
system_matrix_type = ThreadedSparseMatrixCSR{Float64,Int64},
+ strategy = SequentialAssemblyStrategy(SequentialCPUDevice()),
)
return Geselowitz1989ECGLeadCache(
heart_fun,
@@ -416,6 +417,7 @@ function Geselowitz1989ECGLeadCache(
linear_solver ,
solution_vector_type,
system_matrix_type ,
+ strategy ,
)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
However, we should use CG for the symmetric problems to ensure that the preconditioner actually conserves the symmetry.
| η = convert(Tf, η) | ||
| D_Dl1 = adapt(backend, zeros(Tf, N)) # D + Dˡ | ||
| last_partsize = N - (nparts - 1) * partsize # size of the last partition | ||
| SLbuffer_size = (partsize * (partsize - 1) * (nparts-1)) ÷ 2 + last_partsize * (last_partsize - 1) ÷ 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should store the lower/upper triangular parts in some sparse format. I do not have opinions about which one exactly.
| (; backend) = partitioning | ||
| # The following code is required because there is no assumption on the compatibality of x with the backend. | ||
| @timeit_debug "adapt(backend, y)" _y = adapt(backend, y) | ||
| @timeit_debug "_forward_sweep!" _forward_sweep!(_y, P) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of sweep should be ideally a user option -- and must be symmetric if the problem is symmetric, as we tank the symmetry needed for the solver otherwise.
closes #230