How does classical MD work? - LAMMPS

How does classical MD work? classical MD basicsEach of N particles is a point massatomgroup of atoms (united atom)macro- or meso- particleParticles interact via empirical force lawsall physics in energy potential forcepair-wise forces (LJ, Coulombic)many-body forces (EAM, Tersoff, REBO)molecular forces (springs, torsions)long-range forces (Ewald)Integrate Newton s equations of motionF = maset of 3N coupled ODEsadvance as far in time as possibleProperties via time-averaging ensemblesnapshots (vs MC sampling)MD timestepVelocity-Verlet formulation:update V by 1/2 step (using F)update X (using V)build neighbor lists (occasionally)compute F (using X)apply constraints & boundary conditions (on F)update V by 1/2 step (using new F)output and diagnosticsCPU time break-down:inter-particle forces = 80%neighbor lists = 15%everything else = 5%Aside on MD integration schemesMost MD codes use some form of explicit Stormer-VerletOnly second-order: E=| E E0| t2 Global stability trumps local accuracy of high-order schemesCan be shown that for Hamiltonian equations of motion,Stormer-Verlet exactly conserves a shadow Hamiltonian andE Es O( t2)For users: no energy drift over millions of timestepsFor developers: easy to decouple integration scheme fromefficient algorithms for force evaluation, parallelization32 atom LJ cluster200M timesteps t= issuesAre always limited in number of atoms and length of time youcan simulateThese have a large impact on CPU cost of a simulation:level of detail in modelcutoff distance of force fieldlong-range Coulombicsfinding neighborstimestep sizeparallelismCoarse-graining of polymer modelsAll-atom: t= fmsec for C-HC-C distance = Angscutoff = 10 AngsUnited-atom:# of interactions is 9x less t= fmsec for C-Ccutoff = 10 Angs20-30x savings over all-atomBead-Spring.

2-3 C per bead t fmsec mapping isT-dependent21/6 cutoff 8x in interactionscan be considerable savings overunited-atomCutoff in force fieldForces = 80% of CPU costShort-range forces:O(N) scaling for classical MDconstant density assumptionpre-factor is cutoff-dependent# of pairs/atom = cubic in cutoff2x the cutoff 8x the workUse as short a cutoff as can justify:LJ = (standard)all-atom and UA = 8-12 Angstromsbead-spring = 21/6 (repulsive only)Coulombics = 12-20 Angstromssolid-state (metals) =few neighbor shellsdue to screeningTest sensitivity of your results to cutoffLong-range CoulombicsSystems that need it:charged polymers (polyelectrolytes)organic & biological moleculesionic solids, oxidesnot most metals (screening)Computational issue:Coulomb energy only falls off as 1/rOptions:cutoff:scales asN, but large contribution at 10 AngsEwald:scales asN3/2particle-mesh Ewald:scales asNlog(N)multipole:scales asN, but doesn t beat PMEmulti-level summation:scales asNcan beat PME for low-accuracy, large proc countPPPM (Particle-mesh Ewald)Hockney & Eastwood, Comp Sim Using Particles (1988).

Darden, et al, J Chem Phys, 98, p 10089 (1993).Like Ewald, except sum over periodic images evaluated:interpolate atomic charge to 3d meshsolve Poisson s equation on mesh (4 FFTs)interpolate E-fields back to atomsUser-specified accuracy + cutoff ewald-G + mesh-sizeScales asN log(N) if grow cutoff with NScales asNlog(N) if cutoff held fixedParallel FFTs (in LAMMPS )3d FFT is 3 sets of 1d FFTsin parallel, 3d grid is distributedacross procs1d FFTs on-processornative library or FFTW( )multiple transposes of 3d griddata transfer can be costlyFFTs for PPPM can scale poorlyon large # of procs and on clustersGood news: Cost of PPPM is only 2x more than 8-10 Ang cutoffNeighbor listsProblem: how to efficiently find neighbors within cutoff?For each atom, test against all othersO(N2) algorithmVerlet lists:Verlet, Phys Rev, 159, p 98 (1967)Rneigh=Rforce+ skinbuild list: once every few timestepsother timesteps: scan larger list forneighbors within force cutoffrebuild: any atom moves 1/2 skinLink-cells (bins):Hockney et al, J Comp Phys,14, p 148 (1974)grid domain: bins of sizeRforceeach step.

Search 27 bins forneighbors (or 14 bins)Neighbor lists (continued)Verlet list is 6x savings over binsVsphere= 4/3 r3 Vcube= 27r3 Fastest methods do bothlink-cell to build Verlet listuse Verlet list on non-build timestepsO(N) in CPU and memoryconstant-density assumptionthis is what LAMMPS implementsTimescale in classical MDTimescale of simulation is most serious bottleneck in MDTimestep size limited by atomic oscillationsC-H bond = 10 fmsec 1/2 to 1 fmsec timestepDebye frequency = 1013 2 fmsec timestepReality is often on a much longer timescaleprotein folding (msec to seconds)polymer entanglement (msec and up)glass relaxation (seconds to decades)rheological experiments (Hz to KHz)Even smaller timestep for tight-binding or quantum-MDParticle-time metricAtom * steps = size of your simulationUp to 1012is desktop scale 106atoms for 106timesteps1 sec/atom/step on CPU core (cheap LJ potential)2 weeks on single core, 1 day on multi-core desktop1012to 1014is cluster scale1014and up is supercomputer scale1 cubic micron (1010atoms) for 1-2 nanoseconds (106steps)1000 flops per atom per step 1019flopsMD is 10% of peak 1 day on a Petaflop machineGPUs are changing landscape:can be 5-10x faster than multicore CPUE xtending timescale via SHAKER yckaert, et al, J Comp Phys, 23, p 327 (1977)Add constraint forces to freeze bond lengths & anglesrigid water (TIP3P)C-H bonds in polymer or proteinExtra work to enforce constraints.

Solve matrix for each set ofnon-interacting constraintsmatrix size = # of constraintsAllows for 2-3 fmsec timestepExtending timescale via rRESPAT uckerman et al, J Chem Phys, 97, p 1990 (1992)reversible REference System Propagator AlgorithmRigorous multiple timestep methodtime-reversibleoperator calculus derivation of conserved ensemble quantitiesSub-cycle on fast degrees of freedominnermost loop on bond forces ( fmsec)next loop on 3-4 body forcesnext loop on van der Waals & short-range Coulombicoutermost loop on long-range Coulombic (4 fmsec)Can yield 2-3x speed-up, less in parallel due to communicationClassical MD in parallelMD is inherently parallelforces on each atom can be computed simultaneouslyX and V can be updated simultaneouslyNearly all MD codes are parallelizeddistributed-memory message-passing (MPI) between nodesMPI or threads (OpenMP, GPU) within nodeMPI = message-passing interfaceMPICH or OpenMPIassembly-language of parallel computinglowest-common denominatormost portableruns on all parallel machines, even on multi- and many-coremore scalable than shared-memory parallelGoals for parallel algorithmsScalableshort-range MD scales as Noptimal parallel scaling is N/Peven on clusters with higher communication costsGood for short-range forces80% of CPUlong-range Coulombics have short-range componentFast for small systems, not just largenano, polymer, bio systems require long timescales1M steps of 10K atoms is more useful than 10K steps of 1 MatomsEfficient at finding neighborsliquid state, polymer melts, small-molecule diffusionneighbors change rapidlyatoms on a fixed lattice is simpler to parallelizeParallel algorithms for MDPlimpton, J Comp Phys, 117, p 1 (1995)

3 classes of algorithms used by all MD codes1atom-decomposition = split and replicate atoms2force-decomposition = partition forces3spatial-decomposition = geometric split of simulation boxAll 3 methods balance computation optimally as N/PDiffer in key issues for parallel scalabilitycommunication costsload-balanceFocus on inter-particle force computation,other tasks can be done within any of 3 algorithmsmolecular forcestime integration (NVE/NVT/NPT)thermodynamics, diagnostics, ..Spatial-decomposition algorithmPhysical domain divided into 3d boxes,one (or more) per processorEach proc computes forces on atomsin its box using ghost infofrom nearby processorsAtoms carry along molecular topologyas they migrate to new procsCommunication via 6-way stencilAdvantagescommunication scales sub-linear as(N/P)2/3, for large problemsmemory is optimal N/PDisadvantagesmore complex to code efficientlyload-imbalance can be problematicFreely available parallel MD codesBio-oriented MD codesCHARMM: original protein force fieldsAMBER: original DNA force fieldsNAMD: fast and scalableGromacs: fastest and scalableMaterials-oriented MD codes (can also do bio problems):DLPOLY: distributed by Daresbury Lab, UKLAMMPS: distributed by Sandia National LabsGPU-centric MD code (materials and bio):HOOMD: distributed by U Michigancodes above have GPU-capable kernelsWhat is LAMMPS ?

Large-scale Atomic/Molecular Massively Parallel MD codeOpen source (GPL), highly portable C++3-legged stool: bio, materials, mesoscaleParticle simulator at varying length and time scaleselectrons atomistic coarse-grained continuumSpatial-decomposition of simulation domain for parallelismEnergy minimization, dynamics, non-equilibrium MDGPU and OpenMP enhancedCan be coupled to other scales: QM, kMC, FE, CFD, ..Reasons to use LAMMPS1 Versatilebio, materials, mesoscaleSat AM: Tour of LAMMPS Featuresatomistic, coarse-grained, continuumSat PM: Coarse-grain Applications with LAMMPS2 Good parallel performanceSat AM: Tour of LAMMPS Features3 Easy to extendSun PM: Modifying LAMMPS and New Developments4 Well documentedextensive web site1200 page manual5 Active and supportive user community40K postings to mail list, 1200 subscribersquick turn-around on Qs posted to mail listAnother reason to use LAMMPS6 Features for rheology (next 2 days)Mesoscale models:DPD = dissipative particle dynamicsSPH = smoothed particle hydrodynamicsgranular = normal & tangential frictionFLD = fast lubrication dynamicsPD = peridynamicsrigid body dynamicsAspherical particlespoint ellipsoidsrigid body collections of points, spheriods, ellipsoidsrigid bodies of triangles (3d) and lines (2d)Coarse-grained solvent modelsrigid waterpolymers (united-atom, bead-spring)LJ particlesstochastic rotation dynamics (SRD)implicitMore rheological options in LAMMPSMany of these options came from 4-year collaborationwith 3M, BASF, Corning, P&G on solvated colloidal modelingParticle/particle interactions:pair gayberne, resquared, colloid, yukawa/colloid, vincentpair brownian, lubricate, lubricateU (implicit)pair gran/hooke and gran/hertzpair hybrid/overlay for DLVO modelsfix srd for colloids + SRD fluidPackages.

ASPHERE, COLLOID, FLD, GRANULARRIGID, SRD, USER-LB2 methods for measuring diffusivitymean-squared displacement via compute msdVACF via post-processing of dump file3 methods for measuring shear (or bulk) viscositiesNEMD via fix deform and fix nvt/sllod or fix wallMuller-Plathe via fix viscosityGreen-Kubo via fix ave/correlateExamples of rheological simulationsPolymer aggregation under shearMore examples of rheological simulationsDiffusion and viscosity of solvated dimersStill more examples of rheological simulationsViscosity of asphericals in SRD fluidYet some more examples of rheological simulations3 methods of measuring viscosityFinally, enough of rheological simulationsArbitrary-shape asphericals via lines and trianglesSee to view all theseanimations and for links to input scripts

How does classical MD work? - LAMMPS

Tags:

Information

Advertisement

Transcription of How does classical MD work? - LAMMPS

Related search queries

How does classical MD work? - LAMMPS

Tags:

Information

Advertisement

Documents from same domain

Modeling Thermal Transport and Viscosity with Molecular ...

LAMMPS Features and Capabilities

Related documents

Using Rheology to Characterize Flow and Viscoelastic ...

Mylar - DuPont Teijin Films

1. Injection Molding

A Guide to Grades, Compounding and Processing of …

Related search queries