Example: marketing

Introduction to Hacking PostgreSQL - Neil Conway

Introduction to Hacking PostgreSQLNeil Conway , Gavin to Hacking PostgreSQL p. 1 Outline1. Development environment2. Architecture of PostgreSQL3. Backend conventions and infrastructure4. How to submit a patch5. Example patch: addingWHEN qualification to triggersIntroduction to Hacking PostgreSQL p. 2 Part 1: Development EnvironmentMost of the Postgres developers use Unix; you probablyshould tooYou ll need to know CFortunately, C is easyUnix systems programming knowledge is helpful,depending on what you want to work onLearning to understand how a complex systemfunctions is a skill in itself ( code reading ) Introduction to Hacking PostgreSQL p. 3 Development ToolsBasics: $CC, Bison, Flex, CVS, autotools, gdbConfigure flags:enable-depend,enable-debug,enable- cassertConsiderCFLAGS=-O0for easier debugging, but thissuppresses some classes of warningstagsorcscopeare essential What is the definition of this function/type?

Introduction to Hacking PostgreSQL Neil Conway, Gavin Sherry neilc@samurai.com, swm@alcove.com.au Introduction to Hacking PostgreSQL – p. 1. Outline 1. Development environment 2. Architecture of PostgreSQL 3. Backend conventions and infrastructure 4. How to submit a patch 5.

Tags:

  Introduction, Hacking, Postgresql, Hacking postgresql

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Introduction to Hacking PostgreSQL - Neil Conway

1 Introduction to Hacking PostgreSQLNeil Conway , Gavin to Hacking PostgreSQL p. 1 Outline1. Development environment2. Architecture of PostgreSQL3. Backend conventions and infrastructure4. How to submit a patch5. Example patch: addingWHEN qualification to triggersIntroduction to Hacking PostgreSQL p. 2 Part 1: Development EnvironmentMost of the Postgres developers use Unix; you probablyshould tooYou ll need to know CFortunately, C is easyUnix systems programming knowledge is helpful,depending on what you want to work onLearning to understand how a complex systemfunctions is a skill in itself ( code reading ) Introduction to Hacking PostgreSQL p. 3 Development ToolsBasics: $CC, Bison, Flex, CVS, autotools, gdbConfigure flags:enable-depend,enable-debug,enable- cassertConsiderCFLAGS=-O0for easier debugging, but thissuppresses some classes of warningstagsorcscopeare essential What is the definition of this function/type?

2 What are all the call-sites of this function? src/tools/make_[ce]tagsccacheanddistccar e useful, especially on slowermachinesvalgrindcan be useful for debugging memory errorsIntroduction to Hacking PostgreSQL p. 4 Text EditorIf you re not using a good programmer s text editor, startTeach your editor to obey the Postgres codingconventions:Hard tabs, with a tab width of 4 spacesSimilar to Allman/BSD style; just copy thesurrounding codeUsing the Postgres coding conventions makes it morelikely that your patch will be promptly reviewed andappliedIntroduction to Hacking PostgreSQL p. 5 Part 2: PostgreSQL ArchitectureFive main components:1. Theparser- parse the query string2. Therewriter- apply rewrite rules3. Theoptimizer- determine an efficient query plan4. Theexecutor- execute a query plan5.

3 Theutility processor- process DDL likeCREATETABLEI ntroduction to Hacking PostgreSQL p. 6 Architecture DiagramPostgres backendPostgresMain()PARSE:Parse query stringpg_parse_query()ANALYZE:Semantic analysis of query, transform to Query nodeparse_analyze()REWRITE:Apply rewrite rulespg_rewrite_queries()UTILITY PROCESSOR:Execute DDLP ortalRun() -> ProcessUtility()PLAN:Produce a query planpg_plan_queries()EXECUTOR:Execute DMLP ortalRun() -> ExecutePlan() Introduction to Hacking PostgreSQL p. 7 The ParserLex and parse the query string submitted by the the guts; entry point a raw parsetree : a linked list of parse nodesParse nodes are defined ininclude/ is usually a simple mapping between grammarproductions and parse node structureIntroduction to Hacking PostgreSQL p. 8 Semantic AnalysisIn the parser itself, only syntactic analysis is done; basicsemantic checks are done in a subsequent analysisphase related code underparser/Resolve column references, considering schema pathand query contextSELECT a, b, c FROM t1, t2, x IN (SELECT t1 FROM b)Verify that target schemas, tables and columns existCheck that the types used in expressions are consistentIn general, check for errors that are impossible ordifficult to detect in the parser itselfIntroduction to Hacking PostgreSQL p.

4 9 Rewriter, PlannerThe analysis phase produces aQuery, which is thequery s parse treeThe rewriter applies rewrite rules: view definitions andordinary rules. Input is aQuery, output is zero or moreQuerysThe planner takes aQueryand produces aPlan,which encodes how the query ought to be executedOnly needed for optimizable statements (INSERT,DELETE,SELECT,UPDATE)Introductio n to Hacking PostgreSQL p. 10 Executor, Utility ProcessorDDL statements are executed via the utility processor,which basically just calls the appropriate function foreach different kind of DDL statementprocessUtility() ; theimplementation of the DDL statements is incommands/Optimizeable statements are processed via theExecutor: given aPlan, it executes the plan andproduces any resulting tuplesexecutor/; entry point is to Hacking PostgreSQL p.

5 11 Part 3: Common Idioms: NodesPostgres uses a very simple object system with supportfor single inheritance. The root of the class hierarchy isNode:typedef struct typedef struct typedef struct{ { {NodeTag type; NodeTag type; Parent parent;} Node; int a_field; int b_field;} Parent; } Child;This relies on a C trick: you can treat aChild*like aParent*since their initial fields are the sameThe first field ofanyNodeis aNodeTag, which can beused to determine aNode s specific type at runtimeIntroduction to Hacking PostgreSQL p. 12 Nodes, a newNode:makeNode()Run-time type testing via theIsA()macroTest if two nodes are equal:equal()Deep copy a node:copyObject()Serialise a node to text:nodeToString()Deserialise a node from text:stringToNode() Introduction to Hacking PostgreSQL p.

6 13 Nodes: HintsWhen you modify a node or add a new node, rememberto may have to yourNodeis to be serialised/deserialisedGrepping for references to the node s type can behelpful to make sure you don t forget to update anythingIntroduction to Hacking PostgreSQL p. 14 Memory ManagementPostgres uses hierarchical, region-based memorymanagement, and it absolutely rocksbackend/util/mmgrMemory is allocated viapalloc()All allocations occur inside amemory contextDefault memory context:CurrentMemoryContextIntroduction to Hacking PostgreSQL p. 15 Memory Management, can be freed individually viapfree()When a memory context is reset, all allocations in thecontext are releasedResetting contexts is both faster and lesserror-prone than releasing individual allocationsContexts are arranged in a tree; deleting/resetting acontext deletes/resets its child contextsIntroduction to Hacking PostgreSQL p.

7 16 Memory Management ConventionsYou shouldsometimespfree()your allocationsIf the context of allocation is known to be short-lived,don t bother withpfree()If the code might be invoked in an arbitrary memorycontext ( utility functions), you shouldpfree()The exact rules are a bit hazyBe aware of the memory allocation assumptions madeby functions you callMemory leaks,per se, are rare in the backendAll memory is released eventuallyA leak is when memory is allocated in atoo-long-lived memory context: allocating someper-tuple resource in a per-txn contextIntroduction to Hacking PostgreSQL p. 17 Error HandlingMost errors reported byereport()orelog()ereport()is for user-visible errors, and allowsmore fields to be specified (SQLSTATE, detail, hint,etc.)Implemented vialongjmp(3); conceptually similar toexceptions in other languageselog(ERROR)walks back up the stack to theclosest error handling block; that block can eitherhandle the error or re-throw itThe top-level error handler aborts the currenttransaction and resets the transaction s memorycontextReleases all resources held by the transaction,including files, locks, memory, and buffer pinsIntroduction to Hacking PostgreSQL p.

8 18 Error Handling, error handlers can be defined viaPG_TRY()Think about error handling!Neverignore the return values of system callsShould your function return an error code, orereport()on failure?Probablyereport()to save callers the trouble ofchecking for failureUnlessthey can provide a better (more descriptive)error message, or they might not consider the failureto be an actual errorUse assertions (Assert) liberally to detectprogramming errors, butnevererrors the user mightencounterIntroduction to Hacking PostgreSQL p. 19 Part 4: Your First PatchStep 1:Research and preparationIs your new feature actually useful? Does it justscratch your itch, or is it of general value?Does it need to be implemented in the backend, orcan it live in pgfoundry,contrib/, or elsewhere?Does the SQL standard define similar or equivalentfunctionality?

9 What about Oracle, DB2, .. ?Has someone suggested this idea in the past?Search the archives and TODO listMost ideas are badIntroduction to Hacking PostgreSQL p. 20 Sending A ProposalStep 2:Send a proposal for your feature topgsql-hackersPatches that appear without prior discussion riskwasting your timeDiscuss your proposed syntax and behaviorConsider corner cases, and how the feature willrelate to other parts of PostgreSQL (consistency isgood)Will any system catalog changes be needed?Backward-compatibility?Try to reach a consensus with-hackerson how thefeature ought to behaveIntroduction to Hacking PostgreSQL p. 21 ImplementationStep 3:Implement the patchA general strategy is to look at how similar parts ofthe system functionDon t copy and paste(IMHO) Common source of errorsInstead, read through similar sections of code totry to understand how they work, and the APIsthey are usingImplement (just) what you need, refactoring theexisted APIs if requiredAsk for implementation advice as needed(-hackersor IRC)Consider posting work-in-progress versions of thepatchIntroduction to Hacking PostgreSQL p.

10 22 Testing, DocumentationStep 4:Update toolsFor example, if you ve modified DDL syntax, updatepsql s tab completionAddpg_dumpsupport if necessaryStep 5:TestingMake sure the existing regression tests don t failNo compiler warningsAdd new regression tests for the new featureStep 6:Update documentationmake checkindoc/src/sgmldoes a syntaxcheck that is faster than building the whole SGML docsCheck documentation changes visually in a browserIntroduction to Hacking PostgreSQL p. 23 Submitting The PatchStep 7:Submit the patchUse context diff format:diff -cRevieweveryhunk of the patchIs this hunk necessary?Does it needlessly change whitespace or existingcode?Does it have any errors? Does it fail in cornercases? Is there a more elegant way to do this?Work with a code reviewer to make any necessarychangesIf your patch falls through the cracks,be persistentThe developers are busy and reviewing patches isdifficult, time-consuming, and unglamorous workIntroduction to Hacking PostgreSQL p.


Related search queries