Example: dental hygienist

DReX: A Declarative Language for Efficiently Evaluating ...

DReX: A Declarative Language for EfficientlyEvaluating Regular String Transformations Rajeev AlurLoris D AntoniMukund RaghothamanUniversity of Pennsylvania{alur, lorisdan, presentDReX, a Declarative Language that can express allregular string-to-string transformations, and can still be efficientlyevaluated. The class of regular string transformations has a robusttheoretical foundation including multiple characterizations, closureproperties, and decidable analysis questions, and admits a numberof string operations such as insertion, deletion, substring swap,and reversal.}

Programs that transform plain text are ubiquitous and used for many ... that are enabled by the properties of the underlying transducer model. Due to the focus on analyzability, expressiveness is a limiting ... joint, making the complementation of the domain of funnecessary. Similarly, for the operator combine(f;g), we require the domains

Tags:

  Programs, Making, Enabled

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of DReX: A Declarative Language for Efficiently Evaluating ...

1 DReX: A Declarative Language for EfficientlyEvaluating Regular String Transformations Rajeev AlurLoris D AntoniMukund RaghothamanUniversity of Pennsylvania{alur, lorisdan, presentDReX, a Declarative Language that can express allregular string-to-string transformations, and can still be efficientlyevaluated. The class of regular string transformations has a robusttheoretical foundation including multiple characterizations, closureproperties, and decidable analysis questions, and admits a numberof string operations such as insertion, deletion, substring swap,and reversal.}

2 Recent research has led to a characterization ofregular string transformations using a primitive set of functioncombinators analogous to the definition of regular languages usingregular expressions. While these combinators form the basis forthe languageDReXproposed in this paper, our main technicalfocus is on the complexity of Evaluating the output of aDReXprogram on a given input string. It turns out that the naturalevaluation algorithm involves dynamic programming, leading tocomplexity that is cubic in the length of the input string.

3 Ourmain contribution is identifying a consistency restriction on theuse of combinators inDReXprograms, and a single-pass evaluationalgorithm for consistent programs with time complexity that is linearin the length of the input string and polynomial in the size of theprogram. We show that the consistency restriction does not limit theexpressiveness, and whether aDReXprogram is consistent can bechecked Efficiently . We report on a prototype implementation, andevaluate it using a representative set of text processing and Subject [ Language Classifica-tions]: Specialized application languages; [Theory of Compu-tation]: Models of Computation, AutomataKeywordsDReX, string transformations, Declarative languages1.

4 IntroductionPrograms that transform plain text are ubiquitous and used for manydifferent tasks, from reformatting documents to translating databetween different formats. String specific utilities such as sed, AWK,and Perl have been used to query and reformat text files for manyyears. Since these tools are Turing complete they can express verycomplex transformations, however this comes at the cost of not beingamenable to algorithmic analysis. To address this issue, restrictedlanguages have been proposed in the context of verification of stringsanitizers [26] and string coders [13], and for the analysis andoptimization of list-manipulating programs [15].

5 These languagesbuild on variants of finite-state transducers, which are automata-based representations of programs mapping strings to strings, and This is the full version of the paper to be presented at POPL 2015, andincludes proofs omitted from the short version of the paper. A prototypeimplementation ofDReXmay be downloaded ~rmukund/drex/. This research was partially supported byNSF Expeditions in Computing grant CCF of these languages supports different algorithmic analysesthat are enabled by the properties of the underlying transducermodel.

6 Due to the focus on analyzability, expressiveness is a limitingfactor in all such languages and many programs , in particular thosethat reorder input chunks, cannot be represented. Moreover, theselanguages are not Declarative and their semantics are tightly coupledto the transducer model, forcing the programmer to reason in termsof finite state machines and process the input the theory of string-to-string transformations the class of reg-ular string transformations is a robust class that strikes a balancebetween decidability and expressiveness.

7 In particular this classcaptures transformations that involve reordering of input chunks, itis closed under composition [9], it has decidable equivalence [21],and has several equivalent characterizations, such as one-way trans-ducers with a finite set of write-only registers [1], two-way trans-ducers [16] and monadic second-order definable graph transforma-tions [10]. Recently Alur et al [3] proposed a set of combinatorsthat captures the class of regular string transformations. In [3] thefocus is on expressiveness and the paper does not provide an effi-cient procedure to evaluate programs written with these evaluation of such programs is the main focus of this with the combinators presented in [3], we developDReX, an expressive Declarative Language to describe string transfor-mations.

8 The base combinator ofDReX, 7 d, maps any characterathat satisfies the predicate to the stringd(a). This combinatorsymbolically extends the one proposed in [3] with predicates andcan therefore succinctly model strings over large (and potentiallyinfinite) alphabets, such as Unicode. The other combinators sup-ported byDReXare: (a)split(f,g)that unambiguously splitsthe input string into two parts and outputs the concatenation of theresults obtained usingfon the first part andgon the second part;(b)iterate(f)that unambiguously splits the input string into mul-tiple parts and outputs the concatenation of evaluatingfon eachof such parts; (c)combine(f,g)that applies bothfandgto theinput string and concatenates the obtained results.

9 (d) the condi-tionalfelsegthat first tries to applyfto the input and iffcannotbe applied it appliesg; (e)chain(f,R)that unambiguously splitsthe string into multiple parts = neach belonging to thelanguage described by the regular expressionR, appliesfto everytwo pair of adjacent chunks i i+1, and finally concatenates theseresults. In order to model operations such as reversing a string, theoperators split, iteration and chained sum also have aleft-additiveversion in which the outputs computed on each split of the stringare concatenated in reverse straightforward algorithm to evaluateDReXprograms in-volves operationalizing the semantics, use dynamic program-ming and evaluate each sub-program on each substring of the , this algorithm takes time cubic in the length of theinput string.

10 And does not scale to strings longer than approximatelya thousand characters. Because of the analogy betweenDReXoper-ators (split sum, conditionals, iteration, etc.) and regular expressions(concatenation, union, Kleene-*, etc.) one approach is to constructan automaton model for evaluatingDReXprograms similarly to theapproach taken to evaluate regular expressions. This is not simplebecause of various reasons, such as: (a) the conditional operator,felseg, applies the transformationgto the input only if the in-put string is not accepted byf.


Related search queries