Title stata.com String functions

FunctionsContentsFunctionsReferencesAlso seeContentsabbrev(s,n)names, abbreviated to a length ofnchar(n)the character corresponding toASCIIor extendedASCII coden;""ifnis not in the domaincollatorlocale(loc,type)the most closely related locale supported byICUfromlociftypeis1; the actual locale where the collation data comes from iftypeis2collatorversion(loc)the version String of a collator based on localelocindexnot(s1,s2)the position inASCII strings1of the first character ofs1not foundinASCII strings2, or0if all characters ofs1are found ins2plural(n,s)the plural ofsifn6= 1plural(n,s1,s2)the plural ofs1, as modified by or replaced withs2, ifn6= 1real(s)sconverted to numeric ormissingregexm(s,re)performs a match of a regular expression and evaluates to1if regularexpressionreis satisfied by theASCII strings; otherwise,0regexr(s1,re,s2)replaces the first substring withinASCII strings1that matchesrewithASCII strings2and returns the resulting stringregexs(n)subexpressionnfrom a previousregexm()match, where 0 n <10soundex(s)the soundex code for a String ,ssoundexnara(s)the Census soundex code for a String ,sstrcat(s1,s2)there is nostrcat()function; instead the addition operator is usedto concatenate stringsstrdup(s1,n)there is nostrdup()function; instead the multiplication operatoris used to create multiple copies of stringsstring(n)a synonym forstrofreal(n) String (n,s)a synonym forstrofreal(n,s)stritrim(s)swith multiple, consecutive internal blanks (ASCII space characterchar(32)) collapsed to one blankstrlen(s)the number of characters inASCIIsor length in bytesstrlower(s)lowercaseASCII characters in stringsstrltrim(s)swithout leading blanks (ASCII space characterchar(32))strmatch(s1,s2) 1ifs1matches the patterns2.

Otherwise,0strofreal(n)nconverted to a stringstrofreal(n,s)nconverted to a String using the specified display formatstrpos(s1,s2)the position ins1at whichs2is first found,0ifs2does not occur,and1ifs2is empty12 String functionsstrproper(s)a String with the firstASCII letter and any other letters immediatelyfollowing characters that are not letters capitalized; all otherASCII letters converted to lowercasestrreverse(s)reverses theASCII stringsstrrpos(s1,s2)the position ins1at whichs2is last found,0ifs2does not occur,and1ifs2is emptystrrtrim(s)swithout trailing blanks (ASCII space characterchar(32))strtoname(s[,p])strans lated into a Stata 13 compatible namestrtrim(s)swithout leading and trailing blanks (ASCII space characterchar(32)); equivalent tostrltrim(strrtrim(s))strupper(s)upperc aseASCII characters in stringssubinstr(s1,s2,s3,n)s1, where the firstnoccurrences ins1ofs2have been replacedwiths3subinword(s1,s2,s3,n)s1, where the firstnoccurrences ins1ofs2as a word have beenreplaced withs3substr(s,n1,n2)the substring ofs, starting atn1, for a length ofn2tobytes(s[,n])escaped decimal or hex digit strings of up to 200 bytes ofsuchar(n)the Unicode character corresponding to Unicode code pointnoran empty String ifnis beyond the Unicode code-point rangeudstrlen(s)the number of display columns needed to display the Unicode stringsin the Stata Results windowudsubstr(s,n1,n2)the Unicode substring ofs, starting at charactern1, forn2displaycolumnsuisdigit(s) 1if the first Unicode character insis a Unicode decimal digit;otherwise,0uisletter(s) 1if the first Unicode character insis a Unicode letter.

Otherwise,0ustrcompare(s1,s2[,loc])compa res two Unicode stringsustrcompareex(s1,s2,loc,st,case,c slv,norm,num,alt,fr)compares two Unicode stringsustrfix(s[,rep])replaces each invalidUTF-8sequence with a Unicode characterustrfrom(s,enc,mode)converts the stringsin encodingencto aUTF-8encoded Unicodestringustrinvalidcnt(s)the number of invalidUTF-8sequences insustrleft(s,n)the firstnUnicode characters of the Unicode stringsustrlen(s)the number of characters in the Unicode stringsustrlower(s[,loc])lowercase all characters of Unicode stringsunder the given localelocustrltrim(s)removes the leading Unicode whitespace characters and blanks fromthe Unicode stringsustrnormalize(s,norm)normalizes Unicode stringsto one of the five normalization formsspecified bynormustrpos(s1,s2[,n])the position ins1at whichs2is first found; otherwise,0ustrregexm(s,re[,noc])perform s a match of a regular expression and evaluates to1if regularexpressionreis satisfied by the Unicode strings; otherwise,0ustrregexra(s1,re,s2[,noc])re places all substrings within the Unicode strings1that matchrewiths2and returns the resulting stringString functions 3ustrregexrf(s1,re,s2[,noc])replaces the first substring within the Unicode strings1that matchesrewiths2and returns the resulting stringustrregexs(n)subexpressionnfrom a previousustrregexm()matchustrreverse(s)r everses the Unicode stringsustrright(s,n)the lastnUnicode characters of the Unicode stringsustrrpos(s1,s2[,n])the position ins1at whichs2is last found.

Otherwise,0ustrrtrim(s)remove trailing Unicode whitespace characters and blanks from theUnicode stringsustrsortkey(s[,loc])generates a null-terminated byte array that can be used by thesortcommand to produce the same order asustrcompare()ustrsortkeyex(s,loc,st,ca se,cslv,norm,num,alt,fr)generates a null-terminated byte array that can be used by thesortcommand to produce the same order asustrcompare()ustrtitle(s[,loc])a String with the first characters of Unicode words titlecased andother characters lowercasedustrto(s,enc,mode)converts the Unicode stringsinUTF-8encoding to a String inencodingencustrtohex(s[,n])escaped hex digit String ofsup to 200 Unicode charactersustrtoname(s[,p])stringstransl ated into a Stata nameustrtrim(s)removes leading and trailing Unicode whitespace characters andblanks from the Unicode stringsustrunescape(s)the Unicode String corresponding to the escaped sequences ofsustrupper(s[,loc])uppercase all characters in stringsunder the given localelocustrword(s,n[,loc])thenth Unicode word in the Unicode stringsustrwordcount(s[,loc])the number of nonempty Unicode words in the Unicode stringsusubinstr(s1,s2,s3,n)replaces the firstnoccurrences of the Unicode strings2with theUnicode strings3ins1usubstr(s,n1,n2)the Unicode substring ofs, starting atn1, for a length ofn2word(s,n)thenth word ins;missing("") ifnis missingwordbreaklocale(loc,type)the most closely related locale supported byICUfromlociftypeis1, the actual locale where the word-boundary analysis datacome from iftypeis2; or an empty String is returned for anyothertypewordcount(s)the number of words insFunctionsIn the display below,sindicates a String subexpression (a String literal, a String variable, or anotherstring expression) andnindicates a numeric subexpression (a number, a numeric variable, or anothernumeric expression).

If your strings contain Unicode characters or you are writing programs that will be used by otherswho might use Unicode strings, read[U] Handling Unicode String functionsabbrev(s,n)Description:names, abbreviated to a length ofnLength is measured in the number of display columns, not in the number ofcharacters. For most users, the number of display columns equals the number ofcharacters. For a detailed discussion of display columns, see[U] DisplayingUnicode any of the characters ofsare a period, . , andn <8, then the value ofndefaults to a value of 8. Otherwise, ifn <5, thenndefaults to a value of ,abbrev()will return the entire ()is typicallyused with variable names and variable names with factor-variable or time-seriesoperators (the period case).abbrev("displacement",8)isdispla~ :stringsDomainn:integers 5 to 32 Range:stringschar(n)Description:the character corresponding toASCIIor extendedASCII coden;""ifnis not inthe domainNote:ASCII codes are from 0 to 127; extendedASCII codes are from 128 to255.

Prior to Stata 14, the display of extendedASCII characters was encodingdependent. For example,char(128)on Microsoft Windows using Windows-1252encoding displayed the Euro symbol, but on Linux using ISO-Latin-1 encoding,char(128)displayed an invalid character symbol. Beginning with Stata 14, Stata sdisplay encoding isUTF-8on all platforms. Thechar(128)function is an invalidUTF-8sequence and thus will display a question mark. There are two Unicodefunctions corresponding tochar():uchar()andustrunescape(). You canuseuchar(8364)orustrunescape("\u20AC" )to display a Euro sign on :integers 0 to 255 Range:ASCII charactersuchar(n)Description:the Unicode character corresponding to Unicode code pointnor an empty stringifnis beyond the Unicode code-point rangeNote thatuchar()takes the decimal value of the Unicode code ()takes an escaped hex digit String of the Unicode code point. Forexample, bothuchar(8364)andustrunescape("\u20ac") produce the :integers 0 Range:Unicode charactersString functions 5collatorlocale(loc,type)Description:the most closely related locale supported byICUfromlociftypeis1; the actuallocale where the collation data comes from iftypeis2 For any othertype,locis returned in a canonicalized ("enustexas", 0)=enUSTEXAS collatorlocale("enustexas", 1)=enUScollatorlocale("enustexas", 2)=rootDomainloc:strings of locale nameDomaintype: integersRange:stringscollatorversion(loc )Description:the version String of a collator based on localelocThe Unicode standard is constantly adding more characters and the sort key formatmay change as well.

This can causeustrsortkey()andustrsortkeyex()to produce incompatible sort keys between different versions of InternationalComponents for Unicode. The version String can be used for versioning the sortkeys to indicate when saved sort keys must be :stringsindexnot(s1,s2)Description:the position inASCII strings1of the first character ofs1not found inASCII strings2, or0if all characters ofs1are found ins2indexnot()is intended for use with only plainASCII strings. For Unicodecharacters beyond the plainASCII range, the position and character are given inbytes, not :ASCII strings (to be searched)Domains2:ASCII strings (to search for)Range:integers 0plural(n,s)Description:the plural ofsifn6= 1 The plural is formed by adding s (1, "horse")="horse"plural(2, "horse")="horses"Domainn:real numbersDomains:stringsRange:strings6 String functionsplural(n,s1,s2)Description:the plural ofs1, as modified by or replaced withs2, ifn6= 1 Ifs2begins with the character + , the plural is formed by adding the remainderofs2tos1.

Ifs2begins with the character - , the plural is formed by subtractingthe remainder ofs2froms1. Ifs2begins with neither + nor - , then the pluralis formed by (2, "glass", "+es")="glasses"plural(1, "mouse", "mice")="mouse"plural(2, "mouse", "mice")="mice"plural(2, "abcdefg", "-efg")="abcd"Domainn:real numbersDomains1:stringsDomains2:stringsR ange:stringsreal(s)Description:sconverte d to numeric ormissingAlso seestrofreal().real(" ")+1= ("hello")=.Domains:stringsRange: 8e+307 to 8e+307 ormissingregexm(s,re)Description:perform s a match of a regular expression and evaluates to1if regular expressionreis satisfied by theASCII strings; otherwise,0 Regular expression syntax is based on Henry Spencer sNFAalgorithm, and this isnearly identical to not contain binary 0 (\0).regexm()is intended for use with only plainASCII characters. For Unicodecharacters beyond the plainASCII range, the match is based on bytes. For acharacter-based match, seeustrregexm().

Domains:ASCII stringsDomainre:regular expressionsRange:ASCII stringsString functions 7regexr(s1,re,s2)Description:replaces the first substring withinASCII strings1that matchesrewithASCII strings2and returns the resulting stringIfs1contains no substring that matchesre, the unaltereds1is result ofregexr()may be at most 1,100,000 characters ,re, ands2may not contain binary 0 (\0).regexr()is intended for use with only plainASCII characters. For Unicodecharacters beyond the plainASCII range, the match is based on bytes and the resultis restricted to 1,100,000 bytes. For a character-based match, seeustrregexrf()orustrregexra().Domains1 :ASCII stringsDomainre:regular expressionsDomains2:ASCII stringsRange:ASCII stringsregexs(n)Description:subexpressio nnfrom a previousregexm()match, where 0 n <10 Subexpression 0 is reserved for the entire String that satisfied the regular returned subexpression may be at most 1,100,000 characters (bytes) :0 to 9 Range:ASCII stringsustrregexm(s,re[,noc])Description :performs a match of a regular expression and evaluates to1if regular expressionreis satisfied by the Unicode strings; otherwise,0 Ifnocis specified and not 0, a case-insensitive match is performed.

The functionmay return a negative integer if an error ("12345", "([0-9]){5}")=1ustrregexm("de TRÈS près", "rès")=1ustrregexm("de TRÈS près", "Rès")=0ustrregexm("de TRÈS près", "Rès", 1)=1 Domains:Unicode stringsDomainre:Unicode regular expressionsDomainnoc:integersRange:integ ers8 String functionsustrregexrf(s1,re,s2[,noc])Desc ription:replaces the first substring within the Unicode strings1that matchesrewiths2and returns the resulting stringIfnocis specified and not 0, a case-insensitive match is performed. The functionmay return an empty String if an error ("très près", "rès", "X")="tX près"ustrregexrf("TRÈS près", "Rès", "X")="TRÈS près"ustrregexrf("TRÈS près", "Rès", "X", 1)="TX près"Domains1:Unicode stringsDomainre:Unicode regular expressionsDomains2:Unicode stringsDomainnoc:integersRange:Unicode stringsustrregexra(s1,re,s2[,noc])Descri ption:replaces all substrings within the Unicode strings1that matchrewiths2andreturns the resulting stringIfnocis specified and not 0, a case-insensitive match is performed.

Title stata.com String functions

Tags:

Information

Advertisement

Transcription of Title stata.com String functions

Related search queries

Title stata.com String functions

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries