Java性能调整,这并不是新话题了,本文主要从思路方法、对象创建、多线程和同步、list效率、String处理、代码有效长度缩减等多个应用例子中整理总结应用窍门技巧,加上结合新JDK1.4编译器,是开发者可以创建几乎和C样块应用...
[i]ThefollowingmantrawasfirststatedabouttwodecadesagoinJonBentley\'s\"ProgrammingPearls\"columndeferoptimizationandgetyourcodeworkingfirst.Thiswisdomhasbeenampliedbynumerouswritersonobject-orienteddesign,coding,thinking,andmore.Thereigningphilosophyhasbeenstatedas,\"getitworkingfirst,thendetermihichareasarethecriticalonesandoptimizeonlythose.\"[/i]
Since20%ofthecodeisrun80%ofthetime,thisseemslikeareasonableidea.Bentleywasanearlyadvocateoftheuseofprofilingtoolsthatshowwhichpartsofthecodearerunthemost,andtargetingthemostcriticalareasfirst.Onthewhole,thisisgoodadvice,butthelessonhasbeenlearnedtoowell.Todayit\'scommonpracticetoignoreefficiency,scatteringlayersofunnecessaryinefficiencyeverywherewithoutthought.Thisarticleshowsthatit\'sjustaseasytowritefastercodewithouttakingextradevelopmenttimetodoit,andteachessomethingaboutthewayJavaoptimizesyourcode.
MethodCalls
Object-orienteddesignisanorganizationaltechnique;attheindividualmethodlevel,codehasbeenwrittenthesamewayforthepast40years.Thefocusonorganizationmerelyscodedownosmallmanageableunitsesthatcontainanumberof(typicallyquitesmall)methods.InalanguagesuchasCthisisnotaproblematall,asthelanguageitselfhasextensivecontroloverthecostofthemethodcalls.Infact,withtheinlinedirective,thecostofmethodexecutioninCcandroptozero.InJava,however,thedefaultmethoddefinitioncheckstoseewhatobjectitisandcallstheappropriatemethod,whichisequivalenttoavirtualfunctioninC.Thisinvolvesoverhead:theprogrammustfirstexaminetheobjecttodetermineitstype,selecttheappropriatemethod,andthencallit.Callingamethodisquiteslowcomparedtoexecutinginstructionswithinamethod.Infact,asmybenchmarksshow,aloopthatexecutesntimesdoingnothingbutcountingis50timesfasterthanonethatcallsamethodthatdoesnothing.(SeethebenchmarkonmyWebsite,javaperformance/benchmarks\' target=\'_blank\' =\'l2\'>www.righttrak.com/javaperformance/benchmarks.)
Whatisthecostofamethodcallandhowcanyoureduceit?AmethodcostsaboutthreeunitsoftimeonmyPentium4PC,whereaunitisdbyanemptycountingloop.Afinalmethodcostsroughlythesame,andanordinarynonfinalmethodisaboutthreetimesasexpensive.Theconclusionisobvious:wheneverpossible,uhefinalorqualiersonmethods(inotherwords,youdon\'tendtooverridethemethod,sayso).
Let\'sstartbysayingthatyouwantaprogramtorunfast,getJDK1.4andrunitwithoptimizationturnedon:
java-serverMyClass
The-serveroptionscanstheentireloadedprogramasit\'sbeingrun,eliminatingmethodsbyinliningthem,turningmethodsonativeassemblers,removingconstantevaluationsfromloops,andotheroptimizations.Itimprovesperformance,oftenbyafactorof10inCPU-ensivebitsofcode.Itmightsurpriseyoutothinkaboutoptimizingprogramsatruntime,butconsideringthatJavarunsondferentmachines,theonlywaytooptimizeforyourparticularprocessorisatruntime.
Thisfeatureisin1.4.There\'sa\"badfeature\"in1.3thattendstoinvoketheJITcompilerlazily,oftentoolate.Ifyoucompilethefollowingprogramandrunitunder1.3,anycodeinisoptimized,butcodeinfisoptimizedonlyafterit\'scalledonce:
publicvoidf{...}
publicvoid(Stringargs){
f;
}
Inthis,sincefiscalledonlyonce,it\'sobviousthatthecompilerhadbetteroptimizefbeforeexecutingit,ornotbother.
Thecompilermakescertainassumptionsaboutwhatisworthinliningbasedonthefactthatinliningcodecantakemorememorythecodeisbig.Ifyouhaveabigroutine,itwon\'tbeinlined.Thiscanbeveryunfortunateinsomespecializeds,whichiswherehumanelligencecomesin.
Supposeyou\'rewritingf,whichcallsg,whichcallsh.You\'redoingthisjusttoupthecodeandmakeiteasiertoread.However,eventhoughfistheonlyfunctionthatcallsg,gisbigenoughitwon\'tbeinlined.Itdoesn\'tmatterthere\'sabigloopinvolved:
voidf{g;}
voidg{
for(i=0;i<100000000;i)
...;
}
becauhetimeittakestoperformtheloopdwarfsthetimeittakestogetinandout,andsothepercentagecostoftheprocedurecallistiny.
Thereis,however,atobemadeforapplyinghumanelligencetoinliningcode.Frequentlyamethodislarge,butthefirstlineofthemethodisatestthatdetermineswhetherornottoexecutetherest.Consideraloggingroutine.Ifdebugmodeison,itshouldwriteout.Butforallthoseswheredebugisfalse,whycalltheroutineatall?Here\'sanexampleofthisinaction:
privatebooleandebug;
Inthefirst,f7callstheloggingroutineregardlessofthestateofthedebuggingflag:
publicf7(n){
sum=0;
for(i=0;i<n;i){
log(i);
}
sum;
}
Inthesecond,f8callstheloggingroutineonlydebugmodeison.Atthecostofanextrastatementeverytimeyoucallthelogroutine,thiscoderuns25%faster.
publicf8(n){
sum=0;
for(i=0;i<n;i){
(debug)
log2(i);
}
sum;
}
MultithreadingandSynchronization
SinceJavaisamultithreadedlanguage,thesynchronizedprimitiveisprovidedtomakesurethatmultiplethreadsofexecutiondonotdestroyobjects.Whenenteringasynchronizedmethod,itacquiresalockthat\'sassociatedwiththeobjectandpreventsanyothersynchronizedmethodfromentering.Acquiringsuchalockisaslowmachinelanguageinstruction.Theresultisthatcallingasynchronizedmethodisthreetimesasslowasanordinarymethod,whichurnisthreetimesasslowasaorfinalmethod.Thecomputermustfirstcheckwhethersomeonealreadyhasthelock,andnot,acquirethelockallinoneatomicoperation.
publicsynchronizedvoidf{...}
Multithreadingisacomplextopic,andthereaderiswelladvisedtoreadoneofthemanybooksonthetopicforafullunderstanding.However,averyquickoverviewofoptimizationshouldbeginwiththeobservationthatsinceacquiringlocksandmanagingthemisacostlybusiness,weshouldavoiditunlessthereissomerealreasonfortheiruse.Furthermore,multithreadingwillonlyresultinagainofefficiencywecanovercometheimmediatelossofefficiencythatresultsfromcallingsynchronizedmethods.Inagreatmanysituations,multithreadingisnotcalledfor,andthesimplestandbestwaytohandlethesituationistosaythattheobjectisnotthread-safe,andthatprogrammersshouldnevermaketwosimultaneouscallstothesameobject.
Ontheotherhand,multithreadinghasasignicantadvantage,thebestwaytoachieveit,wherepossible,istohavemorethanoneobjectandgiveeachthreaditsownobject.Asperfectexamplesofthis,considerIOstreamsinaWebenvironment.WhiletheremaybemanythreadssimultaneouslywritingWebpages,eachoneiswritingtoitsownWebpage.Insuchs,theIOstreamforservletscouldbeimplementedasafast,unsynchronizedversionofPrStream.
Sometimes,however,thereareapplications(likealog)whereit\'svitalthatmultiplethreadsbeabletowritetothesameobject.Insuchs,synchronizationisvitalforcorrectness.Whilewecanaddatothelibrarytosupportunthread-safeIO,wemustalwayscontinuetosupportthread-safeIOforthosefewswhereit\'simportant.
Ifyou\'regoingtoacquirealock,dosoonlyonce.Planninghowlocksareacquiredandreleasedisnotonlygoodoptimizationpractice,it\'sworthreallythinkingoverasthisisoneofthorickyareaswherebadlythought-outdesignsarenotonlyslow,butoftendon\'tworkinverysubtle,nonrepeatableways.Thesearethehardestpossiblesituationstodebug.Becauseacquiringthelockmeansthatnoonecanenter,synchronizedcriticalsectionsshould:
Beasaspossible
Notcallothersynchronizedroutines(i.e.,dowhateverneedstobedoneinasinglesynchronizedsectionpossible)
Neverallowunsynchronizedaccesstocriticaldata
Neverdeadlock
[i]CaseStudy[/i]
Simplyremovingallthesynchronizationfromjava.io.PrWriterandwritingathatisfunctionallyequivalentbutnotthreadsaferesultedina50%improvementinspeed.ClassPrWritercontainssynchronizedmethodsthatcallothersynchronizedmethods,insomesthreedeep.ThelongchainsofmethodinvocationbeforegettingtoanyactualcodeisalargepartofwhatslowsdownIO.
CallingNativeMethods
Youmightassumethatyoureallyneedspeed,youcanresorttolinkinginsomeCcodeandcallthatfortheultimateinperformance.Theanswermaysurpriseyou;itcertainlysurprisedme.EvenignoringtheobviousdisadvantagesofusingCthelackofportability,requiringasharedlibrarytodeployanapplication,etc.thesimplefactisthatcallinganativemethodistwiceasslowasanordinarymethodcall.
HavinglookedabitattheimplementationoftheJDK,Icantellyouthatwhileitmaybetweakedabit,thereasonisessentiallysoundtocallaCroutine,youmustfirstmakeanativemodecall(that\'sone)andthenupacalltotheunderlyingCroutine;twiceasmuchwork,twiceasmuchtime,right?AndtocommunicatewithanythingheJavaenvironmenttakesfurthercallsaswell,sotheonlywayyou\'llseeasignicantspeedadvantageisbystayingheCworldforawhile.In,nativemethodsseemtobetotallyoutedatthispobyacombinationofincreasinglygoodoptimizationheJavaworldandthesomewhatinefficientcodeinvolvedhecommunicationbetweenthetwo.
CreatingObjects
AsaCprogrammeroriginally,IassumedthatthebiggestcostIwaslikelytofindwasthesynchronizedmethodcall.Iwassurprisedtheslowestoperationbyfarwasthecreationofanobject.Inhindsightitmakesperfectsense.Creatinganobjectrequirestheallocationofmemory,includingalltheoverheadforidentyingtheoftheobject,itslock,andtheamountofmemorybeingused.Afterusingtheobjectforatimeandinvokingmethods,thegarbagecollectormusteventuallyfreethememorythathasbeenallocated.Theactofallocatingthememoryalone,evenwhenoptimizedinJDK1.4,isfarmoreexpensivethanasynchronizedmethodcall.TheoverridingruleinJavacodeoptimizationissimple:don\'tcreateunnecessarytemporaryobjects.
Inthefollowingexample,thefirstversion,whichcreatesonlyasingleobjectandrepeatedlyqueriesit,is800msversus26,300ms,ormorethan30timesfasterthanthesecondone,whichrepeatedlyre-createstheobject.Thisisanextremeexample,ofcourse,becausewhatisbeingdoneisverysimplecomparedtotheobjectcreation,butitgivesanideaofjusthowcostlyobjectcreationis.
publicf10(n){
sum=0;
TempThingt=TempThing(0);
for(i=0;i<n;i)
sumt.getV;
sum;
}
publicf11(n){
sum=0;
for(i=0;i<n;i){
TempThingt=TempThing(0);
sumt.getV;
}
sum;
}
[i]CaseStudy[/i]
WhileremovingsynchronizationandstreamliningthecodepathofPrWriterresultedinafactoroftwoimprovementsinperformance,eliminatingthetemporarycreatedinpringanresultedinasixfoldperformanceimprovement.
StringManipulation
Manyprogrammershaveseenthesequence:
Strings=\"a\"+\"b\"+\"c\";
andknowthatStringBufferisbetter:
StringBufferb=StringBuffer;
b.append(\"a\").append(\"b\").append(\"c\");
Thisknowledgeseemstodownafterthispo.Ifyou\'reprocessinglargesinStringBuffers,don\'tthenturnthembackostopassthemtoanotherroutineunlessyou\'reworriedaboutmultithreadingproblems.Aslongasyou\'reprocessingsinglethreaded,you\'rebetteroffcontinuingtoappendotheStringBufferuntilyou\'redone.Thefollowingroutine:
publicStringgetAsXML{
StringBufferb=StringBuffer;
b.append(...);
b.toString;
}
mustmakeanunnecessarycopyinordertoturntheStringBufferoa.Then,thecallerisgoingtoappendmoretext,thismustbeappendedoyetanotherStringBuffer.Thisisabigwaste.Instead,try:
publicvoidgetAsXML(StringBufferbuf){
buf.append(...);
}
wherethecallerallocatestheStringBufferandpassesittotheroutine,whichfillsit.Thecallercanthencontinueprocessing.Thisapproachhasanotheradvantage,namelythatthecallerusuallyhasamuchbetterideaofthetotaltheStringBufferattheendofprocessing.Itisvastlymoreefficient,youknowhowmanycharactersareinvolved,topreallocatethemratherthanallowtheStringBuffertostartatthedefault16andgrow,whichrequiresalotofcopying.Forexample,youknowtheeventualthewillbeashighas2K,then:
StringBufferbuf=StringBuffer(2048);
obj.getAsXML(buf);
willtypicallyresultinapproximately100%performanceimprovementovertheoriginalcode.It\'sfarbettertooverallocatethantounderallocateandrequireagrowoperation.Remember,thisworksonlytheinquestionisnotbeingassaultedbymultiplethreads.
Manipulatings,evenoptimizedones,takesafairamountofworkandcode,eventhelengthisone.Ifyou\'reprocessingasinglecharacter,usingacharismuchfaster,so:
buf.append(\'\\n\');
issignicantlyfasterthan:
buf.append(\"\\n\");
EfficientUseofLists
Javaprovidesafairlyrichofdatastructures.They\'renotallthesame,andwhiletheymayworkerchangeably,thatdoesn\'tmeanthey\'reallequallygoodinallcircumstances.Tobuildupalistinorder,ArrayListisfasterthanLinkedListbyafactoroftwo.LinkedListissubstantiallyslowerbecauseeachnoderequiresthecreationofanobject.Vectorisaclosesecondinspeed;it\'sslowerbecauseit\'sasynchronizeddatastructure.However,insituationswherevaluesaretobeinsertedhemiddleofthelist(orworsestill,thebeginning),LinkedLististhebestbyordersofmagnitudesinceitdoesnothavetoconstantlycopyelementstomovethemaside.
ArrayListv=ArrayList(n);
for(i=0;i<n;i)
v.add(Integer(i));
v.size;
WhilebothVectorandArrayListuseadoublingalgorithmthatwilladaptivelygrablargerandlargerchunkseverytimethesizeisexceeded,eachtimetheygrowanenormousexpenseisincurred.AswithStringBuffer,it\'sabouttwiceasfasttopreallocateasmuchspaceasyou\'llneedthantogrowlater,evenyouoverallocate.
Last,rememberingthatobjectallocationistheslowestactivityofall,youcaneasilyseethatthislist,whichmustcreateobjectwrappersforeach,isvastlyinefficient.Thefollowingcode,usingalistwrittenjustforelements(seemyWebsiteforthecode),runsafullfour-and-a-halftimesfasterthanArrayList.
IntArrayLista=IntArrayList(n);
for(i=0;i<n;i)
a.add(i);
Forscanningthroughanexistinglist,ArrayLististhefastestoftheJDKlistes;gettinganelementfroman.gif' />isatrivialoperation,sosynchronizationdominatesthetime.Here,LinkedListcanbemonstrouslyslowyouuseitincorrectly.SinceLinkedListisnotarandom-accessdatastructure,callingget(i)meansitmuststartfromelement0andscanforwarduntilitreachespositioni.AloopthatscansthroughtheentirelististhereforenotanO(n)operation,butO(n2).Foralistof100,000elements,mycomputerperformedtheArrayListtraversalin3.25milliseconds.LinkedListtraversaltookanastounding113,657milliseconds,or34,971timesslower.
LinkedListl=l1;
for(i=0;i<n;i)
l.get(i);
ThecorrectwaytocodetraversalthroughaLinkedLististouheiteratordesignpattern:
LinkedListl=l1;
for(Iteratori=l.listIterator;i.hasNext;)
i.next;
Thelessontobelearnedhereisthatitpaystounderstandyourdatastructureswell.Justchoosingtherightdatastructureforyoursituationcanpayenormousdividends.Andusingoneincorrectly,asheofLinkedListtraversal,canbeverycostly.
Last,youwanttostorealistofprimitives,thebestwaywouldbetohaveesdesignedforthepurpose,likeIntArrayList.Nooantstogototheexpenseofwritingandtainingallpermutationsoflistsforalltheprimitivedatatypes;thisisonereasonJavaneedsahigh-qualitytemplatefacilitylikeC.That\'satopicforanotherday,butonethatIhopetorevisitinafuturearticle.Fornow,afriendandIareproposingsomeprimitivelistestoaddtotheJavalibrary,becausewhenyoudowantalistofprimitives,there\'snosubstituteforadecentdatastructure.
Maps
HashMapisquiteabitfasterthantheolderHashtable,mostlybyvirtueofnotbeingsynchronized.However,thealgorithmusedisstilllessthanoptimal.Toanalyzeitfurther,youhavetolookoHashMap\'ssourcecode,andknowabitabouthashingalgorithms.Ingeneral,ahashalgorithmisfastbecauseit\"hashes\"thekeyandturnsitdirectlyothelocationofthebinwherethevalueisstored,makingitanO(1)operation.Theproblemcomeswhentwodferentkeyshappentohashtothesamebin.Statistically,thishappensfairlyoften,andit\'sthejobofthewriteroftheHashMaptoreduceitasmuchaspossible.
Collisionscannotbetotallyeliminatedhegeneral,sothedesignofhashalgorithmsmustallowforthem.Therefore,eachbinheHashMapisessentiallyalinkedlistforallthekeysandvaluesthatcouldhypotheticallyendupthere.ThismeansthateverytimeyouaddanelementtoaHashMap,you\'reonceagaincreatinganobjectthatholdsthekey,thevalue,andareferencetothenextnodeinanymorevalueshappentolandhesamebin.
Objectcreationisthemostexpensiveoperationpossible,soI\'vetriedadferentapproachandhaveonmysiteacoupleofexperimentalesthatperformtwiceasfastasHashMap(FastHashMap)orfourtimesasfastyourkeyisan(FastIntHashMap).Theydo,however,achievepartoftheirspectacularspeedbynotcheckingthesizeeachtimeaelementisadded,soyoumustallocatetherightsizetableinadvance.
AswithallotherJavadatastructures,youaddtoomanyelementstoaHashtableorHashMap,theygrow.Thisistheworstthingyoucando,sincegrowingrequirespainfullyreinsertingeveryelement.Hashingrequiresabout2530%morebinsthanthereareelementsforefficientoperation.AlwayspreallocatewhatyouthinkistherightsizeforyourHashtable,begenerous,andcheckattheendtobesureyouwererightandthatthetabledidnothavetogrow.
Last,becauhehashalgorithmforslooksateverycharacterhe,avoidhashinglargesatallpossible.Thesmallerthe,thefasterthehash.
StrengthReduction
Turningslowmachinelanguageinstructionsoequivalentbutfasteronesistraditionallythejobofapeepholeoptimizerinacompiler;theoptimizerlooksatawindowofinstructionscomingoutofthecodegeneratorandmakesjudicioussubstitutions.IntheJavaenvironmenttherearetwostagesatwhichpeepholeoptimizationscanbedone.OneisduringcompiletimewhenthesourcecodeisturnedoJVMcode;theotheriswhenthecodeisrunandtheJITturnsJVMcodeoanativeassembler.ThelatteristheapproachchosenbySun,becauhatwaytheycanoptimizecodefortheparticularprocessorrunningthecode.
Havingadmittedthatmoststrengthreductionsarethingscompilersshoulddo,yourcompilerdoesn\'tdothem(andJavadidn\'tusedto),thenit\'suptoyoutodothemyourself.Indoingso,thereareanumberofissues:Willtheresultingcodebeassimpleasorsimplerthantheoriginal?Gainingalittlespeedwhilelosingunderstandabilityisnotagreatbargain.Willtheresultingcodebefaster?Programmersoftenassumethey\'reoptimizing,wheninfactthey\'redoingthereverse.Thekindofclockcyclecountingiscertainlybetterdonebyacompiler,withknowledgeofthetargetCPUandenvironmentatallpossible.ThegoodsisJDK1.4nowdoessomestrengthreduction.It\'suptoyoutodecidehowmuchspeedyouneednow.
First,whatnottodo.Multiplicationsbytheconstantpowerof2areautomaticallyconvertedtoshtsbythecomputer:
x*2x<<1
x*16x<<4
Morecomplex,butnotworthit,aremultiplicationsbyconstants:
x*10(x<<3)+(x<<1)
Divisionsarenotsupportedatthemoment,butwillbesoon.Ifyouneedthespeedrightnow,thespeedofthedivisionitselfisfiveorsixtimesfaster.
x/2x>>1
Amuchmoreimportantstrengthreduction,andonethattheJITisnotlikelytodetecthenearfuture,alsoinvolvesdivision.Often,programmerswanttogoaroundaloop,butdosomethingdferenteveryntimes.Onestandardtrickistocountandtakethecountermodulon,ashefollowingexample:
for(i=0;i<100000;i)
(i%109){
//dosomethingeverytenthtime
}
Thisisslow;thefollowingisfourtofivetimesfaster:
for(i=0,j=10;i<100000;i,j--){
(j0){
//dosomethingeverytenthtime
j=10;//restartthecount
}
Similarly,youhavecodeinalooplike:
j=(j+1)%n;
//jshouldalwaysendupbetween0andn-1
it\'smuchfastertowrite:
(jn)
j=0;
Ingeneral,foranypositivenumberx,x%nisequivalenttox&(n-1)nisapoweroftwo.Sox%8x&7aslongasxispositive.Usingthe&operatorisalotfaster.
Summary
Alltheperformanceenhancementshisarticlehaveinvolvedtheapplicationofsimpletechniquestomakeindividualsectionsofcodefaster.Ifyoulearnthericksandapplythemeverywhereasamatterofcourse,yourcodecangetsignicantlyfasterwithoutalotofeffort.Theechniques,combinedwithJDK1.4andthenextgenerationofJavacompilers,aregoingtotakeuswithinahair\'sbreadthofbeingasfastasawell-writtenCapplicationandmostapplicationsinCarenotwellwritten.Theworldwillenjoytheresultingcrisphandlingoftheprogramstocome.Getoutthereandwritesomethinggreat.
AuthorBio
DovKrugerispresidentofRightTRAK,Inc.,aconsultingandtrainingcompanyfocusingonJava,object-oriented,andWeb-basedtechnologies.He\'scurrentlyworkingonimprovingtheperformanceofdynamicWebpageswithgraphics,ernationalization,andsomeoftheJavalibraries.
最新评论