rockbox

History

Jens Arnold d7e4e54bcb Reorder instructions to avoid pipeline stalls on ARMv6 wherever possible (sometimes using different registers to allow this). Speeds up the predictor by almost 20% on ARMv6 (overall speedup for -c1000 is 5%), and might also help a bit on ARMv5. ARMv4 speed is unaffected. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19210 a1c6a512-1295-4272-9138-f99709370657		2008-11-24 23:09:09 +00:00
..
crc.c
decoder.c	Compile-time choice between 16 bit and 32 bit integers for the filters. 32 bit filters are faster on ARMv4 (with assembler code), so use them there. Nice speedup on PP and Gigabeat F/X.	2008-11-19 00:34:48 +00:00
decoder.h
demac.h
demac_config.h	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).	2008-11-24 18:40:49 +00:00
entropy.c	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).	2008-11-24 18:40:49 +00:00
entropy.h
filter.c	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).	2008-11-24 18:40:49 +00:00
filter.h	Compile-time choice between 16 bit and 32 bit integers for the filters. 32 bit filters are faster on ARMv4 (with assembler code), so use them there. Nice speedup on PP and Gigabeat F/X.	2008-11-19 00:34:48 +00:00
filter_16_11.c
filter_32_10.c
filter_64_11.c
filter_256_13.c
filter_1280_15.c
Makefile
parser.c
parser.h	Centralise compile-time configuration.	2008-11-16 17:49:37 +00:00
predictor-arm.S	Reorder instructions to avoid pipeline stalls on ARMv6 wherever possible (sometimes using different registers to allow this). Speeds up the predictor by almost 20% on ARMv6 (overall speedup for -c1000 is 5%), and might also help a bit on ARMv5. ARMv4 speed is unaffected.	2008-11-24 23:09:09 +00:00
predictor-cf.S	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).	2008-11-24 18:40:49 +00:00
predictor.c	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136).	2008-11-24 18:40:49 +00:00
predictor.h
SOURCES	APE codec: Assembler optimised predictor for coldfire. Heavily based on the arm version atm, instruction reordering will probably allow for a bit more speedup soon. Speedup: -c1000: 177% -> 210%, -c2000: 135% -> 147%, -c3000: 97% -> 103%.	2007-10-19 21:35:07 +00:00
vector_math16_armv5te.h	Several tweaks and cleanups: * Use .rept instead of repeated macros for repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct().	2008-11-19 21:31:33 +00:00
vector_math16_armv6.h	Tweak the ARMv6 filter assembly a bit further.	2008-11-24 18:40:43 +00:00
vector_math16_cf.h	Several tweaks and cleanups: * Use .rept instead of repeated macros for repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct().	2008-11-19 21:31:33 +00:00
vector_math32_armv4.h	Several tweaks and cleanups: * Use .rept instead of repeated macros for repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct().	2008-11-19 21:31:33 +00:00
vector_math_generic.h	Several tweaks and cleanups: * Use .rept instead of repeated macros for repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct().	2008-11-19 21:31:33 +00:00