rockbox

Author	SHA1	Message	Date
Jens Arnold	a29b659758	Assembler optimised mono predictor for ARM. Speedup for -c1000 mono is ~5% on PP, ~8% on Gigabeat S (less for higher compression levels). Also fix some overlooked comments in the stereo predictor. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19375 a1c6a512-1295-4272-9138-f99709370657	2008-12-09 23:20:59 +00:00
Jens Arnold	c1cd0469ca	Implement mono predictor in assembler for coldfire, yielding a ~6% speedup for mono -c1000. Apply ideas gained from it back to the stereo predictor, saving 4 instructions. No speed increase for stereo, probably due to cache aliasing effects. * 80-column police. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19296 a1c6a512-1295-4272-9138-f99709370657	2008-12-02 02:26:04 +00:00
Jens Arnold	75bd4adbc2	Shuffling around register allocation allows to keep decoded0 and decoded1 in registers, for a slight speedup. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19287 a1c6a512-1295-4272-9138-f99709370657	2008-12-01 13:21:06 +00:00
Jens Arnold	89a6fe7ae4	Remove extraneous semicolons, and fix a comment. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19268 a1c6a512-1295-4272-9138-f99709370657	2008-11-30 11:54:20 +00:00
Jens Arnold	797ef6585a	Fix APE 16-bit mono output: mono signals need to be scaled for rockbox. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19264 a1c6a512-1295-4272-9138-f99709370657	2008-11-30 01:01:04 +00:00
Jens Arnold	88270f7622	Resurrect the ARM7 16-bit packed vector addition/subtraction for ARMv5, giving a nice speedup for the higher compression levels (tested on Cowon D2). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19260 a1c6a512-1295-4272-9138-f99709370657	2008-11-28 23:50:22 +00:00
Jens Arnold	113c285045	On ARM9TDMI (e.g. Gigabeat F) it's faster to use a ldr/str pair than add+ldmia/stmia for 2 registers. On ARM7TDMI a str pair is equally fast, so go for the simpler macro and use it for all ARMv4. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19250 a1c6a512-1295-4272-9138-f99709370657	2008-11-27 22:07:46 +00:00
Jens Arnold	6d34e33b94	Speed up the predictor a little by using ldrd/strd on ARMv5+. This required shuffling around the register allocation somewhat. Performance on ARMv4 is unaffected. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19248 a1c6a512-1295-4272-9138-f99709370657	2008-11-27 20:52:23 +00:00
Jens Arnold	5b0d74a7d3	Get rid of unused return values, except the one from decode_chunk() which will be used in the dual core split. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19236 a1c6a512-1295-4272-9138-f99709370657	2008-11-26 18:01:18 +00:00
Jens Arnold	d7e4e54bcb	Reorder instructions to avoid pipeline stalls on ARMv6 wherever possible (sometimes using different registers to allow this). Speeds up the predictor by almost 20% on ARMv6 (overall speedup for -c1000 is 5%), and might also help a bit on ARMv5. ARMv4 speed is unaffected. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19210 a1c6a512-1295-4272-9138-f99709370657	2008-11-24 23:09:09 +00:00
Jens Arnold	3761c0108c	Branch optimisation in both C (giving hints to gcc - verified using -fprofile-arcs and gcov) and asm files. Biggest effect on coldfire (-c1000: +8%, -c2000: +5%), but ARM also profits a bit (less than 1% on ARM7TDMI, around 1% on ARM1136). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19199 a1c6a512-1295-4272-9138-f99709370657	2008-11-24 18:40:49 +00:00
Jens Arnold	66c0cf2eb1	Tweak the ARMv6 filter assembly a bit further. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19198 a1c6a512-1295-4272-9138-f99709370657	2008-11-24 18:40:43 +00:00
Björn Stenberg	c6b3d38a15	New makefile solution: A single invocation of 'make' to build the entire tree. Fully controlled dependencies give faster and more correct recompiles. Many #include lines adjusted to conform to the new standards. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19146 a1c6a512-1295-4272-9138-f99709370657	2008-11-20 11:27:31 +00:00
Jens Arnold	2a5053f58c	Several tweaks and cleanups: * Use .rept instead of repeated macros for repeating blocks. * Use MUL (variant) instead of MLA (variant) in the first step of the ARM scalarproduct() if there's no loop. * Unroll ARM assembler functions to 32 where not already done, plus the generic scalarproduct(). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19144 a1c6a512-1295-4272-9138-f99709370657	2008-11-19 21:31:33 +00:00
Jens Arnold	77934cbc96	Compile-time choice between 16 bit and 32 bit integers for the filters. 32 bit filters are faster on ARMv4 (with assembler code), so use them there. Nice speedup on PP and Gigabeat F/X. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19140 a1c6a512-1295-4272-9138-f99709370657	2008-11-19 00:34:48 +00:00
Jens Arnold	1b14167861	Centralise compile-time configuration. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19121 a1c6a512-1295-4272-9138-f99709370657	2008-11-16 17:49:37 +00:00
Jens Arnold	b5c0afc442	Move the contents of rangecoding.h into entropy.c, and remove the former. It was only used there, and defined some variables in the .h git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19116 a1c6a512-1295-4272-9138-f99709370657	2008-11-16 12:58:15 +00:00
Jens Arnold	5ba11af855	Avoid unnecessary register copies on ARMv5. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19112 a1c6a512-1295-4272-9138-f99709370657	2008-11-16 10:12:38 +00:00
Dave Chapman	3e8a2bfa12	Make the standalone demac program compile again git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19107 a1c6a512-1295-4272-9138-f99709370657	2008-11-15 00:35:07 +00:00
Jens Arnold	9a0224fd28	Fix comments. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19102 a1c6a512-1295-4272-9138-f99709370657	2008-11-12 18:20:25 +00:00
Jens Arnold	60e16e8e7a	Tiny speedup by simplifying the filter wrap check. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19101 a1c6a512-1295-4272-9138-f99709370657	2008-11-12 18:16:27 +00:00
Jens Arnold	1600e4918e	Tiny performance improvement for the (not yet usable) compression levels >= -c2000 on ARM7TDMI, utilizing the multiplier's early termination. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19099 a1c6a512-1295-4272-9138-f99709370657	2008-11-12 09:18:36 +00:00
Jens Arnold	fe04e40be7	Further optimised (vs. libgcc) unsigned 32 bit division for ARMv4 (based on the ARMv5(+) version from libgcc), in IRAM on PP for better performance on PP5002, and put into the codeclib for possible reuse. APE -c1000 is now usable on both PP502x and PP5002 (~138% realtime, they're on par now). Gigabeat F/X should also see an APE speedup. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19009 a1c6a512-1295-4272-9138-f99709370657	2008-11-05 00:10:05 +00:00
Jens Arnold	7a835ee0c6	Some entropy decoder tweaks. Also removed unnecessary 'tmp' variables. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@19008 a1c6a512-1295-4272-9138-f99709370657	2008-11-04 23:46:04 +00:00
Jens Arnold	dd7cacdc88	Another minor improvement: better pipelining and one less register used in vector addition/ subtraction. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18739 a1c6a512-1295-4272-9138-f99709370657	2008-10-07 20:52:42 +00:00
Jens Arnold	6b84f60046	APE: Further ARMv6 filter optimisations: Save 4 'ror's per round by utilising the shift feature of the 'pack halfword' instructions in the unaligned vector addition/ subtraction, better pipelining in the aligned scalarproduct(), and a new method to calculate the unaligned scalarproduct(). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18736 a1c6a512-1295-4272-9138-f99709370657	2008-10-07 19:40:17 +00:00
Jens Arnold	6219f4c862	Fix warnings on non-arm targets. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18702 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 21:52:47 +00:00
Jens Arnold	d1b19be423	Various speedups: (1) Put actual decoding functions into IRAM on PP5002. (2) Put the insane filter buffer into IRAM on coldfire and PP502x (just for completeness, as long as there's no better use). (3) Use the ARMv6 'ssat' instruction for saturation on Gigabeat S. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18701 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 21:40:32 +00:00
Jens Arnold	5d29f5188f	Put the rangecoder struct into IRAM for a nice speedup on coldfire. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18699 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 15:54:34 +00:00
Jens Arnold	d456460707	Further speedup for ARMv6 by better pipelining in scalarproduct(). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18697 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 12:30:18 +00:00
Jens Arnold	67554591d9	Fix static noise on armv6. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18694 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 10:09:57 +00:00
Jens Arnold	46bf6bd493	Add preliminary ARMv5te optimisations (verified working, but can probably be sped up further), and fix a comment in the ARMv6 code. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18693 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 09:33:36 +00:00
Jens Arnold	6fcf2765dd	Add armv6 specific asm code for the APE filters, speeding up -c2000..-c5000 a bit on Gigabeat S. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@18692 a1c6a512-1295-4272-9138-f99709370657	2008-10-03 08:54:34 +00:00
Dave Chapman	f026c0fc82	Remove unnecessary #include - this fixes compilation of the standalone demac tool git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15320 a1c6a512-1295-4272-9138-f99709370657	2007-10-26 21:17:37 +00:00
Jens Arnold	35f23267bf	Further optimised the filter vector math assembly for coldfire, and added assembly filter vector math for ARM. Both make use of the fact that the first argument of the vector functions is longword aligned. * The ARM version is tailored for ARM7TDMI, and would slow down arm9 or higher. Introduced a new CPU_ macro for ARM7TDMI. Speedup for coldfire: -c3000 104%->109%, -c4000 43%->46%, -c5000 1.7%->2.0%. Speedup for PP502x: -c2000 66%->75%, -c3000 37%->48%, -c4000 11%->18%, -c5000 2.5%->3.7% git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15302 a1c6a512-1295-4272-9138-f99709370657	2007-10-25 18:58:16 +00:00
Jens Arnold	3ea3caf341	* Flip argument order for scalarproduct() so that the first argument is always 32 bit aligned, like it is already the case for vector_add() and vector_sub(), for upcoming optimisations. * Un-inline the apply_filter functions for better cache utilisation. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15301 a1c6a512-1295-4272-9138-f99709370657	2007-10-25 18:45:28 +00:00
Jens Arnold	87f5359d60	Shuffle some instructions around for that extra percent of performance. Fix a bunch of comments. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15216 a1c6a512-1295-4272-9138-f99709370657	2007-10-19 22:57:19 +00:00
Jens Arnold	5d066590cc	APE codec: Assembler optimised predictor for coldfire. Heavily based on the arm version atm, instruction reordering will probably allow for a bit more speedup soon. Speedup: -c1000: 177% -> 210%, -c2000: 135% -> 147%, -c3000: 97% -> 103%. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15211 a1c6a512-1295-4272-9138-f99709370657	2007-10-19 21:35:07 +00:00
Jens Arnold	2e9c77cc2a	APE codec: Further optimised filtering yields 3..4% speedup for -c2000 (now 135% realtime), -c3000 (now 97% realtime) and higher modes. Single 32 bit stores are faster than movem/lea in IRAM. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15200 a1c6a512-1295-4272-9138-f99709370657	2007-10-19 07:30:55 +00:00
Jens Arnold	2640bdb262	APE codec: Assembler optimised vector math routines for coldfire. -c2000 is now usable at 130% realtime (was 107%), -c3000 is near realtime (93%, was 64%). -c1000 doesn't change. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@15194 a1c6a512-1295-4272-9138-f99709370657	2007-10-18 22:37:33 +00:00
Dave Chapman	cee61b57c8	Remove some unused code git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13630 a1c6a512-1295-4272-9138-f99709370657	2007-06-14 22:35:01 +00:00
Dave Chapman	283738086d	Oops, forgot to set keywords prop git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13627 a1c6a512-1295-4272-9138-f99709370657	2007-06-13 22:03:28 +00:00
Dave Chapman	6b713820c1	ARM assembler predictor decoding function. This increases my -c1000 test track from around 94% realtime on an ipod to around 104% realtime, but yields only a tiny speedup (453% to 455%) on the Gigabeat. Including this optimisation, total decoding time for my 245.70s -c1000 test track on an ipod is 236.06s, with the predictor decoding taking 51.40s of that time - meaning the predictor decoding is only about 22% of the total decoding time. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13626 a1c6a512-1295-4272-9138-f99709370657	2007-06-13 22:02:34 +00:00
Dave Chapman	601ede7f9c	C optimisations to the predictor decoding - create a single function for decoding stereo streams, and reorganise to minimise the number of variables used. My -c1000 test track now decodes at 93% realtime on PortalPlayer (was 78%), 187% on Coldfire (was 170%) and 447% on Gigabeat (was 408%). git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13608 a1c6a512-1295-4272-9138-f99709370657	2007-06-10 08:55:16 +00:00
Dave Chapman	6131996538	Define and use a local APE_MAX function to make the standalone demac decoder compile again. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13601 a1c6a512-1295-4272-9138-f99709370657	2007-06-09 00:58:15 +00:00
Dave Chapman	7b1d90a851	Seeking and resume support for Monkey's Audio git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13597 a1c6a512-1295-4272-9138-f99709370657	2007-06-08 22:35:26 +00:00
Dave Chapman	c995ae8026	Make v3.97 APE files work in Rockbox git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13571 a1c6a512-1295-4272-9138-f99709370657	2007-06-06 17:46:49 +00:00
Dave Chapman	8dcd6058c8	Correct a comment (thanks to Markun for spotting) git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13569 a1c6a512-1295-4272-9138-f99709370657	2007-06-06 08:53:55 +00:00
Dave Chapman	520274219a	Initial commit of Monkey's Audio (.ape/.mac) support. Note that Monkey's is an extremely CPU-intensive codec, and that the decoding speed is directly related to the compression level (-c1000, -c2000, -c3000, -c4000 or -c5000) used when encoding the file. Current performance is: -c1000 to -c3000 are realtime on a Gigabeat, -c1000 is realtime on Coldfire targets (H100, H300 and Cowon), and nothing is realtime on PortalPlayer targets (iPods, H10, Sansa). Hopefully this can be improved. More information at FS #7256 . git-svn-id: svn://svn.rockbox.org/rockbox/trunk@13562 a1c6a512-1295-4272-9138-f99709370657	2007-06-05 16:58:29 +00:00

49 commits