Conversation
There was a problem hiding this comment.
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (6)
src/jpeg/bit_reader.rs:62
- [nitpick] The change from returning u16 to u32 in the read function should be reviewed for consistency and correctness.
pub fn read(&mut self, bits_to_read: u32) -> Result<u32> {
src/jpeg/bit_reader.rs:62
- Ensure that the new behavior introduced in the read function is covered by tests.
pub fn read(&mut self, bits_to_read: u32) -> Result<u32> {
src/helpers.rs:105
- The condition
(value << 1) > shiftedmight not correctly handle edge cases wherevalueis close to the maximum value foru32. Consider usingvalue & (shifted >> 1) != 0as in the original code.
if (value << 1) > shifted {
src/jpeg/jpeg_read.rs:687
- The conversion of the result of
bit_reader.read(1)?tousizeshould be carefully reviewed to ensure it does not introduce any issues.
node = ctree.node[usize::from(node)][bit_reader.read(1)? as usize];
src/jpeg/jpeg_read.rs:903
- The change from
u16tou32in thedecode_eobrun_bitsfunction should be reviewed to ensure it does not introduce any issues.
fn decode_eobrun_bits(s: u8, n: u32) -> u32 {
src/jpeg/jpeg_read.rs:722
- The conversion of
literal_bitstou32in theread_dcfunction should be reviewed to ensure it does not introduce any issues.
let value = bit_reader.read(literal_bits)?;
| /// with as many bits as possible, leaving the possibility of unwinding via the undo_read_ahead | ||
| /// function in certain corner cases. | ||
| #[inline(never)] | ||
| #[cold] |
There was a problem hiding this comment.
Is this really helps with performance? By the rule of thumb refill should be a bit too frequent to being cold. I'll check on my box.
There was a problem hiding this comment.
The tradeoff was with having more registers free for the inner loop, but it's possible that it's faster without the cold. fill_register is definitely cold since it gets called almost never.
There was a problem hiding this comment.
With cold:
ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/img_52MP_7k2.lep; sudo perf stat -B -e cache-references,cache-misses,cycles,ic_fetch_stall.ic_stall_back_pressure,stalled-cycles-frontend,instructions,branch-instructions,branch-misses,ic_fetch_stall.ic_stall_any,ic_fetch_stall.ic_stall_dq_empty,l2_cache_misses_from_ic_miss,l2_latency.l2_cycles_waiting_on_fills,faults,migrations taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep
2025-01-12T14:26:43.217Z INFO [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2025-01-12T14:26:43.583Z INFO [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2025-01-12T14:26:44.712Z INFO [lepton_jpeg::structs::lepton_file_writer] worker threads 7732ms of CPU time in 1127ms of wall time
2025-01-12T14:26:44.712Z INFO [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2025-01-12T14:26:46.304Z INFO [lepton_jpeg_util] compressed input 22171278, output 17324074 bytes (compression = 28.0%)
2025-01-12T14:26:46.304Z INFO [lepton_jpeg_util] Main thread CPU: 3095ms, Worker thread CPU: 18774 ms, walltime: 3095 ms
Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':
857 527 770 cache-references (25,32%)
76 481 649 cache-misses # 8,92% of all cache refs (25,47%)
14 095 457 313 cycles (25,47%)
865 595 905 ic_fetch_stall.ic_stall_back_pressure (25,51%)
935 546 969 stalled-cycles-frontend # 6,64% frontend cycles idle (25,31%)
34 775 075 664 instructions # 2,47 insn per cycle
# 0,03 stalled cycles per insn (25,24%)
3 665 427 290 branch-instructions (25,05%)
140 713 890 branch-misses # 3,84% of all branches (25,07%)
5 325 860 720 ic_fetch_stall.ic_stall_any (25,28%)
39 528 616 ic_fetch_stall.ic_stall_dq_empty (25,38%)
62 527 706 l2_cache_misses_from_ic_miss (25,24%)
1 842 115 928 l2_latency.l2_cycles_waiting_on_fills (25,12%)
188 277 faults
1 migrations
3,129010978 seconds time elapsed
2,821312000 seconds user
0,302604000 seconds sys
Without cold:
ivan@ivan-5950:~/lepton_jpeg_rust$ sudo rm images/img_52MP_7k2.lep; sudo perf stat -B -e cache-references,cache-misses,cycles,ic_fetch_stall.ic_stall_back_pressure,stalled-cycles-frontend,instructions,branch-instructions,branch-misses,ic_fetch_stall.ic_stall_any,ic_fetch_stall.ic_stall_dq_empty,l2_cache_misses_from_ic_miss,l2_latency.l2_cycles_waiting_on_fills,faults,migrations taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep
2025-01-12T14:25:52.492Z INFO [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2025-01-12T14:25:52.974Z INFO [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2025-01-12T14:25:54.453Z INFO [lepton_jpeg::structs::lepton_file_writer] worker threads 10251ms of CPU time in 1478ms of wall time
2025-01-12T14:25:54.453Z INFO [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2025-01-12T14:25:56.028Z INFO [lepton_jpeg_util] compressed input 22171278, output 17324074 bytes (compression = 28.0%)
2025-01-12T14:25:56.028Z INFO [lepton_jpeg_util] Main thread CPU: 3547ms, Worker thread CPU: 21139 ms, walltime: 3547 ms
Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':
854 375 836 cache-references (25,26%)
74 116 401 cache-misses # 8,67% of all cache refs (25,55%)
14 048 593 475 cycles (25,55%)
487 211 314 ic_fetch_stall.ic_stall_back_pressure (25,48%)
869 359 514 stalled-cycles-frontend # 6,19% frontend cycles idle (25,31%)
35 076 159 662 instructions # 2,50 insn per cycle
# 0,02 stalled cycles per insn (25,18%)
3 638 127 413 branch-instructions (25,14%)
140 480 887 branch-misses # 3,86% of all branches (25,13%)
5 203 535 058 ic_fetch_stall.ic_stall_any (25,11%)
36 559 121 ic_fetch_stall.ic_stall_dq_empty (24,97%)
61 473 795 l2_cache_misses_from_ic_miss (24,92%)
1 770 575 776 l2_latency.l2_cycles_waiting_on_fills (24,98%)
188 278 faults
1 migrations
3,585066548 seconds time elapsed
3,233798000 seconds user
0,346549000 seconds sys
|
Nice work, congrats to speed up devli more! |
|
But main is faster (all compiled on Similar results I got on my Intel laptop. Maybe you use different RUST version? UPD: The same on |
Test Results8 tests - 163 7 ✅ - 164 1s ⏱️ - 19m 7s For more details on these failures, see this check. Results for commit 2c98bb2. ± Comparison against base commit 621555c. This pull request removes 163 tests. |
Improve speed of JPEG reading